ISA & ICA : Interactive Alignment of Bitexts


ISA: Interactive Sentence Alignment

ISA is a PHP based web interface for interactive sentence alignment of parallel XML documents. It uses as the backend the length-based Gale&Church approach to sentence alignment but it can be used for manual alignment. The basic idea is to use the interface for
  • adding hard boundaries to improve quality and performance of the automatic alignment
  • correcting existing alignments by removing/adding new segment boundaries
The interface allows you to work only on small portions of the document or the entire document. Alignment results can be saved (if not disabled) or sent via e-mail (if not disabled) in various formats (XCES align with pointers to external sentence IDs, plain text format or simple TMX).

ICA: Interactive Clue Alignment

ICA is a PHP based web interface for interactive word alignment. It uses as its backend the Clue Aligner but can be used for manual alignment as well. You can
  • select clues and clue weights
  • inspect alignment strategies and matching clues
  • correct the alignment by adding and removing links
  • display the contents of clue score databases
ICA works on one sentence pair at a time taken from a pre-defined parallel corpus (its location is hard-coded in the script for the time being). PHP is a server side scripting language and, therefore, the corpus has to be located on the server running the script. An upload function could easily be integrated. However, we would then need some form of authentication for protection. The script also needs to have access to appropriate clues stored in local (server-side) database files (one for each type). These files can be produced by the Clue Aligner off-line.

Select a corpus for sentence alignment:

Select a corpus for word alignment: