The Interactive Clue Aligner (ICA) - A Short User Guide

Introduction

ICA is a PHP based web interface for interactive word alignment. It uses as its backend the Clue Aligner but can be used for manual alignment as well. You can ICA works on one sentence pair at a time taken from a pre-defined parallel corpus (its location is hard-coded in the script for the time being). PHP is a server side scripting language and, therefore, the corpus has to be located on the server running the script. An upload function could easily be integrated. However, we would then need some form of authentication for protection. The script also needs to have access to appropriate clues stored in local (server-side) database files (one for each type). These files can be produced by the Clue Aligner off-line.

ICA actually calls the Clue Aligner as an external tool, runs it and parses the log output to display the alignment results. This can then be modified within the interface and the final alignment can be saved to disk (on the server, only if this feature is not disabled).

Getting started

Initially, ICA shows the main CGI form for selecting clues and starting the alignment. The location of the corpus is hard-coded and cannot be modified. First, you have to select some clue types to be used for the alignment using the check-boxes in front of each clue type name. Select, for example, 'dice', 'sim' and 'gw'. Now you can press the 'align' button to run the word alignment with the current setup. If everything works, you will see the selected sentence pair and the clue matrix in the upcoming screen. The combined clue scores are shown in the matrix (multiplied by a certain factor to make it look nice). The brightness (or better darkness) of the background color indicates the strength of the clue values compared to the others. Cells in the matrix that have been colored in red correspond to the word-to-word links that have been used for the actual word alignment using the currently selected word alignment strategy.

If you move the mouse over the clue matrix you will see that corresponding source and target language words are highlighted in the frame of the matrix and in the sentence pair above. If you move your mouse over aligned word pairs (red cells) the corresponding word alignment in the table to the left (or below) is highlighted as well.

You can also see the scores of the individual clues contributing to the final score. Move the mouse over the scores in the matrix and you will see a small tool-tip window with a list of clues and scores applied to this cell.

Selecting clues and clue weights

You can select any combination of clues fro alignment by checking the checkboxes in the alignment form on the top. Change the clue type weight with the selection box next to the clue type name. These values will be used throughout your session until you change them.

Selecting the sentence pair to be aligned

You can select the sentence pair by its ID from the selection box to the left in the same row as the 'align' button. You can also go to the next pair by pressing the 'align next' link. The latter will call the aligner with the current settings immediately after clicking on the link. The same applies to the 'align previous' link that allows you to go back to the previous sentence pair and align it. If 'save' is not disabled you will have another short-cut link to save the current alignment and to go to the next sentence pair and align it.

Word alignment

The clue aligner is called when pressing the 'align' button (or using the short cut links 'align next', 'align previous' or 'save & align next'). The aligner is called with the current settings using elected clues and weights. You can choose the alignment strategy with the selection box to the left of the 'align' button. For more information read the background literature. The selection box immediately to the left of the 'align' button can be used to set an alignment threshold. Words with a lower score than the chosen threshold score will not be aligned. The alignment is only displayed but not stored anywhere. You may save the current alignment into a local file (on the server) if the 'save' function is not disabled. Note that there is only one file for each sentence pair from the corpus. Each time you save an alignment the old one will be overwritten.

Note that running the clue aligner is limited to 5 seconds. The call to the external program will be killed if this limit is exceeded!

Manual alignment

ICA is not only a visualization tool - it can also be used to alter the alignment. You can click on each cell in the clue matrix to add a word-to-word link (if not linked already) or to remove a word-to-word link (if they have been aligned already). Word-to-word links that overlap with others will be merged (that's how the actual word alignment is done shown in the table to the left). Try to be patient and wait for the screen to refresh before clicking another time!

You may save the alignment into a local file (on the sever) if the 'save' function is not disabled. There is only one file per sentence pair and saving an alignment will overwrite old ones!

Inspecting clue databases

You can have a look at the contents of clue databases. Simply click on the link given the name of the clue type you like to inspect. This will give you a list of 25 clues from the selected database. You can walk through the database using the links surrounding the clue type name ('<<', '<', '>' and '>>'). You can sort them by source language item (click on source), by target language item (click on target), or by their scores (click on score). You may also search for certain items in the database using the input fields in the source and the target column. Note that these functions use external tools that are limited to an execution time of 5 seconds. Displaying (and especially sorting) large databases will not work correctly! Sorting scores also fail if scores are partly in exponential form.

Plans for the future

Background literature

Jörg Tiedemann, 2003,
Recycling Translations - Extraction of Lexical Data from Parallel Corpora and their Application in Natural Language Processing,
Doctoral Thesis, Studia Linguistica Upsaliensia 1, ISSN 1652-1366, ISBN 91-554-5815-7
[ pdf, 1.3MB] [ html]
Jörg Tiedemann, 2004
Word to word alignment strategies.
In Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004). Geneva, Switzerland, August 23-27.
[pdf]


- tiedeman@let.rug.nl