Generation of word forms

The paradigm files

For Saami languages we generate word forms for nouns, adjectives, verbs and numerals. For each of these word classes, we need a file with the tags for the forms we want to generate, e.g. N_paradigms.txt:

N+Sg+Nom
N+Sg+Gen
N+NomAg+Sg+Nom
N+NomAg+Sg+Gen
N+Pl+Nom
N+Pl+Gen
N+NomAg+Pl+Nom
N+NomAg+Pl+Gen
..

How to restrict the generation

We can restrict the generation for lemmas by adding an gen_only-attribute to the l-element in the lexicon. Here are some examples.

No word forms are generated, the lemma will only be used for Leksa:

<l gen_only="none"  ....>lemma</l> 

Only word forms containing these tags will be generated:

<l gen_only="N+NomAg"  ....>lemma</l>  
(NomAg is a tag for identifying a lemma when there is homonymi)

<l gen_only="N+Pl,N+Ess" ....>lemma</l>  
(Only plural forms and essive are generated)

Linguistic variation

We generate the word forms for the tasks and the key answers with a restricted-FST. We generate the word forms, which the program will accept as answers from the students, with a norm-FST, containing all the forms belonging to the written standard. One can take care of dialectical variation, by making two of more restricted FSTs. But the students will still get accepted the answers if they belong to the written standard.