NAME

apertium-tagger —

part-of-speech tagger and trainer for Apertium

SYNOPSIS

apertium-tagger [options] -g serialized_tagger [input [output]]

apertium-tagger [options] -r iterations corpus serialized_tagger

apertium-tagger [options] -s iterations dictionary corpus tagger_spec serialized_tagger tagged_corpus untagged_corpus

apertium-tagger [options] -s 0 dictionary tagger_spec serialized_tagger tagged_corpus untagged_corpus

apertium-tagger [options] -s 0 -u model serialized_tagger tagged_corpus

apertium-tagger [options] -t iterations dictionary corpus tagger_spec serialized_tagger

DESCRIPTION

apertium-tagger is the application responsible for the apertium part-of-speech tagger training or tagging, depending on the calling options. This command only reads from the standard input if the option --tagger or -g is used.

MODES

-g, --tagger: Tags input text by means of Viterbi algorithm.
-r n, --retrain n: Retrains the model with n additional Baum-Welch iterations (unsupervised). This option is incompatible with -u (--unigram)
-s n, --supervised n: Initializes parameters against a hand-tagged text (supervised) through the maximum likelihood estimate method, then performs n iterations of the Baum-Welch training algorithm (unsupervised). The CRP argument can be omitted only when n = 0.
-t n, --train n: Initializes parameters through Kupiec's method (unsupervised), then performs n iterations of the Baum-Welch training algorithm (unsupervised).

MODELS

-u, --unigram=MODEL: use unigram algorithm MODEL from <https://coltekin.net/cagri/papers/trmorph-tools.pdf>
-w, --sliding-window: use the Light Sliding Window algorithm
-x, --perceptron: use the averaged perceptron algorithm

OPTIONS

-d, --debug: Print error (if any) or debug messages while operating.
-e, --skip-on-error: Used with -xs to ignore certain types of errors with the training corpus
-f, --first: Used in conjunction with -g (- -tagger) makes the tagger give all lexical forms of each word, with the chosen one in the first place (after the lemma)
-m, --mark: Mark disambiguated words.
-p, --show-superficial: Prints the superficial form of the word along side the lexical form in the output stream.
-z, --null-flush: Used in conjunction with -g (- -tagger) to flush the output after getting each null character.
--help: Display a help message.

FILES

These are the kinds of files used with each option:

dictionary: Full expanded dictionary file
corpus: Training text corpus file
tagger_spec: Tagger specification file, in XML format
serialized_tagger: Tagger data file, built in the training and used while tagging
tagged_corpus: Hand-tagged text corpus
untagged_corpus: Untagged text corpus, morphological analysis of hand-tagged corpus to use both jointly with -s option
input: Input file, stdin by default
output: Output file, stdout by default

COPYRIGHT

BUGS

Many... lurking in the dark and waiting for you!