|
|
| |
APERTIUM-TAGGER(1) |
FreeBSD General Commands Manual |
APERTIUM-TAGGER(1) |
apertium-tagger —
part-of-speech tagger and trainer for Apertium
apertium-tagger |
[options] -g
serialized_tagger [input
[output]] |
apertium-tagger |
[options] -r
iterations corpus serialized_tagger |
apertium-tagger |
[options] -s
iterations dictionary corpus tagger_spec
serialized_tagger tagged_corpus
untagged_corpus |
apertium-tagger |
[options] -s
0 dictionary tagger_spec
serialized_tagger tagged_corpus untagged_corpus |
apertium-tagger |
[options] -s
0 -u model
serialized_tagger tagged_corpus |
apertium-tagger |
[options] -t
iterations dictionary corpus tagger_spec
serialized_tagger |
apertium-tagger is the application responsible for the
apertium part-of-speech tagger training or tagging, depending on the calling
options. This command only reads from the standard input if the option
- -tagger or
-g is used.
-g ,
- -tagger
- Tags input text by means of Viterbi algorithm.
-r
n,
- -retrain
n
- Retrains the model with n additional Baum-Welch
iterations (unsupervised). This option is incompatible with
-u
(- -unigram )
-s
n,
- -supervised
n
- Initializes parameters against a hand-tagged text (supervised) through the
maximum likelihood estimate method, then performs n
iterations of the Baum-Welch training algorithm (unsupervised). The CRP
argument can be omitted only when n = 0.
-t
n,
- -train
n
- Initializes parameters through Kupiec's method (unsupervised), then
performs n iterations of the Baum-Welch training
algorithm (unsupervised).
-u ,
- -unigram=MODEL
- use unigram algorithm MODEL from
<https://coltekin.net/cagri/papers/trmorph-tools.pdf>
-w ,
- -sliding-window
- use the Light Sliding Window algorithm
-x ,
- -perceptron
- use the averaged perceptron algorithm
-d ,
- -debug
- Print error (if any) or debug messages while operating.
-e,
- -skip-on-error
- Used with
-xs to ignore certain types of errors
with the training corpus
-f ,
- -first
- Used in conjunction with
-g
(- -tagger ) makes the
tagger give all lexical forms of each word, with the chosen one in the
first place (after the lemma)
-m ,
- -mark
- Mark disambiguated words.
-p ,
- -show-superficial
- Prints the superficial form of the word along side the lexical form in the
output stream.
-z ,
- -null-flush
- Used in conjunction with
-g
(- -tagger ) to flush the
output after getting each null character.
- -help
- Display a help message.
These are the kinds of files used with each option:
- dictionary
- Full expanded dictionary file
- corpus
- Training text corpus file
- tagger_spec
- Tagger specification file, in XML format
- serialized_tagger
- Tagger data file, built in the training and used while tagging
- tagged_corpus
- Hand-tagged text corpus
- untagged_corpus
- Untagged text corpus, morphological analysis of hand-tagged corpus to use
both jointly with
-s option
- input
- Input file, stdin by default
- output
- Output file, stdout by default
Copyright © 2005, 2006 Universitat d'Alacant / Universidad de Alicante.
This is free software. You may redistribute copies of it under the terms of
the GNU General
Public License.
Many... lurking in the dark and waiting for you!
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |