Hailo::Role::Tokenizer - A role representing a Hailo tokenizer
This is the constructor. It takes no arguments.
Takes a line of input and returns an array reference of tokens. A token is an
array reference containing two elements: a spacing attribute and the
token text. The spacing attribute is an integer which will be stored
along with the token text in the database. The following values are currently
being used:
- 0 - normal token
- 1 - prefix token (no whitespace follows it)
- 2 - postfix token (no whitespace precedes it)
- 3 - infix token (no whitespace follows or precedes it)
Takes an array reference of tokens and returns a line of output. A token is an
array reference as described in
"make_tokens". The tokens will be joined
together into a sentence according to the whitespace attributes associated
with the tokens, as well as any formatting provided by the tokenizer
implementation.
Hinrik Örn Sigurðsson, hinrik.sig@gmail.com
Ævar Arnfjörð Bjarmason
<avar@cpan.org>
Copyright 2010 Hinrik Örn Sigurðsson and Ævar
Arnfjörð Bjarmason <avar@cpan.org>
This program is free software, you can redistribute it and/or
modify it under the same terms as Perl itself.