yaz-icu [-c config] [-p opt] [-s] [-x] [infile]
Specifies the file containing ICU chain configuration which is XML based.-p type
Specifies extra information to be printed about the ICU system. If type is c then ICU converters are printed. If type is l, then available locales are printed. If type is t, then available transliterators are printed.-s
Specifies that output should include sort key as well. Note that sort key differs between ICU versions.-x
Specifies that output should be XML based rather than "text" based.
Converts case (and rule specifies how): ldisplay
Lower case using ICU function u_strToLower.u
Upper case using ICU function u_strToUpper.t
To title using ICU function u_strToTitle.f
Fold case using ICU function u_strFoldCase.
This is a meta step which specifies that a term/token is to be displayed. This term is retrieved in an application using function icu_chain_token_display (yaz/icu.h).transform
Specifies an ICU transform rule using a transliterator Identifier. The rule attribute is the transliterator Identifier. See ICU Transforms for more information.transliterate
Specifies a rule-based transliterator. The rule attribute is the custom transformation rule to be used. See ICU Transforms for more information.tokenize
Breaks / tokenizes a string into components using ICU functions ubrk_open, ubrk_setText, .. . The rule is one of: ljoin
Line. ICU: UBRK_LINE.s
Sentence. ICU: UBRK_SENTENCE.w
Word. ICU: UBRK_WORD.c
Character. ICU: UBRK_CHARACTER.t
Title. ICU: UBRK_TITLE.
Joins tokens into one string. The rule attribute is the joining string, which may be empty. The join conversion element was added in YAZ 4.2.49.
cat text | yaz-icu -c chain.xml
<icu_chain locale="en"> <transform rule="[:Control:] Any-Remove"/> <tokenize rule="w"/> <transform rule="[[:WhiteSpace:][:Punctuation:]] Remove"/> <transliterate rule="xy > z;"/> <display/> <casemap rule="l"/> </icu_chain>