|
NAMEuniquote - escape special characters using various quoting conventionsSYNOPSISuniquote [options] [ textfile ... ]Standard options: --version print version information and exit --help this message --man full manpage --debug add some debugging output Character mode options: Without a specified encoding, utf8 is assumed unless file has encoding extension. --verbose -v show full character names like \N{EN DASH} --hex -x use singleton \x{...} esapes instead of \N{U+XXX} --encoding -E specify encoding for all input files --html -H show HTML entities (add --verbose for names) --xml -X show XML entities Binary mode options: --bytes -b binary file in hex --octal -0 binary file in octal Other options: --endings -n place $ at EOL so trailing spaces visible --backslash -t use backslash escapes for unprintable ASCII --fix-newlines -l consider any Unicode linebreak sequence as EOL --unbuffer -u flush each output line DESCRIPTIONThe uniquote program it means as a Unicode-aware replacement for programs like ol(1) and "cat -v". It converts ASCII control code and all non-ASCII code points into a quoted form such as one might use in a Perl literal.Use --endings or "-e" to cat like "cat -e" and add a dollar at the end of each line so trailing spaces become apparent. Use --backslash or "-t" to show tabs and other ASCII control codes as backslash escapes. By default, uniquote converts each such code points into the form "\N{U+hex}", making code point 962 appear as "\N{U+3C2}". The --hex option instead shows eligible points in backslash-X notation, so code point 962 would be displayed as "\x{3C2}". The --verbose option instead displays eligible code points by name. Code point 962 would then be shown as "\N{GREEK SMALL LETTER FINAL SIGMA}". The --xml and --html options show code point using numeric entities. Adding --verbose to --html will use named HTML entities where available. Character Modes vs Binary ModeTo treat the file as a sequence a bytes, use --binary. This displays all bytes escaped in the form "\xXX". The other way to specify binary input uses the <--octal> option.If you have not specified binary mode, then you are in character mode. The default encoding in character mode us not ASCII but UTF-8. If you have not specified an optional encoding with --encoding, but the filename ends with the name of an encoding that Perl recognizes, that encoding will be assumed. Note that no matter the actual input character encoding, code points reflect the Unicode number of that code point. You can use this property to normalize input, or to check that you actually know a file's encoding. For example, you can test the same file with various 8-bit encodings like Latin1, MacRoman, and CP1252. The default input encoding is actually "utf8"; that is, Perl's permissive version of UTF-8. If you want strict UTF-8, override it. EXAMPLES$ perl -E 'say "ascii:\tnayeeve fassodd"' > /tmp/nf.ascii $ perl -E 'binmode(STDOUT, "encoding(macroman)")||die; say "macroman:\tna\xEFve fa\xE7ade"' > /tmp/nf.macroman $ perl -E 'binmode(STDOUT, "encoding(utf8)")||die; say "utf8:\tna\xEFve fa\xE7ade"' > /tmp/nf.utf8 $ perl -E 'binmode(STDOUT, "encoding(utf16)")||die; say "utf16:\tna\xEFve fa\xE7ade"' > /tmp/nf.utf16 $ perl -E 'binmode(STDOUT, "encoding(utf32)")||die; say "utf32:\tna\xEFve fa\xE7ade"' > /tmp/nf.utf32 $ perl -E 'binmode(STDOUT, "encoding(latin1)")||die; say "latin1:\tna\xEFve fa\xE7ade"' > /tmp/nf.latin1 $ perl -E 'binmode(STDOUT, "encoding(cp1252)")||die; say "cp1252:\tna\xEFve fa\xE7ade"' > /tmp/nf.cp1252 $ wc -c /tmp/nf* 23 /tmp/nf.ascii 21 /tmp/nf.cp1252 21 /tmp/nf.latin1 23 /tmp/nf.macroman 42 /tmp/nf.utf16 84 /tmp/nf.utf32 21 /tmp/nf.utf8 235 total $ uniquote /tmp/nf.* ascii:\N{U+09}nayeeve fassodd cp1252:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade latin1:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade macroman:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade utf16:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade utf32:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade utf8:\N{U+09}na\N{U+EF}ve fa\N{U+E7}ade $ uniquote --backslash --endings /tmp/nf.* ascii:\tnayeeve fassodd$ cp1252:\tna\N{U+EF}ve fa\N{U+E7}ade$ latin1:\tna\N{U+EF}ve fa\N{U+E7}ade$ macroman:\tna\N{U+EF}ve fa\N{U+E7}ade$ utf16:\tna\N{U+EF}ve fa\N{U+E7}ade$ utf32:\tna\N{U+EF}ve fa\N{U+E7}ade$ utf8:\tna\N{U+EF}ve fa\N{U+E7}ade$ $ uniquote --verbose /tmp/nf.* ascii:\N{CHARACTER TABULATION}nayeeve fassodd cp1252:\N{CHARACTER TABULATION}na\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade latin1:\N{CHARACTER TABULATION}na\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade macroman:\N{CHARACTER TABULATION}na\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade utf16:\N{CHARACTER TABULATION}na\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade utf32:\N{CHARACTER TABULATION}na\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade utf8:\N{CHARACTER TABULATION}na\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade $ uniquote --binary /tmp/nf.* ascii:\x09nayeeve fassodd cp1252:\x09na\xEFve fa\xE7ade latin1:\x09na\xEFve fa\xE7ade macroman:\x09na\x95ve fa\x8Dade \xFE\xFF\x00u\x00t\x00f\x001\x006\x00:\x00\x09\x00n\x00a\x00\xEF\x00v\x00e\x00 \x00f\x00a\x00\xE7\x00a\x00d\x00e\x00 \x00\x00\xFE\xFF\x00\x00\x00u\x00\x00\x00t\x00\x00\x00f\x00\x00\x003\x00\x00\x002\x00\x00\x00:\x00\x00\x00\x09\x00\x00\x00n\x00\x00\x00a\x00\x00\x00\xEF\x00\x00\x00v\x00\x00\x00e\x00\x00\x00 \x00\x00\x00f\x00\x00\x00a\x00\x00\x00\xE7\x00\x00\x00a\x00\x00\x00d\x00\x00\x00e\x00\x00\x00 utf8:\x09na\xC3\xAFve fa\xC3\xA7ade $ uniquote --xml /tmp/nf.* ascii:	nayeeve fassodd cp1252:	naïve façade latin1:	naïve façade macroman:	naïve façade utf16:	naïve façade utf32:	naïve façade utf8:	naïve façade $ uniquote --html /tmp/nf.* ascii:	nayeeve fassodd cp1252:	naïve façade latin1:	naïve façade macroman:	naïve façade utf16:	naïve façade utf32:	naïve façade utf8:	naïve façade $ uniquote --html --verbose /tmp/nf.* ascii:	nayeeve fassodd cp1252:	naïve façade latin1:	naïve façade macroman:	naïve façade utf16:	naïve façade utf32:	naïve façade utf8:	naïve façade $ uniquote --backslash --encoding latin1 --verbose /tmp/nf.* ascii:\tnayeeve fassodd cp1252:\tna\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade latin1:\tna\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade macroman:\tna\N{MESSAGE WAITING}ve fa\N{REVERSE LINE FEED}ade \N{LATIN SMALL LETTER THORN}\N{LATIN SMALL LETTER Y WITH DIAERESIS}\0u\0t\0f\01\06\0:\0\t\0n\0a\0\N{LATIN SMALL LETTER I WITH DIAERESIS}\0v\0e\0 \0f\0a\0\N{LATIN SMALL LETTER C WITH CEDILLA}\0a\0d\0e\0 \0\0\N{LATIN SMALL LETTER THORN}\N{LATIN SMALL LETTER Y WITH DIAERESIS}\0\0\0u\0\0\0t\0\0\0f\0\0\03\0\0\02\0\0\0:\0\0\0\t\0\0\0n\0\0\0a\0\0\0\N{LATIN SMALL LETTER I WITH DIAERESIS}\0\0\0v\0\0\0e\0\0\0 \0\0\0f\0\0\0a\0\0\0\N{LATIN SMALL LETTER C WITH CEDILLA}\0\0\0a\0\0\0d\0\0\0e\0\0\0 utf8:\tna\N{LATIN CAPITAL LETTER A WITH TILDE}\N{MACRON}ve fa\N{LATIN CAPITAL LETTER A WITH TILDE}\N{SECTION SIGN}ade $ uniquote --backslash --encoding cp1252 --verbose /tmp/nf.* ascii:\tnayeeve fassodd uniquote: cp1252 "\x8D" does not map to Unicode at /tmp/nf.macroman line 0 cp1252:\tna\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade latin1:\tna\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade \N{LATIN SMALL LETTER THORN}\N{LATIN SMALL LETTER Y WITH DIAERESIS}\0u\0t\0f\01\06\0:\0\t\0n\0a\0\N{LATIN SMALL LETTER I WITH DIAERESIS}\0v\0e\0 \0f\0a\0\N{LATIN SMALL LETTER C WITH CEDILLA}\0a\0d\0e\0 \0\0\N{LATIN SMALL LETTER THORN}\N{LATIN SMALL LETTER Y WITH DIAERESIS}\0\0\0u\0\0\0t\0\0\0f\0\0\03\0\0\02\0\0\0:\0\0\0\t\0\0\0n\0\0\0a\0\0\0\N{LATIN SMALL LETTER I WITH DIAERESIS}\0\0\0v\0\0\0e\0\0\0 \0\0\0f\0\0\0a\0\0\0\N{LATIN SMALL LETTER C WITH CEDILLA}\0\0\0a\0\0\0d\0\0\0e\0\0\0 utf8:\tna\N{LATIN CAPITAL LETTER A WITH TILDE}\N{MACRON}ve fa\N{LATIN CAPITAL LETTER A WITH TILDE}\N{SECTION SIGN}ade $ uniquote --backslash --encoding macroman --verbose /tmp/nf.* ascii:\tnayeeve fassodd cp1252:\tna\N{LATIN CAPITAL LETTER O WITH CIRCUMFLEX}ve fa\N{LATIN CAPITAL LETTER A WITH ACUTE}ade latin1:\tna\N{LATIN CAPITAL LETTER O WITH CIRCUMFLEX}ve fa\N{LATIN CAPITAL LETTER A WITH ACUTE}ade macroman:\tna\N{LATIN SMALL LETTER I WITH DIAERESIS}ve fa\N{LATIN SMALL LETTER C WITH CEDILLA}ade \N{OGONEK}\N{CARON}\0u\0t\0f\01\06\0:\0\t\0n\0a\0\N{LATIN CAPITAL LETTER O WITH CIRCUMFLEX}\0v\0e\0 \0f\0a\0\N{LATIN CAPITAL LETTER A WITH ACUTE}\0a\0d\0e\0 \0\0\N{OGONEK}\N{CARON}\0\0\0u\0\0\0t\0\0\0f\0\0\03\0\0\02\0\0\0:\0\0\0\t\0\0\0n\0\0\0a\0\0\0\N{LATIN CAPITAL LETTER O WITH CIRCUMFLEX}\0\0\0v\0\0\0e\0\0\0 \0\0\0f\0\0\0a\0\0\0\N{LATIN CAPITAL LETTER A WITH ACUTE}\0\0\0a\0\0\0d\0\0\0e\0\0\0 utf8:\tna\N{SQUARE ROOT}\N{LATIN CAPITAL LETTER O WITH STROKE}ve fa\N{SQUARE ROOT}\N{LATIN SMALL LETTER SHARP S}ade ERRORSExits 0 if all is well, 1 otherwise.Errors include inaccessible files, bogus encodings, and contents that do not match a specified encoding. BUGSGood question.SEE ALSOod(1), cat(1), Encode(3)HISTORYFirst public release February 27, 2011.AUTHORTom Christiansen "<tchrist@perl.com>"COPYRIGHT AND LICENCECopyright 2010 Tom Christiansen.This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
Visit the GSP FreeBSD Man Page Interface. |