|
NAMEchkascii - a small C program that checks whether a file is purely ASCII plain-text or else contains any non-ASCII characters.SYNOPSISThe ASCII system uses a 7-bit representation in which every byte's left-most bit is reserved. This allows for the character value range 0-127, many of which are control characters (for instance 0) not found in actual, regular text files.Characters outside of the US-ASCII character set need to be specially encoded when they get into a plain text file. So if a plain text file has the German character 'ö' (o-Umlaut), the character would need to be encoded at the time of writing, and then decoded at the time of reading. Text-encoding schemes are of two varieties: some such as ISO-8859-1 extend the traditional ASCII set by using 8-bit representation; others such as the Unicode family use 2 or 4 bytes per character. As an example, 'ö' (o-Umlaut) uses 1 byte in ISO-8859-1 (decimal 246) but 2 bytes in UTF-8 (the first byte holding decimal 195; the second byte holding decimal 182). chkascii treats valid ASCII values in a text file as the austere set: { 32-126 / 9 (horizontal tab) / 10 (line feed a.k.a. Unix newline) } Anything else is considered junk, although the user can pass in additional ASCII values to treat as good. chkascii additionally checks whether the file is EOL-terminated. If the file is neither empty nor terminated with ASCII 10, chkascii treats the condition as an error too. USAGEThe command below asks <file> to be checked as per the built-in scheme, stopping on the first bad ASCII character:chkascii <file> The command above uses a default maximum-junk (maxjunk) value of 1. If you pass in a maxjunk value of 0, the application treats the upper limit of reportable junk as infinite: chkascii --maxjunk=0 <file> --summary keeps chkascii going till EOF, whence a summary is
printed.
--quiet asks chkascii to suppress normal output.
These options are mutually exclusive: --maxjunk | --summary | --quiet --accept forces an ASCII value to be accepted as good. The command below asks ASCII values 13 (CR) and 11 (VT) to be treated as good: chkascii --accept=13,11 <file> RETURN VALUES-1 is returned for operational failure. Because the operating system might map return values to unsigned char, the shell could pick up 255.If the file (contains all good ASCII values and is EOL-terminated) or else (is empty), 0 is returned. All other cases are treated as errors, and a byte (initially 8 off-bits) is adjusted at the ends (as explained underneath) and returned. If the file has junk, the rightmost bit of the byte (equivalent to 1) is turned on. If the file is not EOL-terminated, the left-most bit of the byte (equivalent to 128) is turned on. SEE ALSO`man 7 ascii` and xascii(1) are the standard tools for printing the table of ASCII values and symbols.vim(1) has the Normal command ga which displays the ASCII value of the character under the cursor. od(1) can be used to print the (decimal) ASCII values in a file: od -t u1 <file> The Ctrl+Q subcommand of hexedit(1) can be used to insert (hexadecimal) ASCII half-chars (0 through f). Besides many editors, unix2dos(1) and flip(1) can convert text files between Unix (LF) and DOS (CRLF) line-endings. BUGS/LIMITATIONSchkascii does not read standard input.If you pass in multiple files, the first path is taken as the
argument.
Optional arguments can be specified a maximum of once. AUTHORManish Jain (bourne.identity@hotmail.com)
Visit the GSP FreeBSD Man Page Interface. |