|
|
| |
Jcode(3) |
User Contributed Perl Documentation |
Jcode(3) |
Jcode - Japanese Charset Handler
use Jcode;
#
# traditional
Jcode::convert(\$str, $ocode, $icode, "z");
# or OOP!
print Jcode->new($str)->h2z->tr($from, $to)->utf8;
<Japanese document is now available as Jcode::Nihongo. >
Jcode.pm supports both object and traditional approach. With
object approach, you can go like;
$iso_2022_jp = Jcode->new($str)->h2z->jis;
Which is more elegant than:
$iso_2022_jp = $str;
&jcode::convert(\$iso_2022_jp, 'jis', &jcode::getcode(\$str), "z");
For those unfamiliar with objects, Jcode.pm still supports
"getcode()" and
"convert()."
If the perl version is 5.8.1, Jcode acts as a wrapper to Encode,
the standard charset handler module for Perl 5.8 or later.
Methods mentioned here all return Jcode object unless otherwise mentioned.
- $j = Jcode->new($str [, $icode])
- Creates Jcode object $j from
$str. Input code is automatically checked unless
you explicitly set $icode. For available charset,
see getcode below.
For perl 5.8.1 or better, $icode can
be any encoding name that Encode understands.
$j = Jcode->new($european, 'iso-latin1');
When the object is stringified, it returns the EUC-converted
string so you can <print $j> instead of
<print $j->euc>.
- Passing Reference
- Instead of scalar value, You can use reference as
Jcode->new(\$str);
This saves time a little bit. In exchange of the value of
$str being converted. (In a way,
$str is now "tied" to jcode
object).
- $j->set($str [, $icode])
- Sets $j's internal string to
$str. Handy when you use Jcode object repeatedly
(saves time and memory to create object).
# converts mailbox to SJIS format
my $jconv = new Jcode;
$/ = 00;
while(<>){
print $jconv->set(\$_)->mime_decode->sjis;
}
- $j->append($str [, $icode]);
- Appends $str to $j's
internal string.
- $j = jcode($str [, $icode]);
- shortcut for Jcode->new() so you can go like;
In general, you can retrieve encoded string as
$j->encoded.
- $sjis = jcode($str)->sjis
- $euc = $j->euc
- $jis = $j->jis
- $sjis = $j->sjis
- $ucs2 = $j->ucs2
- $utf8 = $j->utf8
- What you code is what you get :)
- $iso_2022_jp = $j->iso_2022_jp
- Same as "$j->h2z->jis". Hankaku
Kanas are forcibly converted to Zenkaku.
For perl 5.8.1 and better, you can also use any encoding names
and aliases that Encode supports. For example:
$european = $j->iso_latin1; # replace '-' with '_' for names.
FYI: Encode::Encoder uses similar trick.
- $j->fallback($fallback)
- For perl is 5.8.1 or better, Jcode stores the internal string in UTF-8.
Any character that does not map to ->encoding are replaced with
a '?', which is Encode standard.
my $unistr = "\x{262f}"; # YIN YANG
my $j = jcode($unistr); # $j->euc is '?'
You can change this behavior by specifying fallback like
Encode. Values are the same as Encode.
"Jcode::FB_PERLQQ",
"Jcode::FB_XMLCREF",
"Jcode::FB_HTMLCREF" are aliased to
those of Encode for convenice.
print $j->fallback(Jcode::FB_PERLQQ)->euc; # '\x{262f}'
print $j->fallback(Jcode::FB_XMLCREF)->euc; # '☯'
print $j->fallback(Jcode::FB_HTMLCREF)->euc; # '☯'
The global variable $Jcode::FALLBACK
stores the default fallback so you can override that by assigning the
value.
$Jcode::FALLBACK = Jcode::FB_PERLQQ; # set default fallback scheme
- [@lines =] $jcode->jfold([$width, $newline_str, $kref])
- folds lines in jcode string every $width (default:
72) where $width is the number of
"halfwidth" character. Fullwidth Characters are counted as two.
with a newline string spefied by
$newline_str (default: "\n").
Rudimentary kinsoku suppport is now available for Perl 5.8.1
and better.
- $length = $jcode->jlength();
- returns character length properly, rather than byte length.
To use methods below, you need MIME::Base64. To install, simply
perl -MCPAN -e 'CPAN::Shell->install("MIME::Base64")'
If your perl is 5.6 or better, there is no need since MIME::Base64
is bundled.
- $mime_header = $j->mime_encode([$lf, $bpl])
- Converts $str to MIME-Header documented in
RFC1522. When $lf is specified, it uses
$lf to fold line (default: \n). When
$bpl is specified, it uses
$bpl for the number of bytes (default: 76; this
number must be smaller than 76).
For Perl 5.8.1 or better, you can also encode MIME Header
as:
$mime_header = $j->MIME_Header;
In which case the resulting
$mime_header is MIME-B-encoded UTF-8 whereas
"$j->mime_encode()" returnes
MIME-B-encoded ISO-2022-JP. Most modern MUAs support both.
- $j->mime_decode;
- Decodes MIME-Header in Jcode object. For perl 5.8.1 or better, you can
also do the same as:
Jcode->new($str, 'MIME-Header')
- $j->h2z([$keep_dakuten])
- Converts X201 kana (Hankaku) to X208 kana (Zenkaku). When
$keep_dakuten is set, it leaves dakuten as is
(That is, "ka + dakuten" is left as is instead of being
converted to "ga")
You can retrieve the number of matches via
$j->nmatch;
- $j->z2h
- Converts X208 kana (Zenkaku) to X201 kana (Hankaku).
You can retrieve the number of matches via
$j->nmatch;
To use "->m()" and
"->s()", you need perl 5.8.1 or better.
- $j->tr($from, $to, $opt);
- Applies "tr/$from/$to/" on Jcode object
where $from and $to are
EUC-JP strings. On perl 5.8.1 or better, $from and
$to can also be flagged UTF-8 strings.
If $opt is set,
"tr/$from/$to/$opt" is applied.
$opt must be 'c', 'd' or the combination
thereof.
You can retrieve the number of matches via
$j->nmatch;
The following methods are available only for perl 5.8.1 or
better.
- $j->s($patter, $replace, $opt);
- Applies "s/$pattern/$replace/$opt".
$pattern and
"replace" must be in EUC-JP or flagged
UTF-8. $opt are the same as regexp options. See
perlre for regexp options.
Like "$j->tr()",
"$j->s()" returns the object itself
so you can nest the operation as follows;
$j->tr("a-z", "A-Z")->s("foo", "bar");
- [@match = ] $j->m($pattern, $opt);
- Applies "m/$patter/$opt". Note that this
method DOES NOT RETURN AN OBJECT so you can't chain the method like
"$j->s()".
If you need to access instance variables of Jcode object, use access methods
below instead of directly accessing them (That's what OOP is all about)
FYI, Jcode uses a ref to array instead of ref to hash (common way)
to optimize speed (Actually you don't have to know as long as you use access
methods instead; Once again, that's OOP)
- $j->r_str
- Reference to the EUC-coded String.
- $j->icode
- Input charcode in recent operation.
- $j->nmatch
- Number of matches (Used in $j->tr, etc.)
- ($code, [$nmatch]) = getcode($str)
- Returns char code of $str. Return codes are as
follows
ascii Ascii (Contains no Japanese Code)
binary Binary (Not Text File)
euc EUC-JP
sjis SHIFT_JIS
jis JIS (ISO-2022-JP)
ucs2 UCS2 (Raw Unicode)
utf8 UTF8
When array context is used instead of scaler, it also returns
how many character codes are found. As mentioned above,
$str can be \$str instead.
jcode.pl Users: This function is 100% upper-conpatible
with jcode::getcode() -- well, almost;
* When its return value is an array, the order is the opposite;
jcode::getcode() returns $nmatch first.
* jcode::getcode() returns 'undef' when the number of EUC characters
is equal to that of SJIS. Jcode::getcode() returns EUC. for
Jcode.pm there is no in-betweens.
- Jcode::convert($str, [$ocode, $icode, $opt])
- Converts $str to char code specified by
$ocode. When $icode is
specified also, it assumes $icode for input string
instead of the one checked by getcode(). As mentioned above,
$str can be \$str instead.
jcode.pl Users: This function is 100% upper-conpatible
with jcode::convert() !
For perl is 5.8.1 or later, Jcode acts as a wrapper to Encode. Meaning Jcode is
subject to bugs therein.
This package owes a lot in motivation, design, and code, to the jcode.pl for
Perl4 by Kazumasa Utashiro <utashiro@iij.ad.jp>.
Hiroki Ohzaki <ohzaki@iod.ricoh.co.jp> has helped me polish
regexp from the very first stage of development.
JEncode by makamaka@donzoko.net has inspired me to integrate
Encode to Jcode. He has also contributed Japanese POD.
And folks at Jcode Mailing list <jcode5@ring.gr.jp>. Without
them, I couldn't have coded this far.
Encode
Jcode::Nihongo
<http://www.iana.org/assignments/character-sets>
Copyright 1999-2005 Dan Kogai <dankogai@dan.co.jp>
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |