![]() |
![]()
| ![]() |
![]()
NAMEEncode::Detect::CJK - A Charset Detector, optimized for EastAsia charset and website contentSYNOPSISuse Encode::Detect::CJK; #just use use Encode::Detect::CJK qw(detect); #use and export function #simple use it my $charset=CharsetDetector::detect($octets); #use it with advanced option my $charset = CharsetDetector::detect($octets,$max_len,$is_consider_html_head_charset); #return the charset of binary string $octets #$max_len if $octets 's size is big, will make detect slow, sometimes you need specify $max_len for detect,null is for DEFAULT(unlimit max_len) #$is_consider_html_header_charset, by DEFAULT, detetor will consider # html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset, # if you don't want detetor to consider html header as a factor, set $is_consider_html_header_charset to "" or 0 Basic Functiondetect - detect the charset of string$charset=CharsetDetector::detect($octets,$max_len,$is_consider_html_head_charset); $charset=CharsetDetector::detect($octets,$max_len);#CharsetDetector::detect($octets,$max_len,1); $charset=CharsetDetector::detect($octets);#same as CharsetDetector::detect($octets,undef); Param $octets - input binary string input binary string Param $max_len - max length for charset detector if $octets 's size is big, will make detect slow, sometimes you need specify $max_len for detect,null is for DEFAULT(unlimit max_len) DEFAULT is unlimit Param $is_consider_html_head_charset by DEFAULT, detetor will consider html header (e.g. <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ) as a factor to detect charset, if you don't want detetor to consider html header as a factor, set $is_consider_html_header_charset to "" or 0 Return Value $charset if $octets is null return '' if $octets is '' return 'iso-8859-1' else return charset name Supported Charset Listreturn value: alias ascii : ascii iso-8859-1 : iso-8859-1 utf8 : utf8 utf-8-strict utf16 : utf16 cp936 : euc-cn(gb2312) cp936(gbk) gb18030 big5-eten : big5-eten euc-jp : euc-jp shiftjis : shiftjis iso-2022-jp : iso-2022-jp euc-kr : euc-kr iso-2022-kr : iso-2022-kr COPYRIGHTThe CharsetDetector module is Copyright (c) 2003-2008 QIAN YU. All rights reserved.You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.
|