  | 
 
 
 
 |  
 |  | 
 
  
    | I18N::Charset(3) | 
    User Contributed Perl Documentation | 
    I18N::Charset(3) | 
   
 
I18N::Charset - IANA Character Set Registry names and
    Unicode::MapUTF8 (et al.) conversion scheme names 
  use I18N::Charset;
  $sCharset = iana_charset_name('WinCyrillic');
  # $sCharset is now 'windows-1251'
  $sCharset = umap_charset_name('Adobe DingBats');
  # $sCharset is now 'ADOBE-DINGBATS' which can be passed to Unicode::Map->new()
  $sCharset = map8_charset_name('windows-1251');
  # $sCharset is now 'cp1251' which can be passed to Unicode::Map8->new()
  $sCharset = umu8_charset_name('x-sjis');
  # $sCharset is now 'sjis' which can be passed to Unicode::MapUTF8->new()
  $sCharset = libi_charset_name('x-sjis');
  # $sCharset is now 'MS_KANJI' which can be passed to `iconv -f $sCharset ...`
  $sCharset = enco_charset_name('Shift-JIS');
  # $sCharset is now 'shiftjis' which can be passed to Encode::from_to()
  I18N::Charset::add_iana_alias('my-japanese' => 'iso-2022-jp');
  I18N::Charset::add_map8_alias('my-arabic' => 'arabic7');
  I18N::Charset::add_umap_alias('my-hebrew' => 'ISO-8859-8');
  I18N::Charset::add_libi_alias('my-sjis' => 'x-sjis');
  I18N::Charset::add_enco_alias('my-japanese' => 'shiftjis');
The "I18N::Charset" module
    provides access to the IANA Character Set Registry names for identifying
    character encoding schemes. It also provides a mapping to the character set
    names used by the Unicode::Map8 and Unicode::Map modules. 
So, for example, if you get an HTML document with a META
    CHARSET="..." tag, you can fairly quickly determine what
    Unicode::MapXXX module can be used to convert it to Unicode. 
If you don't have the module Unicode::Map installed, the umap_
    functions will always return undef. If you don't have the module
    Unicode::Map8 installed, the map8_ functions will always return undef. If
    you don't have the module Unicode::MapUTF8 installed, the umu8_ functions
    will always return undef. If you don't have the iconv library installed, the
    libi_ functions will always return undef. If you don't have the Encode
    module installed, the enco_ functions will always return undef. 
There are four main conversion routines:
    iana_charset_name(),
    map8_charset_name(),
    umap_charset_name(), and
    umu8_charset_name(). 
  - iana_charset_name()
 
  - This function takes a string containing the name of a character set and
      returns a string which contains the official IANA name of the character
      set identified. If no valid character set name can be identified, then
      "undef" will be returned. The case and
      punctuation within the string are not important.
    
    
    $sCharset = iana_charset_name('WinCyrillic');
    
   
  - mime_charset_name()
 
  - This function takes a string containing the name of a character set and
      returns a string which contains the preferred MIME name of the character
      set identified. If no valid character set name can be identified, then
      "undef" will be returned. The case and
      punctuation within the string are not important.
    
    
    $sCharset = mime_charset_name('Extended_UNIX_Code_Packed_Format_for_Japanese');
    
   
  - enco_charset_name()
 
  - This function takes a string containing the name of a character set and
      returns a string which contains a name of the character set suitable to be
      passed to the Encode module. If no valid character set name can be
      identified, or if Encode is not installed, then
      "undef" will be returned. The case and
      punctuation within the string are not important.
    
    
    $sCharset = enco_charset_name('Extended_UNIX_Code_Packed_Format_for_Japanese');
    
   
  - libi_charset_name()
 
  - This function takes a string containing the name of a character set and
      returns a string which contains a name of the character set suitable to be
      passed to iconv. If no valid character set name can be identified, then
      "undef" will be returned. The case and
      punctuation within the string are not important.
    
    
    $sCharset = libi_charset_name('Extended_UNIX_Code_Packed_Format_for_Korean');
    
   
  - mib_to_charset_name
 
  - This function takes a string containing the MIBenum of a character set and
      returns a string which contains a name for the character set. If the given
      MIBenum does not correspond to any character set, then
      "undef" will be returned.
    
    
    $sCharset = mib_to_charset_name('3');
    
   
  - mib_charset_name
 
  - This is a synonum for mib_to_charset_name
 
  - charset_name_to_mib
 
  - This function takes a string containing the name of a character set in
      almost any format and returns a MIBenum for the character set. For
      IANA-registered character sets, this is the IANA-registered MIB. For
      non-IANA character sets, this is an unambiguous unique string whose only
      use is to pass to other functions in this module. If no valid character
      set name can be identified, then "undef"
      will be returned.
    
    
    $iMIB = charset_name_to_mib('US-ASCII');
    
   
  - map8_charset_name()
 
  - This function takes a string containing the name of a character set (in
      almost any format) and returns a string which contains a name for the
      character set that can be passed to Unicode::Map8::new(). Note: the
      returned string will be capitalized just like the name of the .bin file in
      the Unicode::Map8::MAPS_DIR directory. If no valid character set name can
      be identified, then "undef" will be
      returned. The case and punctuation within the argument string are not
      important.
    
    
    $sCharset = map8_charset_name('windows-1251');
    
   
  - umap_charset_name()
 
  - This function takes a string containing the name of a character set (in
      almost any format) and returns a string which contains a name for the
      character set that can be passed to Unicode::Map::new(). If no
      valid character set name can be identified, then
      "undef" will be returned. The case and
      punctuation within the argument string are not important.
    
    
    $sCharset = umap_charset_name('hebrew');
    
   
  - umu8_charset_name()
 
  - This function takes a string containing the name of a character set (in
      almost any format) and returns a string which contains a name for the
      character set that can be passed to Unicode::MapUTF8::new(). If no
      valid character set name can be identified, then
      "undef" will be returned. The case and
      punctuation within the argument string are not important.
    
    
    $sCharset = umu8_charset_name('windows-1251');
    
   
 
There is one function which can be used to obtain a list of all
    IANA-registered character set names. 
  - all_iana_charset_names()
 
  - Returns a list of all registered IANA character set names. The names are
      not in any particular order.
 
 
This module supports several semi-private routines for specifying
    character set name aliases. 
  - add_iana_alias()
 
  - This function takes two strings: a new alias, and a target IANA Character
      Set Name (or another alias). It defines the new alias to refer to that
      character set name (or to the character set name to which the second alias
      refers).
    
Returns the target character set name of the successfully
        installed alias. Returns 'undef' if the target character set name is not
        registered. Returns 'undef' if the target character set name of the
        second alias is not registered. 
    
      I18N::Charset::add_iana_alias('my-alias1' => 'Shift_JIS');
    
    With this code, "my-alias1" becomes an alias for the
        existing IANA character set name 'Shift_JIS'. 
    
      I18N::Charset::add_iana_alias('my-alias2' => 'sjis');
    
    With this code, "my-alias2" becomes an alias for the
        IANA character set name referred to by the existing alias 'sjis' (which
        happens to be 'Shift_JIS'). 
   
  - add_map8_alias()
 
  - This function takes two strings: a new alias, and a target Unicode::Map8
      Character Set Name (or an exising alias to a Map8 name). It defines the
      new alias to refer to that mapping name (or to the mapping name to which
      the second alias refers).
    
If the first argument is a registered IANA character set name,
        then all aliases of that IANA character set name will end up pointing to
        the target Map8 mapping name. 
    Returns the target mapping name of the successfully installed
        alias. Returns 'undef' if the target mapping name is not registered.
        Returns 'undef' if the target mapping name of the second alias is not
        registered. 
    
      I18N::Charset::add_map8_alias('normal' => 'ANSI_X3.4-1968');
    
    With the above statement, "normal" becomes an alias
        for the existing Unicode::Map8 mapping name 'ANSI_X3.4-1968'. 
    
      I18N::Charset::add_map8_alias('normal' => 'US-ASCII');
    
    With the above statement, "normal" becomes an alias
        for the existing Unicode::Map mapping name 'ANSI_X3.4-1968' (which is
        what "US-ASCII" is an alias for). 
    
      I18N::Charset::add_map8_alias('IBM297' => 'EBCDIC-CA-FR');
    
    With the above statement, "IBM297" becomes an alias
        for the existing Unicode::Map mapping name 'EBCDIC-CA-FR'. As a side
        effect, all the aliases for 'IBM297' (i.e. 'cp297' and 'ebcdic-cp-fr')
        also become aliases for 'EBCDIC-CA-FR'. 
   
  - add_umap_alias()
 
  - This function works identically to add_map8_alias() above, but
      operates on Unicode::Map encoding tables.
 
  - add_libi_alias()
 
  - This function takes two strings: a new alias, and a target iconv Character
      Set Name (or existing iconv alias). It defines the new alias to refer to
      that character set name (or to the character set name to which the
      existing alias refers).
    
Returns the target conversion scheme name of the successfully
        installed alias. Returns 'undef' if there is no such target conversion
        scheme or alias. 
    Examples: 
    
      I18N::Charset::add_libi_alias('my-chinese1' => 'CN-GB');
    
    With this code, "my-chinese1" becomes an alias for
        the existing iconv conversion scheme 'CN-GB'. 
    
      I18N::Charset::add_libi_alias('my-chinese2' => 'EUC-CN');
    
    With this code, "my-chinese2" becomes an alias for
        the iconv conversion scheme referred to by the existing alias 'EUC-CN'
        (which happens to be 'CN-GB'). 
   
  - add_enco_alias()
 
  - This function takes two strings: a new alias, and a target Encode encoding
      Name (or existing Encode alias). It defines the new alias referring to
      that encoding name (or to the encoding to which the existing alias
      refers).
    
Returns the target encoding name of the successfully installed
        alias. Returns 'undef' if there is no such encoding or alias. 
    Examples: 
    
      I18N::Charset::add_enco_alias('my-japanese1' => 'jis0201-raw');
    
    With this code, "my-japanese1" becomes an alias for
        the existing encoding 'jis0201-raw'. 
    
      I18N::Charset::add_enco_alias('my-japanese2' => 'my-japanese1');
    
    With this code, "my-japanese2" becomes an alias for
        the encoding referred to by the existing alias 'my-japanese1' (which
        happens to be 'jis0201-raw' after the previous call). 
   
 
  - Unicode::Map
 
  - Convert strings from various multi-byte character encodings to and from
      Unicode.
 
  - Unicode::Map8
 
  - Convert strings from various 8-bit character encodings to and from
      Unicode.
 
  - Jcode
 
  - Convert strings among various Japanese character encodings and
    Unicode.
 
  - Unicode::MapUTF8
 
  - A wrapper around all three of these character set conversion
      distributions.
 
 
Martin Thurn, "mthurn@cpan.org",
    <http://tinyurl.com/nn67z>. 
This module is free software; you can redistribute it and/or
    modify it under the same terms as Perl itself. 
 
 
  Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
  |