GSP
Quick Navigator

Search Site

Unix VPS
A - Starter
B - Basic
C - Preferred
D - Commercial
MPS - Dedicated
Previous VPSs
* Sign Up! *

Support
Contact Us
Online Help
Handbooks
Domain Status
Man Pages

FAQ
Virtual Servers
Pricing
Billing
Technical

Network
Facilities
Connectivity
Topology Map

Miscellaneous
Server Agreement
Year 2038
Credits
 

USA Flag

 

 

Man Pages
HTML::Entities::ImodePictogram(3) User Contributed Perl Documentation HTML::Entities::ImodePictogram(3)

HTML::Entities::ImodePictogram - encode / decode i-mode pictogram

  use HTML::Entities::ImodePictogram;

  $html      = encode_pictogram($rawtext);
  $rawtext   = decode_pictogram($html);
  $cleantext = remove_pictogram($rawtext);

  use HTML::Entities::ImodePictogram qw(find_pictogram);

  $num_found = find_pictogram($rawtext, \&callback);

HTML::Entities::ImodePictogram handles HTML entities for i-mode pictogram (emoji), which are assigned in Shift_JIS private area.

See http://www.nttdocomo.co.jp/i/tag/emoji/index.html for details about i-mode pictogram.

In all functions in this module, input/output strings are asssumed as encoded in Shift_JIS. See Jcode for conversion between Shift_JIS and other encodings like EUC-JP or UTF-8.

This module exports following functions by default.

encode_pictogram
  $html = encode_pictogram($rawtext);
  $html = encode_pictogram($rawtext, unicode => 1);
    

Encodes pictogram characters in raw-text into HTML entities. If $rawtext contains extended pictograms, they are encoded in Unicode format. If you add "unicode" option explicitly, all pictogram characters are encoded in Unicode format (""). Otherwise, encoding is done in decimal format ("&#NNNNN;").

decode_pictogram
  $rawtext = decode_pictogram($html);
    

Decodes HTML entities (both for "" and "&#NNNNN;") for pictogram into raw-text in Shift_JIS.

remove_pictogram
  $cleantext = remove_pictogram($rawtext);
    

Removes pictogram characters in raw-text.

This module also exports following functions on demand.

find_pictogram
  $num_found = find_pictorgram($rawtext, \&callback);
    

Finds pictogram characters in raw-text and executes callback when found. It returns the total numbers of charcters found in text.

The callback is given three arguments. The first is a found pictogram character itself, and the second is a decimal number which represents Shift_JIS codepoint of the character. The third is a Unicode codepoint. Whatever the callback returns will replace the original text.

Here is a stub implementation of encode_pictogram(), which will be the good example for the usage of find_pictogram(). Note that this example version doesn't support extended pictograms.

  sub encode_pictogram {
      my $text = shift;
      find_pictogram($text, sub {
                         my($char, $number, $cp) = @_;
                         return '&#' . $number . ';';
                     });
      return $text;
  }
    

  • This module works so slow, because regex used here matches "ANY" characters in the text. This is due to the difficulty of extracting character boundaries of Shift_JIS encoding.
  • Extended pictogram support of this module is not complete. If you handle pictogram characters in Unicode, try Encode module with perl 5.8.0, or Unicode::Japanese.

Tatsuhiko Miyagawa <miyagawa@bulknews.net>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

HTML::Entities, Unicode::Japanese, http://www.nttdocomo.co.jp/p_s/imode/tag/emoji/
2003-06-23 perl v5.32.1

Search for    or go to Top of page |  Section 3 |  Main Index

Powered by GSP Visit the GSP FreeBSD Man Page Interface.
Output converted with ManDoc.