|
|
| |
Prima::Drawable::Glyphs(3) |
User Contributed Perl Documentation |
Prima::Drawable::Glyphs(3) |
Prima::Drawable::Glyphs - helper routines for bi-directional text input and
complex scripts output
use Prima;
$::application-> begin_paint;
$::application-> text_shape_out('אפס123', 0,0);
123ספא
The class implements an abstraction over a set of glyphs that can be rendered to
represent text strings. Objects of the class are created and returned from
"Prima::Drawable::text_shape" calls, see
more in "text_shape" in Prima::Drawable. A
"Prima::Drawable::Glyphs" object is a
blessed array reference that can contain either two, four, or five packed
arrays with 16-bit integers, representing, correspondingly, a set of glyph
indexes, a set of character indexes, a set of glyph advances, a set of glyph
position offsets per glyph, and a font index. Additionally, the class
implements several sets of helper routines that aim to address common tasks
when displaying glyph-based strings.
Each sub-array is an instance of
"Prima::array", an effective plain memory
structure that provides standard perl interface over a string scalar filled
with fixed-width integers.
The following methods provide read-only access to these
arrays:
- glyphs
- Contains a set of unsigned 16-bit integers where each is a glyph number
corresponding to the font that was used for shaping the text. These glyph
numbers are only applicable to that font. Zero is usually treated as a
default glyph in vector fonts, when shaping cannot map a character; in
bitmap fonts this number is usually same as
"defaultChar".
This array is recognized as a special case when is sent to
"text_out" or
"get_text_width", that can process it
without other arrays. In this case, no special advances and glyph
positions are taken into the account though.
Each glyph is not necessarily mapped to a character, and quite
often is not, even in english left-to-right texts. F ex character
combinations like "ff",
"fi",
"fl" may be mapped to single ligature
glyphs. When right-to-left, RTL, text direction is taken into the
account, the glyph positions may change, too. See
"indexes" below that addresses mapping
of glyphs to characters.
- indexes
- Contains a set of unsigned 16-bit integers where each is a text offset
corresponding to the text was used in shaping. Each glyph position thus
points to a first character in the text that maps to the glyph.
There can be more than one character per glyph, such as the
above example with a "ff" ligature.
There can also be cases with more than one character per more than one
glyph, f ex in indic scripts. In these cases it is easier to operate
neither by character offsets nor by glyph offsets, but rather by
clusters, where each cluster is an individual syntax unit that
contains one or more characters per one or more glyphs.
In addition to the text offset, each index value can be
flagged with a "to::RTL" bit,
signifying that the character in question has RTL direction. This is not
necessarily semitic characters from RTL languages that only have that
attribute set; spaces in these languages are normally attributed the RTL
bit too, sometimes also numbers. Use of explicit direction control
characters from U+20XX block can result in any character being assigned
or not assigned the RTL bit.
The array has an extra item added to its end, the length of
the text that was used for the shaping. This helps for easy calculation
of cluster length in characters, especially of the last one, where the
difference between indexes is, basically, the cluster length.
The array is not used for text drawing or calculation, but
only for conversion between character, glyph, and cluster coordinates
(see "Coordinates" below).
- advances
- Contains a set of unsigned 16-bit integers where each is a pixel distance
of how much space the corresponding glyph occupies. Where the advances
array is not present, or was force-filled by
"advances" options in
"text_shape", a glyph advance value is
basically a sum of a, b, and c widths of the corresponding glyph. However
there are cases when depending on shaping input, these values can differ.
One of those cases is the combining graphemes, where the text
consisting of two characters, "A" and
combining grave accent U+300 should be drawn as a single
"À" symbol, and where the font doesn't have that single
glyph but rather two individual glyphs
"A" and
"`". There, where the grave glyph has
its own advance for standalone usage, in this case it should be ignored
though, and that is achieved by the shaper setting the advance of the
"`" to zero.
The array content is respected by
"text_out" and
"get_text_width", and its content can
be changed at will to produce gaps in the text quite easily. F ex
"Prima::Edit" uses that to display tab
characters as spaces with 8x advance.
- positions
- Contains a set of pairs of signed 16-bit integers where each is a X and Y
pixel offset for each glyph. Like in the previous example with the
"À" symbol, the grave glyph
"`" may be positioned differently on the
vertical axis in "À" and "à" graphemes,
for example.
The array is respected by
"text_out" (but not by
"get_text_width").
- fonts
- Contains a set of unsigned 16-bit integers where each is an index in the
font substitution list (see "fontMapperPalette" in
Prima::Drawable). Zero means the current font.
The font substitution is applied by
"text_shape" when
"polyfont" options is set (it is by
default), and when the shaper cannot match all fonts. If the current
font contains all needed glyphs, this entry is not present at all.
The array is respected by
"text_out" and
"get_text_width".
In addition to the natural character coordinates, where each index is a text
offset that can be directly used in "substr"
perl function, the "Prima::Drawable::Glyphs"
class offers two additional coordinate systems that help abstract the object
data for display and navigation.
The glyph coordinate system is a rather straighforward copy of the
character coordinate system, where each number is an offset in the
"glyphs" array. Similarly, these offsets
can be used to address individual glyphs, indexes, advances, and positions.
However these are not easy to use when one needs, for example, to select a
grapheme with a mouse, or break set of glyphs in such a way so that a
grapheme is not broken. These can be managed easier in the cluster
coordinate system.
The cluster coordinates represent a virtually superimposed set of
offsets where each corresponds to a set of one or more characters displayed
by a one or more glyphs. Most useful functions below operate in this
system.
Practically, most useful coordinates that can be used for implementing selection
is either character or cluster, but not glyphs. The charater-based selections
makes trivial extraction or replacement of the selected text, while the
cluster-based makes it easier to manipulate (f ex with Shift- arrow keys) the
selection itself.
The class supports both, by operating on selection maps or
selection chunks, where each represent same information but in
different ways. For example, consider embedded number in a bidi text. For
the sake of clarity I'll use latin characters here. Let's have a text scalar
containing these characters:
ABC123
where ABC is right-to-left text, and which, when rendered
on screen, should be displayed as
123CBA
(and index array is (3,4,5,2,1,0) ).
Next, the user clicks the mouse between A and B (in text offset
1), drags the mouse then to the left, and finally stops between characters 2
and 3 (text offset 4). The resulting selection then should not be, as one
might naively expect, this:
123CBA
__^^^_
but this instead:
123CBA
^^_^^_
because the next character after C is 1, and the range of
the selected sub-text is from characters 1 to 4.
The class offers to encode such information in a map, i.e.
array of integers "1,1,0,1,1,0", where
each entry is either 0 or 1 depending on whether the cluster is or is not
selected. Alternatively, the same information can be encoded in
chunks, or RLE sets, as array
"0,2,1,2,1", where the first integer
signifies number of non-selected clusters to display, the second - number of
selected clusters, the third the non-selected again, etc. If the first
character belongs to the selected chunk, the first integer in the result is
set to 0.
When sending input to a widget in order to type in text, the otherwise trivial
case of figuring out at which position the text should be inserted (or
removed, for that matter), becomes interesting when there are characters with
mixed direction.
F ex it is indeed trivial, when the latin text is
"AB", and the cursor is positioned between
"A" and
"B", to figure out that whenever the user
types "C", the result should become
"ACB". Likewise, when the text is RTL and
both text and input is arabic, the result is the same. However when f.ex.
the text is "A1", that is displayed as
"1A" because of RTL shaping, and the
cursor is positioned between 1 (LTR) and
"A" (RTL), it is not clear whether that
means the new input should be appended after 1 and
become "A1C", or after
"A", and become, correspondingly,
"AC1".
There is no easy solution for this problem, and different programs
approach this differently, and some go as far as to provide two cursors for
both directions. The class offers its own solution that uses some primitive
heuristics to detect whether cursor belongs to the left or to the right
glyph. This is the area that can be enhanced, and any help from native users
of RTL languages can be greatly appreciated.
- abc $CANVAS, $INDEX
- Returns a, b, c metrics from the glyph $INDEX
- advances
- Read-only accessor to the advances array, see Structure above.
- clone
- Clones the object
- cluster2glyph $FROM, $LENGTH
- Maps a range of clusters starting with $FROM with
size $LENGTH into the corresponding range of
glyphs. Undefined $LENGTH calculates the range
from $FROM till the object end.
- cluster2index $CLUSTER
- Returns character offset of the first character in cluster
$CLUSTER.
Note: result may contain
"to::RTL" flag.
- cluster2range $CLUSTER
- Returns character offset of the first character in cluster
$CLUSTER and how many characters are there in the
cluster.
- clusters
- Returns array of integers where each is a first character offsets per
cluster.
- cursor2offset $AT_CLUSTER, $PREFERRED_RTL
- Given a cursor positioned next to the cluster
$AT_CLUSTER, runs simple heuristics to see what
character offset it corresponds to. $PREFERRED_RTL
is used when object data are not enough.
See "Bidi input" above.
- def $CANVAS, $INDEX
- Returns d, e, f metrics from the glyph $INDEX
- fonts
- Read-only accessor to the font indexes, see Structure above.
- get_box $CANVAS
- Return box metrics of the glyph object.
See "get_text_box" in Prima::Drawable.
- get_sub $FROM, $LENGTH
- Extracts and clones a new object that constains data from cluster offset
$FROM, with cluster length
$LENGTH.
- get_sub_box $CANVAS, $FROM, $LENGTH
- Calculate box metrics of a glyph string from the cluster
$FROM with size
$LENGTH.
- get_sub_width $CANVAS, $FROM, $LENGTH
- Calculate pixel width of a glyph string from the cluster
$FROM with size
$LENGTH.
- get_width $CANVAS, $WITH_OVERHANGS
- Return width of the glyph objects, with overhangs if requested.
- glyph2cluster $GLYPH
- Return the cluster that contains $GLYPH.
- glyphs
- Read-only accessor to the glyph indexes, see Structure above.
- glyph_lengths
- Returns array where each glyph position is set to a number showing how
many glyphs the cluster occupies at this position
- index2cluster $INDEX
- Returns the cluster that contains the character offset
$INDEX.
- indexes
- Read-only accessor to the indexes, see Structure above.
- index_lengths
- Returns array where each glyph position is set to a number showing how
many characters the cluster occupies at this position
- justify CANVAS, TEXT, WIDTH, %OPTIONS
- Umbrella call for "justify_interspace"
if $OPTIONS{letter} or
$OPTIONS{word} if set; for
"justify_arabic" if
$OPTIONS{kashida} is set; and for
"justify_tabs" if
$OPTIONS{tabs} is set.
Returns a boolean flag whether the glyph object was changed or
not.
- justify_arabic CANVAS, TEXT, WIDTH, %OPTIONS
- Performs justifications of arabic TEXT with kashida to the given WIDTH,
returns either success flag, or new text with explicit tatweel
characters inserted.
my $text = "\x{6a9}\x{634}\x{6cc}\x{62f}\x{647}";
my $g = $canvas->text_shape($text) or return;
$canvas->text_out($g, 10, 50);
$g->justify_arabic($canvas, $text, 200) or return;
$canvas->text_out($g, 10, 10);
Inserts tatweels only between arabic letters that did not form
any ligatures in the glyph object, max one tatweel set per word (if
any). Does not apply the justification if the letters in the word are
rendered as LTR due to embedding or explcit shaping options; only does
justification on RTL letters. If for some reason newly inserted tatweels
do not form a monotonically increasing series after shaping, skips the
justifications in that word.
Note: Does not use JSTF font table, on Windows results may be
different from native rendering.
Options:
If justification is found to be needed, eventual ligatures
with newly inserted tatweel glyphs are resolved via a call to
"text_shape(%OPTIONS)" - so any needed
shaping options, such as "language",
may be passed there.
- as_text BOOL = 0
- If set, returns new text with inserted tatweels, or undef if no
justification is possible.
If unset, runs inplace justification on the caller glyph
object, and returns the boolean success flag.
- min_kashida INTEGER = 0
- Specifies minimal width of a kashida strike to be inserted.
- kashida_width INTEGER
- During the calculation a width of a tatweel glyph is needed - unless
supplied by this option, it is calculated dynamically. Also, when called
in list context, and succeeded, returns " 1,
kashida_width " that can be reused in subsequent calls.
- justify_interspace CANVAS, TEXT, WIDTH, %OPTIONS
- Performs inplace inter-letter and/or inter-word justifications of TEXT to
the given WIDTH. Returns either a boolean flag whether there were any
change made, or, new text with explicit space characters inserted.
Options:
- as_text BOOL = 0
- If set, returns new text with inserted spaces, or undef if no
justification is possible.
If unset, runs inplace justification on the caller glyph
object, and returns the boolean success flag.
- letter BOOL = 1
- If set, runs an inter-letter spacing on all glyphs.
- max_interletter FLOAT = 1.05
- When the inter-letter spacing is applied, it is applied first, and can
take up to "$OPTIONS{max_interletter} *
glyph_width" space.
Inter-word spacing does not have such limit, and in worst
case, can produce two words moved to the left and to the right edges of
the enclosing 0 - WIDTH-1 rectangle.
- space_width INTEGER
- "as_text" mode: during the calculation
the width of space glyph may be needed - unless supplied by
$OPTIONS{space_width}, it is calculated
dynamically. Also, when called in list context, and succeeded, returns
" 1, space_width " that can be reused in
subsequent calls.
- word BOOL = 1
- If set, runs an inter-word spacing by extending advances on all space
glyphs.
- justify_tabs CANVAS, TEXT, %OPTIONS
- Expands tabs as $OPTIONS{tabs} (default:8) spaces.
Needs glyph and the advance of the space glyph to replace the
tab glyph. If no $OPTIONS{glyph} and
$OPTIONS{width} are specified, calculates
them.
Returns a boolean flag whether there were any change made. On
success, if called in the list context, returns also space glyph ID and
space glyph width for eventual use on the later calls.
- left_overhang
- First integer from the "overhangs"
result.
- log2vis
- Returns a map of integers where each character position corresponds to a
glyph position. The name is a rudiment from pure fribidi shaping, where
"log2vis" and
"vis2log" were mapper functions with the
same functionality.
- n_clusters
- Calculates how many clusters the object contains.
- new @ARRAYS
- Create new object. Not used directly, but rather from inside
"text_shape" calls.
- new_array NAME
- Creates an array suitable for the object for direct insertion, if manual
construction of the object is needed. F ex one may set missing
"fonts" array like this:
$obj->[ Prima::Drawable::Glyphs::FONTS() ] = $obj->new_array('fonts');
$obj->fonts->[0] = 1;
The newly created array is filled with zeros.
- new_empty
- Creates a new empty object.
- overhangs
- Calculates two pixel widths for overhangs in the beginning and in the end
of the glyph string. This is used in emulation of a
"get_text_width" call with the
"to::AddOverhangs" flag.
- positions
- Read-only accessor to the positions array, see Structure above.
- reorder_text TEXT
- Returns a visual representation of
"TEXT" assuming it was the input of the
"text_shape" call that created the
object.
- reverse
- Creates a new object that has all arrays reversed. User for calculation of
pixel offset from the right end of a glyph string.
- right_overhang
- Second integer from the "overhangs"
result.
- selection2range $CLUSTER_START $CLUSTER_END
- Converts cluster selection range into text selection range
- selection_chunks_clusters, selection_chunks_glyphs $START, $END
- Calculates a set of chunks of texts, that, given a text selection from
positions $START to $END,
represent each either a set of selected and non-selected
clusters/glyphs.
- selection_diff $OLD, $NEW
- Given set of two chunk lists, in format as returned by
"selection_chunks_clusters" or
"selection_chunks_glyphs", calculates
the list of chunks affected by the selection change. Can be used for
efficient repaints when the user interactively changes text selection, to
redraw only the changed regions.
- selection_map_clusters, selection_map_glyphs $START, $END
- Same as "selection_chunks_XXX", but
instead of RLE chunks returns full array for each cluster/glyph, where
each entry is a boolean value corresponding to whether that cluster/glyph
is to be displayed as selected, or not.
- selection_walk $CHUNKS, $FROM, $TO = length, $SUB
- Walks the selection chunks array, returned by
"selection_chunks", between
$FROM and $TO
clusters/glyphs, and for each chunk calls the provided
"$SUB->($offset, $length,
$selected)", where each call contains 2 integers to chunk
offset and length, and a boolean flag whether the chunk is selected or
not.
Can be also used on a result of
"selection_diff", in which case
$selected flag is irrelevant.
- sub_text_out $CANVAS, $FROM, $LENGTH, $X, $Y
- Optimized version of "$CANVAS->text_out(
$self->get_sub($FROM, $LENGTH), $X, $Y )".
- sub_text_wrap $CANVAS, $FROM, $LENGTH, $WIDTH, $OPT, $TABS
- Optimized version of "$CANVAS->text_wrap(
$self->get_sub($FROM, $LENGTH), $WIDTH, $OPT, $TABS )". The
result is also converted to chunks.
- text_length
- Returns the length of the text that was shaped and that produced the
object.
- x2cluster $CANVAS, $X, $FROM, $LENGTH
- Given sub-cluster from $FROM with size
$LENGTH, calculates how many clusters would fit in
width $X.
- _debug
- Dumps glyph object content in a readable format.
This section is only there to test proper rendering
- Latin
- Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
eiusmod tempor incididunt ut labore et dolore magna aliqua.
Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
- Latin combining
- D̍üi̔s͙ a̸u̵t͏eͬ
ịr͡u̍r͜e̥
d͎ǒl̋o̻rͫ i̮n̓
r͐e̔p͊rͨe̾h̍e͐n̔ḋe͠r̕i̾t̅
ịn̷
vͅo̖lͦuͦpͧt̪ątͅe̪
v̰e̷l̳i̯t̽ e̵s̼s̈e̮ ċi̵l͟l͙u͆m͂ d̿o̙lͭo͕r̀e̯ ḛu̅ fͩuͧg̦iͩa̓ť n̜u̼lͩl͠a̒ p̏a̽r̗i͆a͆t̳űr̀
- Cyrillic
- Lorem Ipsum
используют
потому, что
тот
обеспечивает
более или
менее
стандартное
заполнение
шаблона.
а также
реальное
распределение
букв и
пробелов
в абзацах
- Hebrew
- זוהי עובדה
מבוססת
שדעתו של
הקורא תהיה
מוסחת על
ידי טקטס
קריא כאשר
הוא יביט
בפריסתו.
המטרה בשימוש ב-Lorem Ipsum הוא שיש לו פחות או יותר תפוצה של אותיות, בניגוד למלל
- Arabic
- العديد من
برامح
النشر
المكتبي
وبرامح
تحرير
صفحات
الويب
تستخدم
لوريم
إيبسوم
بشكل
إفتراضي
كنموذج عن النص، وإذا قمت بإدخال "lorem ipsum" في أي محرك بحث ستظهر العديد من
- Hindi
- Lorem Ipsum के अंश कई
रूप में
उपलब्ध हैं,
लेकिन
बहुमत को
किसी अन्य
रूप में
परिवर्तन
का सामना
करना पड़ा
है, हास्य
डालना या
क्रमरहित
शब्द ,
जो तनिक भी विश्वसनीय नहीं लग रहे हो. यदि आप Lorem Ipsum के एक अनुच्छेद का उपयोग करने जा रहे हैं, तो आप को यकीन दिला दें कि पाठ के मध्य में वहाँ कुछ भी शर्मनाक छिपा हुआ नहीं है.
- Chinese
- 无可否认,当读者在浏览一个页面的排版时,难免会被可阅读的内容所分散注意力。
Lorem Ipsum的目的就是为了保持字母多多少少标准及平
- Largest well-known grapheme cluster in Unicode
- ཧྐྵྨླྺྼྻྂ
<http://archives.miloush.net/michkap/archive/2010/04/28/10002896.html>.
Dmitry Karasik, <dmitry@karasik.eu.org>.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |