![]() |
![]()
| ![]() |
![]()
NAMEXML::Compile::Schema - Compile a schema into CODEINHERITANCEXML::Compile::Schema is a XML::Compile XML::Compile::Schema is extended by XML::Compile::Cache SYNOPSIS# compile tree yourself my $parser = XML::LibXML->new; my $tree = $parser->parse...(...); my $schema = XML::Compile::Schema->new($tree); # get schema from string my $schema = XML::Compile::Schema->new($xml_string); # get schema from file (most used) my $schema = XML::Compile::Schema->new($filename); my $schema = XML::Compile::Schema->new([glob "*.xsd"]); # the "::Cache" extension has more power my $schema = XML::Compile::Cache->new(\@xsdfiles); # adding more schemas, from parsed XML $schema->addSchemas($tree); # adding more schemas from files # three times the same: well-known url, filename in schemadir, url # Just as example: usually not needed. $schema->importDefinitions('http://www.w3.org/2001/XMLSchema'); $schema->importDefinitions('2001-XMLSchema.xsd'); $schema->importDefinitions(SCHEMA2001); # from ::Util # alternatively my @specs = ('one.xsd', 'two.xsd', $schema_as_string); my $schema = XML::Compile::Schema->new(\@specs); # ARRAY! # see what types are defined $schema->printIndex; # create and use a reader use XML::Compile::Util qw/pack_type/; my $elem = pack_type 'my-namespace', 'my-local-name'; # $elem eq "{my-namespace}my-local-name" my $read = $schema->compile(READER => $elem); my $data = $read->($xmlnode); my $data = $read->("filename.xml"); # when you do not know the element type beforehand use XML::Compile::Util qw/type_of_node/; my $elem = type_of_node $xml->documentElement; my $reader = $reader_cache{$type} # either exists ||= $schema->compile(READER => $elem); # or create my $data = $reader->($xmlmsg); # create and use a writer my $doc = XML::LibXML::Document->new('1.0', 'UTF-8'); my $write = $schema->compile(WRITER => '{myns}mytype'); my $xml = $write->($doc, $hash); $doc->setDocumentElement($xml); # show result print $doc->toString(1); # to create the type nicely use XML::Compile::Util qw/pack_type/; my $type = pack_type 'myns', 'mytype'; print $type; # shows {myns}mytype # using a compiled routines cache use XML::Compile::Cache; # separate distribution my $schema = XML::Compile::Cache->new(...); # Show which data-structure is expected print $schema->template(PERL => $type); # Error handling tricks with Log::Report use Log::Report mode => 'DEBUG'; # enable debugging dispatcher SYSLOG => 'syslog'; # errors to syslog as well try { $reader->($data) }; # catch errors in $@ DESCRIPTIONThis module collects knowledge about one or more schemas. The most important method provided is compile(), which can create XML file readers and writers based on the schema information and some selected element or attribute type.Various implementations use the translator, and more can be added later:
Be warned that the schema is not validated; you can develop schemas which do work well with this module, but are not valid according to W3C. In many cases, however, the translater will refuse to accept mistakes: mainly because it cannot produce valid code. The values (both for reading as for writing) are strictly validated. However, the reader is sloppy with unexpected attributes, and many other things: that's too expensive to check. Extends "DESCRIPTION" in XML::Compile. METHODSExtends "METHODS" in XML::Compile.ConstructorsExtends "Constructors" in XML::Compile.
AccessorsExtends "Accessors" in XML::Compile.
CompilersExtends "Compilers" in XML::Compile.
AdministrationExtends "Administration" in XML::Compile.
example: of use of importDefinitions my $schema = XML::Compile::Schema->new; $schema->importDefinitions('my-spec.xsd'); my $other = "<schema>...</schema>"; # use 'HERE' documents! my @specs = ('my-spec.xsd', 'types.xsd', $other); $schema->importDefinitions(\@specs, @options);
DETAILSExtends "DETAILS" in XML::Compile.Distribution collection overviewExtends "Distribution collection overview" in XML::Compile.ComparisonExtends "Comparison" in XML::Compile.Collecting definitionsWhen starting an application, you will need to read the schema definitions. This is done by instantiating an object via XML::Compile::Schema::new() or XML::Compile::WSDL11::new(). The WSDL11 object has a schema object internally.Schemas may contains "import" and "include" statements, which specify other resources for definitions. In the idea of the XML design team, those files should be retrieved automatically via an internet connection from the "schemaLocation". However, this is a bad concept; in XML::Compile modules you will have to explicitly provide filenames on local disk using importDefinitions() or XML::Compile::WSDL11::addWSDL(). There are various reasons why I, the author of this module, think the dynamic automatic internet imports are a bad idea. First: you do not always have a working internet connection (travelling with a laptop in a train). Your implementation should work the same way under all environmental circumstances! Besides, I do not trust remote files on my system, without inspecting them. Most important: I want to run my regression tests before using a new version of the definitions, so I do not want to have a remote server change the agreements without my knowledge. So: before you start, you will need to scan (recursively) the initial schema or wsdl file for "import" and "include" statements, and collect all these files from their "schemaLocation" into files on local disk. In your program, call importDefinitions() on all of them -in any order- before you call compile(). Organizing your definitions One nice feature to help you organize (especially useful when you package your code in a distribution), is to add these lines to the beginning of your code: package My::Package; XML::Compile->addSchemaDirs(__FILE__); XML::Compile->knownNamespace('http://myns' => 'myns.xsd', ...); Now, if the package file is located at "SomeThing/My/Package.pm", the definion of the namespace should be kept in "SomeThing/My/Package/xsd/myns.xsd". Somewhere in your program, you have to load these definitions: # absolute or relative path is always possible $schema->importDefinitions('SomeThing/My/Package/xsd/myns.xsd'); # relative search path extended by addSchemaDirs $schema->importDefinitions('myns.xsd'); # knownNamespace improves abstraction $schema->importDefinitions('http://myns'); Very probably, the namespace is already in some variable: use XML::Compile::Schema; use XML::Compile::Util 'pack_type'; my $myns = 'http://some-very-long-uri'; my $schema = XML::Compile::Schema->new($myns); my $mytype = pack_type $myns, $myelement; my $reader = $schema->compileClient(READER => $mytype); Addressing componentsNormally, external users can only address elements within a schema, and types are hidden to be used by other schemas only. For this reason, it is permitted to create an element and a type with the same name.The compiler requires a starting-point. This can either be an element name or an element's id. The format of the element name is "{namespace-uri}localname", for instance {http://library}book You may also start with http://www.w3.org/2001/XMLSchema#float as long as this ID refers to a top-level element, not a type. When you use a schema without "targetNamespace" (which is bad practice, but sometimes people really do not understand the beneficial aspects of the use of namespaces) then the elements can be addressed as "{}name" or simple "name". Representing data-structuresThe code will do its best to produce a correct translation. For instance, an accidental 1.9999 will be converted into 2 when the schema says that the field is an "int". It will also strip superfluous blanks when the data-type permits. Especially watch-out for the "Integer" types, which produce Math::BigInt objects unless compile(sloppy_integers) is used.Elements can be complex, and themselve contain elements which are complex. In the Perl representation of the data, this will be shown as nested hashes with the same structure as the XML. You should not take tare of character encodings, whereas XML::LibXML is doing that for us: you shall not escape characters like "<" yourself. The schemas define kinds of data types. There are various ways to define them (with restrictions and extensions), but for the resulting data structure is that knowledge not important. simpleType A single value. A lot of single value data-types are built-in (see XML::Compile::Schema::BuiltInTypes). Simple types may have range limiting restrictions (facets), which will be checked by default. Types may also have some white-space behavior, for instance blanks are stripped from integers: before, after, but also inside the number representing string. Note that some of the reader hooks will alter the single value of these elements into a HASH like used for the complexType/simpleContent (next paragraph), to be able to return some extra collected information. . Example: typical simpleType In XML, it looks like this: <test1>42</test1> In the HASH structure, the data will be represented as test1 => 42 With reader hook "after => 'XML_NODE'" hook applied, it will become test1 => { _ => 42 , _XML_NODE => $obj } complexType/simpleContent In this case, the single value container may have attributes. The number of attributes can be endless, and the value is only one. This value has no name, and therefore gets a predefined name "_". When passed to the writer, you may specify a single value (not the whole HASH) when no attributes are used. . typical simpleContent example In XML, this looks like this: <test2 question="everything">42</test2> As a HASH, this shows as test2 => { _ => 42 , question => 'everything' } When specified in the writer, when no attributes are need, you can use either form: test3 => { _ => 7 } test3 => 7 complexType and complexType/complexContent These containers not only have attributes, but also multiple values as content. The "complexContent" is used to create inheritance structures in the data-type definition. This does not affect the XML data package itself. . Example: typical complexType element The XML could look like: <test3 question="everything" by="mouse"> <answer>42</answer> <when>5 billion BC</when> </test3> Represented as HASH, this looks like test3 => { question => 'everything' , by => 'mouse' , answer => 42 , when => '5 billion BC' } Manually produced XML NODE For a WRITER, you may also specify a XML::LibXML::Node anywhere. test1 => $doc->createTextNode('42'); test3 => $doc->createElement('ariba'); This data-structure is used without validation, so you are fully on your own with this one. Typically, nodes are produced by hooks to implement work-arounds. Occurence A second factor which determines the data-structure is the element occurrence. Usually, elements have to appear once and exactly once on a certain location in the XML data structure. This order is automatically produced by this module. But elements may appear multiple times.
Default Values [added in v0.91] With compile(default_values) you can control how much information about default values defined by the schema will be passed into your program. The choices, available for both READER and WRITER, are:
. Example: use of default_values EXTEND Let us process a schema using the schema schema. A schema file can contain lines like this: <element minOccurs="0" ref="myelem"/> In mode "EXTEND" (the READER default), this gets translated into: element => { ref => 'myelem', maxOccurs => 1 , minOccurs => 0, nillable => 0 }; With "EXTEND" in the READER, all schema information is used to provide a complete overview of available information. Your code does not need to check whether the attributes were available or not: attributes with defaults or fixed values are automatically added. Again mode "EXTEND", now for the writer: element => { ref => 'myelem', minOccurs => 0 }; <element minOccurs="0" maxOccurs="1" ref="myelem" nillable="0"/> . Example: use of default_values IGNORE With option "default_values" set to "IGNORE" (the WRITER default), you would get element => { ref => 'myelem', maxOccurs => 1, minOccurs => 0 } <element minOccurs="0" maxOccurs="1" ref="myelem"/> The same in both translation directions. The nillable attribute is not used, so will not be shown by the READER. The writer does not try to be smart, so does not add the nillable default. . Example: use of default_values MINIMAL With option "default_values" set to "MINIMAL", the READER would do this: <element minOccurs="0" maxOccurs="1" ref="myelem"/> element => { ref => 'myelem', minOccurs => 0 } The maxOccurs default is "1", so will not be included, minimalizing the size of the HASH. For the WRITER: element => { ref => 'myelem', minOccurs => 0, nillable => 0 } <element minOccurs="0" ref="myelem"/> because the default value for nillable is '0', it will not show as attribute value. Repetative blocks Particle blocks come in four shapes: "sequence", "choice", "all", and "group" (an indirect block). This also affects "substitutionGroups". repetative sequence, choice, all In situations like this: <element name="example"> <complexType> <sequence> <element name="a" type="int" /> <sequence> <element name="b" type="int" /> </sequence> <element name="c" type="int" /> </sequence> </complexType> </element> (yes, schemas are verbose) the data structure is <example> <a>1</a> <b>2</b> <c>3</c> </example> the Perl representation is flattened, into example => { a => 1, b => 2, c => 3 } Ok, this is very simple. However, schemas can use repetition: <element name="example"> <complexType> <sequence> <element name="a" type="int" /> <sequence minOccurs="0" maxOccurs="unbounded"> <element name="b" type="int" /> </sequence> <element name="c" type="int" /> </sequence> </complexType> </element> The XML message may be: <example> <a>1</a> <b>2</b> <b>3</b> <b>4</b> <c>5</c> </example> Now, the perl representation needs to produce an array of the data in the repeated block. This array needs to have a name, because more of these blocks may appear together in a construct. The name of the block is derived from the type of block and the name of the first element in the block, regardless whether that element is present in the data or not. So, our example data is translated into (and vice versa) example => { a => 1 , seq_b => [ {b => 2}, {b => 3}, {b => 4} ] , c => 5 } The following label is used, based on the name of the first element (say "xyz") as defined in the schema (not in the actual message): seq_xyz sequence with maxOccurs > 1 cho_xyz choice with maxOccurs > 1 all_xyz all with maxOccurs > 1 When you have compile(key_rewrite) option PREFIXED, and you have explicitly assigned the prefix "xs" to the schema namespace (See compile(prefixes)), then those names will respectively be "seq_xs_xyz", "cho_xs_xyz", "all_xs_xyz". . Example: always an array with maxOccurs larger than 1 Even when there is only one element found, it will be returned as ARRAY (of one element). Therefore, you can write my $data = $reader->($xml); foreach my $a ( @{$data->{a}} ) {...} . Example: blocks with maxOccurs larger than 1 In the schema: <sequence maxOccurs="5"> <element name="a" type="int" /> <element name="b" type="int" /> </sequence> In the XML message: <a>15</a><b>16</b><a>17</a><b>18</b> In Perl representation: seq_a => [ {a => 15, b => 16}, {a => 17, b => 18} ] repetative groups [behavioral change in 0.93] In contrast to the normal partical blocks, as described above, do the groups have names. In this case, we do not need to take the name of the first element, but can use the group name. It will still have "gr_" appended, because groups can have the same name as an element or a type(!) Blocks within the group definition cannot be repeated. . Example: groups with maxOccurs larger than 1 <element name="top"> <complexType> <sequence> <group ref="ns:xyz" maxOccurs="unbounded"> </sequence> </complexType> </element> <group name="xyz"> <sequence> <element name="a" type="int" /> <element name="b" type="int" /> </sequence> </group> translates into gr_xyz => [ {a => 42, b => 43}, {a => 44, b => 45} ] repetative substitutionGroups For substitutionGroups which are repeating, the name of the base element is used (the element which has attribute "<abstract="true"">. We do need this array, because the order of the elements within the group may be important; we cannot group the elements based to the extended element's name. In an example substitutionGroup, the Perl representation will be something like this: base-element-name => [ { extension-name => $data1 } , { other-extension => $data2 } ] Each HASH has only one key. . Example: with a list of ints <test5>3 8 12</test5> as Perl structure: test5 => [3, 8, 12] . Example: substitutionGroup <xs:element name="price" type="xs:int" abstract="true" /> <xs:element name="euro" type="xs:int" substitutionGroup="price" /> <xs:element name="dollar" type="xs:int" substitutionGroup="price" /> <xs:element name="product"> <xs:complexType> <xs:element name="name" type="xs:string" /> <xs:element ref="price" /> </xs:complexType> </xs:element> Now, valid XML data is <product> <name>Ball</name> <euro>12</euro> </product> and <product> <name>Ball</name> <dollar>6</dollar> </product> The HASH repesentation is respectively product => {name => 'Ball', euro => 12} product => {name => 'Ball', dollar => 6} . Example: of HOOKs: my $hook = { type => '{my_ns}my_type' , before => sub { ... } , action => 'WRITER' }; my $hook = { path => qr/\(volume\)/ , replace => 'SKIP' , action => 'READER' }; # path contains "volume" or id is 'aap' or id is 'noot' my $hook = { path => qr/\bvolume\b/ , id => [ 'aap', 'noot' ] , before => [ sub {...}, sub { ... } ] , after => sub { ... } }; . Example: use of the type selector type => 'int' type => '{http://www.w3.org/2000/10/XMLSchema}int' type => qr/\}xml_/ # type start with xml_ type => [ qw/int float/ ]; use XML::Compile::Util qw/pack_type SCHEMA2000/; type => pack_type(SCHEMA2000, 'int') # with XML::Compile::Cache $schema->addPrefixes(xsd => SCHEMA2000); type => 'xsd:int' . Example: type hook with XML::Compile::Cache use XML::Compile::Util qw/SCHEMA2001/; my $schemas = XML::Compile::Cache->new(...); $schemas->addPrefixes(xsd => SCHEMA2001, mine => 'http://somens'); $schemas->addHook(type => 'xsd:int', ...); $schemas->addHook(type => 'mine:sometype', ...); . Example: use of the ID selector # default schema types have id's with same name id => 'ABC' id => 'http://www.w3.org/2001/XMLSchema#int' id => qr/\#xml_/ # id which start with xml_ id => [ qw/ABC fgh/ ]; use XML::Compile::Util qw/pack_id SCHEMA2001/; id => pack_id(SCHEMA2001, 'ABC') . Example: anyAttribute in a READER Say your schema looks like this: <schema targetNamespace="http://mine" xmlns:me="http://mine" ...> <element name="el"> <complexType> <attribute name="a" type="xs:int" /> <anyAttribute namespace="##targetNamespace" processContents="lax"> </complexType> </element> <simpleType name="non-empty"> <restriction base="NCName" /> </simpleType> </schema> Then, in an application, you write: my $r = $schema->compile ( READER => pack_type('http://mine', 'el') , anyAttribute => 'ALL' ); # or lazy: READER => '{http://mine}el' my $h = $r->( <<'__XML' ); <el xmlns:me="http://mine"> <a>42</a> <b type="me:non-empty"> everything </b> </el> __XML use Data::Dumper 'Dumper'; print Dumper $h; __XML__ The output is something like $VAR1 = { a => 42 , '{http://mine}a' => ... # XML::LibXML::Node with <a>42</a> , '{http://mine}b' => ... # XML::LibXML::Node with <b>everything</b> }; You can improve the reader with a callback. When you know that the extra attribute is always of type "non-empty", then you can do my $read = $schema->compile ( READER => '{http://mine}el' , anyAttribute => \&filter ); my $anyAttRead = $schema->compile ( READER => '{http://mine}non-empty' ); sub filter($$$$) { my ($fqn, $xml, $path, $translator) = @_; return () if $fqn ne '{http://mine}b'; (b => $anyAttRead->($xml)); } my $h = $r->( see above ); print Dumper $h; Which will result in $VAR1 = { a => 42 , b => 'everything' }; The filter will be called twice, but return nothing in the first case. You can implement any kind of complex processing in the filter. . Example: to trace the paths $schema->addHook ( action => 'READER' , path => qr/./ , before => 'PRINT_PATH' ); . Example: specify anyAttribute use XML::Compile::Util qw/pack_type/; my $attr = $doc->createAttributeNS($somens, $sometype, 42); my $h = { a => 12 # normal element or attribute , "{$somens}$sometype" => $attr # anyAttribute , pack_type($somens, $mytype) => $attr # nicer , "$prefix:$sometype" => $attr # [1.28] }; . Example: before hook on user-provided HASH. sub beforeOnComplex($$$$) { my ($doc, $values, $path, $fulltype) = @_; my %copy = %$values; $copy{extra} = 42; delete $copy{superfluous}; $copy{count} =~ s/\D//g; # only digits \%copy; } . Example: before hook on simpleType data sub beforeOnSimple($$$$) { my ($doc, $value, $path, $fulltype) = @_; $value * 100; # convert euro to euro-cents } . Example: before hook with object for complexType sub beforeOnObject($$$$) { my ($doc, $obj, $path, $fulltype) = @_; +{ name => $obj->name , price => $obj->euro , currency => 'EUR' }; } . Example: replace hook sub replace($$$$$) { my ($doc, $values, $path, $tag, $r, $fulltype) = @_ my $node = $doc->createElement($tag); $node->appendText($values->{text}); $node; } . Example: add an extra sibbling after the usual process sub after($$$$) { my ($doc, $node, $path, $values, $fulltype) = @_; my $child = $doc->createAttributeNS($myns, earth => 42); $node->addChild($child); $node; } . Example: creating nodes with text { my $text; sub before($$$) { my ($doc, $values, $path) = @_; my %copy = %$values; $text = delete $copy{text}; \%copy; } sub after($$$) { my ($doc, $node, $path) = @_; $node->addChild($doc->createTextNode($text)); $node; } $schema->addHook ( action => 'WRITER' , type => 'mixed' , before => \&before , after => \&after ); } List type List simpleType objects are also represented as ARRAY, like elements with a minOccurs or maxOccurs unequal 1. Using substitutionGroup constructs A substitution group is kind-of choice between alternative (complex) types. However, in this case roles have reversed: instead a "choice" which lists the alternatives, here the alternative elements register themselves as valid for an abstract (head) element. All alternatives should be extensions of the head element's type, but there is no way to check that. Wildcards via any and anyAttribute The "any" and "anyAttribute" elements are referred to as "wildcards": they specify (huge, generic) groups of elements and attributes which are accepted, instead of being explicit. The author of this module advices against the use of wildcards in schemas: the purpose of schemas is to be explicit about the message in the interface, and that basic idea is simply thrown away by these wildcards. Let people cleanly extend the schema with inheritance! There is always a substitutionGroup alternative possible. Because wildcards are not explicit about the types to expect, the "XML::Compile" module can not prepare for them at run-time. You need to go read the documentation and do some tricky manual work to get it to work. Read about the processing of wildcards in the manual page for each of the back-ends (XML::Compile::Translate::Reader, XML::Compile::Translate::Writer, ...). ComplexType with "mixed" attribute [largely improved in 0.86, reader only] ComplexType and ComplexContent components can be declared with the "<mixed="true""> attribute. This implies that text is not limited to the content of containers, but may also be used inbetween elements. Usually, you will only find ignorable white-space between elements. In this example, the "a" container is marked to be mixed: <a> before <b>2</b> after </a> Each back-end has its own way of handling mixed elements. The compile(mixed_elements) currently only modifies the reader's behavior; the writer's capabilities are limited. See XML::Compile::Translate::Reader. hexBinary and base64Binary These are used to include images and such in an XML message. Usually, they are quite large with respect to the other elements. When you use SOAP, you may wish to use XML::Compile::XOP instead. The element values which you need to pass for fields of these types is a binary BLOB, something Perl does not have. So, it is a string containing binary data but not specially marked that way. If you need to store an integer in such a binary field, you first have to promote it into a BLOB (string) like this { color => pack('N', $i) } # writer my $i = unpack('N', $d->{color}); # reader Module Geo::KML implemented a nice hook to avoid the explicit need for this "pack" and "unpack". The KML schema designers liked colors to be written as "ffc0c0c0" and abused "hexBinary" for that purpose. The "colorType" fields in KML are treated as binary, but just represent an int. Have a look in that Geo::KML code if your schema has some of those tricks. Only available in Backpan, withdrawn from CPAN. Schema hooksYou can use hooks, for instance, to block processing parts of the message, to create work-arounds for schema bugs, or to extract more information during the process than done by default.Defining hooks Multiple hooks can active during the compilation process of a type, when "compile()" is called. During Schema translation, each of the hooks is checked for all types which are processed. When multiple hooks select the object to get a modified behavior, then all are evaluated in order of definition. Defining a global hook (where HOOKDATA is the LIST of PAIRS with hook parameters, and HOOK a HASH with such HOOKDATA): my $schema = XML::Compile::Schema->new ( ... , hook => HOOK , hooks => [ HOOK, HOOK ] ); $schema->addHook(HOOKDATA | HOOK); $schema->addHooks(HOOK, HOOK, ...); my $wsdl = XML::Compile::WSDL->new(...); $wsdl->addHook(HOOKDATA | HOOK); local hooks are only used for one reader or writer. They are evaluated before the global hooks. my $reader = $schema->compile(READER => $type , hook => HOOK, hooks => [ HOOK, HOOK, ...]); General syntax Each hook has three kinds of parameters:
Selectors define the schema component of which the processing is modified. When one of the selectors matches, the processing information for the hook is used. When no selector is specified, then the hook will be used on all elements. Available selectors (see below for details on each of them):
As argument, you can specify one element as STRING, a regular expression to select multiple elements, or an ARRAY of STRINGs and REGEXes. Next to where the hook is placed, we need to known what to do in the case: the hook contains processing information. When more than one hook matches, then all of these processors are called in order of hook definition. However, first the compile hooks are taken, and then the global hooks. How the processing works exactly depends on the compiler back-end. There are major differences. Each of those manual-pages lists the specifics. The label tells us when the processing is initiated. Available labels are "before", "replace", and "after". Hooks on matching types The "type" selector specifies a complexType of simpleType by name. Best is to base the selection on the full name, like "{ns}type", which will avoid all kinds of name-space conflicts in the future. However, you may also specify only the "local type" (in any name-space). Any REGEX will be matched to the full type name. Be careful with the pattern archors. If you use XML::Compile::Cache [release 0.90], then you can use "prefix:type" as type specification as well. You have to explicitly define prefix to namespace beforehand. Hooks on extended type [1.48] This hook will match all elements which use a type which is equal or based on the given type. In the schema, you will find extension and restriction constructs. You may only pass a single full type (no arrays of types or local names) per 'extend' hook. Using a hooks on extended types is quite expensive for the compiler. example: $schemas->addHook(extends => "{ns}local", ...); $schemas->addHook(extends => 'mine:sometype', ...); # need ::Cache Hooks on matching ids Matching based on IDs can reach more schema elements: some types are anonymous but still have an ID. Best is to base selection on the full ID name, like "ns#id", to avoid all kinds of name-space conflicts in the future. Hooks on matching paths When you see error messages, you always see some representation of the path where the problem was discovered. You can use this path as selector, when you know what it is... BE WARNED, that the current structure of the path is not really consequent hence will be improved in one of the future releases, breaking backwards compatibility. TypemapsOften, XML will be used in object oriented programs, where the facts which are transported in the XML message are attributes of Perl objects. Of course, you can always collect the data from each of the Objects into the required (huge) HASH manually, before triggering the reader or writer. As alternative, you can connect types in the XML schema with Perl objects and classes, which results in cleaner code.You can also specify typemaps with new(typemap), addTypemaps(), and compile(typemap). Each type will only refer to the last map for that type. When an "undef" is given for a type, then the older definition will be cancelled. Examples of the three ways to specify typemaps: my %map = ($x1 => $p1, $x2 => $p2); my $schema = XML::Compile::Schema->new(...., typemap => \%map); $schema->addTypemaps($x3 => $p3, $x4 => $p4, $x1 => undef); my $call = $schema->compile(READER => $type, typemap => \%map); The latter only has effect for the type being compiled. The definitions are cumulative. In the second example, the $x1 gets disabled. Objects can come in two shapes: either they do support the connection with XML::Compile (implementing two methods with predefined names), or they don't, in which case you will need to write a little wrapper. use XML::Compile::Util qw/pack_type/; my $t1 = pack_type $myns, $mylocal; $schema->typemap($t1 => 'My::Perl::Class'); $schema->typemap($t1 => $some_object); $schema->typemap($t1 => sub { ... }); The implementation of the READER and WRITER differs. In the READER case, the typemap is implemented as an 'after' hook which calls a "fromXML" method. The WRITER is a 'before' hook which calls a "toXML" method. See respectively the XML::Compile::Translate::Reader and XML::Compile::Translate::Writer. Private variables in objects When you design a new object, it is possible to store the information exactly like the corresponding XML type definition. The only thing the "fromXML" has to do, is bless the data-structure into its class: $schema->typemap($xmltype => 'My::Perl::Class'); package My::Perl::Class; sub fromXML { bless $_[1], $_[0] } # for READER sub toXML { $_[0] } # for WRITER However... the object may also need so need some private variables. If you store them in the same HASH for your object, you will get "unused tags" warnings from the writer. To avoid that, choose one of the following alternatives: # never complain about unused tags ::Schema->new(..., ignore_unused_tags => 1); # only complain about unused tags not matching regexp my $not_for_xml = qr/^[A-Z]/; # my XML only has lower-case ::Schema->new(..., ignore_unused_tags => $not_for_xml); # only for one compiled WRITER (not used with READER) ::Schema->compile(..., ignore_unused_tags => 1); ::Schema->compile(..., ignore_unused_tags => $not_for_xml); Typemap limitations There are some things you need to know:
Handling xsi:type[1.10] The "xsi:type" is an old-fashioned mechanism, and should be avoided! In this case, the schema does tell you that a certain element has a certrain type, but at run-time(!) that is changed. When an XML element has a "xsi:type" attribute, it tells you simply to have an extension of the original type. This whole mechanism does bite the "compilation" idea of XML::Compile... however with some help, it will work.To make "xsi:type" work at run-time, you have to pass a table of which types you expect at compile-time. Example: my %xsi_type_table = ( $base_type1 => [ $ext1_of_type1, $ext2_of_type2 ] , $base_type2 => [ $ext1_of_type2 ] ); my $r = $schema->compile(READER => $type , xsi_type => \%xsi_type_table ); When your schema is an XML::Compile::Cache (version at least 0.93), your types look like "prefix:local". With a plain XML::Compile::Schema, they will look like "{namespace}local", typically produced with XML::Compile::Util::pack_type(). When used in a reader, the resulting data-set will contain a "XSI_TYPE" key inbetween the facts which were taken from the element. The type is is long syntax "{$ns}$type". See XML::Compile::Util::unpack_type() With the writer, you have to provide such an "XSI_TYPE" value or the element's base type will be used (and no "xsi:type" attribute created). This will probably cause warnings about unused tags. The type can be provided in full (see XML::Compile::Util::pack_type()) or [1.31] prefixed. [1.25] then the value is not an ARRAY, but only the keyword "AUTO", the parser will try to auto-detect all types which are valid alternatives. This currently only works for non-builtin types. The auto-detection might be slow and (because many schemas are broken) not produce a complete list. When debugging is enabled ("use Log::Report mode => 3;") you will see to which list this AUTO gets expanded. xsi_type => { $base_type => 'AUTO' } # requires X::C v1.25 XML::Compile::Cache (since v1.01) makes using "xsi:type" easier. When you have a ::Cache based object (for instance a XML::Compile::WSDL11) you can simply say $wsdl->addXsiType( $base_type => 'AUTO' ) Now, you do not need to pass the xsi table to each compilation call. Key rewrite[improved with release 1.10] The standard practice is to use the localName of the XML elements as key in the Perl HASH; the key rewrite mechanism is used to change that, sometimes to separate elements which have the same localName within different name-spaces, or when an element and an attribute share a name (key rewrite is applied to elements AND attributes) in other cases just for fun or convenience.Rewrite rules are interpreted at "compile-time", which means that they do not slow-down the XML construction or deconstruction. The rules work the same for readers and writers, because they are applied to name found in the schema. Key rewrite rules can be set during schema object initiation with new(key_rewrite) and to an existing schema object with addKeyRewrite(). These rules will be used in all calls to compile(). Next, you can use compile(key_rewrite) to add rules which are only used for a single compilation. These are applied before the global rules. All rules will always be attempted, and the rulle will me applied to the result of the previous change. The last defined rewrite rules will be applied first, with one major exception: the "PREFIXED" rules will be executed before any other rule. key_rewrite via table When a HASH is provided as rule, then the XML element name is looked-up. If found, the value is used as translated key. First full name of the element is tried, and then the localName of the element. The full name can be created with XML::Compile::Util::pack_type() or by hand: use XML::Compile::Util qw/pack_type/; my %table = ( pack_type($myns, 'el1') => 'nice_name1' , "{$myns}el2" => 'alsoNice' , el3 => 'in any namespace' ); $schema->addKeyRewrite( \%table ); Rewrite via function When a CODE reference is provided, it will get called for each key which is found in the schema. Passed are the name-space of the element and its local-name. Returned is the key, which may be the local-name or something else. For instance, some people use capitals in element names and personally I do not like them: sub dont_like_capitals($$) { my ($ns, $local) = @_; lc $local; } $schema->addKeyRewrite( \&dont_like_capitals ); for short: my $schema = XML::Compile::Schema->new( ..., key_rewrite => sub { lc $_[1] } ); key_rewrite when localNames collide Let's start with an apology: we cannot auto-detect when these rewrite rules are needed, because the colliding keys are within the same HASH, but the processing is fragmented over various (sequence) blocks: the parser does not have the overview on which keys of the HASH are used for which elements. The problem occurs when one complex type or substitutionGroup contains multiple elements with the same localName, but from different name-spaces. In the perl representation of the data, the name-spaces get ignored (to make the programmer's life simple) but that may cause these nasty conflicts. Rewrite for convenience In XML, we often see names like "my-elem-name", which in Perl would be accessed as $h->{'my-elem-name'} In this case, you cannot leave-out the quotes in your perl code, which is quite inconvenient, because only 'barewords' can be used as keys unquoted. When you use option "key_rewrite" for compile() or new(), you could decide to map dashes onto underscores. key_rewrite => sub { my ($ns, $local) = @_; $local =~ s/\-/_/g; $local } key_rewrite => sub { $_[1] =~ s/\-/_/g; $_[1] } then "my-elem-name" in XML will get mapped onto "my_elem_name" in Perl, both in the READER as the WRITER. Be warned that the substitute command returns the success, not the modified value! Pre-defined key_rewrite rules
SEE ALSOThis module is part of XML-Compile distribution version 1.63, built on July 02, 2019. Website: http://perl.overmeer.net/xml-compile/LICENSECopyrights 2006-2019 by [Mark Overmeer <markov@cpan.org>]. For other contributors see ChangeLog.This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://dev.perl.org/licenses/
|