|
|
| |
XML::Fast(3) |
User Contributed Perl Documentation |
XML::Fast(3) |
XML::Fast - Simple and very fast XML - hash conversion
use XML::Fast;
my $hash = xml2hash $xml;
my $hash2 = xml2hash $xml, attr => '.', text => '~';
This module implements simple, state machine based, XML parser written in C.
It could parse and recover some kind of broken XML's. If you need
XML validator, use XML::LibXML
Another similar module is XML::Bare. I've used it for some time, but it have
some failures:
- If your XML have node with TextNode, then CDATANode, then again TextNode,
you'll got broken value
- It doesn't support charsets
- It doesn't support any kind of entities.
So, after count of tries to fix XML::Bare I've decided to write
parser from scratch.
Here is some features and principles:
- It uses minimal count of memory allocations.
- All XML is parsed in 1 scan.
- All values are copied from source XML only once (to destination
keys/values)
- If some types of nodes (for ex comments) are ignored, there are no memory
allocations/copy for them.
I've removed benchmark results, since they are very different for
different xml's. Sometimes XML::Bare is faster, sometimes not. So, XML::Fast
mainly should be considered not
"faster-than-bare", but
"format-other-than-bare"
- order [ = 0 ]
- Not implemented yet. Strictly keep the output order. When
enabled, structures become more complex, but xml could be completely
reverted.
- attr [ = '-' ]
- Attribute prefix
<node attr="test" /> => { node => { -attr => "test" } }
- text [ = '#text' ]
- Key name for storing text
When undef, text nodes will be ignored
<node>text<sub /></node> => { node => { sub => '', '#text' => "test" } }
- join [ = '' ]
- Join separator for text nodes, splitted by subnodes
Ignored when "order" in
effect
# default:
xml2hash( '<item>Test1<sub />Test2</item>' )
: { item => { sub => '', '~' => 'Test1Test2' } };
xml2hash( '<item>Test1<sub />Test2</item>', join => '+' )
: { item => { sub => '', '~' => 'Test1+Test2' } };
- trim [ = 1 ]
- Trim leading and trailing whitespace from text nodes
- cdata [ = undef ]
- When defined, CDATA sections will be stored under this key
# cdata = undef
<node><![CDATA[ test ]]></node> => { node => 'test' }
# cdata = '#'
<node><![CDATA[ test ]]></node> => { node => { '#' => 'test' } }
- comm [ = undef ]
- When defined, comments sections will be stored under this key
When undef, comments will be ignored
# comm = undef
<node><!-- comm --><sub/></node> => { node => { sub => '' } }
# comm = '/'
<node><!-- comm --><sub/></node> => { node => { sub => '', '/' => 'comm' } }
- array => 1
- Force all nodes to be kept as arrays.
# no array
<node><sub/></node> => { node => { sub => '' } }
# array = 1
<node><sub/></node> => { node => [ { sub => [ '' ] } ] }
- array => [ 'node', 'names']
- Force nodes with names to be stored as arrays
# no array
<node><sub/></node> => { node => { sub => '' } }
# array => ['sub']
<node><sub/></node> => { node => { sub => [ '' ] } }
- utf8decode => 1
- Force decoding of utf8 sequences, instead of just upgrading them (may be
useful for broken xml)
- XML::Bare
Another fast parser
- XML::LibXML
The most powerful XML parser for perl. If you don't need to
parse gigabytes of XML ;)
- XML::Hash::LX
XML parser, that uses XML::LibXML for parsing and then
constructs hash structure, identical to one, generated by this module.
(At least, it should ;)). But of course it is much more slower, than
XML::Fast
- •
- Does not support wide charsets (UTF-16/32) (see RT71534
<https://rt.cpan.org/Ticket/Display.html?id=71534>)
- Ordered mode (as implemented in XML::Hash::LX)
- Create hash2xml, identical to one in XML::Hash::LX
- Partial content event-based parsing (I need this for reading XML
streams)
Patches, propositions and bug reports are welcome ;)
Mons Anderson, <mons@cpan.org>
Copyright (C) 2010 Mons Anderson
This library is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |