NAME

xmlrewrite - cleanup XML based on schemas

SYNOPSIS

 # EXPERIMENTAL!
 xmlrewrite infile.xml schema-files >outfile.xml
 xmlrewrite -x infile.xml -s schema-files -o outfile.xml
 cat x.xml | xmlrewrite - schemas/*.xsd | lpr
 xmlrewrite -p schema2001 file.xml
 xmlrewrite --repair in.xml --xmlns =http://somens

DESCRIPTION

Convert an XML message into an XML with the same information. A schema is required to enforce the correct information: for instance whitespace removal is only allowed when the type definition permits it.

The command has TWO MODES:

--transform (default): The input message is processed as is, and then some transformations are made on that message. All options will be used to change the output.
--repair: The input message is corrupted. First, it will be read. Then the options will be used to change the input XML. After this, no transformations will be applied: the corrected message is written. Many xml generators (especially schema editing tools) are broken: their results can be fixed in this mode.

CURRENT LIMITATIONS

This is the first release of rewrite. It still lacks most of the more interesting features which I have in mind. There are also a few real limitiations in the current version:

Comments and processing instructions are lost
The result is not seriously tested

Message options

You can either specify an XML message filename and one or more schema filenames as arguments, or use the options.

--repair | --transform

The execution mode. The effect of many options will change according to the mode: be careful.

--xml|-x filename

The file which contains the xml message. A single dash (-) means "stdin".

--plugin|-p <pre-defined or CLASS>

Plugins add transformations to the available options. You can either specify a pre-defined name, or the name of a CLASS which will be loaded and then used. The CLASS must extend XML::Rewrite.

Pre-defined plugins:

  schema2001       Schema version 2001

These plugins will load the required schema's automatically, so you only need to provide your own.

  xmlrewrite -p schema2001 myschema.xsd >clean.xsd

--schema|-s filename(s)

This option can be repeated, or the filenames separated by comma's, if you have more than one schema file to parse. All imported and included schema components have to be provided explicitly, except schema-2001 which is always loaded.

--type|-t TYPE

The type of the root element, required if the XML is not namespaceo qualified, although the schema is. If not specified, the root element is automatically inspected.

The TYPE notation is "{namespace}localname". Be warned to use quoting on the UNIX command-line, because curly braces have a special meaning for the shell.

--output|-o filename

By default (or when the filename is a '-'), the output is printed to stdout.

--blanks-before all|containers|none

Put a blank line before (the comments before) each element, only containers (element with childs) or never.

Namespaces

--xmlns PREFIX=NAMESPACE[,PREFIX=NS]

PREFIX and NAMESPACE combination, to be used in the output. You may use this option more than once, and seperate a few definitions in one string with commas.

   abc=http://myns  # prefix abc
   =http://myns     # default namespace

General Options

--encoding|-c <character-set> The character-set of the document. If not specified and not in the source document, then "UTF-8" will be used. It is not possible to fix erroneous encoding information while reading.
--version|-v <string> Overrule the XML version indicator of the document. If not specified and not in the source document, then 1.0 will be used.
--compress -1|0..8 Set output compression level. A value of -1 means that there should be no compression. By default, the compression level of the input document is used.
--standalone | --no-standalone If specified, it will overrule the value found in the source document. If not provided, the value from the source document will be used, but only when present.

Change Options

--rm-elements NAME[,NAME] Remove all appearances of the NAMEd (name-space qualified) elements. This option can appear more than once.

  --rm-elements xs:annotation
  --rm-elements '{http://myns}mytype'

--comments | --no-comments

Controls whether to keep or remove comments. Comments are interpreted as being related to the element which follow them. Comments at the end of blocks will also relate to the last element before it.

Schema Change Options

Behavior is different between repair mode and transformation mode.

--element-form qualified|unqualified
--attribute-form qualified|unqualified
--target-ns URI Change the target namespace of the output schema. All elements and types defined by the schema are in this name-space from now on.
--annotations | --no-annotations
--default-values extend|ignore|minimal: The default is "ignore", which means that the output message will not add or remove elements and attributes based on their known defaults. With "extend", the defaults will be made explicit in the output. With "minimal", elements and attributes which have the default value will get removed.
--id-constraints | --no-id-constraints: Remove "key", "keyref", and "unique" elements from the schema. They are used for optimizing XML database queries.
--expand-includes | --no-expand-includes (default is not to expand) Take the content of include files, and merge that with the schema at hand. The includes defined in the included files will be consumed as well. Include statements of files which are not found are left in, without error message.

LICENSE

Copyrights 2008 by Mark Overmeer. For other contributors see ChangeLog.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://www.perl.com/perl/misc/Artistic.html