|
NAMEData::Domain - Data description and validationSYNOPSISuse Data::Domain qw/:all/; # some basic domains my $int_dom = Int(-min => 3, -max => 18); my $nat_dom = Nat(-max => 100); # natural numbers my $num_dom = Num(-min => 3.33, -max => 18.5); my $string_dom = String(-min_length => 2, -optional => 1); my $handle_dom = Handle; my $enum_dom = Enum(qw/foo bar buz/); my $int_list_dom = List(-min_size => 1, -all => Int); my $mixed_list = List(String, Int(-min => 0), Date, True, Defined); my $struct_dom = Struct(foo => String, bar => Int(-optional => 1)); my $obj_dom = Obj(-can => 'print'); my $class_dom = Class(-can => 'print'); # using the domain to check data my $error_messages = $domain->inspect($some_data); reject_form($error_messages) if $error_messages; # same, using the smart match API $some_other_data ~~ $domain or die "did not match because $Data::Domain::MESSAGE"; # custom name and custom messages (2 different ways) $domain = Int(-name => 'age', -min => 3, -max => 18, -messages => "only for people aged 3-18"); $domain = Int(-name => 'age', -min => 3, -max => 18, -messages => { TOO_BIG => "not for old people over %d", TOO_SMALL => "not for babies under %d", }); # examples of subroutines for specialized domains sub Phone { String(-regex => qr/^\+?[0-9() ]+$/, -messages => "Invalid phone number", @_) } sub Email { String(-regex => qr/^[-.\w]+\@[\w.]+$/, -messages => "Invalid email", @_) } sub Contact { Struct(-fields => [name => String, phone => Phone, mobile => Phone(-name => 'Mobile', -optional => 1), emails => List(-all => Email) ], @_) } # lazy subdomain $domain = Struct( date_begin => Date(-max => 'today'), date_end => sub {my $context = shift; Date(-min => $context->{flat}{date_begin})}, ); # recursive domain my $expr_domain; $expr_domain = One_of(Num, Struct(operator => String(qr(^[-+*/]$)), left => sub {$expr_domain}, right => sub {$expr_domain})); # constants in deep datastructures $domain = Struct( foo => 123, # 123 becomes a domain bar => List(Int, 'buz', Int) ); # 'buz' becomes a domain # list with repetitive structure (here : triples) my $domain = List(-all => [String, Int, Obj(-can => 'print')]); DESCRIPTIONA data domain is a description of a set of values, either scalar or structured (arrays or hashes). The description can include many constraints, like minimal or maximal values, regular expressions, required fields, forbidden fields, and also contextual dependencies. From that description, one can then invoke the domain's "inspect" method to check if a given value belongs to the domain or not. In case of mismatch, a structured set of error messages is returned, giving detailed explanations about what was wrong.The motivation for writing this package was to be able to express in a compact way some possibly complex constraints about structured data. Typically the data is a Perl tree (nested hashrefs or arrayrefs) that may come from XML, JSON, from a database through DBIx::DataModel, or from postprocessing an HTML form through CGI::Expand. "Data::Domain" is a kind of tree parser on that structure, with some facilities for dealing with dependencies within the structure, and with several options to finely tune the error messages returned to the user. The main usage for "Data::Domain" is to check input from forms in interactive applications : the structured error messages make it easy to display a form again, highlighting which fields were rejected and why. Another usage is for writing automatic tests, with the help of the companion module Test::InDomain. There are several other packages in CPAN doing data validation; these are briefly listed in the "SEE ALSO" section. EXPORTSDomain constructorsuse Data::Domain qw/:all/; # or use Data::Domain qw/:constructors/; # or use Data::Domain qw/Whatever Empty Num Int Nat Date Time String Enum List Struct One_of All_of/; Internally, domains are represented as Perl objects; however, it would be tedious to write my $domain = Data::Domain::Struct->new( anInt => Data::Domain::Int->new(-min => 3, -max => 18), aDate => Data::Domain::Date->new(-max => 'today'), ... ); so for each of its builtin domain constructors, "Data::Domain" exports a plain function that just calls "new" on the appropriate subclass; these functions are all exported in in a group called ":constructors", and allow us to write more compact code : my $domain = Struct( anInt => Int(-min => 3, -max => 18), aDate => Date(-max => 'today'), ... ); The list of available domain constructors will be expanded below in "BUILTIN DOMAIN CONSTRUCTORS". Shortcuts (domains with predefined options)use Data::Domain qw/:all/; # or use Data::Domain qw/:shortcuts/; # or use Data::Domain qw/True False Defined Undef Blessed Unblessed Regexp Obj Class/; The ":shortcuts" export group contains a number of convenience functions that call the "Whatever" domain constructor with various pre-built options. Precise definitions for each of these functions will be given below in "BUILTIN SHORTCUTS". Renaming imported functionsShort function names like "Int", "String", "List", "Obj", "True", etc. are convenient but may cause name clashes with other modules. Thanks to the powerful features of Sub::Exporter, these functions can be renamed in various ways. Here is an example :use Data::Domain -all => { -prefix => 'dom_' }; my $domain = dom_Struct( anInt => dom_Int(-min => 3, -max => 18), aDate => dom_Date(-max => 'today'), ... ); There are a number of other ways to rename imported functions; see Sub::Exporter and Sub::Exporter::Tutorial. Removing symbols from the import listTo preserve backwards compatibility with Exporter, the present module also supports exclamation marks to exclude some specific symbols from the import list. For exampleuse Data::Domain qw/:all !Date/; will import everything except the "Date" function. METHODS COMMON TO ALL DOMAINSnewThe "new" method creates a new domain object, from one of the domain constructors listed below ("Num", "Int", "Date", etc.). The "Data::Domain" class itself has no "new" method, because it is an abstract class.This method is seldom called explicitly; it is usually more convenient to use the wrapper subroutines introduced above, i.e. to write "Int(@args)" instead of "Data::Domain::Int->new(@args)". All examples below will use this shorter notation. Arguments to the "new" method may specify various options for the domain to be constructed. Option names always start with a dash. If no option name is given, parameters to the "new" method are passed to the default option defined in each constructor subclass. For example the default option in "Data::Domain::List" is "-items", so my $domain = List(Int, String, Int); is equivalent to my $domain = List(-items => [Int, String, Int]); So in short, the "default option" is syntactic sugar for using positional parameters instead of named parameters. Each domain constructor has its own list of available options; these will be presented below, together with each subclass (for example options for setting minimal/maximal values, regular expressions, string length, etc.). However, there are also some generic options, available in every domain constructor; these are listed here, in several categories. Options for customizing the domain behaviour
Options for checking boolean properties Options in this category check if the data possesses, or does not possess, a given property; hence, the argument to each option must be a boolean. For example, here is a domain that accepts all blessed objects that are not weak references and are not readonly : $domain = Whatever(-blessed => 1, -weak => 0, -readonly => 0); Boolean property options are :
Options for checking other general properties Options in this category do not take a boolean argument, but a class name, method name, role or smart match operand.
EXPERIMENTAL: options for checking return values Disclaimer: options in this section are still experimental; the call API or structure of returned values might change in future versions of "Data::Domain". These options call methods or coderefs within the data, and then check the results against the supplied domains. This is somehow contrary to the principle of "domains", because a function call or method call not only inspects the data : it might also alter the data. However, one could also argue that peeking into an object's internals is contrary to the principle of encapsulation, so in this sense, method calls are more appropriate. You decide ... but beware of side-effects in your data!
Note that this property can be invoked not only on "Obj", but on any domain; hence, it is possible to simultaneously check if an object has some given internal structure, and also answers to some method calls : $domain = Struct( # must be a hashref -fields => {foo => String} # must have a {foo} key with a String value -has => [foo => String], # must have a ->foo method that returns a String );
inspectmy $messages = $domain->inspect($some_data); This method inspects the supplied data, and returns an error message (or a structured collection of messages) if anything is wrong. If the data successfully passed all domain tests, the method returns "undef". For scalar domains ("Num", "String", etc.), the error message is just a string. For structured domains ("List", "Struct"), the return value is an arrayref or hashref of the same structure, like for example {anInt => "smaller than mimimum 3", aDate => "not a valid date", aList => ["message for item 0", undef, undef, "message for item 3"]} The client code can then exploit this structure to dispatch error messages to appropriate locations (like for example the form fields from which the data was gathered). smart match"Data::Domain" overloads the smart match operator "~~", so one can writeif ($data ~~ $domain) {...} instead of if (!my $msg = $domain->inspect($data)) {...} The error message from the last smart match operation can be retrieved from $Data::Domain::MESSAGE. stringificationWhen printed, domains stringify to a compact Data::Dumper representation of their internal attributes; these details can be useful for debugging or logging purposes.BUILTIN DOMAIN CONSTRUCTORSWhatevermy $just_anything = Whatever; my $is_defined = Whatever(-defined => 1); my $is_undef = Whatever(-defined => 0); my $is_true = Whatever(-true => 1); my $is_false = Whatever(-true => 0); my $is_of_class = Whatever(-isa => 'Some::Class'); my $does_role = Whatever(-does => 'Some::Role'); my $has_methods = Whatever(-can => [qw/jump swim dance sing/]); The "Data::Domain::Whatever" domain can contain any kind of Perl value, including "undef" (actually this is the only domain that contains "undef"). The only specific option is :
The "Whatever" is mostly used together with some of the general options described above, like "-true", "-does", "-can", etc. EmptyThe "Data::Domain::Empty" domain always fails when inspecting any data. This is sometimes useful within lazy constructors, like in this example :Struct( foo => String, bar => sub { my $context = shift; if (some_condition($context)) { return Empty(-messages => 'your data is wrong') } else { ... } } ) The "LAZY CONSTRUCTORS" section gives more explanations about lazy domains. Nummy $domain = Num(-range =>[-3.33, 999], -not_in => [2, 3, 5, 7, 11]); Domain for numbers (including floats). Numbers are recognized through "looks_like_number" in Scalar::Util. Options for the domain are :
Intmy $domain = Int(-min => -999, -max => 999, -not_in => [2, 3, 5, 7, 11]); Domain for integers. Integers are recognized through the regular expression "/^-?\d+$/". This domain accepts the same options as "Num" and returns the same error messages. Natmy $domain = Nat(-max => 999); Domain for natural numbers (i.e. positive integers). Natural numbers are recognized through the regular expression "/^\d+$/". This domain accepts the same options as "Num" and returns the same error messages. DateData::Domain::Date->parser('EU'); # default my $domain = Date(-min => '01.01.2001', -max => 'today', -not_in => ['02.02.2002', '03.03.2003', 'yesterday']); Domain for dates, implemented via the Date::Calc module. By default, dates are parsed according to the European format, i.e. through the Decode_Date_EU method; this can be changed by setting Data::Domain::Date->parser('US'); # will use Decode_Date_US or Data::Domain::Date->parser(\&your_own_date_parsing_function); # that func. should return an array ($year, $month, $day) Options to this domain are:
When outputting error messages, dates will be printed according to Date::Calc's current language (english by default); see that module's documentation for changing the language. Timemy $domain = Time(-min => '08:00', -max => 'now'); Domain for times in format "hh:mm:ss" (minutes and seconds are optional). Options to this domain are:
Stringmy $domain = String(qr/^[A-Za-z0-9_\s]+$/); my $domain = String(-regex => qr/^[A-Za-z0-9_\s]+$/, -antiregex => qr/$RE{profanity}/, # see Regexp::Common -range => ['AA', 'zz'], -length => [1, 20], -not_in => [qw/foo bar/]); Domain for strings. Things considered as strings are either scalar values, or objects with an overloaded stringification method; by contrast, a hash reference is not considered to be a string, even if it can stringify to something like "HASH(0x3f9fc4)" or "Some::Class=HASH(0x3f9fc4)" through Perl's internal rules. Options to this domain are:
Handlemy $domain = Handle(); Domain for filehandles. This domain has no options. Domain membership is checked through "openhandle" in Scalar::Util. Enummy $domain = Enum(qw/foo bar buz/); Domain for a finite set of scalar values. Options are:
Listmy $domain = List(String, Int, String, Num); my $domain = List(-items => [String, Int, String, Num]); # same as above my $domain = List(-all => String(qr/^[A-Z]+$/), -any => String(-min_length => 3), -size => [3, 10]); my $domain = List(-all => [String, Int, Whatever(-can => 'print')]); Domain for lists of values (stored as Perl arrayrefs). Options are:
Structmy $domain = Struct(foo => Int, bar => String); my $domain = Struct(-fields => {foo => Int, bar => String}); # same as above my $domain = Struct(-fields => [foo => Int, bar => String], -exclude => '*'); # only 'foo' and 'bar', nothing else my $domain = Struct(-keys => List(-all => String(qr/^[abc])), -values => List(-all => Int)); Domain for associative structures (stored as Perl hashrefs). Options are:
One_ofmy $domain = One_of($domain1, $domain2, ...); Union of domains : successively checks the member domains, until one of them succeeds. Options are:
All_ofmy $domain = All_of($domain1, $domain2, ...); Intersection of domains : checks all member domains, and requires that all of them succeed. Options are:
BUILTIN SHORTCUTSBelow are the precise definition for the shortcut functions exported in the ":shortcuts" group. Each of these functions sets some initial options, but also accepts further options as arguments, so for example it is possible to write something like "Obj(-does => 'Storable', -optional => 1)", which is equivalent to "Whatever(-blessed => 1, -does => 'Storable', -optional => 1)".True"Whatever(-true => 1)"False"Whatever(-true => 0)"Defined"Whatever(-defined => 1)"Undef"Whatever(-defined => 0)"Blessed"Whatever(-blessed => 1)"Unblessed"Whatever(-blessed => 0)"Regexp"Whatever(-does => 'Regexp')"Obj"Whatever(-blessed => 1)" (synonym to "Blessed")Class"Whatever(-blessed => 0, -isa => 'UNIVERSAL')"LAZY CONSTRUCTORS (CONTEXT DEPENDENCIES)PrincipleIf an element of a structured domain ("List" or "Struct") depends on another element, then we need to lazily construct that subdomain. Consider for example a struct in which the value of field "date_end" must be greater than "date_begin" : the subdomain for "date_end" can only be constructed when the argument to "-min" is known, namely when the domain inspects an actual data structure.Lazy domain construction is achieved by supplying a subroutine reference instead of a domain object. That subroutine will be called with some context information, and should return the domain object. So our example becomes : my $domain = Struct( date_begin => Date, date_end => sub {my $context = shift; Date(-min => $context->{flat}{date_begin})} ); Structure of contextThe supplied context is a hashref containing the following information:
To illustrate this, the following code : my $domain = Struct( foo => List(Whatever, Whatever, Struct(bar => sub {my $context = shift; print Dumper($context); String;}) ) ); my $data = {foo => [undef, 99, {bar => "hello, world"}]}; $domain->inspect($data); will print : $VAR1 = { 'root' => {'foo' => [undef, 99, {'bar' => 'hello, world'}]}, 'path' => ['foo', 2, 'bar'], 'list' => $VAR1->{'root'}{'foo'}, 'flat' => { 'bar' => 'hello, world', 'foo' => $VAR1->{'root'}{'foo'} } }; Examples of lazy domainsContextual setsThe domain below accepts hashrefs with a "country" and a "city", but also checks that the city actually belongs to the given country : %SOME_CITIES = { Switzerland => [qw/Geneve Lausanne Bern Zurich Bellinzona/], France => [qw/Paris Lyon Marseille Lille Strasbourg/], Italy => [qw/Milano Genova Livorno Roma Venezia/], }; my $domain = Struct( country => Enum(keys %SOME_CITIES), city => sub { my $context = shift; Enum(-values => $SOME_CITIES{$context->{flat}{country}}); }); Ordered lists A domain for ordered lists of integers: my $domain = List(-all => sub { my $context = shift; my $index = $context->{path}[-1]; return $index == 0 ? Int : Int(-min => $context->{list}[$index-1]); }); The subdomain for the first item in the list has no specific constraint; but the next subdomains have a minimal bound that comes from the previous list item. Recursive domain A domain for expression trees, where leaves are numbers, and intermediate nodes are binary operators on subtrees : my $expr_domain; $expr_domain = One_of(Num, Struct(operator => String(qr(^[-+*/]$)), left => sub {$expr_domain}, right => sub {$expr_domain})); Observe that recursive calls to the domain are encapsulated within "sub {...}" so that they are treated as lazy domains. WRITING NEW DOMAIN CONSTRUCTORSImplementing new domain constructors is fairly simple : create a subclass of "Data::Domain" and implement a "new" method and an "_inspect" method. See the source code of "Data::Domain::Num" or "Data::Domain::String" for short examples.However, before writing such a class, consider whether the existing mechanisms are not enough for your needs. For example, many domains could be expressed as a "String" constrained by a regular expression; therefore it is just a matter of writing a subroutine that wraps a call to the domain constructor, while supplying some of its arguments : sub Phone { String(-regex => qr/^\+?[0-9() ]+$/, -messages => "Invalid phone number", @_) } sub Email { String(-regex => qr/^[-.\w]+\@[\w.]+$/, -messages => "Invalid email", @_) } sub Contact { Struct(-fields => [name => String, phone => Phone, mobile => Phone(-optional => 1), emails => List(-all => Email) ], @_) } Observe that these examples always pass @_ to the domain call : this is so that the client can still add its own arguments to the call, like $domain = Phone(-name => 'private phone', -optional => 1, -not_in => [ 1234567, 9999999 ]); CONSTANT SUBDOMAINSFor convenience, elements of "List()" or "Struct()" may be plain scalar constants, and are automatically translated into constant domains :$domain = Struct(foo => 123, bar => List(Int, 'buz', Int)); This is exactly equivalent to $domain = Struct(foo => Int(-min => 123, -max => 123), bar => List(Int, String(-min => 'buz', -max => 'buz'), Int)); CUSTOMIZING ERROR MESSAGESMessages returned by validation rules have default values, but can be customized in several ways.General structure of error messagesEach error message has an internal string identifier, like "TOO_SHORT", "NOT_A_HASH", etc. The section "Message identifiers" below tells which message identifiers may be generated by each domain constructor.Message identifiers are then associated with user-friendly strings, either within the domain itself, or via a global table. Such strings are actually sprintf format strings, with placeholders for printing some specific details about the validation rule : for example the "String" domain defines default messages such as TOO_SHORT => "less than %d characters", SHOULD_MATCH => "should match '%s'", The "-messages" option to domain constructorsAny domain constructor may receive a "-messages" option to locally override the messages for that domain. The argument may be
Here is an example : sub Phone { String(-regex => qr/^\+?[0-9() ]+$/, -min_length => 7, -messages => { TOO_SHORT => "phone number should have at least %d digits", SHOULD_MATCH => "invalid chars in phone number", }, @_); } The "messages" class methodDefault strings associated with message identifiers are stored in a global table. The "Data::Domain" distribution contains builtin tables for english (the default) and for french : these can be chosen through the "messages" class method :Data::Domain->messages('english'); # the default Data::Domain->messages('francais'); The same method can also receive a custom table. my $custom_table = {...}; Data::Domain->messages($custom_table); This should be a two-level hashref : first-level entries in the hash correspond to "Data::Domain" subclasses (i.e "Num => {...}", "String => {...}"), or to the constant "Generic"; for each of those, the second-level entries should correspond to message identifiers as specified in the doc for each subclass (for example "TOO_SHORT", "NOT_A_HASH", etc.). Values should be strings suitable to be fed to sprintf. Look at $builtin_msgs in the source code to see an example. Finally, it is also possible to write your own message generation handler : Data::Domain->messages(sub {my ($msg_id, @args) = @_; return "you just got it wrong ($msg_id)"}); What is received in @args depends on which validation rule is involved; it can be for example the minimal or maximal bounds, or the regular expression being checked. The "-name" option to domain constructorsThe name of the domain is prepended in front of error messages. The default name is the subclass of "Data::Domain", so a typical error message for a string would beString: less than 7 characters However, if a "-name" is supplied to the domain constructor, that name will be printed instead; my $dom = String(-min_length => 7, -name => 'Phone'); # now error would be: "Phone: less than 7 characters" Message identifiersThis section lists all possible message identifiers generated by the builtin constructors.
INTERNALSVariablesMAX_DEEPIn order to avoid infinite loops, the "inspect" method will raise an exception if $MAX_DEEP recursive calls were exceeded. The default limit is 100, but it can be changed like this : local $Data::Domain::MAX_DEEP = 999; Methodsnode_from_pathmy $node = node_from_path($root, @path); Convenience function to find a given node in a data tree, starting from the root and following a path (a sequence of hash keys or array indices). Returns "undef" if no such path exists in the tree. Mainly useful for contextual constraints in lazy constructors. msg Internal utility method for generating an error message. subclass Method that returns the short name of the subclass of "Data::Domain" (i.e. returns 'Int' for "Data::Domain::Int"). _expand_range Internal utility method for converting a "range" parameter into "min" and "max" parameters. _build_subdomain Internal utility method for dynamically converting lazy domains (coderefs) into domains. SEE ALSODoc and tutorials on complex Perl data structures: perlref, perldsc, perllol.Other CPAN modules doing data validation : Data::FormValidator, CGI::FormBuilder, HTML::Widget::Constraint, Jifty::DBI, Data::Constraint, Declare::Constraints::Simple, Moose::Manual::Types, Smart::Match, Test::Deep, Params::Validate, Validation::Class. Among those, "Declare::Constraints::Simple" is the closest to "Data::Domain", because it is also designed to deal with substructures; yet it has a different approach to combinations of constraints and scope dependencies. Some inspiration for "Data::Domain" came from the wonderful Parse::RecDescent module, especially the idea of passing a context where individual rules can grab information about neighbour nodes. Ideas for some features were borrowed from Test::Deep and from Moose::Manual::Types. AUTHORLaurent Dami, <dami at cpan.org>COPYRIGHT AND LICENSECopyright 2006, 2007, 2012 by Laurent Dami.This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Visit the GSP FreeBSD Man Page Interface. |