NAME

Google::ProtocolBuffers - simple interface to Google Protocol Buffers

SYNOPSYS

    ##
    ## Define structure of your data and create serializer classes
    ##
    use Google::ProtocolBuffers;
    Google::ProtocolBuffers->parse("
        message Person {
          required string name  = 1;
          required int32 id     = 2; // Unique ID number for this person.
          optional string email = 3;
        
          enum PhoneType {
            MOBILE = 0;
            HOME = 1;
            WORK = 2;
          }
        
          message PhoneNumber {
            required string number = 1;
            optional PhoneType type = 2 [default = HOME];
          }
        
          repeated PhoneNumber phone = 4;
        }
    ",
        {create_accessors => 1 }
    );
    
    ##
    ## Serialize Perl structure and print it to file
    ##
    open my($fh), ">person.dat";
    binmode $fh;
    print $fh Person->encode({
        name    => 'A.U. Thor',
        id      => 123,
        phone   => [ 
            { number => 1234567890 }, 
            { number => 987654321, type=>Person::PhoneType::WORK() }, 
        ],
    });
    close $fh;
    
    ##
    ## Decode data from serialized form
    ##
    my $person;
    {
        open my($fh), "<person.dat";
        binmode $fh;
        local $/;
        $person = Person->decode(<$fh>);
        close $fh;
    }
    print $person->{name}, "\n";
    print $person->name,   "\n";  ## ditto

DESCRIPTION

Google Protocol Buffers is a data serialization format. It is binary (and hence compact and fast for serialization) and as extendable as XML; its nearest analogues are Thrift and ASN.1. There are official mappings for C++, Java and Python languages; this library is a mapping for Perl.

METHODS

Google::ProtocolBuffers->parse($proto_text, \%options)

Google::ProtocolBuffers->parsefile($proto_filename, \%options)

Protocol Buffers is a typed protocol, so work with it starts with some kind of Interface Definition Language named 'proto'. For the description of the language, please see the official page (<http://code.google.com/p/protobuf/>) Methods 'parse' and 'parsefile' take the description of data structure as text literal or as name of the proto file correspondently. After successful compilation, Perl serializer classes are created for each message, group or enum found in proto. In case of error, these methods will die. On success, a list of names of created classes is returned. Options are given as a hash reference, the recognizable options are:

include_dir => [ $dir_name ]

One proto file may include others, this option sets where to look for the included files. Multiple dirs should be specificed as an ARRAYREF.

generate_code => $filename or $file_handler

Compilation of proto source is a relatively slow and memory consuming operation, it is not recommended in production environment. Instead, with this option you may specify filename or filehandle where to save Perl code of created serializer classes for future use. Example:

    ## in helper script
    use Google::ProtocolBuffers;
    Google::ProtocolBuffers->parse(
        "message Foo {optional int32 a = 1; }",
        { generate_code => 'Foo.pm' }
    );
    
    ## then, in production code
    use Foo;
    my $str = Foo->encode({a => 100});

create_accessors (Boolean)

If this option is set, then result of 'decode' will be a blessed structure with accessor methods for each field, look at Class::Accessor for more info. Example:

    use Google::ProtocolBuffers;
    Google::ProtocolBuffers->parse(
        "message Foo { optional int32 id = 1; }",
        { create_accessors => 1 }
    );
    my $foo = Foo->decode("\x{08}\x{02}");
    print $foo->id; ## prints 2
    $foo->id(100);  ## now it is set to 100

follow_best_practice (Boolean)

This option is from Class::Accessor too; it has no effect without 'create_accessors'. If set, names of getters (read accessors) will start with get_ and names of setter with set_:

    use Google::ProtocolBuffers;
    Google::ProtocolBuffers->parse(
        "message Foo { optional int32 id = 1; }",
        { create_accessors => 1, follow_best_practice => 1 }
    );
    ## Class::Accessor provides a constructor too
    my $foo = Foo->new({ id => 2 }); 
    print $foo->get_id;  
    $foo->set_id(100);

simple_extensions (Boolean)

If this option is set, then extensions are treated as if they were regular fields in messages or groups:

    use Google::ProtocolBuffers;
    use Data::Dumper;
    Google::ProtocolBuffers->parse(
        "   
            message Foo { 
                optional int32 id = 1;
                extensions 10 to max;     
            }
            extend Foo {
               optional string name = 10;
            }
        ",
        { simple_extensions=>1, create_accessors => 1 }
    );
    my $foo = Foo->decode("\x{08}\x{02}R\x{03}Bob");
    print Dumper $foo; ## { id => 2, name => 'Bob' }
    print $foo->id, "\n";
    $foo->name("Sponge Bob");

This option is off by default because extensions live in a separate namespace and may have the same names as fields. Compilation of such proto with 'simple_extension' option will result in die. If the option is off, you have to use special accessors for extension fields - setExtension and getExtension, as in C++ Protocol Buffer API. Hash keys for extended fields in Plain Old Data structures will be enclosed in brackets:

    use Google::ProtocolBuffers;
    use Data::Dumper;
    Google::ProtocolBuffers->parse(
        "   
            message Foo { 
                optional int32 id = 1;
                extensions 10 to max;     
            }
            extend Foo {
               optional string id = 10; // <-- id again!
            }
        ",
        {   simple_extensions   => 0,   ## <-- no simple extensions 
            create_accessors    => 1, 
        }
    );
    my $foo = Foo->decode("\x{08}\x{02}R\x{05}Kenny");
    print Dumper $foo;      ## { id => 2, '[id]' => 'Kenny' }
    print $foo->id, "\n";                   ## 2
    print $foo->getExtension('id'), "\n";   ## Kenny
    $foo->setExtension("id", 'Kenny McCormick');

no_camel_case (Boolean)

By default, names of created Perl classes are taken from "camel-cased" names of proto's packages, messages, groups and enums. First characters are capitalized, all underscores are removed and the characters following them are capitalized too. An example: a fully qualified name 'package_test.Message' will result in Perl class 'PackageTest::Message'. Option 'no_camel_case' turns name-mangling off. Names of fields, extensions and enum constants are not affected anyway.

package_name (String)

Package name to be put into generated Perl code; has no effect on Perl classes names and has no effect unless 'generate_code' is also set.

MessageClass->encode($hashref)

This method may be called as class or instance method. 'MessageClass' must already be created by compiler. Input is a hash reference. Output is a scalar (string) with serialized data. Unknown fields in hashref are ignored. In case of errors (e.g. required field is not set and there is no default value for the required field) an exception is thrown. Examples:

    use Google::ProtocolBuffers;
    Google::ProtocolBuffers->parse(
        "message Foo {optional int32 id = 1; }",
        {create_accessors => 1}
    );
    my $string = Foo->encode({ id => 2 });
    my $foo = Foo->new({ id => 2 });
    $string = $foo->encode;                 ## ditto

MessageClass->decode($scalar)

Class method. Input: serialized data string. Output: data object of class 'MessageClass'. Unknown fields in serialized data are ignored. In case of errors (e.g. message is broken or partial) or data string is a wide-character (utf-8) string, an exception is thrown.

PROTO ELEMENTS

Enums

For each enum in proto, a Perl class will be constructed with constants for each enum value. You may import these constants via ClassName->import(":constants") call. Please note that Perl compiler will know nothing about these constants at compile time, because this import occurs at run time, so parenthesis after constant's name are required.

    use Google::ProtocolBuffers;
    Google::ProtocolBuffers->parse(
        "
            enum Foo {
                   FOO = 1;
                   BAR = 2; 
            }
        ", 
        { generate_code => 'Foo.pm' }
    ); 
    print Foo::FOO(), "\n";     ## fully quailified name is fine
    Foo->import(":constants");
    print FOO(), "\n";          ## now FOO is defined in our namespace
    print FOO;                  ## <-- Error! FOO is bareword!

Or, do the import inside a BEGIN block:

    use Foo;                    ## Foo.pm was generated in previous example
    BEGIN { Foo->import(":constants") }
    print FOO, "\n";            ## ok, Perl compiler knows about FOO here

Groups

Though group are considered deprecated they are supported by Google::ProtocolBuffers. They are like nested messages, except that nested type definition and field definition go together:

    use Google::ProtocolBuffers;
    Google::ProtocolBuffers->parse(
        "
            message Foo {
                optional group Bar = 1 {
                    optional int32 baz = 1;
                }
            }
        ",
        { create_accessors => 1 }
    );
    my $foo = Foo->new;
    $foo->Bar( Foo::Bar->new({ baz => 2 }) );
    print $foo->Bar->baz, ", ", $foo->{Bar}->{baz}, "\n";   # 2, 2

Default values

Proto file may specify a default value for a field. The default value is returned by accessor if there is no value for field or if this value is undefined. The default value is not accessible via plain old data hash, though. Default string values are always byte-strings, if you need wide-character (Unicode) string, use "decode_utf8" in Encode.

    use Google::ProtocolBuffers;
    Google::ProtocolBuffers->parse(
        "message Foo {optional string name=1 [default='Kenny'];} ",
        {create_accessors => 1}
    );
    
    ## no initial value
    my $foo = Foo->new; 
    print $foo->name(), ", ", $foo->{name}, "\n"; # Kenny, (undef)   
    
    ## some defined value        
    $foo->name('Ken');           
    print $foo->name(), ", ", $foo->{name}, "\n"; # Ken, Ken   
    
    ## empty, but still defined value    
    $foo->name('');   
    print $foo->name(), ", ", $foo->{name}, "\n"; # (empty), (empty)  
    
    ## undef value == default value 
    $foo->name(undef);
    print $foo->name(), ", ", $foo->{name}, "\n"; # Kenny, (undef)

Extensions

From the point of view of serialized data, there is no difference if a field is declared as regular field or if it is extension, as far as field number is the same. That is why there is an option 'simple_extensions' (see above) that treats extensions like regular fields. From the point of view of named accessors, however, extensions live in namespace different from namespace of fields, that's why they simple names (i.e. not fully qualified ones) may conflict. (And that's why this option is off by default). The name of extensions are obtained from their fully qualified names from which leading part, most common with the class name to be extended, is stripped. Names of hash keys enclosed in brackets; arguments to methods 'getExtension' and 'setExtension' do not. Here is the self-explanatory example to the rules:

    use Google::ProtocolBuffers;
    use Data::Dumper;
    
    Google::ProtocolBuffers->parse(
        "
            package some_package;
            // message Plugh contains one regular field and three extensions
            message Plugh {
                optional int32 foo = 1;
                extensions 10 to max;
            }
            extend Plugh {
                optional int32 bar = 10;
            }
            message Thud {
                extend Plugh {
                    optional int32 baz = 11;
                }
            }
            
            // Note: the official Google's proto compiler does not allow 
            // several package declarations in a file (as of version 2.0.1).
            // To compile this example with the official protoc, put lines
            // above to some other file, and import that file here.
            package another_package;
            // import 'other_file.proto';
            
            extend some_package.Plugh {
                optional int32 qux = 12;
            }
            
        ",
        { create_accessors => 1 }
    );
    
    my $plugh = SomePackage::Plugh->decode(
        "\x{08}\x{01}\x{50}\x{02}\x{58}\x{03}\x{60}\x{04}"
    );
    print Dumper $plugh; 
    ## {foo=>1, '[bar]'=>2, '[Thud.baz]'=>3, [another_package.qux]=>4}
    
    print $plugh->foo, "\n";                            ## 1
    print $plugh->getExtension('bar'), "\n";            ## 2
    print $plugh->getExtension('Thud.baz'), "\n";       ## 3
    print $plugh->getExtension('Thud::baz'), "\n";      ## ditto

Another point is that 'extend' block doesn't create new namespace or scope, so the following proto declaration is invalid:

    // proto:
    package test;
    message Foo { extensions 10 to max; } 
    message Bar { extensions 10 to max; }
    extend Foo { optional int32 a = 10; }
    extend Bar { optional int32 a = 20; }   // <-- Error: name 'a' in package
                                            // 'test' is already used!

Well, extensions are the most complicated part of proto syntax, and I hope that you either got it or you don't need it.

RUN-TIME MESSAGE CREATION

You don't like to mess with proto files? Structure of your data is known at run-time only? No problem, create your serializer classes at run-time too with method Google::ProtocolBuffers->create_message('ClassName', \@fields, \%options); (Note: The order of field description parts is the same as in proto file. The API is going to change to accept named parameters, but backward compatibility will be preserved).

    use Google::ProtocolBuffers;
    use Google::ProtocolBuffers::Constants(qw/:labels :types/);
    
    ##
    ## proto:
    ## message Foo {
    ##      message Bar {
    ##           optional int32 a = 1 [default=12];
    ##      }
    ##      required int32 id = 1;
    ##      repeated Bar   bars = 2;    
    ## }
    ##
    Google::ProtocolBuffers->create_message(
        'Foo::Bar',
        [
            ## optional      int32        a = 1 [default=12]
            [LABEL_OPTIONAL, TYPE_INT32, 'a', 1, '12']
        ],
        { create_accessors => 1 }
    );
    Google::ProtocolBuffers->create_message(
        'Foo',
        [
            [LABEL_REQUIRED, TYPE_INT32, 'id',   1],
            [LABEL_REPEATED, 'Foo::Bar', 'bars', 2],
        ],
        { create_accessors => 1 }
    );
    my $foo = Foo->new({ id => 10 });
    $foo->bars( Foo::Bar->new({a=>1}), Foo::Bar->new({a=>2}) );
    print $foo->encode;

There are methods 'create_group' and 'create_enum' also; the following constants are exported: labels (LABEL_OPTIONAL, LABEL_OPTIONAL, LABEL_REPEATED) and types (TYPE_INT32, TYPE_UINT32, TYPE_SINT32, TYPE_FIXED32, TYPE_SFIXED32, TYPE_INT64, TYPE_UINT64, TYPE_SINT64, TYPE_FIXED64, TYPE_SFIXED64, TYPE_BOOL, TYPE_STRING, TYPE_BYTES, TYPE_DOUBLE, TYPE_FLOAT).

KNOWN BUGS, LIMITATIONS AND TODOs

All proto options are ignored except default values for fields; extension numbers are not checked. Unknown fields in serialized data are skipped, no stream API (encoding to/decoding from file handlers) is present. Ask for what you need most.

Introspection API is planned.

Declarations of RPC services are currently ignored, but their support is planned (btw, which Perl RPC implementation would you recommend?)

AUTHOR, ACKNOWLEDGEMENS, COPYRIGHT

Author: Igor Gariev <gariev@hotmail.com>
the CSIRT Gadgets Foundation <csirtgadgets.org>

Proto grammar is based on work by Alek Storm <http://groups.google.com/group/protobuf/browse_thread/thread/1cccfc624cd612da>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.