|
NAMEmok - an awk for moleculesSYNOPSISmok [OPTION]... 'CODE' FILE... DESCRIPTIONThe purpose of mok is to read all the molecules found in the files that are given in the command line, and for each molecule execute the CODE that is given. The CODE is given in Perl and it has at its disposal all of the methods of the PerlMol toolkit.This mini-language is intended to provide a powerful environment for writing "molecular one-liners" for extracting and munging chemical information. It was inspired by the AWK programming language by Aho, Kernighan, and Weinberger, the SMARTS molecular pattern description language by Daylight, Inc., and the Perl programming language by Larry Wall. Mok takes its name from Ookla the Mok, an unforgettable character from the animated TV series "Thundarr the Barbarian", and from shortening "molecular awk". For more details about the Mok mini-language, see LANGUAGE SPECIFICATION below. Mok is part of the PerlMol project, <http://www.perlmol.org>. OPTIONS
LANGUAGE SPECIFICATIONA Mok script consists of a sequence of pattern-action statements and optional subroutine definitions, in a manner very similar to the AWK language.pattern_type:/pattern/options { action statements } { action statements } sub name { statements } BEGIN { statements } END { statements } # comment When the whole program consists of one unconditional action block, the braces may be omitted. Program execution is as follows: 1) The BEGIN block is executed as soon as it's compiled, before any other actions are taken. 2) For each molecule in the files given in the command line, each pattern is applied in turn; if the pattern matches, the corresponding statement block is executed. The pattern is optional; statement blocks without a pattern are executed unconditionally. Subroutines are only executed when called explicitly. 3) Finally, the END block is executed. The statements are evaluated as Perl statements in the Chemistry::Mok::UserCode::Default package. The following chemistry modules are conveniently loaded by default: Chemistry::Mol; Chemistry::Atom ':all'; Chemistry::Bond; Chemistry::Pattern; Chemistry::Pattern::Atom; Chemistry::Pattern::Bond; Chemistry::File; Chemistry::File::*; Math::VectorReal ':all'; Besides these, there is one more function available for convenience: "println", which is defined by "sub println { print "\@_", "\n" }". Pattern SpecificationThe pattern must be a SMARTS string readable by the Chemistry::File::SMARTS module, unless a different type is specified by means of the -p option or a pattern_type is given explicitly before the pattern itself. The pattern is given within slashes, in a way reminiscent of AWK and Perl regular expressions. As in Perl, certain one-letter options may be included after the closing slash. An option is turned on by giving the corresponing lowercase letter and turned off by giving the corresponding uppercase letter.
Special VariablesWhen blocks with action statements are executed, some variables are defined automatically. The variables are local, so you can do whatever you want with them with no side effects. However, the objects themselves may be altered by using their methods.NOTE: Mok 0.10 defined $file, $mol, $match, and $patt in lowercase. While they still work, the lowercase variables are deprecated and may be removed in the future.
Special BlocksWithin action blocks, the following block names can be used with Perl funcions such as "next" and "last":
EXAMPLESPrint the names of all the molecules found in all the .sdf files in the current directory:mok 'println $MOL->name' *.sdf Find esters among *.mol; print the filename, molecule name, and formula: mok '/C(=O)OC/{ printf "$FILE: %s (%s)\n", $MOL->name, $MOL->formula }' *.mol Find out the total number of atoms: mok '{ $n += $MOL->atoms } END { print "Total: $n atoms\n" }' *.mol Find out the average C-S bond length: mok '/CS/g{ $n++; $len += $B[0]->length } END { printf "Average C-S bond length: %.3f\n", $len/$n; }' *.mol Convert PDB files to MDL molfiles: mok '{ $FILE =~ s/pdb/mol/; $MOL->write($FILE, format => "mdlmol") }' *.pdb Find molecules with a given formula by overriding the formula pattern type globally (this example requires Chemistry::FormulatPattern): mok -p formula_pattern '/C6H12O6/{ println $MOL->name }' *.sdf Find molecules with a given formula by overriding the formula pattern type just for one specific pattern. This can be used when more than one pattern type is needed in one script. mok 'formula_pattern:/C6H12O6/{ println $MOL->name }' *.sdf SEE ALSOawk(1), perl(1) Chemistry::Mok, Chemistry::Mol, Chemistry::Pattern, <http://dmoz.org/Arts/Animation/Cartoons/Titles/T/Thundarr_the_Barbarian/>.Tubert-Brohman, I. Perl and Chemistry. The Perl Journal 2004-06 (<http://www.tpj.com/documents/s=7618/tpj0406/>). The PerlMol project site at <http://www.perlmol.org>. VERSION0.25AUTHORIvan Tubert-Brohman <itub@cpan.org>COPYRIGHTCopyright (c) 2005 Ivan Tubert-Brohman. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Visit the GSP FreeBSD Man Page Interface. |