|
NAMEGames::Dissociate - a Dissociated Press algorithm and filterSYNOPSISuse Games::Dissociate; ... $brilliant_prose = dissociate($normal_prose); or perl -MGames::Dissociate -e dissociate_filter meno.txt ABSTRACTThis module provides the function "dissociate", which implements a Dissociated Press algorithm, well known to Emacs users as "meta-x dissociate". The algorithm here is by no means a straight port of Emacs's "dissociate.el", but is instead merely inspired by it.(I actually intended to make it a straight port, but couldn't manage it -- the code in "dissociate.el" is totally uncommented, and is especially obscure Lisp.) This module also provides a procedure "dissociate_filter", for use in the one-liner context: perl -MGames::Dissociate -e 'dissociate_filter(2)' < thesis.txt > snip.txt or perl -MGames::Dissociate -e 'dissociate_filter(-2)' < thesis.txt > snip.txt or in a script consisting of #!/usr/local/bin/perl use Games::Dissociate; dissociate_filter; Sample DissociationI got this text from feeding the UNIX man page for "regexp" (in plaintext) to "dissociate" with a $group_size parameter of 3:nd of then the full list of the more branch is zero or
"*", "." (matching thand regexp(n) right initional
argumented by a pieces of the left to match that (ab|a) general other worDS
match to the first, followed by "?". It matcheS In of the next start
was been could exp. The characters in expreSSIons belowed in the full matching
the in starticular EXpression in "[0-9]" include a list of sequence
of the are may before the regexp even therwise. REgexp(n) Tcl regular
expression to regexp(n) regexp(n) right. Input string), "\",
About Dissociated Press algorithms"Dissociated Press" algorithms produce text with token-patterns (patterns of words, or patterns of characters) similar those found to an input text.This may be implemented in terms of Markov chains (basically, statistical modeling of frequency of token-groups), altho both this module and Emacs's "dissociate.el" take shortcuts to avoid having to construct and manipulate a real statistical model of the input text. Basically, the way Dissociated Press algorithms (at least mine -- I can't speak for the exact details of all others) work is: 1. Start at a random point in the text, and read a group
of tokens (characters or words from there -- where group size is a parameter
you change) from there. Call this the last-matched group.
2. Output the last-matched group. 3. Look for the other times the last-matched group occurs in the text, and randomly select one of them. (Or: select the next time that group occurs -- a shortcut I've made in the code, which seems to still produce random-looking results). Look at the group of tokens that occurs right after that. Make that the last-matched group. Loop back to Step 2 until we think we've outputted enough. 4. But if the last-matched group from 2 occurred just that once in the text, go back to step 1. Since the groups of characters or words (at least, when you look at them as bits of text only group-size tokens long) are all taken from the input text, you get somewhat natural-looking text -- as opposed to what you'd get if you just randomly outputted single characters or single words from the input text. The process of applying a DissociatedPress algorithm to a bit of text is called "dissociation". PARAMETERS AND USAGETo use this module after you've installed it, say "use Games::Dissociate". This imports the function "dissociate" and the procedure "dissociate_filter".
Efficiency NotesThis module has to search the input string by performing regexp searches on it. In the current version of this module, control over compilation of regular expressions may not be not optimally efficient. Perl 5.005 provides options to better control regexp compilation; once Perl 5.005 is in wider use, I may come out with a new version of Games::Dissociate requiring Perl 5.005 or later, using these new regexp compilation control features.If you feed this module a lot of text (over 50K, say), it will indeed get very slow (notably with by-word dissociation), since that whole chunk of text has to be searched over and over and over. If you have an idea for making this module more efficient, feel free to email it to me. Internationalization NotesWhen dealing with text in heavily inflected languages (like Finnish -- lots of unique word endings, frequently used), this module will require longer input text to produce interesting results for by-word dissociation, compared to relatively inflection-poor languages like English.For text written with no inter-word spacing (often the case with Thai, for example), there's no way for this module to tell where the word breaks are -- in such cases, use only the by-character mode. The current version of this library assumes "/./" matches a single character, for by-character dissociation; and, for by-word dissociation, that "/\w+/" matches whole words and /\W+/ matches non-word strings. These are locale-dependent functions, and Games::Dissociate has a "use locale" in it, hopefully triggering correct behavior for your favorite locale, language, and character-encoding. Consult perllocale and locale for more information on locales. I have found "use locale" to do unwelcome things (like unceremoniously dumping core) on a few very strange, very old (and otherwise barely-working) machines. If this is a problem for you, or if you don't plan to use locales, comment out the "use locale" in the Games::Dissociate source code. The treatment of locales and support for them may change in future versions of this module, depending on how future Perl versions shape up, particularly in their support of Unicode. Randomness NotesThis library uses "rand" extensively, but never calls "srand". If you're getting the same dissociated output all the time, then you're using an old (pre-5.004) version of Perl that doesn't do implicit randomness seeding -- just call "srand();", maybe right after you say "use Games::Dissociate";SEE ALSO* Emacs's "dissociate.el" (written circa 1985?).COPYRIGHTCopyright (c) 1998-2001, Sean M. Burke. All rights reserved.This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. REMINDERIt's just a toy.AUTHORCurrent maintainer Avi Finkel "avi@finkel.org"; Original author Sean M. Burke <sburke@cpan.org>
Visit the GSP FreeBSD Man Page Interface. |