|
NAMEextract_url -- extract URLs from email messagesSYNOPSISextract_url [options] fileDESCRIPTIONThis is a Perl script that extracts URLs from correctly-encoded MIME email messages. This can be used either as a pre-parser for urlview, or to replace urlview entirely.Urlview is a great program, but has some deficiencies. In particular, it isn't particularly configurable, and cannot handle URLs that have been broken over several lines in format=flowed delsp=yes email messages. Nor can it handle quoted-printable email messages. Also, urlview doesn't eliminate duplicate URLs. This Perl script handles all of that. It also sanitizes URLs so that they can't break out of the command shell. This is designed primarily for use with the mutt emailer. The idea is that if you want to access a URL in an email, you pipe the email to a URL extractor (like this one) which then lets you select a URL to view in some third program (such as Firefox). An alternative design is to access URLs from within mutt's pager by defining macros and tagging the URLs in the display to indicate which macro to use. A script you can use to do that is tagurl.pl. OPTIONS
DEPENDENCIESMandatory dependencies are MIME::Parser and HTML::Parser. These usually come with Perl.Optional dependencies are URI::Find (recognizes more exotic URL variations in plain text (without HTML tags)), Curses::UI (allows it to fully replace urlview), MIME::Quoted (does a more standardized decode of quoted-printable characters in plain text), and Getopt::Long (if present, extract_url.pl recognizes long options --version and --list). EXAMPLESThis Perl script expects a valid email to be either piped in via STDIN or in a file listed as the script's only argument. Its STDOUT can be a pipe into urlview (it will detect this). Here's how you can use it:cat message.txt | extract_url.pl cat message.txt | extract_url.pl | urlview extract_url.pl message.txt extract_url.pl message.txt | urlview For use with mutt 1.4.x, here's a macro you can use: macro index,pager \cb "\ <enter-command> \ unset pipe_decode<enter>\ <pipe-message>extract_url.pl<enter>" \ "get URLs" For use with mutt 1.5.x, here's a more complicated macro you can use: macro index,pager \cb "\ <enter-command> set my_pdsave=\$pipe_decode<enter>\ <enter-command> unset pipe_decode<enter>\ <pipe-message>extract_url.pl<enter>\ <enter-command> set pipe_decode=\$my_pdsave<enter>" \ "get URLs" Here's a suggestion for how to handle encrypted email: macro index,pager ,b "\ <enter-command> set my_pdsave=\$pipe_decode<enter>\ <enter-command> unset pipe_decode<enter>\ <pipe-message>extract_url.pl<enter>\ <enter-command> set pipe_decode=\$my_pdsave<enter>" \ "get URLs" macro index,pager ,B "\ <enter-command> set my_pdsave=\$pipe_decode<enter>\ <enter-command> set pipe_decode<enter>\ <pipe-message>extract_url.pl<enter>\ <enter-command> set pipe_decode=\$my_pdsave<enter>" \ "decrypt message, then get URLs" message-hook . 'macro index,pager \cb ,b "URL viewer"' message-hook ~G 'macro index,pager \cb ,B "URL viewer"' CONFIGURATIONIf you're using it with Curses::UI (i.e. as a standalone URL selector), this Perl script will try and figure out what command to use based on the contents of your ~/.urlview file. However, it also has its own configuration file (~/.extract_urlview) that will be used instead, if it exists. So far, there are eight kinds of lines you can have in this file:
Here is an example config file: SHORTCUT COMMAND mozilla-firefox -remote "openURL(%s,new-window)" HTML_TAGS a,iframe,link ALTSELECT Q DEFAULT_VIEW context STANDARDSNone.AVAILABILITYhttp://www.memoryhole.net/~kyle/extract_url/SEE ALSOmutt(1) urlview(1) urlscan(1)CAVEATSAll URLs have any potentially dangerous shell characters (namely a single quote and a dollar sign) removed (transformed into percent-encoding) before they are used in a shell. This should eliminate the possibility of a bad URL breaking the shell.If using Curses::UI, and a URL is too big for your terminal, when you select it, extract_url.pl will (by default) ask you to review it in a way that you can see the whole thing. AUTHORProgram was written by Kyle Wheeler <kyle@memoryhole.net>Released under license BSD-2-Cluase (simplified) For more information about the license, visit <http://spdx.org/licenses/BSD-2-Clause>.
Visit the GSP FreeBSD Man Page Interface. |