faup
—
Finally An Url Parser!
faup |
[-ablptu ] [-d
delim] [-f
scheme | credential |
subdomain | domain |
domain_without_tld | host |
tld | port |
resource_path | query_string |
fragment] [-m
module1 module2 ...] [-o
csv | json |
module] [-r
count] [-w
ip:port] ⟨url |
file⟩ |
Everytime faup
is called with an
url or a file containing a list of
URLs, it will split the various parts of it.
faup
is designed to be really fast to do
that operation, much faster than a regex, and give you much more features.
The problem is harder than what it first appears. If you want to get fields
like subdomains from an URL, you will need to extract the TLD properly. And
since extracting the TLD properly is impossible with a regex, you will be
stuck.
The following options are available:
-a
- skip provided argument file open check
-b
- Run the webserver in background
-d
delim
- Use delim as the field delimiter character instead
of the comma character.
-f
field
- Extract only the wanted field which can be one of
the following: scheme, credential, subdomain, domain, domain_without_tld,
host, tld, port, resource_path, query_string, fragment
-l
- Prefix each line with the line number. Works only if the output is
CSV.
-m
module1 module2 ..
- Load the modules in the wanted order. If the list is empty,
faup
will run without any module.
-o
format
- Output in the wanted format which can be one of the
following: csv, json, module
-p
- print the header (applies to CSV only)
-r
count
- Remove the count number of characters from the end.
Useful to remove the dot in DNS urls given.
-t
- do not extract TLD > 1 (eg. only get 'uk' instead of 'co.uk')
-u
- Update the TLD suffix list
-w
ip:port
- Start
faup
in webserver mode
The faup
utility exits 0 on success,
and >0 if an error occurs.
Extract the TLD from slashdot.org:
faup -f tld
www.slashdot.org
Run faup in webserver mode in background:
faup -b -w 0.0.0.0:9876
List all the available modules:
faup $ modules list all
Enable the uppercase lua module:
faup $ modules enable
uppercase.lua
Output the wanted url in json
faup -o json
http://www.example.co.uk:1234/foo.html#bar
The following environment variable affects the exectution of
faup
:
FAUP_DATA_DIR
- If the environment variable
FAUP_DATA_DIR
is set,
tell faup
where to find its datadir, to get
modules or the tld prefix list ('mozilla.tlds')