|
|
| |
KHTTP_PARSE(3) |
FreeBSD Library Functions Manual |
KHTTP_PARSE(3) |
khttp_parse , khttp_parsex
—
parse a CGI instance for kcgi
#include <sys/types.h>
#include <stdarg.h>
#include <stdint.h>
#include <kcgi.h>
enum kcgi_err
khttp_parse (struct kreq *req,
const struct kvalid *keys, size_t
keysz, const char *const *pages,
size_t pagesz, size_t
defpage);
enum kcgi_err
khttp_parsex (struct kreq *req,
const struct kmimemap *suffixes, const
char *const *mimes, size_t mimesz,
const struct kvalid *keys, size_t
keysz, const char *const *pages,
size_t pagesz, size_t defmime,
size_t defpage, void *arg,
void (*argfree)(void *arg), unsigned
int debugging, const struct kopts *opts);
extern const char *const
kmimetypes[KMIME__MAX];
extern const char *const khttps[KHTTP__MAX];
extern const char *const kschemes[KSCHEME__MAX];
extern const char *const kmethods[KMETHOD__MAX];
extern const struct kmimemap ksuffixmap[];
extern const char *const ksuffixes[KMIME__MAX];
The khttp_parse () and
khttp_parsex () functions parse and validate input and
the HTTP environment (compression, paths, MIME types, and so on). They are the
central functions in the
kcgi(3)
library, parsing and validating key-value form (query string, message body,
cookie) data and opaque message bodies.
They must be matched by
khttp_free(3)
if and only if the return value is KCGI_OK .
Otherwise, resources are internally freed.
The collective arguments are as follows:
- arg
- A pointer to private application data. It is not touched unless
argfree is provided.
- argfree
- Function invoked with arg by the child process
starting to parse untrusted network data. This makes sure that no
unnecessary data is leaked into the child.
- debugging
- This bit-field enables debugging of the underlying parse and/or write
routines. It may have
KREQ_DEBUG_WRITE for writes
and KREQ_DEBUG_READ_BODY for the pre-parsed body.
Debugging messages to
kutil_info(3)
consist of the process ID followed by “-tx” or
“-rx” for writing or reading, a colon and space, then the
logged data. A newline will flush the existing line, as well reaching 80
characters. If flushed at 80 characters and not a newline, an ellipsis
will follow the line. The total logged bytes will be emitted at the end of
all reads or writes.
- defmime
- If no MIME type is specified (that is, there's no suffix to the page
request), use this index in the mimes array.
- defpage
- If no page was specified (e.g., the default landing page), this is
provided as the requested page index.
- keys
- An optional array of input and validation fields or
NULL .
- keysz
- The number of elements in keys.
- mimesz
- The number of elements in mimes. Also the MIME index
used if no MIME type was matched. This differs from
defmime, which is used if there is no MIME suffix at
all.
- mimes
- An array of MIME types (e.g., “text/html”), mapped into a
MIME index during MIME body parsing. This relates both to pages and input
fields with a body type. Any array should include at least
text/plain , as this is the default content type
for MIME documents.
- opts
- Tunable options regarding socket buffer sizes and so on. If set to
NULL , meaningful defaults are used.
- pages
- An array of recognised pathnames. When pathnames are parsed, they're
matched to indices in this array.
- pagesz
- The number of pages in pages. Also used if the
requested page was not in pages.
- req
- This structure is cleared and filled with input fields and HTTP context
parsed from the CGI environment. It is the main structure carried around
in a
kcgi(3)
application.
- suffixes
- Define the MIME type (suffix) mapping.
The first form, khttp_parse (), is for
applications using the system-recognised MIME types. This should work well
enough for most applications. It is equivalent to invoking the second form,
khttp_parsex (), as follows:
khttp_parsex(req, ksuffixmap,
kmimetypes, KMIME__MAX, keys, keysz,
pages, pagesz, KMIME_TEXT_HTML,
defpage, NULL, NULL, 0, NULL);
A struct kreq object is filled in by
khttp_parse () and
khttp_parsex (). It consists of the following fields:
- void *arg
- Private application data. This is set during
khttp_parse ().
- enum kauth auth
- Type of “managed” HTTP authorisation performed by the web
server according to the
AUTH_TYPE header variable,
if any. This is KAUTH_DIGEST for the
AUTH_TYPE of “digest”,
KAUTH_BASIC for “basic”,
KAUTH_BEARER for “bearer”,
KAUTH_UNKNOWN for other values of
AUTH_TYPE , or KAUTH_NONE
if AUTH_TYPE is not set. See the
rawauth field for raw (i.e., not processed by the
web server) authorisation requests.
- struct kpair **cookiemap
- An array of keysz singly linked lists of elements of
the cookies array. If
cookie->key is equal to one
of the entries of keys and
cookie->state is
KPAIR_VALID or
KPAIR_UNCHECKED , the cookie is added to the list
cookiemap[cookie->keypos].
Empty lists are NULL . If a list contains more than
one cookie, cookie->next
points to the next cookie. For the last cookie in a list,
cookie->next is NULL.
- struct kpair **cookienmap
- Similar to cookiemap, except that it contains the
cookies where cookie->state
is
KPAIR_INVALID .
- struct kpair *cookies
- Key-value pairs read from request cookies found in the
HTTP_COOKIE header variable, or
NULL if cookiesz is 0. See
fields for key-value pairs from the request query
string or message body.
- size_t cookiesz
- The size of the cookies array.
- struct kpair **fieldmap
- Similar to cookiemap, except that the lists contain
elements of the fields array.
- struct kpair **fieldnmap
- Similar to fieldmap, except that it contains the
fields where field->state
is
KPAIR_INVALID .
- struct kpair *fields
- Key-value pairs read from the
QUERY_STRING header
variable and from the message body, or NULL if
fieldsz is 0. See cookies
for key-value pairs from request cookies.
- size_t fieldsz
- The number of elements in the fields array.
- char *fullpath
- The full requested path as contained in the
PATH_INFO header variable. For example, requesting
“https://bsd.lv/app.cgi/dir/file.html?q=v”, where
“app.cgi” is the CGI program, this value would be
/dir/file.html. It is not guaranteed to start with
a slash and it may be an empty string.
- char *host
- The host name received in the
HTTP_HOST header
variable. When using name-based virtual hosting, this is typically the
virtual host name specified by the client in the HTTP request, and it
should not be confused with the canonical DNS name of the host running the
web server. For example, a request to
“https://bsd.lv/app.cgi/file” would have a host of
“bsd.lv”. If HTTP_HOST is not
defined, host is set to
“localhost”.
- struct kdata *kdata
- Internal data. Should not be touched.
- const struct kvalid *keys
- Value passed to
khttp_parse ().
- size_t keysz
- Value passed to
khttp_parse ().
- enum kmethod method
- The
KMETHOD_ACL ,
KMETHOD_CONNECT ,
KMETHOD_COPY ,
KMETHOD_DELETE ,
KMETHOD_GET , KMETHOD_HEAD ,
KMETHOD_LOCK ,
KMETHOD_MKCALENDAR ,
KMETHOD_MKCOL ,
KMETHOD_MOVE ,
KMETHOD_OPTIONS ,
KMETHOD_POST ,
KMETHOD_PROPFIND ,
KMETHOD_PROPPATCH ,
KMETHOD_PUT ,
KMETHOD_REPORT ,
KMETHOD_TRACE , or
KMETHOD_UNLOCK submission method obtained from the
REQUEST_METHOD header variable. If an unknown
method was requested, KMETHOD__MAX is used. If no
method was specified, the default is KMETHOD_GET .
Applications will usually accept only
KMETHOD_GET and
KMETHOD_POST , so be sure to emit a
KHTTP_405 status for undesired methods.
- size_t mime
- The MIME type of the requested file as determined by its
suffix matched to the
mimemap map passed to
khttp_parsex () or the default
kmimemap if using
khttp_parse (). This defaults to the
mimesz value passed to
khttp_parsex () or the default
KMIME__MAX if using
khttp_parse () when no suffix is specified or when
the suffix is specified but not known.
- size_t page
- The page index found by looking up pagename in the
pages array. If pagename is
not found in pages, pagesz is
used; if pagename is empty,
defpage is used.
- char *pagename
- The first component of fullpath or an empty string
if there is none. It is compared to the elements of the
pages array to determine which
page it corresponds to. For example, for a
fullpath of “/dir/file.html” this
component corresponds to dir. For
“/file.html”, it's file.
- char *path
- The middle part of fullpath, after stripping
pagename/ at the beginning and
.suffix at the end, or an empty string if there is
none. For example, if the fullpath is
bar/baz.html, this component is
baz.
- char *pname
- The script name received in the
SCRIPT_NAME header
variable. For example, for a request to a CGI program
/var/www/cgi-bin/app.cgi mapped by the web server
from “https://bsd.lv/app.cgi/file”, this would be
app.cgi. This may not reflect a file system entity
and it may be an empty string.
- uint16_t port
- The server's receiving TCP port according to the
SERVER_PORT header variable, or 80 if that is not
defined or an invalid number.
- struct khttpauth rawauth
- The raw authorization request according to the
HTTP_AUTHORIZATION header variable passed by the
web server. This is only set if the web server is not managing
authorisation itself.
- char *remote
- The string form of the client's IPv4 or IPv6 address taken from the
REMOTE_ADDR header variable, or
“127.0.0.1” if that is not defined. The address format of
the string is not checked.
- struct khead
*reqmap[
KREQU__MAX ]
- Mapping of enum krequ enumeration values to
reqs parsed from the input stream.
- struct khead *reqs
- List of all HTTP request headers, known via enum
krequ and not known, parsed from the input stream, or
NULL if reqsz is 0.
- size_t reqsz
- Number of request headers in reqs.
- enum kscheme scheme
- The access scheme according to the
HTTPS header
variable, either KSCHEME_HTTPS if
HTTPS is set and equal to the string
“on” or KSCHEME_HTTP otherwise.
- char *suffix
- The suffix part of the last component of fullpath or
an empty string if there is none. For example, if the
fullpath is /bar/baz.html,
this component is html. See the
mime field for the MIME type parsed from the
suffix.
The application may optionally define keys
provided to khttp_parse () and
khttp_parsex () as an array of struct
kvalid. This structure is central to the validation of input data. It
consists of the following fields:
- const char *name
- The field name, i.e., how it appears in the HTML form input name. This
cannot be
NULL . If the field name is an empty
string and the HTTP message consists of an opaque body (and not key-value
pairs), then that field will be used to validate the HTTP message body.
This is useful for KMETHOD_PUT style
requests.
- int (*)(struct kpair *)
valid
- A validation function returning non-zero if parsing and validation succeed
or 0 otherwise. If it is
NULL , then no validation
is performed, the data is considered as valid, and it is bucketed into
cookiemap or fieldmap as such.
User-defined valid functions usually set
the type and parsed fields
in the key-value pair. When working with binary data or with a key that
can take different data types, it is acceptable for a validation
function to set the type to
KPAIR__MAX and for the application to ignore the
parsed field and to work directly with
val and valsz.
The validation function is allowed to allocate new memory for
val: if the val pointer
changes during validation, the memory pointed to after validation will
be freed with
free(3)
after the data is passed out of the sandbox.
These functions are invoked from within a system-specific
sandbox that may not allow some system calls, for example opening files
or sockets. In other words, validation functions should only do pure
computation.
The struct kpair structure presents the user
with fields parsed from input and (possibly) matched to the
keys variable passed to
khttp_parse () and
khttp_parsex (). It is also passed to the validation
function to be filled in. In this case, the MIME-related fields are already
filled in and may be examined to determine the method of validation. This is
useful when validating opaque message bodies.
- char *ctype
- The value's MIME content type (e.g.,
image/jpeg ),
or an empty string if not defined.
- size_t ctypepos
- If ctype is not
NULL , it is
looked up in the mimes parameter passed to
khttp_parsex () or ksuffixmap
if using khttp_parse (). If found, it is set to the
appropriate index. Otherwise, it's mimesz.
- char *file
- The value's MIME source filename or an empty string if not defined.
- char *key
- The NUL-terminated key (input) name. If the HTTP message body is opaque
(e.g.,
KMETHOD_PUT ), then an empty-string key is
cooked up. The key may contain an arbitrary sequence of non-NUL bytes,
even non-ASCII bytes, control characters, and shell metacharacters.
- size_t keypos
- If found in the keys array passed to
khttp_parse (), the index of the matching key.
Otherwise keysz.
- struct kpair *next
- In a cookie or field map, next points to the next
parsed key-value pair with the same key name. This
occurs most often in HTML checkbox forms, where many fields may have the
same name.
- union parsed parsed
- The parsed, validated value. These may be integer in
i, for a 64-bit signed integer; a string
s, for a NUL-termianted character string; or a
double d, for a double-precision floating-point
number. This is intentionally basic because the resulting data must be
reliably passed from the parsing context back into the web
application.
- enum kpairstate state
- The validation state:
KPAIR_VALID if the pair was
successfully validated by a validation function,
KPAIR_INVALID if a validation function was invoked
but failed, or KPAIR_UNCHECKED if no validation
function is defined for this key.
- enum kpairtype type
- If parsed, the type of data in parsed, otherwise
KFIELD__MAX .
- char *val
- The (input) value, which may contain an arbitrary sequence of bytes, even
NUL bytes, non-ASCII bytes, control characters, and shell metacharacters.
The byte following the end of the array,
val[valsz], is always
guaranteed to be NUL. The validation function may modify the contents. For
example, for integer numbers and e-mail adresses, trailing whitespace may
be replaced with NUL bytes.
- size_t valsz
- The length of the val buffer in bytes. It is not a
string length.
- char *xcode
- The value's MIME content transfer encoding (e.g.,
base64 ), or an empty string if not defined.
The struct khttpauth structure holds
authorisation data if passed by the server. The specific fields are as
follows.
- enum kauth type
- If no data was passed by the server, the type value
is
KAUTH_NONE . Otherwise it's
KAUTH_BASIC , KAUTH_BEARER ,
or KAUTH_DIGEST .
KAUTH_UNKNOWN signals that the authorisation type
was not recognised.
- int authorised
- For
KAUTH_BASIC ,
KAUTH_BEARER , or
KAUTH_DIGEST authorisation, this field indicates
whether all required values were specified for the application to perform
authorisation.
- char *digest
- An MD5 digest of
REQUEST_METHOD ,
SCRIPT_NAME , PATH_INFO ,
header variables and the request body. It is not a NUL-terminated string,
but an array of exactly MD5_DIGEST_LENGTH bytes.
Only filled in when HTTP_AUTHORIZATION is
“digest” and authorised is non-zero.
Otherwise, it remains NULL . Used in
khttpdigest_validatehash(3).
- d
- An anonymous union containing parsed fields per type:
struct khttpbasic basic for
KAUTH_BASIC or
KAUTH_BEARER , or struct
khttpdigest digest for
KAUTH_DIGEST .
If the field for an HTTP authorisation request is
KAUTH_BASIC or KAUTH_BEARER ,
it will consist of the following for its parsed entities in its
struct khttpbasic structure:
- response
- The hashed and encoded response string for
KAUTH_BASIC , or an opaque string for
KAUTH_BEARER .
If the field for an HTTP authorisation request is
KAUTH_DIGEST , it will consist of the following in
its struct khttpdigest structure:
- alg
- The encoding algorithm, parsed from the possible
MD5 or MD5-Sess
values.
- qop
- The quality of protection algorithm, which may be unspecified,
Auth or Auth-Init .
- user
- The user coordinating the request.
- uri
- The URI for which the request is designated. (This must match the request
URI).
- realm
- The request realm.
- nonce
- The server-generated nonce value.
- cnonce
- The (optional) client-generated nonce value.
- response
- The hashed and encoded response string, which entangled fields depending
on algorithm and quality of protection.
- count
- The (optional) cnonce counter.
- opaque
- The (optional) opaque string requested by the server.
The struct kopts structure consists of
tunables for network performance. You probably don't want to use these
unless you really know what you're doing!
- sndbufsz
- The size of the output buffer. The output buffer is a heap-allocated
region into which writes (via
khttp_write(3)
and
khttp_head(3))
are buffered instead of being flushed directly to the wire. The buffer is
flushed when it is full, when the HTTP headers are flushed, and when
khttp_free(3)
is invoked. If the buffer size is zero, writes are flushed immediately to
the wire. If the buffer size is less than zero, it is filled with a
meaningful default.
Lastly, the struct khead structure holds
parsed HTTP headers.
- key
- Holds the HTTP header name. This is not the CGI header name (e.g.,
HTTP_COOKIE ), but the reconstituted HTTP name
(e.g., Coookie ).
- val
- The opaque header value, which may be an empty string.
A number of variables are defined
<kcgi.h> to simplify
invocations of the khttp_parse () family. Applications
are strongly suggested to use these variables (and associated enumerations) in
khttp_parse () instead of overriding them with
hand-rolled sets in khttp_parsex ().
- kmimetypes
- Indexed list of common MIME types, for example, “text/html”
and “application/json”. Corresponds to enum
kmime enum khttp.
- khttps
- Indexed list of HTTP status code and identifier, for example, “200
OK”. Corresponds to enum khttp.
- kschemes
- Indexed list of URL schemes, for example, “https” or
“ftp”. Corresponds to enum
kscheme.
- kmethods
- Indexed list of HTTP methods, for example, “GET” and
“POST”. Corresponds to enum
kmethod.
- ksuffixmap
- Map of MIME types defined in enum kmime to possible
suffixes. This array is terminated with a MIME type of
KMIME__MAX and name
NULL .
- ksuffixes
- Indexed list of canonical suffixes for MIME types corresponding to
enum kmime. This may be a
NULL pointer for types that have no canonical
suffix, for example. “application/octet-stream”.
khttp_parse () and khttp_parsex ()
return an error code:
KCGI_OK
- Success (not an error).
KCGI_ENOMEM
- Memory failure. This can occur in many places: spawning a child,
allocating memory, creating sockets, etc.
KCGI_ENFILE
- Could not allocate file descriptors.
KCGI_EAGAIN
- Could not spawn a child.
KCGI_FORM
- Malformed data between parent and child whilst parsing an HTTP request.
(Internal system error.)
KCGI_SYSTEM
- Opaque operating system error.
On failure, the calling application should terminate as soon as
possible. Applications should not try to write an HTTP 505
error or similar, but allow the web server to handle the empty CGI response
on its own.
The khttp_parse () and
khttp_parsex () functions were written by
Kristaps Dzonsons
<kristaps@bsd.lv>.
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |