|
|
| |
Mail::SpamAssassin::PerMsgStatus(3) |
User Contributed Perl Documentation |
Mail::SpamAssassin::PerMsgStatus(3) |
Mail::SpamAssassin::PerMsgStatus - per-message status (spam or not-spam)
my $spamtest = Mail::SpamAssassin->new({
'rules_filename' => '/etc/spamassassin.rules',
'userprefs_filename' => $ENV{HOME}.'/.spamassassin/user_prefs'
});
my $mail = $spamtest->parse();
my $status = $spamtest->check ($mail);
my $rewritten_mail;
if ($status->is_spam()) {
$rewritten_mail = $status->rewrite_mail ();
}
...
The Mail::SpamAssassin "check()" method
returns an object of this class. This object encapsulates all the per-message
state.
- $status->check ()
- Runs the SpamAssassin rules against the message pointed to by the
object.
- $status->learn()
- After a mail message has been checked, this method can be called. If the
score is outside a certain range around the threshold, ie. if the message
is judged more-or-less definitely spam or definitely non-spam, it will be
fed into SpamAssassin's learning systems (currently the naive Bayesian
classifier), so that future similar mails will be caught.
- $score = $status->get_autolearn_points()
- Return the message's score as computed for auto-learning. Certain tests
are ignored:
- rules with tflags set to 'learn' (the Bayesian rules)
- rules with tflags set to 'userconf' (user white/black-listing rules, etc)
- rules with tflags set to 'noautolearn'
Also note that auto-learning occurs using scores from either
scoreset 0 or 1, depending on what scoreset is used during message
check. It is likely that the message check and auto-learn scores will be
different.
- $score = $status->get_head_only_points()
- Return the message's score as computed for auto-learning, ignoring all
rules except for header-based ones.
- $score = $status->get_learned_points()
- Return the message's score as computed for auto-learning, ignoring all
rules except for learning-based ones.
- $score = $status->get_body_only_points()
- Return the message's score as computed for auto-learning, ignoring all
rules except for body-based ones.
- $score = $status->get_autolearn_force_status()
- Return whether a message's score included any rules that are flagged as
autolearn_force.
- $rule_names = $status->get_autolearn_force_names()
- Return a list of comma separated list of rule names if a message's score
included any rules that are flagged as autolearn_force.
- $isspam = $status->is_spam ()
- After a mail message has been checked, this method can be called. It will
return 1 for mail determined likely to be spam, 0 if it does not seem
spam-like.
- $list = $status->get_names_of_tests_hit ()
- After a mail message has been checked, this method can be called. It will
return a comma-separated string, listing all the symbolic test names of
the tests which were triggered by the mail.
- $list = $status->get_names_of_tests_hit_with_scores_hash ()
- After a mail message has been checked, this method can be called. It will
return a pointer to a hash for rule & score pairs for all the symbolic
test names and individual scores of the tests which were triggered by the
mail.
- $list = $status->get_names_of_tests_hit_with_scores ()
- After a mail message has been checked, this method can be called. It will
return a comma-separated string of rule=score pairs for all the symbolic
test names and individual scores of the tests which were triggered by the
mail.
- $list = $status->get_names_of_subtests_hit ()
- After a mail message has been checked, this method can be called. It will
return a comma-separated string, listing all the symbolic test names of
the meta-rule sub-tests which were triggered by the mail. Sub-tests are
the normally-hidden rules, which score 0 and have names beginning with two
underscores, used in meta rules.
If a parameter of collapsed or dbg is passed, the output will
be a condensed array of sub-tests with multiple hits reduced to one
entry.
If the parameter of dbg is passed, the output will be a
condensed string of sub-tests with multiple hits reduced to one entry
with the number of hits in parentheses. Some information is also added
at the end regarding the multiple hits.
- $num = $status->get_score ()
- After a mail message has been checked, this method can be called. It will
return the message's score.
- $num = $status->get_required_score ()
- After a mail message has been checked, this method can be called. It will
return the score required for a mail to be considered spam.
- $num = $status->get_autolearn_status ()
- After a mail message has been checked, this method can be called. It will
return one of the following strings depending on whether the mail was
auto-learned or not: "ham", "no", "spam",
"disabled", "failed", "unavailable".
It also returns is flagged with auto_learn_force, it will also
include the status and the rules hit. For example:
"autolearn_force=yes (AUTOLEARNTEST_BODY)"
- $report = $status->get_report ()
- Deliver a "spam report" on the checked mail message. This
contains details of how many spam detection rules it triggered.
The report is returned as a multi-line string, with the lines
separated by "\n" characters.
- $preview = $status->get_content_preview ()
- Give a "preview" of the content.
This is returned as a multi-line string, with the lines
separated by "\n" characters,
containing a fully-decoded, safe, plain-text sample of the first few
lines of the message body.
- $msg = $status->get_message()
- Return the object representing the message being scanned.
- $status->rewrite_mail ()
- Rewrite the mail message. This will at minimum add headers, and at maximum
MIME-encapsulate the message text, to reflect its spam or not-spam status.
The function will return a scalar of the rewritten message.
The actual modifications depend on the configuration (see
"Mail::SpamAssassin::Conf" for more
information).
The possible modifications are as follows:
- To:, From: and Subject: modification on spam mails
- Depending on the configuration, the To: and From: lines can have a
user-defined RFC 2822 comment appended for spam mail. The subject line may
have a user-defined string prepended to it for spam mail.
- X-Spam-* headers for all mails
- Depending on the configuration, zero or more headers with names beginning
with "X-Spam-" will be added to mail
depending on whether it is spam or ham.
- spam message with report_safe
- If report_safe is set to true (1), then spam messages are encapsulated
into their own message/rfc822 MIME attachment without any modifications
being made.
If report_safe is set to false (0), then the message will only
have the above headers added/modified.
- $status->action_depends_on_tags($tags, $code, @args)
- Enqueue the supplied subroutine reference $code,
to become runnable when all the specified tags become available. The
$tags may be a simple scalar - a tag name, or a
listref of tag names. The subroutine &$code
when called will be passed a
"permessagestatus" object as its first
argument, followed by the supplied (optional) list
@args .
- $status->set_tag($tagname, $value)
- Set a template tag, as used in
"add_header", report templates, etc.
This API is intended for use by plugins. Tag names will be converted to an
all-uppercase representation internally. Tag names must consist of ONLY
alphanumeric characters.
$value can be a simple scalar (string
or number), or a reference to an array, in which case the public method
get_tag will join array elements using a space as a separator, returning
a single string for backward compatibility.
$value can also be a subroutine
reference, which will be evaluated each time the template is expanded.
The first argument passed by get_tag to a called subroutine will be a
PerMsgStatus object (this module's object), followed by optional
arguments provided a caller to get_tag.
Note that perl supports closures, which means that variables
set in the caller's scope can be accessed inside this
"sub". For example:
my $text = "hello world!";
$status->set_tag("FOO", sub {
my $pms = shift;
return $text;
});
See
"Mail::SpamAssassin::Conf"'s
"TEMPLATE TAGS" section for more
details on how template tags are used.
"undef" will be returned if
a tag by that name has not been defined.
- $string = $status->get_tag($tagname)
- Get the current value of a template tag, as used in
"add_header", report templates, etc.
This API is intended for use by plugins. Tag names will be converted to an
all-uppercase representation internally. See
"Mail::SpamAssassin::Conf"'s
"TEMPLATE TAGS" section for more details
on tags.
"undef" will be returned if
a tag by that name has not been defined.
- $string = $status->get_tag_raw($tagname, @args)
- Similar to "get_tag", but keeps a tag
name unchanged (does not uppercase it), and does not convert arrayref tag
values into a single string.
- $status->set_spamd_result_item($subref)
- Set an entry for the spamd result log line.
$subref should be a code reference for a
subroutine which will return a string in
'name=VALUE' format, similar to the other entries
in the spamd result line:
Jul 17 14:10:47 radish spamd[16670]: spamd: result: Y 22 - ALL_NATURAL,
DATE_IN_FUTURE_03_06,DIET_1,DRUGS_ERECTILE,DRUGS_PAIN,
TEST_FORGED_YAHOO_RCVD,TEST_INVALID_DATE,TEST_NOREALNAME,
TEST_NORMAL_HTTP_TO_IP,UNDISC_RECIPS scantime=0.4,size=3138,user=jm,
uid=1000,required_score=5.0,rhost=localhost,raddr=127.0.0.1,
rport=33153,mid=<9PS291LhupY>,autolearn=spam
"name" and
"VALUE" must not contain
"=" or
"," characters, as it is important
that these log lines are easy to parse.
The code reference will be called by spamd after the message
has been scanned, and the
"PerMsgStatus::check()" method has
returned.
- $status->finish ()
- Indicate that this $status object is finished
with, and can be destroyed.
If you are using SpamAssassin in a persistent environment, or
checking many mail messages from one
"Mail::SpamAssassin" factory, this
method should be called to ensure Perl's garbage collection will clean
up old status objects.
- $name = $status->get_current_eval_rule_name()
- Return the name of the currently-running eval rule.
"undef" is returned if no eval rule is
currently being run. Useful for plugins to determine the current rule name
while inside an eval test function call.
- $status->get_decoded_body_text_array ()
- Returns the message body, with base64 or quoted-printable
encodings decoded, and non-text parts or non-inline attachments stripped.
This is the same result text as used in 'rawbody' rules.
It is returned as an array of strings, with each string being
a 2-4kB chunk of the body, split from boundaries if possible.
- $status->get_decoded_stripped_body_text_array ()
- Returns the message body, decoded (as described in
get_decoded_body_text_array()), with HTML rendered, and with
whitespace normalized.
This is the same result text as used in 'body' rules.
It will always render text/html.
It is returned as an array of strings, with each string
representing one 'paragraph'. Paragraphs, in plain-text mails, are
double-newline-separated blocks of multi-line text.
- $status->get (header_name [, default_value])
- Returns a message header, pseudo-header or a real name, email-address or
some other parsed value set by modifiers.
"header_name" is the name of a mail
header, such as 'Subject', 'To', etc.
Should be called in list context since 4.0. Will return list
of headers content, or other values when modifiers used.
If "default_value" is given,
it will be used if the requested
"header_name" does not exist. This is
mainly useful when called in scalar context to set 'undef' instead of
legacy '' return value when header does not exist.
Appending ":raw" modifier to
the header name will inhibit decoding of quoted-printable or base-64
encoded strings.
Appending ":addr" modifier
to the header name will return all email-addresses found in the header.
It is mainly applicable to header fields 'From', 'Sender', 'To', 'Cc'
along with their 'Resent-*' counterparts, and the 'Return-Path'. For
example, all of the following will result in "example@foo"
(and "example@bar"):
- example@foo
- example@foo (Foo Blah), <example@bar>
- example@foo, example@bar
- display: example@foo (Foo Blah), example@bar ;
- Foo Blah <example@foo>
- "Foo Blah" <example@foo>
- "'Foo Blah'" <example@foo>
Appending ":name" modifier to
the header name will return all "display names" from the header
field. As with ":addr", it is mainly
applicable to header fields 'From', 'Sender', 'To', 'Cc' along with their
'Resent-*' counterparts, and the 'Return-Path'. For example, all of the
following will result in "Foo Blah" (and "Bar Baz"). One
level of single quotes is stripped too, as it is often seen.
- example@foo (Foo Blah)
- example@foo (Foo Blah), "Bar Baz" <example@bar>
- display: example@foo (Foo Blah), example@bar ;
- Foo Blah <example@foo>
- "Foo Blah" <example@foo>
- "'Foo Blah'" <example@foo>
Appending ":host" to the header
name will return the first hostname-looking string that ends with a valid
TLD. First it tries to find a match after @ character (possible email), then
from any part of the header. Normal use of this would be for example
'From:addr:host' to return the hostname portion of a From-address.
Appending ":domain" to the
header name implies ":host", but will
return only domain part of the hostname, as returned by
RegistryBoundaries::trim_domain().
Appending ":ip" to the header
name, will return the first IPv4 or IPv6 address string found. Could be used
for example as 'X-Originating-IP:ip'.
Appending ":revip" to the header
name implies ":ip", but will return the
found IP in reverse (usually for DNSBL usage).
Appending ":first" modifier to
the header name will return only the first (topmost) header, in case there
are multiple ones. Similarly ":last" will
select the last one. These affect only the physical header line selection.
If selected header is parsed further with
":addr" or similar, it may return multiple
results, if the selected header contains multiple addresses.
There are several special pseudo-headers that can be
specified:
- "ALL" can be used to mean the text of all the message's headers.
Each header is decoded and unfolded to single line, unless called with
:raw.
- "ALL-TRUSTED" can be used to mean the text of all the message's
headers that could only have been added by trusted relays.
- "ALL-INTERNAL" can be used to mean the text of all the message's
headers that could only have been added by internal relays.
- "ALL-UNTRUSTED" can be used to mean the text of all the
message's headers that may have been added by untrusted relays. To make this
pseudo-header more useful for header rules the 'Received' header that was
added by the last trusted relay is included, even though it can be
trusted.
- "ALL-EXTERNAL" can be used to mean the text of all the message's
headers that may have been added by external relays. Like
"ALL-UNTRUSTED" the 'Received' header added by the last internal
relay is included.
- "ToCc" can be used to mean the contents of both the 'To' and
'Cc' headers.
- "EnvelopeFrom" is the address used in the 'MAIL FROM:' phase of
the SMTP transaction that delivered this message, if this data has been made
available by the SMTP server.
- "MESSAGEID" is a symbol meaning all Message-Id's found in the
message; some mailing list software moves the real 'Message-Id' to
'Resent-Message-Id' or 'X-Message-Id', then uses its own one in the
'Message-Id' header. The value returned for this symbol is the text from all
3 headers, separated by newlines.
- "X-Spam-Relays-Untrusted" is the generated metadata of untrusted
relays the message has passed through
- "X-Spam-Relays-Trusted" is the generated metadata of trusted
relays the message has passed through
- "X-Spam-Relays-External" is the generated metadata of external
relays the message has passed through
- "X-Spam-Relays-Internal" is the generated metadata of internal
relays the message has passed through
- $status->get_uri_list ()
- Returns an array of all unique URIs found in the message. It takes a
combination of the URIs found in the rendered (decoded and HTML stripped)
body and the URIs found when parsing the HTML in the message. Will also
set $status->{uri_list} (the array as returned
by this function).
The returned array will include the "raw" URI as
well as "slightly cooked" versions. For example, the single
URI 'http://%77w%77.example.com/' will get turned into: (
'http://%77w%77.example.com/', 'http://www.example.com/'
)
- $status->get_uri_detail_list ()
- Returns a hash reference of all unique URIs found in the message and
various data about where the URIs were found in the message. It takes a
combination of the URIs found in the rendered (decoded and HTML stripped)
body and the URIs found when parsing the HTML in the message. Will also
set $status->{uri_detail_list} (the hash
reference as returned by this function).
The hash format looks something like this:
raw_uri => {
types => { a => 1, img => 1, parsed => 1, domainkeys => 1,
unlinked => 1, schemeless => 1 },
cleaned => [ canonicalized_uri ],
anchor_text => [ "click here", "no click here" ],
domains => { domain1 => 1, domain2 => 1 },
hosts => { host1 => domain1, host2 => domain2 },
}
"raw_uri" is whatever the
URI was in the message itself (http://spamassassin.apache%2Eorg/). Uris
parsed from text will be prefixed with scheme if missing (http://,
mailto: etc). HTML uris are as found.
"types" is a hash of the
HTML tags (lowercase) which referenced the raw_uri. parsed is a
faked type which specifies that the raw_uri was seen in the rendered
text. domainkeys is defined when raw_uri was found from DK/DKIM
d= field. unlinked is defined when it's assumed that MUA will not
linkify uri (found in body without scheme or www. prefix).
schemeless is always added for uris without scheme, regardless of
linkifying (i.e. email address found in body without mailto:).
"cleaned" is an array of the
raw and canonicalized version of the raw_uri
(http://spamassassin.apache%2Eorg/,
https://spamassassin.apache.org/).
"anchor_text" is an array of
the anchor text (text between <a> and </a>), if any, which
linked to the URI.
"domains" is a hash of the
domains found in the canonicalized URIs.
"hosts" is a hash of
unstripped hostnames found in the canonicalized URIs as hash keys, with
their domain part stored as a value of each hash entry.
- $status->add_uri_detail_list ($raw_uri, $types, $source,
$valid_domain)
- Adds values to internal uri_detail_list. When used from Plugins,
recommended to call from parsed_metadata (along with
register_method_priority, -10) so other Plugins calling
get_uri_detail_list() will see it.
"raw_uri" is the URI to be
added. The only required parameter.
"types" is an optional hash
reference, contents are added to uri_detail_list->{types} (see
get_uri_detail_list for known keys). parsed is default is no hash
given. nocanon does not run uri_list_canonicalize (no redirector,
uri fixing). noclean skips adding uri_detail_list->{cleaned},
so it would not be used in "uri" rule checks, but domain/hosts
would still be used for URIBL/RBL purposes.
"source" is an optional
simple string, only used for debug logging purposes to identify where
uri originates from (default: "parsed").
"valid_domain" is an
optional boolean (0/1). If true, uri will not be added unless
hostname/domain is in valid format and contains a valid TLD. (default:
0)
- $status->clear_test_state()
- DEPRECATED, UNNEEDED SINCE 4.0
- $status->got_hit ($rulename, $desc_prepend [, name => value,
...])
- Register a hit against a rule in the ruleset.
There are two mandatory arguments. These are
$rulename, the name of the rule that fired, and
$desc_prepend, which is a short string that will
be prepended to the rules "describe"
string in output reports.
In addition, callers can supplement that with the following
optional data:
- score => $num
- Optional: the score to use for the rule hit. If unspecified, the value
from the "Mail::SpamAssassin::Conf"
object's "{scores}" hash will be used (a
configured score), and in its absence the
"defscore" option value.
- defscore => $num
- Optional: the score to use for the rule hit if neither the option
"score" is provided, nor a configured
score value is provided.
- value => $num
- Optional: the value to assign to the rule; the default value is
1. tflags multiple rules use values of
greater than 1 to indicate multiple hits. This value is accessible to meta
rules.
- ruletype => $type
- Optional, but recommended: the rule type string. This is used in the
"hit_rule" plugin call, called by this
method. If unset, 'unknown' is used.
- tflags => $string
- Optional: a string, i.e. a space-separated list of additional tflags to be
appended to an existing list of flags in
$self->{conf}->{tflags}, such as: "nice
noautolearn multiple". No syntax checks are performed.
- description => $string
- Optional: a custom rule description string. This is used in the
"hit_rule" plugin call, called by this
method. If unset, the static description is used.
Backward compatibility: the two mandatory arguments have been part
of this API since SpamAssassin 2.x. The optional name=<gtvalue>
pairs, however, are a new addition in SpamAssassin 3.2.0.
- $status->rule_pending ($rulename)
- Register a pending rule. Must be called from rules eval-function, if the
result can arrive later than when exiting the function (async lookups).
$status->rule_done($rulename) or
$status->got_hit(...) must be called when the
result has arrived. If these are not used, it can break depending meta
rule evaluation.
- $status->rule_ready ($rulename)
- Mark a previously marked
$status->rule_pending() rule ready.
Alternatively $status->got_hit() will
also mark rule ready. If these are not used, it can break depending meta
rule evaluation.
- $status->test_log ($text [, $rulename])
- Add $text log entry for a hit rule in final
message REPORT/SUMMARY.
Usually called just before got_hit(), to describe for
example what URI the rule matched on. Optional <$rulename>
argument is recommended to make sure log is written to correct rule. If
rulename is not provided, get_current_eval_rule_name() is used as
fallback.
Can be called multiple times per rule for additional
entries.
- $status->create_fulltext_tmpfile (fulltext_ref)
- This function creates a temporary file containing the passed scalar
reference data. If no scalar is passed, full/pristine message text is
assumed. This is typically used by external programs like pyzor and
dccproc, to avoid hangs due to buffering issues.
All tempfiles are automatically cleaned up by PerMsgStatus
destructor.
- $status->delete_fulltext_tmpfile (tmpfile)
- Will cleanup after a
$status->create_fulltext_tmpfile() call.
Deletes the temporary file and uncaches the filename. Generally there no
need to call this, PerMsgStatus destructor cleans up all tmpfiles.
- all_from_addrs_domains
- This function returns all the various from addresses in a message using
all_from_addrs() and then returns only the domain names.
Mail::SpamAssassin(3) spamassassin(1)
Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc. |