|
NAMEMail::SpamAssassin::Plugin::TxRep - Normalize scores with sender reputation recordsSYNOPSISThe TxRep (Reputation) plugin is designed as an improved replacement of the AWL (Auto-Whitelist) plugin. It adjusts the final message spam score by looking up and taking in consideration the reputation of the sender.To try TxRep out, you have to first disable the AWL plugin (if enabled), and back up its database. AWL is loaded in v310.pre and can be disabled by commenting out the loadplugin line: # loadplugin Mail::SpamAssassin::Plugin::AWL When AWL is not disabled, TxRep will refuse to run. TxRep should be enabled by uncommenting the following line in v341.pre: loadplugin Mail::SpamAssassin::Plugin::TxRep Use the supplied 60_txreputation.cf file or add these lines to a .cf file: header TXREP eval:check_senders_reputation() describe TXREP Score normalizing based on sender's reputation tflags TXREP userconf noautolearn priority TXREP 1000 DESCRIPTIONThis plugin is intended to replace the former AWL - AutoWhiteList. Although the concept and the scope differ, the purpose remains the same - the normalizing of spam score results based on previous sender's history. The name was intentionally changed from "whitelist" to "reputation" to avoid any confusion, since the result score can be adjusted in both directions.The TxRep plugin keeps track of the average SpamAssassin score for senders. Senders are tracked using multiple identificators, or their combinations: the From: email address, the originating IP and/or an originating block of IPs, sender's domain name, the DKIM signature, and the HELO name. TxRep then uses the average score to reduce the variability in scoring from message to message, and modifies the final score by pushing the result towards the historical average. This improves the accuracy of filtering for most email. In comparison with the original AWL plugin, several conceptual changes were implemented in TxRep: 1. Scoring - at AWL, although it tracks the number of messages received from each respective sender, when calculating the corrective score at a new message, it does not take it in count in any way. So for example a sender who previously sent a single ham message with the score of -5, and then sends a second one with the score of +10, AWL will issue a corrective score bringing the score towards the -5. With the default "auto_whitelist_factor" of 0.5, the resulting score would be only 2.5. And it would be exactly the same even if the sender previously sent 1,000 messages with the average of -5. TxRep tries to take the maximal advantage of the collected data, and adjusts the final score not only with the mean reputation score stored in the database, but also respecting the number of messages already seen from the sender. You can see the exact formula in the section ""txrep_factor"". 2. Learning - AWL ignores any spam/ham learning. In fact it acts against it, which often leads to a frustrating situation, where a user repeatedly tags all messages of a given sender as spam (resp. ham), but at any new message from the sender, AWL will adjust the score of the message back to the historical average which does not include the learned scores. This is now changed at TxRep, and every spam/ham learning will be recorded in the reputation database, and hence taken in consideration at future email from the respective sender. See the section "LEARNING SPAM / HAM" for more details. 3. Auto-Learning - in certain situations SpamAssassin may declare a message an obvious spam resp. ham, and launch the auto-learning process, so that the message can be re-evaluated. AWL, by design, did not perform any auto-learning adjustments. This plugin will readjust the stored reputation by the value defined by ""txrep_learn_penalty"" resp. ""txrep_learn_bonus"". Auto-learning score thresholds may be tuned, or the auto-learning completely disabled, through the setting ""txrep_autolearn"". 4. Relearning - messages that were wrongly learned or auto-learned, can be relearned. Old reputations are removed from the database, and new ones added instead of them. The relearning works better when message tracking is enabled through the ""txrep_track_messages"" option. Without it, the relearned score is simply added to the reputation, without removing the old ones. 5. Aging - with AWL, any historical record of given sender has the same weight. It means that changes in senders behavior, or modified SA rules may take long time, or be virtually negated by the AWL normalization, especially at senders with high count of past messages, and low recent frequency. It also turns to be particularly counterproductive when the administrator detects new patterns in certain messages, and applies new rules to better tag such messages as spam or ham. AWL will practically eliminate the effect of the new rules, by adjusting the score back towards the (wrong) historical average. Only setting the "auto_whitelist_factor" lower would help, but in the same time it would also reduce the overall impact of AWL, and put doubts on its purpose. TxRep, besides the ""txrep_factor"" (replacement of the "auto_whitelist_factor"), introduces also the ""txrep_dilution_factor"" to help coping with this issue by progressively reducing the impact of past records. More details can be found in the description of the factor below. 6. Blacklisting and Whitelisting - when a whitelisting or blacklisting was requested through SpamAssassin's API, AWL adjusts the historical total score of the plain email address without IP (and deleted records bound to an IP), but since during the reception new records with IP will be added, the blacklisted entry would cease acting during scanning. TxRep always uses the record of the plain email address without IP together with the one bound to an IP address, DKIM signature, or SPF pass (unless the weight factor for the EMAIL reputation is set to zero). AWL uses the score of 100 (resp. -100) for the blacklisting (resp. whitelisting) purposes. TxRep increases the value proportionally to the weight factor of the EMAIL reputation. It is explained in details in the section " WHITELISTING" in BLACKLISTING . TxRep can blacklist or whitelist also IP addresses, domain names, and dotless HELO names. 7. Sender Identification - AWL identifies a sender on the basis of the email address used, and the originating IP address (better told its part defined by the mask setting). The main purpose of this measure is to avoid assigning false good scores to spammers who spoof known email addresses. The disadvantage appears at senders who send from frequently changing locations or even when connecting through dynamical IP addresses that are not within the block defined by the mask setting. Their score is difficult or sometimes impossible to track. Another disadvantage is, for example, at a spammer persistently sending spam from the same IP address, just under different email addresses. AWL will not find his previous scores, unless he reuses the same email address again. TxRep uses several identificators, and creates separate database entries for each of them. It tracks not only the email/IP address combination like AWL, but also the standalone email address (regardless of the originating IP), the standalone IP (regardless of email address used), the domain name of the email address, the DKIM signature, and the HELO name of the connecting PC. The influence of each individual identificator may be tuned up with the help of weight factors described in the section "REPUTATION WEIGHTS". 8. Message Tracking - TxRep (optionally) keeps track of already scanned and/or learned message ID's. This is useful for avoiding to strengthen the reputation score by simply rescanning or relearning the same message multiple times. In the same time it also allows the proper relearning of once wrongly learned messages, or relearning them after the learn penalty or bonus were changed. See the option ""txrep_track_messages"". 9. User and Global Storages - usually it is recommended to use the per-user setup of SpamAssassin, because each user may have quite different requirements, and may receive quite different sort of email. Especially when using the Bayesian and AWL plugins, the efficiency is much better when SpamAssassin is learned spam and ham separately for each user. However, the disadvantage is that senders and emails already learned many times by different users, will need to be relearned without any recognized history, anytime they arrive to another user. TxRep uses the advantages of both systems. It can use dual storages: the global common storage, where all email processed by SpamAssassin is recorded, and a local storage separate for each user, with reputation data from his email only. See more details at the setting ""txrep_user2global_ratio"". 10. Outbound Whitelisting - when a local user sends messages to an email address, we assume that he needs to see the eventual answer too, hence the recipient's address should be whitelisted. When SpamAssassin is used for scanning outgoing email too, when local users use the SMTP server where SA is installed, for sending email, and when internal networks are defined, TxREP will improve the reputation of all 'To:' and 'CC' addresses from messages originating in the internal networks. Details can be found at the setting ""txrep_whitelist_out"". Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable the AWL to allow TxRep running. TxRep reuses the database handling of the original AWL module, and some its parameters bound to the database handler modules. By default, TxRep creates its own database, but the original auto-whitelist can be reused as a starting point. The AWL database can be renamed to the name defined in TxRep settings, and TxRep will start using it. The original auto-whitelist database has to be backed up, to allow switching back to the original state. The spamassassin/Plugin/TxRep.pm file replaces both spamassassin/Plugin/AWL.pm and spamassassin/AutoWhitelist.pm. Another two AWL files, spamassassin/DBBasedAddrList.pm and spamassassin/SQLBasedAddrList.pm are still needed. TEMPLATE TAGSThis plugin module adds the following "tags" that can be used as placeholders in certain options. See Mail::SpamAssassin::Conf for more information on TEMPLATE TAGS._TXREPXXXY_ TXREP modifier _TXREPXXXYMEAN_ Mean score on which TXREP modification is based _TXREPXXXYCOUNT_ Number of messages on which TXREP modification is based _TXREPXXXYPRESCORE_ Score before TXREP _TXREPXXXYUNKNOWN_ New sender (not found in the TXREP list) The XXX part of the tag takes the form of one of the following IDs, depending on the reputation checked: EMAIL, EMAILIP, IP, DOMAIN, or HELO. The Y appendix ID is used only in the case of dual storage, and takes the form of either U (for user storage reputations), or G (for global storage reputations). USER PREFERENCESThe following options can be used in both site-wide ("local.cf") and user-specific ("user_prefs") configuration files to customize how SpamAssassin handles incoming email messages.
REPUTATION WEIGHTSThe overall reputation of the sender comprises several elements:
Each of these partial reputations is weighted with the help of these parameters, and the overall reputation is calculation as the sum of the individual reputations divided by the sum of all their weights: sender_reputation = weight_email * rep_email + weight_email_ip * rep_email_ip + weight_domain * rep_domain + weight_ip * rep_ip + weight_helo * rep_helo You can disable the individual partial reputations by setting their respective weight to zero. This will also reduce the size of the database, since each partial reputation requires a separate entry in the database table. Disabling some of the partial reputations in this way may also help with the performance on busy servers, because the respective database lookups and processing will be skipped too.
ADMINISTRATOR SETTINGSThese settings differ from the ones above, in that they are considered 'more privileged' -- even more than the ones in the PRIVILEGED SETTINGS section. No matter what "allow_user_rules" is set to, these can never be set from a user's "user_prefs" file.
BLACKLISTING / WHITELISTINGWhen asked by SpamAssassin to blacklist or whitelist a user, the TxRep plugin adds a score of 100 (for blacklisting) or -100 (for whitelisting) to the given sender's email address. At a plain address without any IP address, the value is multiplied by the ratio of total reputation weight to the EMAIL reputation weight to account for the reduced impact of the standalone EMAIL reputation when calculating the overall reputation.total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo blacklisted_reputation = 100 * total_weight / weight_email When a standalone email address is blacklisted/whitelisted, all records of the email address bound to an IP address, DKIM signature, or a SPF pass will be removed from the database, and only the standalone record is kept. Besides blacklisting/whitelisting of standalone email addresses, the same method may be used also for blacklisting/whitelisting of IP addresses, domain names, and HELO names (only dotless Netbios HELO names can be used). When whitelisting/blacklisting an email address or domain name, you can bind them to a specified DKIM signature or SPF record by appending the DKIM signing domain or the tag 'spf' after the ID in the following way: spamassassin --add-addr-to-blacklist=spamming.biz,spf spamassassin --add-addr-to-whitelist=friend@good.org,good.org When a message contains both a DKIM signature and an SPF pass, the DKIM signature takes the priority, so the record bound to the 'spf' tag won't be checked. Only email addresses and domains can be bound to DKIM or SPF. Records of IP addresses and HELO names are always without DKIM/SPF. In case of dual storage, the black/whitelisting is performed only in the default storage. REPUTATION LOGICS1. The most significant sender identificator is equally as at AWL, the combination of the email address and the originating IP address, resp. its part defined by the IPv4 resp. IPv6 mask setting.2. No IP checking for standalone EMAIL address reputation 3. No signature checking for IP reputation, and for HELO name reputation 4. The EMAIL_IP weight, and not the standalone EMAIL weight is used when no IP address is available (EMAIL_IP is the main indicator, and has the highest weight) 5. No IP checking at signed emails (signature authenticates the email instead of the IP address) 6. No IP checking at SPF pass (we assume the domain owner is responsible for all IP's he authorizes to send from, hence we use the same identity for all of them) 7. No signature used for standalone EMAIL reputation (would be redundant, since no IP is used at signed EMAIL_IP reputation, and we would store two identical hits) 8. When available, the DKIM signer is used instead of the domain name for the DOMAIN reputation 9. No IP and no signature used for HELO reputation (despite the possibility of the possible existence of multiple computers with the same HELO) 10. The full (unmasked IP) address is used (in the address field, instead the IP field) for the standalone IP reputation LEARNING SPAM / HAMWhen SpamAssassin is told to learn (or relearn) a given message as spam or ham, all reputations relevant to the message (email, email_ip, domain, ip, helo) in both global and user storages will be updated using the "txrep_learn_penalty" respectively the "rxrep_learn_bonus" values. The new reputation of given sender property (email, domain,...) will be the respective result of one of the following formulas:new_reputation = old_reputation + learn_penalty new_reputation = old_reputation - learn_bonus The TxRep plugin currently does track each message individually, hence it does not detect when you learn the message repeatedly. It will add/subtract the penalty/bonus score each time the message is fed to the spam learner. OPTIMIZING TXREPTxRep can be optimized for speed and simplicity, or for the precision in assigning the reputation scores.First of all TxRep can be quickly disabled and re-enabled through the option ""use_txrep"". It can be done globally, or individually in each respective "user_prefs". Disabling TxRep will not destroy the database, so it can be re-enabled any time later again. On many systems, SQL-based storage may perform faster than the default Berkeley DB storage, so you should consider setting it up. Then there are multiple settings that can reduce the number of records stored in the database, hence reducing the size of the storage, and also the processing time: 1. Setting ""txrep_user2global_ratio"" to zero will disable the dual storage, halving so the disk space requirements, and the processing times of this plugin. 2. You can disable all but one of the "REPUTATION WEIGHTS". The EMAIL_IP is the most specific option, so it is the most likely choice in such case, but you could base the reputation system on any of the remaining scores. Each of the enabled reputations adds a new entry to the database for each new identificator. So while for example the number of recorded and scored domains may be big, the number of stored IP addresses will be probably higher, and would require more space in the storage. 3. Disabling the ""txrep_track_messages"" avoids storing a separate entry for every scanned message, hence also reducing the disk space requirements, and the processing time. 4. Disabling the option ""txrep_autolearn"" will save the processing time at messages that trigger the auto-learning process. 5. Disabling ""txrep_whitelist_out"" will reduce the processing time at outbound connections. 6. Keeping the option ""auto_whitelist_distinguish_signed"" enabled may help slightly reducing the size of the database, because at signed messages, the originating IP address is ignored, hence no additional database entries are needed for each separate IP address (resp. a masked block of IP addresses). Since TxRep reuses the storage architecture of the former AWL plugin, for initializing the SQL storage, the same instructions apply also to TxRep. Although the old AWL table can be reused for TxRep, by default TxRep expects the SQL table to be named "txrep". To install a new SQL table for TxRep, run the appropriate SQL file for your system under the /sql directory. If you get a syntax error at an older version of MySQL, use TYPE=MyISAM instead of ENGINE=MyISAM at the end of the command. You can also use other types of ENGINE (depending on what is available on your system). For example MEMORY engine stores the entire table in the server memory, achieving performance similar to Redis. You would need to care about the replication of the RAM table to disk through a cronjob, to avoid loss of data at reboot. The InnoDB engine is used by default, offering high scalability (database size and concurrence of accesses). In conjunction with a high value of innodb_buffer_pool or with the memcached plugin (MySQL v5.6+) it can also offer performance comparable to Redis.
Visit the GSP FreeBSD Man Page Interface. |