karma - a scoring engine
Karma is a heuristic scoring engine that uses connection metadata and other Haraka plugins as inputs. Connections scoring in excess of specified thresholds are rewarded or penalized in proportionate ways.
Haraka includes some excellent plugins that detect message or sender patterns that are indicative of spam. Some plugins have accuracy rates above 95%. Detecting messages incorrectly 5% of the time means that some portion of ham will get blocked and some portion of spam will get delivered. The traditional solution is to permit users to specify their filtering aggressiveness, and then suffer the consequences (blocked ham -vs- too much spam).
By scoring and weighting several "95+" plugins, accuracy rates above 99% are attainable. With a half dozen such plugins, 99.99% accuracy is attainable. With karma, good senders with good history can occasionally fail tests (false positives) and still deliver their mail. Senders with poor history will have a much harder time.
One challenge for mail filtering is that filters (or plugins, in Haraka's case) are usually called in sequence. A message passes from one filter to the next in a pipeline and each filter gets to choose "yeah or nay." In order for karma to transcend this limit, karma requires that every filter passes every message. Nearly all Haraka plugins have a
deny option for this reason.
In order to score a filters results, filters must save their results to the Result Store. Karma will see that and apply the award specified in
How Karma Works
Karma takes a holistic view of connections. During the connection, karma collects these results and applies the result_awards defined in
karma.ini. Once a connection/message exceeds the threshold.negative score (default: -8), karma rejects it at the next [deny]hook.
The scoring mechanism is not dissimilar to SpamAssassin, but Karma has some particular advantages:
* Runs entirely in Node, so it's very fast * Async and very scalable. * Builds sender and network reputation databases * Has access to connection properties (relaying, port, auth attempts, etc..) * Access to raw SMTP commands (data + formatting inspection) * Can reject connections before DATA (save lots of bandwidth)
Karma is not a replacement for content filters. Karma focuses on the quality of the connection. Content filters (bayes*) focus on the content of the message. Karma works best with content filters.
See config/karma.ini for options and inline documentation.
Karma allows the site administrator to control how much weight to assign to plugin results, providing a great deal of control over what results are worth rejecting for.
Karma begins scoring the connection when the first packet arrives. The IP reputation, sender OS, GeoIP location, DNSBL listing, and FCrDNS are often a sufficient basis for rejecting a connection without ever blocking a ham.
Karma performs checks early and often, maximizing the penality it can exact upon bad mailers.
Deny / Reject
When connections become worse than [thresholds]negative, they are denied during the next [deny]hook.
Karma history is computed as the number of good - bad connections.
When each connection ends, karma records the result. When a sufficient history has been built for an IP or ASN, future connections from that address(es) will get a positive or negative karma award.
The reward is purposefully small, to permit good senders in bad neighborhoods to still send.
Connection delays (aka tarpitting, teergrubing, early talker) slow down a SMTP conversation by inserting artificial delays. Karma increases connection delays adaptively as connection quality changes.
Karma's delay goals:
1. Don't delay valid senders 2. Penalize senders in proportion to their karma score 3. Dampen bruteforce AUTH attacks. 4. Since the only *cost* we can exact from miscreants is time, and connections are cheap to maintain, keep miscreants online as long as possible.
There are three tarpit options:
[tarpit] * delay=0 (seconds to delay, default: 0) * max=5 (max seconds to delay, default: 5)
When set to zero, the value of the delay is adaptive, calculated proportional to the karma score of the connection. Connections with good karma will see no delay and bad ones will see long delays.
When delay is non-zero, each SMTP response will be delayed by that many seconds.
In practice, most naughty senders abandon the connection when forced to
wait more than a handful of seconds.
max sets the maximum delay between
karma, do not use Haraka's
Connection data that karma considers:
- IP Reputation
- ASN reputation
- DENY events by other plugins
- envelope sender from a spammy TLD
- malformed envelope addresses
- unrecognized SMTP commands
- matching env from and env to name (rare in ham, frequent in spam)
The data from these tests are helpful but the real power of karma is scoring the results of other plugins. See karma.ini for a rich set of examples.
Karma records the number of good, bad, and total connections. The results are accessible to other plugins as well.
var karma = connection.results.get('karma');
The karma result object contains at least the following:
score: 0, <- score for this connection history: 0, <- score for all connections good: 0, <- qty of past 'good' connections bad: 0, <- qty of past 'bad' connections connections: 0, <- total past connections pass: , <- tests that added positive karma fail: , <- tests that added negative karma
If an IP has at least 5 connections and all are good or bad, and
all_bad result will be added to the
fail test list. This are
very good indicators of future connection quality and are scored in karma.ini.
[asn] enable=true (default: true) report_as
When [asn]enable is true, karma records the number of good and bad connections from each ASN.
ASNs with less than 5 karma points in either direction are ignored.
Store the ASN results as another plugin. Example: I set
report_as=asn, so that karma history for an ASN is reported with the ASN plugin data. A practical consequence of changing report_as is that the award location in karma.ini would need to change from:
NNN karma | pass | equals | asn_all_good | 2 NNN karma | fail | equals | asn_all_bad | -3
NNN asn | pass | equals | asn_all_good | 2 NNN asn | fail | equals | asn_all_bad | -3
Very old versions of Outlook Express and some malware senders don't bother complying with the RFC (5321, 2821, 821) address format. Karma checks the envelope from and to addresses for a common RFC ignorant pattern that is highly correlated with malware.
Certain bruteforce password hacking tools have a pre-programmed SMTP path that ignores SMTP responses. After EHLO, they attempt AUTH,LOGIN with a valid username. To bruteforce a password often requires millions of attempts so each bot sprays a couple dozen connections at the target server. Better quality MTAs like Haraka have built-in auth protection that inserts timeouts between successive auth attempts. The bots work around that by dropping the connection after each failure and reconnecting. The attempts are distributed so IP blocking is of limited effectiveness.
To combat these bruteforce attacks several strategies are called for:
1. Impose [connection delays](#delay) 2. Disable SMTP-AUTH when not encrypted. The bots rarely use STARTTLS. Besides preventing user passwords from transiting the internet in clear text, requiring TLS encryption also means AUTH is not available to poorly written bots. 3. Having done #2, bot AUTH attempts show up as unrecognized commands. Penalizing these with tarpitting and rate limiting will almost never harm a legit sender but it will make it take much much longer for attackers to bruteforce passwords.
Karma is most effective at filtering mail delivered by bots and rogue servers. Spam delivered by servers with good reputations normally pass karma's checks. Expect to use karma with content filters.