
This contribution is called the posterior probability and is computed using Bayes' theorem. Each word in the email contributes to the email's spam probability, or only the most interesting words. For instance, Bayesian spam filters will typically have learned a very high spam probability for the words "Viagra" and "refinance", but a very low spam probability for words seen only in legitimate email, such as the names of friends and family members.Īfter training, the word probabilities (also known as likelihood functions) are used to compute the probability that an email with a particular set of words in it belongs to either category. For all words in each training email, the filter will adjust the probabilities that each word will appear in spam or legitimate email in its database. To train the filter, the user must manually indicate whether a new email is spam or not. The filter doesn't know these probabilities in advance, and must first be trained so it can build them up.

For instance, most email users will frequently encounter the word " Viagra" in spam email, but will seldom see it in other email. Particular words have particular probabilities of occurring in spam email and in legitimate email. CRM114, oft cited as a Bayesian filter, is not intended to use a Bayes filter in production, but includes the ″unigram″ feature for reference.
BOGOFILTER VS SPAMASSASSIN SOFTWARE
Server-side email filters, such as DSPAM, SpamAssassin, SpamBayes, Bogofilter and ASSP, make use of Bayesian spam filtering techniques, and the functionality is sometimes embedded within mail server software itself.
BOGOFILTER VS SPAMASSASSIN INSTALL
Users can also install separate email filtering programs. Many modern mail clients implement Bayesian spam filtering. Variants of the basic technique have been implemented in a number of research works and commercial software products. That work was soon thereafter deployed in commercial spam filters. The first scholarly publication on Bayesian spam filtering was by Sahami et al.

Although naive Bayesian filters did not become popular until later, multiple programs were released in 1998 to address the growing problem of unwanted email.
