In this model, a text (such as a sentence or a document) is represented as the bag (multiset) of its words. Each key is the word, and each value is the number of occurrences of that word in the given text document.
Example usage: spam filtering
In Bayesian spam filtering, an e-mail message is modeled as an unordered collection of words selected from one of two probability distributions: one representing spam and one representing legitimate e-mail. To classify an e-mail message, the Bayesian spam filter assumes that the message is a pile of words that has been poured out randomly from one of the two bags, and uses Bayesian probability to determine which bag it is more likely to be.source: https://en.wikipedia.org/wiki/Bag-of-words_model
No comments:
Post a Comment