Introduction - If you have any usage issues, please Google them yourself
Using TF-IDF to Determine Word Relevance in Document Queries :
In this paper, we examine the results of applying
Term Frequency Inverse Document Frequency
(TF-IDF) to determine what words in a corpus of
documents might be more favorable to use in a
query. As the term implies, TF-IDF calculates
values for each word in a document through an
inverse proportion of the frequency of the word
in a particular document to the percentage of
documents the word appears in. Words with
high TF-IDF numbers imply a strong
relationship with the document they appear in,
suggesting that if that word were to appear in a
query, the document could be of interest to the
user. We provide evidence that this simple
algorithm efficiently categorizes relevant words
that can enhance query retri