Introduction - If you have any usage issues, please Google them yourself
1-The Cranfield collection is a standard IR text collection(included in this directory)., consisting of 1400 documents the aerodynamics field.Write a program that preprocesses the collection.Determine the frequency of occurence for all the words in this collection. Integrate the Porter stemmer and a stopword eliminator into your code.
2- For weighting, use the TF/IDF weighting scheme.For each of the ten queries provided on the class webpage, determine a ranked list of documents, in descending order of their similarity with the query.
3- I will have to implement an efficient and effective spam filter (a text Classifier).