Information Retrieval (IR) is the discipline that deals with retrieval of unstructured
data, especially textual documents, in response to a query or topic statement, which
mayitselfbeunstructured,e.g.,asentenceorevenanotherdocument,orwhichmay
be structured, e.g., a boolean expression. The need for effective methods of auto-
mated IR has grown in importance because of the tremendous explosion in the
amount of unstructured data, both internal, corporate document collections, and the
immense and growing number of document sources on the Internet. This report is a
tutorial and survey of the state of the art, both research and commercial, in this
dynamic field. The topics covered include: formulation of structured and unstruc-
tured queries and topic statements, indexing (including term weighting) of docu-
ment collections, methods for computing the similarity of queries and documents,
classification and routing of documents in an incoming stream to users on the basis
of topic or need statements, clustering of document collections on the basis of lan-
guageortopic,andstatistical,probabilistic,andsemanticmethodsofanalyzingand
retrieving documents.
 
Date : 2009-01-05
Size : 741.63kb
User : fuji246