Description: ti-idf算法,实现对中文文档的检索,把多篇文档中的词,按照权值从小到大进行排列(词语以文本中的词库为准)-ti-idf algorithm, the realization of the Chinese document retrieval, to document more than words, in accordance with the right values from small to large to carry out with (the words to the thesaurus text shall prevail) Platform: |
Size: 648192 |
Author:min |
Hits:
Description: ti-idf算法,实现对英文文档的检索,把多篇文档中的词(英文单词),按照权值从小到大进行排列-ti-idf algorithm, the realization of the English document retrieval, to document more than words (English words), in accordance with the right values to be ranked from small to large Platform: |
Size: 377856 |
Author:min |
Hits:
Description: The tf–idf weight (term frequency–inverse document frequency) is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus. The importance increases proportionally to the number of times a word appears in the document but is offset by the frequency of the word in the corpus. Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document s relevance given a user query. Platform: |
Size: 5120 |
Author:oplachko84 |
Hits:
Description: 信息检索系统从最初的纯手工检索系统业已发展到现在的以信息技术为支撑的检索系统,在这一过程中,适应新的信息资源、信息技术这些检索环境,提高信息检索系统的查全率、查准率和系统响应时间是不变的主题,在众多文本中掌握最有效的信息始终是信息处理的一大目标。围绕向量空间模型设计了一个文本检索系统,介绍向量空间模型的基础上给出了基于它的信息检索系统的一般结构框架和各部分的功能,探讨了系统中所涉及到的关键技术。用向量空间模型进行特征表达,用TF-IDF(Term-Frequency Inverse-Document-Frequency)进行特征项赋权,用倒排文档进行索引,用余弦夹角进行距离度量,用查全率和查准率评价检索系统性能,并以向量空间模型及相关理论为基础对中文信息检索进行了一些探讨。向量空间模型需要解决特征项的生成和加权、相似度的计算(检索运算)等一系列问题。由于向量检索中采用的向量叫某种距离度量来反映文档的满足程度,所以相似度的值最好能与真实情况相符,计算简便。-Information retrieval system to retrieve from the first hand to the present system has been developed using information technology to support the retrieval system, in the process and adapt to new information resources, information technology, the search environment, improve information retrieval system recall , precision and system response time is the constant theme in many text information is always the most effective control is a major goal of information processing. Vector space model around a text retrieval system is designed to introduce the vector space model is given on the basis of its information retrieval system based on the general framework and functions of each part, of the system, the key technologies involved. The feature vector space model using the expression, with the TF-IDF (Term-Frequency Inverse-Document-Frequency) for feature items empowerment, with the inverted file indexing, with the cosine angle between the distance measurement, with recall and precision evalu Platform: |
Size: 713728 |
Author:Peng Jin |
Hits:
Description: tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.[1]:8 It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others.
Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document s relevance given a user query. tf–idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.
One of the simplest ranking functions is computed by summing the tf–idf for each query term many more sophisticated ranking functions are variants of this simple model.-tf–idf, short for term frequency–inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.[1]:8 It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others.
Variations of the tf–idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document s relevance given a user query. tf–idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification.
One of the simplest ranking functions is computed by summing the tf–idf for each query term many more sophisticated ranking functions are variants of this simple model. Platform: |
Size: 17408 |
Author:adel |
Hits: