Description: k-means 算法的工作过程说明如下:首先从n个数据对象任意选择 k 个对象作为初始聚类中心;而对于所剩下其它对象,则根据它们与这些聚类中心的相似度(距离),分别将它们分配给与其最相似的(聚类中心所代表的)聚类;然后再计算每个所获新聚类的聚类中心(该聚类中所有对象的均值);不断重复这一过程直到标准测度函数开始收敛为止。一般都采用均方差作为标准测度函数. k个聚类具有以下特点:各聚类本身尽可能的紧凑,而各聚类之间尽可能的分开。下面给出我写的源代码。-work process k-means algorithm is as follows: First, choose k objects from n data objects as the initial cluster centers while for the rest of the other objects, according to the similarity (distance) with those of their cluster centers, They were assigned to the most similar (represented by the cluster center) clustering then calculated for each cluster received new cluster center (the cluster mean all objects) repeats this process Until the beginning of a standard measure function convergence. MSE is generally used as the standard measure function k clustering has the following characteristics: each cluster itself as compact as possible, and to separate between the clusters as possible. Here is what I wrote the source code. Platform: |
Size: 2048 |
Author:xiaojade |
Hits:
Description: 算法思想:提取文档的TF/IDF权重,然后用余弦定理计算两个多维向量的距离来计算两篇文档的相似度,用标准的k-means算法就可以实现文本聚类。源码为java实现(Algorithm idea: extract the TF/IDF weight of the document, then calculate the distance between two multidimensional vectors by cosine theorem, calculate the similarity of the two documents, and achieve the text clustering with the standard k-means algorithm. Source code for Java implementation) Platform: |
Size: 15360 |
Author:startrek
|
Hits: