Description: ICA is used to classify text in extension to the latent semantic indexing framework. ICA show to align the context grouping structure well in a human sense [1], thus can be used for unsupervised classification. The demonstration shows this on medical abstracts (MED dataset), that uses BIC to estimate the number of classes and produces keywords for each class. The icaML algorithm is used.
-ICA is used to classify text in extension to the latent semantic indexing framework. ICA show to align the context grouping structure well in a human sense [1], thus can be used for unsupervised classification. The demonstration shows this on medical abstracts (MED dataset), that uses BIC to estimate the number of classes and produces keywords for each class. The icaML algorithm is used. Platform: |
Size: 2496134 |
Author:海心 |
Hits:
Description: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
一个可以实现多种方法分类的软件,利用各个
对象的属性。决策树,距离、密度等-Weka is a collection of machine learning al gorithms for data mining tasks. The algorithms can either be applied directly to a dataset or ca lled from your own Java code. Weka contains tool 's for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for d eveloping new machine learning schemes. can be a real Categories are various methods of software, using all the attributes of objects. Decision Tree, distance, density, etc. Platform: |
Size: 15446626 |
Author:马何坛 |
Hits:
Description: ICA is used to classify text in extension to the latent semantic indexing framework. ICA show to align the context grouping structure well in a human sense [1], thus can be used for unsupervised classification. The demonstration shows this on medical abstracts (MED dataset), that uses BIC to estimate the number of classes and produces keywords for each class. The icaML algorithm is used.
-ICA is used to classify text in extension to the latent semantic indexing framework. ICA show to align the context grouping structure well in a human sense [1], thus can be used for unsupervised classification. The demonstration shows this on medical abstracts (MED dataset), that uses BIC to estimate the number of classes and produces keywords for each class. The icaML algorithm is used. Platform: |
Size: 2495488 |
Author:海心 |
Hits:
Description: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
一个可以实现多种方法分类的软件,利用各个
对象的属性。决策树,距离、密度等-Weka is a collection of machine learning al gorithms for data mining tasks. The algorithms can either be applied directly to a dataset or ca lled from your own Java code. Weka contains tool 's for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for d eveloping new machine learning schemes. can be a real Categories are various methods of software, using all the attributes of objects. Decision Tree, distance, density, etc. Platform: |
Size: 15446016 |
Author:马何坛 |
Hits:
Description: 一个著名的文本分类数据集,用于测试分类器的性能。是写论文的同志不可或缺的东西。-A famous dataset for Text Classification, which is essencial for thesis writing. Platform: |
Size: 8151040 |
Author:Yishi Zhang |
Hits:
Description: Weka是一个超强功能的machine learning开发包-Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Platform: |
Size: 18667520 |
Author:Alan |
Hits:
Description: Abstract. In this paper, we propose a method of hiding sensitive classification
rules from data mining algorithms for categorical datasets. Our
approach is to reconstruct a dataset according to the classification rules
that have been checked and agreed by the data owner for releasing to
data sharing. Unlike the other heuristic modification approaches, firstly,
our method classifies a given dataset. Subsequently, a set of classification
rules is shown to the data owner to identify the sensitive rules that
should be hidden. After that we build a new decision tree that is constituted
only non-sensitive rules. Finally, a new dataset is reconstructed.
Our experiments show that the sensitive rules can be hidden completely
on the reconstructed datasets. While non-sensitive rules are still able
to discovered without any side effect. Moreover, our method can also
preserve high usability of reconstructed datasets. Platform: |
Size: 274432 |
Author:Rishi |
Hits:
Description: ORL人脸图像库,共40人,每人10幅图像,其中每人的前5幅作为训练样本,后5幅作为测试分类样本,统计正确分类率。分类准则为最近邻规则。
真实的图像尺寸为112x92,列向量堆积对应人脸库矩阵的每一列。 -ORL face image database, a total of 40 per 10 images, each of which the first five as training samples, after the 5 categories as a test sample, correct classification rate statistics. Classification criteria for the nearest neighbor rule. The real image size is 112x92, the corresponding column vector face database matrix accumulation of each column. Platform: |
Size: 3500032 |
Author:limei |
Hits:
Description: 基于支持向量机的人脸检测训练集增强算法实现。根据支持向量机(support vector machine,简称SVM)~ ,对基于边界的分类算"~(geometric approach)~
言,类别边界附近的样本通常比其他样本包含有更多的分类信息.基于这一基本思路,以人脸检测问题为例.探讨了
对给定训练样本集进行边界增强的问题,并为此而提出了一种基于支持向量机和改进的非线性精简集算法
IRS(improved reduced set)的训练集边界样本增强算法,用以扩大-91l练集并改善其样本分布.其中,所谓IRS算法是指
在精简集(reduced se0算法的核函数中嵌入一种新的距离度量一一图像欧式距离一一来改善其迭代近似性能,IRS
可以有效地生成新的、位于类别边界附近的虚拟样本以增强给定训练集.为了验证算法的有效性,采用增强的样本
集训练基于AdaBoost的人脸检测器,并在MIT+CMU正面人脸测试库上进行了测试.实验结果表明通过这种方法
能够有效地提高最终分类器的人脸检测性能.-According to support vector machines(SVMs),for those geometric approach based classification
methods,examples close to the class boundary usually are more informative than others.Taking face detection as an
example,this paper addresses the problem of enhancing given training set and presents a nonlinear method to tackle
the problem effectively.Based on SVM and improved reduced set algorithm (IRS),the method generates new
examples lying close to the face/non—face class boundary to enlarge the original dataset and hence improve its
sample distribution.The new IRS algorithm has greatly improved the approximation performance of the original
reduced set(RS)method by embedding a new distance metric called image Euclidean distance(IMED)into the
keme1 function.To verify the generalization capability of the proposed method,the enhanced dataset is used to train
an AdaBoost.based face detector and test it on the MIT+CMU frontal face test set.The experimental results show
that the origina Platform: |
Size: 649216 |
Author:郭事业 |
Hits:
Description: 基于朴素贝叶斯的分类练习,在UCI数据库中的breast数据集上进行的测试-Bayesian classification based on practice, in the UCI database data set on breast test Platform: |
Size: 1024 |
Author:王善民 |
Hits:
Description: In this paper, we present two novel class-based
weighting methods for the Euclidean nearest neighbor algorithm
and compare them with global weighting methods
considering empirical results on a widely accepted time series
classification benchmark dataset. Our methods provide
higher accuracy than every global weighting in nearly half
of the cases and they have better overall performance. We
conclude that class-based weighting has great potential for
improving time series classification accuracy and it might be
extended to use with other distance functions than the Euclidean
distance. Platform: |
Size: 153600 |
Author:amijeet |
Hits: