Location:
Search - data set mining
Search list
Description: 此代码是用c语言编写的决策树的c4.5代码,它是数据挖掘分类算法中的一种,可以对给定数据集进行分类,挖掘出规则-this code is c language of the decision tree Bank code, which is data mining classification algorithm of a can of a given data set for classification, tapping rules
Platform: |
Size: 222732 |
Author: 李雪 |
Hits:
Description: 此代码是用c语言编写的决策树的c4.5代码,它是数据挖掘分类算法中的一种,可以对给定数据集进行分类,挖掘出规则-this code is c language of the decision tree Bank code, which is data mining classification algorithm of a can of a given data set for classification, tapping rules
Platform: |
Size: 223232 |
Author: 李雪 |
Hits:
Description: 一个模拟weka的系统,输入文件格式和weka的一样,实现决策树的分析以及通过数据挖掘整理规则集合,很值得新手学习-a simulation system, the importation of files and weka, the same realization of the decision tree analysis and data mining collated by the rules set, is worth learning newcomers
Platform: |
Size: 37888 |
Author: 郑磊 |
Hits:
Description: 粗糙集应用软件,方便完成数据挖掘、知识总结-Rough Set application software to facilitate the completion of data mining, knowledge summary
Platform: |
Size: 4174848 |
Author: xmj |
Hits:
Description: 以从医院病案室获得的3022例数据为样本,在完成样本数据库以及糖尿病并发症的多维数据集设计后,以糖尿病并发症流行病学知识发现为重点,研究定性数据定量化挖掘模型及算法引擎的设计与实现,即将关联模型引入糖尿病并发症的流行病学研究,应用集合论中的Apriori性质,实现关联规则的挖掘引擎设计。-cases from the hospital to obtain the data for 3,022 cases samples the completion of the sample database and diabetic complications multidimensional data sets design, Complications of diabetes epidemiology knowledge discovery as the focus, Quantitative study of qualitative data mining engine model and algorithm design and implementation, Relational Model forthcoming introduction of diabetic complications epidemiological studies, the application of set theory Apriori nature, Implementation of mining association rules engine design.
Platform: |
Size: 313344 |
Author: Eric Cheng |
Hits:
Description: 这是一本粗糙集数据挖掘的书,其中同时介绍了其他几种数据挖掘方法,并进行了相关对比-This is a rough set data mining of the book, which also introduced several other data mining methods, and conduct the relevant comparison
Platform: |
Size: 2049024 |
Author: 阿米 |
Hits:
Description: 数据挖掘算法的实现,基于模糊聚类的最大树算法,数据集是darpa99,也就是KDD-CUP99中采用的数据集-The realization of data mining algorithms, based on fuzzy clustering of the largest tree algorithm, a data set is darpa99, which is used in KDD-CUP99 data set
Platform: |
Size: 32768 |
Author: 谢松林 |
Hits:
Description: Entropy Based Subspace Clustering for Mining Data - ENCLUS - a new version of PROCLUS algorithm for clustering high dimensional data set.-Entropy Based Subspace Clustering for Mining Data- ENCLUS- a new version of PROCLUS algorithm for clustering high dimensional data set.
Platform: |
Size: 133120 |
Author: volkanbaykan |
Hits:
Description: Springer最新数据挖掘方面的教材,不容错过-Springer latest data mining aspects of teaching materials
Platform: |
Size: 26700800 |
Author: Qiusong Yang |
Hits:
Description: 软件学报 2008年论文《聚类算法研究》,作者孙吉贵, 刘杰, 赵连宇。pdf格式,14页。对近年来聚类算法的研究现状与新进展进行归纳总结.一方面对近年来提出的较有代表性的聚类算法,从算法思想、关键技术和优缺点等方面进行分析概括 另一方面选择一些典型的聚类算法和一些知名的数据集,主要从正确率和运行效率两个方面进行模拟实验,并分别就同一种聚类算法、不同的数据集以及同一个数据集、不同的聚类算法的聚类情况进行对比分析.最后通过综合上述两方面信息给出聚类分析的研究热点、难点、不足和有待解决的一些问题.上述工作将为聚类分析和数据挖掘等研究提供有益的参考.
-The research actuality and new progress in clustering algorithm in recent years are summarized in this
paper. First, the analysis and induction of some representative clustering algorithms have been made from several
aspects, such as the ideas of algorithm, key technology, advantage and disadvantage. On the other hand, several
typical clustering algorithms and known data sets are selected, simulation experiments are implemented from both
sides of accuracy and running efficiency, and clustering condition of one algorithm with different data sets is analyzed by comparing with the same clustering of the data set under different algorithms. Finally, the research hotspot, difficulty, shortage of the data clustering and some pending problems are addressed by the integration of the aforementioned two aspects information. The above work can give a valuable reference for data clustering and data mining.
Platform: |
Size: 470016 |
Author: dengyue |
Hits:
Description: 数据挖掘,K-means源码,数据集为iris-Data mining, K-means source code for the iris data set
Platform: |
Size: 294912 |
Author: 刘凡 |
Hits:
Description: 数据挖掘,CURE算法实现,数据集为iris-Data mining, CURE algorithm, the data set for the iris
Platform: |
Size: 304128 |
Author: 刘凡 |
Hits:
Description: 数据挖掘,DIANA算法实现,数据集为iris-Data mining, DIANA algorithm, the data set for the iris
Platform: |
Size: 290816 |
Author: 刘凡 |
Hits:
Description: KDD cup 1999 Data.
This is the data set used for The Third International Knowledge Discovery and Data Mining Tools Competition, which was held in conjunction with KDD-99 The Fifth International Conference on Knowledge Discovery and Data Mining.
Platform: |
Size: 18115584 |
Author: Luyen |
Hits:
Description: 红酒、白酒质量数据集,可作为机器学习中的数据挖掘数据库-Red wine, white wine quality data sets can be used as data mining machine learning database
Platform: |
Size: 417792 |
Author: 张先韬 |
Hits:
Description: 森林火灾数据集,可作为 数 据 挖 掘 的数据库-Forest fires dataset can be used as data mining database
Platform: |
Size: 2048 |
Author: 张先韬 |
Hits:
Description: public affair data set for the matlab input to the data minig concept the data mining the input will be read by the user in the tex formet
Platform: |
Size: 6144 |
Author: shankar.m |
Hits:
Description: Data Mining process model selected is KDD which starts selection of data.Initially the researcher has taken the Kddcup.data-10-perecnt which contains total of 311,027 records which includes both labeled and unlabeled records-Data Mining process model selected is KDD which starts selection of data.Initially the researcher has taken the Kddcup.data-10-perecnt which contains total of 311,027 records which includes both labeled and unlabeled records
Platform: |
Size: 2656256 |
Author: darmaan |
Hits:
Description: Geolife GPS 轨迹数据集–用户指南
这一 GPS 轨迹数据集是在 (微软研究亚洲) Geolife 项目中收集的, 178 用户在四年 (2007年4月至 2011年10月) 期间。该数据集的 GPS 轨迹由一个时间戳点序列表示, 每一个都包含纬度、经度和高度信息。该数据集包含17621个轨迹, 总距离为1251654公里, 总持续时间为48203小时。该轨迹数据集可以应用于移动模式挖掘、用户活动识别、基于位置的社交网络、位置隐私和位置推荐等多个研究领域。(Geolife GPS track data set - User Guide The GPS trajectory data set was gathered in the Geolife project (Microsoft Research Asia) and 178 users over a four-year period (April 2007 to October 2011). The GPS trajectory of the data set is represented by a sequence of time stamps, each of which contains latitude, longitude and altitude information. The dataset contains 17621 trajectories with a total distance of 1251654 km and a total duration of 48203 hours. These trajectories record different GPS loggers and GPS telephones, and have various sampling rates. The trajectory of 91% is recorded in dense representation, for example, every 1 to 5 seconds or 5 to 10 meters per point. The trajectory data set can be used in many research fields, such as mobile pattern mining, user activity recognition, location-based social networks, location privacy and location recommendation.)
Platform: |
Size: 22576128 |
Author: 李白43 |
Hits:
Description: 分类器的性能比较与调优:
使用scikit-learn 包中的tree,贝叶斯,knn,对数据进行模型训练,尽量了解其原理及运用。
使用不同分析三种分类器在实验中的性能比较,分析它们的特点。
本实验采用的数据集为house与segment。(Performance comparison and optimization of classifiers:
We use tree, Bayesian and KNN in scikit-learnpackage to train the data model and try to understand its principle and application.
The performances of three classifiers are compared and their characteristics are analyzed.
The data set used in this experiment is house and segment.)
Platform: |
Size: 1301504 |
Author: Ryan112 |
Hits: