Description: 数据挖掘领域一个活跃的研究分支就是序列模式的发现,上传一个prefixspan算法-The field of data mining of an active research branch is the sequential pattern discovery algorithm to upload a PrefixSpan Platform: |
Size: 121856 |
Author:闻晞 |
Hits:
Description: 来自startup的垂直搜索引擎http://www.kosmix.com/的开源项目,又一个开源的类似google mapreduce 的分布式文件系统,可以应用在诸如图片存储、搜索引擎、网格计算、数据挖掘这样需要处理大数据量的网络应用中。与hadoop集成得也比较好,这样可以充分利用了hadoop一些现成的功能,基于C++。-Applications that process large volumes of data (such as, search engines, grid computing applications, data mining applications, etc.) require a backend infrastructure for storing data. Such infrastructure is required to support applications whose workload could be characterized as:
Primarily write-once/read-many workloads
Few millions of large files, where each file is on the order of a few tens of MB to a few tens of GB in size
Mostly sequential access
We have developed the Kosmos Distributed File System (KFS), a high performance distributed file system to meet this infrastructure need.
Platform: |
Size: 449536 |
Author:湖北 |
Hits:
Description: 序列模式挖掘是数据挖掘的一个重要分支,在序列事务及;有关信息处理中有着广泛的应用,如顾客购物习惯、web访问模式、科学实验过程分析、自然灾害预测、疾病治疗、药物检验以及{ sizej pos;
DNA等。序列模式挖掘算法有AprioriAll、GsP、F’reeSpan、本文将设计与实现针对string数据类型的算法,来对序列模式挖掘有更深入的剖析。-Sequence pattern mining is an important branch of data mining, and in the sequence matters the information processing has a wide range of applications, such as customer shopping habits, web access mode, the process of scientific experiments, natural disaster prediction, disease treatment, drug testing, and { sizej pos DNA and so on. Sequential pattern mining algorithm AprioriAll, GsP, F' reeSpan, this paper design and implementation of algorithms for string data types, to sequence pattern mining on a more in-depth analysis. Platform: |
Size: 288768 |
Author:谢亚妮 |
Hits:
Description: 数据挖掘(Data Mining)阶段首先要确定挖掘的任务或目的。数据挖掘的目的就是得出隐藏在数据中的有价值的信息。数据挖掘是一门涉及面很广的交叉学科,包括器学习、数理统计、神经网络、数据库、模式识别、粗糙集、模糊数学等相关技术。它也常被称为“知识发现”。知识发现(KDD)被认为是从数据中发现有用知识的整个过程。数据挖掘被认为是KDD过程中的一个特定步骤,它用专门算法从数据中抽取模式(patter,如数据分类、聚类、关联规则发现或序列模式发现等。数据挖掘主要步骤是:数据准备、数据挖掘、结果的解释评估。-Data Mining (Data Mining) stage must first determine the mission or purpose of the excavation. The purpose of data mining is to draw valuable information hidden in the data. Data mining is an interdisciplinary involving a wide range, including control study, mathematical statistics, neural networks, databases, pattern recognition, rough sets, fuzzy mathematics and other related technologies. It is also often referred to as the " knowledge discovery" . Knowledge discovery (KDD) is that the whole process is to discover useful knowledge from data. Data mining is a particular step in the KDD process, with a special algorithm (patter, such as data classification, clustering, association rules discovery or sequential pattern discovery. Extracted from the data model and data mining major steps: data preparation, data mining, interpretation of the results evaluated. Platform: |
Size: 12288 |
Author:dlufl |
Hits:
Description: AprioriAll算法的基本思路
1) 排序阶段 利用客户标识customer 2id作为主关键字以及事务发生的时间transaction 2 time作为次关键字对数据库D排序,该步骤将原始的事务数据库转换成客户序列的数据库.
2) 发现频繁项集阶段 利用关联规则挖掘算法找出所有的频繁项目集.
3) 转换阶段 在已经转换的客户序列中,每一个事务被包含于该事物中的所大项目集来替换,如果一个序列不包含任何大项目集,则在已经转换的序列中不应该保留这项事务.
4) 序列阶段 利用核心算法找出所有的序列模式.
-Sequential pattern mining from the sequence found in the database as a sequence of frequent pattern, it is a kind of important data mining issues, has a very wide application, be used in customer buying behavior, including the analysis of network access mode of analysis, the scientific experiments Analysis, the early diagnosis of disease, natural disasters forecast, DNA sequences deciphered, and so on. The efficiency. In this paper, I was in the sequence pattern mining one of two algorithms to study, namely: Armorial and GSP algorithm. First on the sequence patterns of some basic concepts and principles. And demonstrate through concrete examples of the implementation of the algorithm, then reached into the grasp of understanding. Used vc again based on the programming language and Access database to achieve the end result of running the analysis and synthesis. Platform: |
Size: 2048 |
Author:hou ruilian |
Hits:
Description: Mining sequential pattern is one of the
common data mining task for many real-life
applications.Previous existing algorithm such as
CAMLS(Constraint-based Apriori Algorithm for
Mining Long Sequences) mines the complete set of
frequent sequences(Long) satisfying a min-sup
threshold in a sequence.However,mining long
sequences will generate an explosive number of
frequent sequences, which is prohibitively costly in
both run time and space storage.In this paper, we
propose to improve CAMLS algorithm to produce
only for closed sequences.Instead of mining full set of
sequences,we plan to mine only short(closed)
sequences.i.e.,those containing,no super sequences
with same support.Ou Platform: |
Size: 124928 |
Author:varun |
Hits:
Description: 在2006年9月召开的ICDM会议上,邀请了ACM KDD创新大奖(InnovationAward)和
Top 10 Algorithms in Data Mining
IEEEICDM研究贡献奖(Research Contributions Award)的获奖者们来参与数据挖掘10大算
法的选举,每人提名10种他认为最重要的算法-Classification,Statistical Learning,Top 10 Algorithms in Data Mining,materials on Association Analysis,Link Mining,Clustering,Bagging and Boosting,Sequential Patterns,Integrated Mining,Rough Sets,Graph Mining Platform: |
Size: 1840128 |
Author:yz |
Hits:
Description: 一个数据挖掘基础算法,AprioriAll算法的C++实现,用来实现序列模式挖掘的-A data mining based algorithm, AprioriAll algorithm C++ implementation, used to achieve sequential pattern mining Platform: |
Size: 3051520 |
Author:倪武 |
Hits:
Description: To reduce the generation of candidate sequences and the scans to sequence database for
AprioriAll algorithm, an efficient sequential pattern mining method based on improved AprioriAll
algorithm is presented. Firstly, data are preprocessed. Then do the sequential pattern mining with
improved AprioriAll algorithm. The improvements of AprioriAll algorithm are mainly two points:
one is to change the connection of candidate sequences to reduce the generation of candidate
sequences; the other is to reduce the needless database scans to improve the efficiency of
algorithm. Finally, the efficiency and validity of improved AprioriAll algorithm is validated
through experiments. Platform: |
Size: 836608 |
Author:sensensen |
Hits: