登录

双语推荐:频繁项

对于不确定性数据,传统判断集是否频繁的方法并不能准确表达集的频繁性,同样对于大型数据,频繁项集显得庞大和冗余。针对上述不足,在水平挖掘算法Apriori的基础上,提出一种基于不确定性数据的频繁集挖掘算法UFCIM。利用置信度概率表达频繁的准确性,置信度越高,集为频繁的准确性也越高,且由于频繁集是频繁项集的一种无损压缩表示,因此利用压缩形式的频繁集替代庞大的频繁项集。实验结果表明,该算法能够快速地挖掘出不确定性数据中的频繁集,在减少集冗余的同时保证集的准确性和完整性。
For the uncertain data, traditional method of judging whether an itemset is frequent cannot express how close the estimate is, meanwhile frequent itemsets are large and redundant for large datasets. Regarding to the above two disadvantages, this paper proposes a mining algorithm of frequent closed itemsets based on uncertain data called UFCIM to mine frequent closed itemsets from uncertain data according to frequent itemsets mining method from uncertain data, and it is based on level mining algorithm Apriori. It uses probability of confidence to express how close the estimate is, the larger that probability of confidence is, the itemsets are more likely to be frequent. Besides as frequent closed itemsets are compact and lossless representation of frequent itemsets, so it uses compacted frequent closed itemsets to take place of frequent itemsets which are of huge size. Experimental result shows the UFCIM algorithm can mine frequent closed itemsets effectively and quickly. It can reduce

[ 可能符合您检索需要的词汇 ]

频繁模式是频繁地出现在数据集中的模式(如集、子序列或子结构).如频繁地同时出现在交易数据集中的商品的集合是频繁项集,利用高效率的频繁项集挖掘算法来发现频繁项集,通过分析这些频繁项集来预测商品的销售情况.
Frequent pattern is frequently seen in the data concentration mode (such as itemsets, sequences or structures).As fre?quently appear in both the transaction data concentrated merchandise collection is frequent itemset, Using of efficient algorithm for mining frequent itemsets to find frequent itemsets, through the analysis of the frequent itemsets to predict the commodity the sales situation.

[ 可能符合您检索需要的词汇 ]

频繁项集挖掘是数据挖掘研究领域的一个基本问题,其瓶颈在于频繁项集全集的结果过多,冗余现象严重,而频繁集能唯一确定频繁项集且规模小得多。针对如何快速生成频繁集,分析不可分辨矩阵、概念格和频繁集之间的关系,提出一种新的更有利于生成频繁集的格结构,并给出相应的渐进式生成算法和频繁集提取算法。实验表明该方法能够高效地挖掘频繁集。
Frequent itemset mining is one of the fundamental problems in data mining research field.The crux of frequent itemsets mining is that the resultant itemsets of full sets of frequent itemset are usually extremely much, and hence the redundancy phenomenon is serious. The frequent closed itemset is the one can only exactly determine the frequent itemsets and with much smaller size.In this paper, for generating the frequent closed itemset rapidly, we analyse the relationship among the indistinguishable matrix, the concept lattice and the frequent closed itemsets, and propose a new lattice structure which benefits more to generating the frequent closed itemsets. The corresponding algorithms of incremental generation and frequent closed itemsets extraction are also presented.Experiments demonstrate that the method can efficiently mine the frequent closed itemsets.

[ 可能符合您检索需要的词汇 ]

不确定性数据挖掘已经成为数据挖掘领域的新热点,频繁项集挖掘是重点研究的问题之一.但是目前出现的挖掘算法大多集中在完全频繁项集,而用于最大频繁项集和频繁集的算法尚不多见.文中研究了一种基于UF-Tree的用于不确定性数据中挖掘最大频繁项集的算法,该挖掘过程分为两个步骤,第一步先得到以频繁1-集为后缀的局部最大频繁项集,第二步得到所有的全局最大频繁项集,实验证明该算法性能良好且特别适用于稠密型、事务长度较小的数据集.
Recently ,the research on uncertain data mining has become a new hotspot in the area of data mining ,and the frequent itemsets mining is one of the focus issues .The existing algorithms mostly concentrated on the complete frequent itemsets ,and there is few algorithms used to mine maximal or closet ones .This paper proposes a new algorithm UMF-growth to mine maximal fre-quent itemsets from uncertain data .The mining process of the UMF-growth is divided into two steps :the first step is to find out all of the local maximal frequent itemsets with the frequent 1-i-tem as suffixes ,respectively .And the second step is to get all the maximal frequent itemsets . The experimental results show that the performance of UMF-growths is very good and especially suitable for the dense database .

[ 可能符合您检索需要的词汇 ]

研究频繁项集模式挖掘优化问题。传统的挖掘算法常产生大规模的候选集,并且反复扫描数据库,导致频繁项集挖掘时间过长,空间效率太低。为了改进频繁项集挖掘时时间与空间效率低的问题,提出一种高效频繁项集挖掘算法CPT-Mine。此算法利用编码模式树存储事务数据库中的频繁项集信息,构建FP数组,加快产生频繁项集,引入CPT-Mine算法,快速地挖掘数据库中所包含的频繁项集,无需递归构造条件模式树,只需两次扫描数据库即可生成所有频繁项集。最后的实验证明了该算法能缩短挖掘时间3~10 s,空间效率提高43%。
The frequent itemsets mining optimization model is researched. Traditional mining is often produce large-scale candidate itemsets, and repeatedly scanning database, the time is too long, lead to frequent itemsets mining space efficien-cy is too low. Frequent itemset mining is to improve the problem of low efficiency of time and space, put forward an efficient algorithm for mining frequent itemsets CPT-Mine. The algorithm using encoding scheme tree store information of frequent itemsets of transaction databases construction of FP array, to speed up the produce frequent itemsets, the introduction of CPT-Mine algorithm, fast, frequent itemsets mining database contains no recursive model tree structure conditions, only two times of scanning database can generate all frequent itemsets. The experiment proves that this algorithm can shorten the mining time 3~10 s, space efficiency increased by 43%.

[ 可能符合您检索需要的词汇 ]

针对频繁项集挖掘时间与空间效率低的问题,提出一种基于New FP-tree的高效频繁项集挖掘算法。此算法利用New FP-tree结构存储事务数据库中的频繁项集信息,无需递归构造条件模式树,仅需两次扫描数据库即可生成所有频繁项集。最后的实验证明了该算法的有效性。
Aiming at the problem of low time and space efficiencies for frequent item set mining, an algorithm for frequent item set mining based on New FP-tree is proposed. The algorithm constructs New FP-tree to compress business database. Without recursion condition pattern tree, the algorithm needs to scan database only two times to produce all frequent item set. Lastly the algorithm is realized on experiment and is proved to be valid.

[ 可能符合您检索需要的词汇 ]

Apriori算法在关联规则挖掘过程中需要多次扫描事务数据库,产生大量候选目集,导致计算量过大。为解决该问题,提出一种基于频繁2集支持矩阵的Apriori改进算法,通过分析频繁k+1集的生成机制,将支持矩阵与频繁2集矩阵相结合实现快速剪枝,并大幅减少频繁k集验证的计算量。实验结果表明,与Apriori算法和ABTM算法相比,改进算法明显提高了频繁项集的挖掘效率。
As Apriori algorithm used for mining association rules can lead to a large number of candidate itemsets and huge computations, an improved Apriori algorithm based on frequency 2-item set support matrix is proposed. By analyzing the generation mechanism of frequent k+1 item sets, the improved algorithm combines assistant matrix and frequent 2-item matrix to realize rapid purning, it can trim infrequent item set quickly and reduce the amount of calculation of k frequent item set verification. Experimental result shows that frequent itemsets mining efficiency of improved algorithm increases significantly compared with Apriori algorithm and ABTM algorithm.

[ 可能符合您检索需要的词汇 ]

挖掘数据流中频繁项集的技术是当前研究的热点之一.笔者借鉴数据模型FP-tree的结构,提出改进的适应挖掘数据流完全频繁项集的方法:FP-NEW.算法预处理阶段保存生成的潜在频繁项并作为构造NFP-tree中的记录输入,用户可以通过设置时间权重等策略对存储结果进行剪枝处理,最终经过迭代挖掘界标窗口中的完全频繁项集.实验证明算法能够适应数据流频繁项集的挖掘,并且在时空效率以及挖掘准确性上有一定优势.
Frequent item sets in stream data mining become one of the hot research topics.On the basis of the models of FP-tree data structure,we propose the improved adaptive data mining methods for data stream complete frequent item sets:FP-NEW.The potential frequent items generated in the preprocessing stage are saved and entered into the NFP-tree.Using the strategy of time weight setting,the users can make pruning treatment to the stored dataset.Finally,iteration is used in mining the complete frequent item sets in the landmark window. Experiments show that the algorithm can be applied to the mining of the data stream complete frequent item sets, and has certain advantages over the time and space efficiency and the accuracy.

[ 可能符合您检索需要的词汇 ]

为了进一步降低扫描数据库的次数和减轻内存负担,从而更好地提高挖掘频繁项集的效率,一种基于Apriori的优化算法(M-Apriori)被提出.该方法通过构建频繁状态矩阵来存放集的频繁状态,构建事务布尔矩阵来存放事务与集的关系,此算法只需在初始化阶段扫描一次数据库产生初始的频繁状态矩阵和事务布尔矩阵,并在此基础上直接递推产生所有的频繁项集.实验证明,与Apriori算法相比,M-Apriori算法具有更好的性能与效率.
To reduce the number of database scanning and reduce the burden of memory further, also to improve the efficiency of mining frequent itemsets better, an Apriori-based optimization algorithm (M-Apriori) is proposed. The method stores frequent itemsets state by constructing the frequent state matrix and store the relationship between the transaction and itemsets by constructing the Boolean matrix. The algorithm scans the database only once and generates the initial frequent state matrix and the Boolean matrix during the initialization phase. On this basis, all frequent itemsets can be found directly without scanning the database repeatedly. Experiments show that M-Apriori algorithm has better performance and efficiency compared with the Apriori algorithm.

[ 可能符合您检索需要的词汇 ]

通过对 Apriori 算法基本原理和性能的研究分析,针对算法存在的不足,提出了一种更高效的基于对频繁项集分组并行的挖掘算法。该算法把频繁 k-1集按照一定规律分组,每组频繁 k-1子集直接产生频繁 k 子集;再把每组产生的频繁 k 子集合起来,这样每组不仅在自连接时减少了很多判断连接尝试,而且可以并行处理连接、剪枝行为,减少了等待时间,提高了查找频繁项集的速度。经过实验证实,改进后的算法在性能上有很大的提升。
The technical principle and performance of Apriori algorithm are studied . Aiming at the deficiencies in the algo-rithm , this paper gives a more efficient algorithm for mining frequent itemsets based on parallel grouping . This algorithm classifies ( k-1 )-frequent itemsets according to certain rules , each group of ( k-1 )-frequent itemsets generates k-frequent itemsets directly , and then combines them . So this will reduce a lot of judgement attempt at the self-connection and can provide parallel processing capabilities to solve connection and pruning action , reducing the waiting time and improve the search speed of frequent itemsets . Experiments show that the improved algorithm has greatly improved in performance .

[ 可能符合您检索需要的词汇 ]