英语翻译数据挖掘是从数据库中发现隐含的、新颖的、对决策有潜在价值的知识和规则的过程,目前已经在许多领域得到了广泛的应用.而聚类分析是数据挖掘领域最为重要的技术之一,聚类分
来源:学生作业帮助网 编辑:六六作业网 时间:2024/12/25 22:40:32
英语翻译数据挖掘是从数据库中发现隐含的、新颖的、对决策有潜在价值的知识和规则的过程,目前已经在许多领域得到了广泛的应用.而聚类分析是数据挖掘领域最为重要的技术之一,聚类分
英语翻译
数据挖掘是从数据库中发现隐含的、新颖的、对决策有潜在价值的知识和规则的过程,目前已经在许多领域得到了广泛的应用.而聚类分析是数据挖掘领域最为重要的技术之一,聚类分析是将物理或抽象对象的集合分成由类似的对象组成的多个簇的过程.通过聚类生成的簇是一组对象的集合,同一个簇中的对象彼此相似,不同簇中的对象相异.而在诸多聚类算法中,K-means聚类算法是最为经典的.
K-means算法是一种典型的基于划分的聚类算法,该算法有着思想简单易行,对大规模数据的挖掘具有高效性和可伸缩性,时间复杂性接近线性等优点.但是该算法也存在缺点:算法对初值敏感;初值采用随机,算法不够稳定;算法易陷入局部极小,并且一般只能发现球状簇;聚类个数K需要预先给定.
本文主要是介绍和分析传统K-means聚类算法并了解K-means聚类算法的优点和缺点,最后对K-means聚类算法进行改进.该改进主要针对K-means聚类算法对初值的依赖性这个特点进行改进.改进主要是通过一些算法进行初始点的选择,这样就克服了K-means算法不稳定等缺点,并能够使聚类结果更加精确.
主要工作内容和研究成果如下:
1.介绍和分析K-means聚类算法的思想,并实现该算法.然后通过一些数据来了解该算法的优缺点.
2.对K-means聚类算法的缺点进行改进,主要针对K-means聚类算法对初值的依赖性这个特点进行改进.采用两种改进方法,第一种借鉴Huffman思想,第二种借鉴贪心算法思想和Kruskal算法的思想.
不要使用google翻译和有道翻译等!
英语翻译数据挖掘是从数据库中发现隐含的、新颖的、对决策有潜在价值的知识和规则的过程,目前已经在许多领域得到了广泛的应用.而聚类分析是数据挖掘领域最为重要的技术之一,聚类分
Data mining from a database found implied, novel, a potential value of decision-making process of the knowledge and rules in many areas, has been widely used. And clustering analysis is the most important data mining field technology of clustering analysis is put physics or abstract collections of objects into the object by similar composed of multiple cluster process. By clustering generated clusters are a group of collections of objects, the object in the same clusters resemble each other, different with different objects in the cluster. And in many clustering algorithms, K - means clustering algorithm is the most classic.
K - means algorithm is a kind of typical clustering algorithm based on division, this algorithm has thought is simple, and the mining of large-scale data with efficiency and scalability, time complexity close to linear, etc. But this algorithm also exists weakness: algorithm of initial sensitive; Using random initial value, the algorithm is not quite stable; Algorithm easily into the local minimum, and only commonly found globular clusters; The cluster number K need to be given.
This paper mainly introduces and analyses tradition K - means clustering algorithms and understand K - means clustering algorithm, and finally the advantages and disadvantages of K - means clustering algorithm was improved. This improvement mainly for K - means clustering algorithm's dependence on initial value this characteristic is improved. Improvement mainly through some algorithm of the initial points, so choose overcomes K - means algorithm unstable, and can make the disadvantages such as clustering results more precise.
Main content and research results are as follows: 1. Introduction and analysis K - means clustering algorithms, and realize the ideological algorithm. Then through some data to understand the advantages and disadvantages of this algorithm.
2. The K - means clustering algorithm improved the shortcomings, mainly for K - means clustering algorithm's dependence on initial value this characteristic is improved. Using the two improved methods for reference, the first kind, the second kind of reference Huffman thought Kruskal algorithm greedy algorithm of thoughts and ideas.