首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
K-means算法是一种应用广泛的聚类算法,但是存在初始聚类中心和K值选取的难题.本文提出了一种基于学术文献同被引分析的初始聚类中心和K值选取的K-means改进算法.该算法属于两步聚类算法,首先对学术文献进行同被引分析,得到同被引矩阵,然后基于同被引矩阵进行层次聚类.算法记录每次迭代过程中被聚为一类的学术文献间的距离以及两次迭代间的距离差,当两次迭代的距离差取得最大值时取其聚类数作为第二步K-means算法的K值,并且将此时的类中心作为第二步K-means算法的初始聚类中心.第二步聚类则依据文献内容实现K-means算法.实验通过与经典K-means算法和基于凝聚层次聚类算法的改进K-means算法的对比,证明了本文提出的改进的K-means算法具备更优的聚类效果.  相似文献   

2.
K-means算法研究综述   总被引:4,自引:0,他引:4  
对聚类分析中的基本算法K-means算法中的K值确定、初始聚类中心选择以及分类属性数据处理等主要问题进行综述,理清K-means算法的整个发展脉络及算法研究中的热点和难点,提出改进K-means聚类算法的思路。  相似文献   

3.
一种基于改进K-means的文档聚类算法的实现研究   总被引:1,自引:1,他引:0  
在对文档聚类的含义、作用和一般过程的阐述基础上,分析一种基于“最小最大”原则初始质心优选的改进K-means聚类的基本思想,并重点设计相关的聚类算法,实现聚类系统,基于系统对300篇学术文档及其相关特征词语进行聚类实验。实验结果表明,本文所设计和实现的改进K-means的聚类算法表现出较好的性能。  相似文献   

4.
提出利用蚁群聚类方法进行初始聚类,通过K-means聚类算法对初始聚类的结果进一步分层聚类,并结合术语综合相似度计算的方式提取每个类的标签,从而完成术语层次关系的构建。最后抽取部分实验结果,由领域专家对其进行评价,并对结果进行分析。  相似文献   

5.
文章在对DBSCN与K-means两种经典聚类算法分析研究基础上,结合中文文本数据的特点,对这两种方法进行结合与改进,提出了一种中文文本聚类方法:DKTC。该算法能自动产生簇的个数,且对“噪声”或异常数据不敏感,对数据的输入顺序不敏感,另外,与DBSCAN相比,该算法有更高的处理效率。实验表明,DKTC算法不仅能对中文文本进行聚类,且与传统DBSCN与K-means法相比,聚类效果都有一定程度的改善。  相似文献   

6.
常娥 《图书情报工作》2012,56(11):89-92
结合潜性语义索引(latent semantic index,LSI)理论和K-means聚类法,提出一种改进的文本自动聚类方法,即首先利用N-gram统计法抽取文档关键词,并应用潜性语义索引LSI对构建文档的向量空间模型进行降维,然后采用K-means算法进行文本聚类。实验表明,该算法进行文本聚类的准确度最高可达84.7%。  相似文献   

7.
宋江春  沈钧毅 《情报学报》2006,25(4):488-492
提出了一个新的基于双向近邻技术的多层文档聚类算法。使用新的文档特征抽取方法构造了文档的主题和关键字特征向量。首先在主题特征向量空间中,改进了传统的最近邻技术,使最近邻概念由单向变为双向。利用改进后的方法对文档进行初始聚类,然后在基于主题关键字的新的特征向量空间中利用类间距和连接度对初始文档类进行求精,从而得到最终聚类。由于使用了两层聚类方法,使算法的效率和精度都大大提高。最后对算法的有效性、可伸缩性和时间复杂度进行了研究。  相似文献   

8.
基于粒子群的模糊C均值文本聚类算法研究   总被引:1,自引:0,他引:1  
利用模糊C均值算法解决文本聚类问题时,随机选取的初始聚类中心和聚类数会导致不同的聚类结果,且容易陷入局部最优。提出利用粒子群优化算法确定模糊C均值的初始聚类中心,并通过向量空间模型和特征提取,再利用模糊C均值进行文档聚类。实验表明,这种基于粒子群的模糊C均值聚类算法迭代次数少,能解决经典模糊C均值算法对初始值敏感和易陷入局部极小的缺点,且聚类速度和效果得到明显提高。  相似文献   

9.
基于粗糙用户聚类的协同过滤推荐模型   总被引:1,自引:0,他引:1  
【目的】将粗糙集引入到基于用户聚类的协同过滤中,提高推荐质量。【方法】提出一种基于粗糙用户聚类的协同过滤推荐模型:离线时采用粗糙K-means用户聚类算法,根据用户与聚类中心的相似度将其分配到K个类的上、下近似中,形成用户的初始近邻集;在线时从目标用户的初始近邻集中搜索其最近邻,预测项目评分并向其产生推荐。【结果】通过实验对比发现,该模型比传统的和基于项目的协同过滤推荐算法降低约14%的平均绝对误差,比基于用户聚类的协同过滤推荐算法降低约10%的平均误差。【局限】在考虑上、下近似对聚类中心调整的重要程度时,忽略了用户聚类数目和最近邻集用户数阈值的变化所产生的影响。【结论】该模型能有效提高推荐精度,具有较强的可行性和现实意义。  相似文献   

10.
基于蚁群算法与K-means算法相结合的Web用户聚类   总被引:1,自引:1,他引:0  
Web用户聚类是指用聚类算法产生用户会话的聚类,是电子商务中的一个重要问题.该问题的难度在于有成千上万的会话需要聚类,而且每个会话都可描述为一个高维向量.此外,该问题就聚类的数目而言具有指数的复杂性,是一个NP-难的问题.本文提出一种新的聚类方法,该方法将蚁群算法与K-means算法相结合对用户会话进行优化聚类.实验结果表明,与K-means算法相比,该方法在Web导航推荐的应用中具有更好的性能.  相似文献   

11.
This article surveys a sample of sources of the information about Romania available to British readers in nineteenth century British newspapers and periodicals. It traces first contacts between the Romanian lands and Britain after the union of the principalities of Wallachia and Moldavia in 1859, then after their independence from the Ottoman Empire. The article highlights an increased Romanian interest in British periodicals, which reported and reviewed Romanian literature and scholarship. The article concludes that nineteenth century British newspapers and periodicals offer a great variety and wealth of new material previously unavailable or unknown to researchers. It also states that only a portion of a large quantity of this material has been indexed and is therefore available via the bibliographic sources mentioned in the article. The author argues for the need of a new and updated British-Romanian bibliography, which can draw on new online resources offering access to thousands of new newspapers and periodical records.  相似文献   

12.
ABSTRACT

The paper looks at library approval plans for material published in Slavic, East European, and Eurasian countries from the selector's point of view. Reasons why a selector would or would not want one are examined. Success with approval plans requires monitoring receipts, as well as good and ongoing communication among the selector, the acquistions department, and the vendor. A preliminary list of vendors offering approval plans for the countries of the region appears in the appendix.  相似文献   

13.
为进一步提升武汉科技信息共享服务平台使用效率,本文从平台资源建设、资源应用、供需对接方式和供需特点等方面分析了武汉科技信息资源服务现状;基于需求和利用的角度,结合平台管理实践和走访用户、问卷调查等研究方法,从信息资源需求主体和平台自身建设管理两个维度,找出制约科技资源供需对接的主要因素;以市场化和制度化为创新理念,从政策创新、机制创新、市场化服务、环境营造、人才培养等方面提出平台建设由“资源集聚”向“需求导向”转变的对策建议.  相似文献   

14.
ABSTRACT

The article examines the most important periodicals of ethnic minorities in Poland. After 1989, many ethnic groups (e.g., Germans and Romanies) were allowed to publish journals and newspapers for the first time since the end of World War II. The publications examined show the rich cultural life of the various ethnic groups as well as their current status in Poland. In addition to popular titles, some scholarly publications are also discussed.  相似文献   

15.
在分析文献在不同研究阶段用词时间特征的倾向性基础上,提出一种基于主题模型的研究发展阶段识别方法。重点阐述该方法的构建过程,包括时间特征抽取、发展阶段界定、主题冷热变化分析等步骤。为验证该方法的有效性,针对词频统计法和主题模型方法在主题演化分析中的效果进行比较分析。结果表明,该方法能在识别主题热点和发展趋势的同时,有效地区分不同主题所反映的研究发展阶段。  相似文献   

16.
ABSTRACT

In today's current political environment raising support for millage renewals, bond campaigns or even millage continuations for public libraries is affected by national politics, and a tendency towards tax aversion on principle across the country. The lessons we can learn from the governing boards of small and rural public libraries are worth raising up to a greater national consciousness. Board governance, community consciousness, and facilities management are clear and logical tools that elected boards of public libraries can use to politick for support of libraries.  相似文献   

17.
The author answers a reference question on bibliographic sources for the Ukrainian periodical press 1840–1850. Helpful publications include bibliographies, guides, and library catalogs. These potentially make mention of revolutionary developments in Hungary (such as the Twelve Points paragraph of the Demands of the Hungarian Nation in March 1848, the subsequent April Laws, and Hungary's declaration of independence in April 1949), and elsewhere in the Hapsburg Empire.  相似文献   

18.
Chromebooks and the G Suite group of products, like Google Search, Gmail, and Google Docs, have rapidly expanded in American schools during the past 5 years. The impact of one-to-one Chromebook devices and the pervasive use of Google's software products in American education cannot be overstated. This article explores some of the influences of these products on research, based on the experiences of a librarian and technology coordinator at an elementary education level. The author has several suggestions for effective research with these products in mind.  相似文献   

19.
Library historians can learn a great deal from studying the gender of authorship and institutional affiliation of a scholarly journal. The focus of this study is to examine these two aspects of journal production in library history to see who is producing published research in this field. Twenty-three years of Libraries & Culture were chosen as the target volumes. The study reveals that more men than women published in library history as well as locating which institutions were represented. This type of information is usedful to the library historian engaged in the analysis of published scholarship, and more generally to scholars with an interest in patterns of literature production in fields closely related to the social and behavioral sciences.  相似文献   

20.
ABSTRACT

The history of the almanac in Croatia is reconstructed through primary research in bibliographic and archival sources. The almanac is a vehicle for knowledge communication in informal contexts, engaging both oral tradition and literary forms traceable to medieval literacy and ways of structuring knowledge. The history of the almanac in Croatia reflects the changing context of the book trade, literacy, and the evolution of language. Four main stages are identified: (1) the beginning of the annual almanac in the seventeenth century; astrological almanacs reflecting the sensibility of the Baroque period; (2) the Enlightenment's stimulation of almanac publishing in the spirit of contemporary secular reforms in agriculture and education; (3) nineteenth-and twentieth-century almanac trade, showing complex and overlapping networks for the production, distribution and appropriation of printed almanacs;(4) roughly the end of World War II, when the almanac slowly moved out of the role of a popular mass medium and into specialized niches represented by regional, diaspora, and religious almanacs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号