首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Clustering tagged documents with labeled and unlabeled documents
Authors:Chien-Liang Liu  Wen-Hoar HsaioChia-Hoang Lee  Chun-Hsien Chen
Institution:Department of Computer Science, 1001 University Road, Hsinchu 300, Taiwan, ROC
Abstract:This study employs our proposed semi-supervised clustering method called Constrained-PLSA to cluster tagged documents with a small amount of labeled documents and uses two data sets for system performance evaluations. The first data set is a document set whose boundaries among the clusters are not clear; while the second one has clear boundaries among clusters. This study employs abstracts of papers and the tags annotated by users to cluster documents. Four combinations of tags and words are used for feature representations. The experimental results indicate that almost all of the methods can benefit from tags. However, unsupervised learning methods fail to function properly in the data set with noisy information, but Constrained-PLSA functions properly. In many real applications, background knowledge is ready, making it appropriate to employ background knowledge in the clustering process to make the learning more fast and effective.
Keywords:Text mining  Document clustering  Semi-supervised clustering  Tagged document clustering
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号