首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Integrating External Knowledge to Supplement Training Data in Semi-Supervised Learning for Text Categorization
Authors:Mohammed Benkhalifa  Abdelhak Mouradi  Houssaine Bouyakhf
Institution:1. School of Science and Engineering, Al Akhawayn University in Ifrane (AUI), Av. Hassan II, Ifrane, 53000, Morocco
2. Ecole Nationale Superieure d'Informatique et d'Analyses des Systémes (ENSIAS), Mohammed V University, Agdal Rabat, p??
3. Computer Science Department, Mohammed V University, Facuty of Sciences in Rabat, Morocco
Abstract:Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. TC has been an application for many learning approaches, which prove effective. Nevertheless, TC provides many challenges to machine learning. In this paper, we suggest, for text categorization, the integration of external WordNet lexical information to supplement training data for a semi-supervised clustering algorithm which can learn from both training and test documents to classify new unseen documents. This algorithm is the ldquoSemi-Supervised Fuzzy c-Meansrdquo (ssFCM). Our experiments use Reuters 21578 database and consist of binary classifications for categories selected from the 115 TOPICS classes of the Reuters collection. Using the Vector Space Model, each document is represented by its original feature vector augmented with external feature vector generated using WordNet. We verify experimentally that the integration of WordNet helps ssFCM improve its performance, effectively addresses the classification of documents into categories with few training documents and does not interfere with the use of training data.
Keywords:semi supervised Fuzzy c Means  text categorization  vector space model  WordNet lexical database  Reuters Database
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号