Cross-lingual text categorization: Conquering language boundaries in globalized environments期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Cross-lingual text categorization: Conquering language boundaries in globalized environments

Authors:	Chih-Ping Wei Yen-Ting Lin Christopher C Yang

Institution:	1. Department of Information Management, College of Management, National Taiwan University, Taipei, Taiwan, ROC;2. Science & Technology Policy Research and Information Center, National Applied Research Laboratories, Taipei, Taiwan, ROC;3. College of Information Science and Technology, Drexel University, Philadelphia, PA, USA

Abstract:	Text categorization pertains to the automatic learning of a text categorization model from a training set of preclassified documents on the basis of their contents and the subsequent assignment of unclassified documents to appropriate categories. Most existing text categorization techniques deal with monolingual documents (i.e., written in the same language) during the learning of the text categorization model and category assignment (or prediction) for unclassified documents. However, with the globalization of business environments and advances in Internet technology, an organization or individual may generate and organize into categories documents in one language and subsequently archive documents in different languages into existing categories, which necessitate cross-lingual text categorization (CLTC). Specifically, cross-lingual text categorization deals with learning a text categorization model from a set of training documents written in one language (e.g., L₁) and then classifying new documents in a different language (e.g., L₂). Motivated by the significance of this demand, this study aims to design a CLTC technique with two different category assignment methods, namely, individual- and cluster-based. Using monolingual text categorization as a performance reference, our empirical evaluation results demonstrate the cross-lingual capability of the proposed CLTC technique. Moreover, the classification accuracy achieved by the cluster-based category assignment method is statistically significantly higher than that attained by the individual-based method.

Keywords:	Document management Text mining Text categorization Cross-lingual text categorization
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏