首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Adaptive sampling for thresholding in document filtering and classification
Institution:1. Department of Computer Science and Engineering, Akshaya College of Engineering and Technology, Coimbatore, Tamil Nadu, India;2. Department of Information Technology, PSG College of Technology, Coimbatore, Tamil Nadu, India
Abstract:Document filtering (DF) and document classification (DC) are often integrated together to classify suitable documents into suitable categories. A popular way to achieve integrated DF and DC is to associate each category with a threshold. A document d may be classified into a category c only if its degree of acceptance (DOA) with respect to c is higher than the threshold of c. Therefore, tuning a proper threshold for each category is essential. A threshold that is too high (low) may mislead the classifier to reject (accept) too many documents. Unfortunately, thresholding is often based on the classifier's DOA estimations, which cannot always be reliable, due to two common phenomena: (1) the DOA estimations made by the classifier cannot always be correct, and (2) not all documents may be classified without any controversy. Unreliable estimations are actually noises that may mislead the thresholding process. In this paper, we present an adaptive and parameter-free technique AS4T to sample reliable DOA estimations for thresholding. AS4T operates by adapting to the classifier's status, without needing to define any parameters. Experimental results show that, by helping to derive more proper thresholds, AS4T may guide various classifiers to achieve significantly better and more stable performances under different circumstances. The contributions are of practical significance for real-world integrated DF and DC.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号