首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Character contiguity in N-gram-based word matching: the case for Arabic text searching
Institution:1. Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;2. Institute of Mathematics and Information Science, North-Eastern Federal University, Yakutsk, Russia;3. Optoelectronics Center, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;4. Department of Mathematics and Statistics, Georgia State University, Atlanta, GA 30302, USA;1. Department of Rehabilitation Medicine, Fukuoka University Hospital, Fukuoka, Japan;2. Department of Neurology, Fukuoka University, Fukuoka, Japan;3. Department of Neurosurgery, Fukuoka University, Fukuoka, Japan
Abstract:This work assesses the performance of two N-gram matching techniques for Arabic root-driven string searching: contiguous N-grams and hybrid N-grams, combining contiguous and non-contiguous. The two techniques were tested using three experiments involving different levels of textual word stemming, a textual corpus containing about 25 thousand words (with a total size of about 160KB), and a set of 100 query textual words. The results of the hybrid approach showed significant performance improvement over the conventional contiguous approach, especially in the cases where stemming was used. The present results and the inconsistent findings of previous studies raise some questions regarding the efficiency of pure conventional N-gram matching and the ways in which it should be used in languages other than English.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号