Character contiguity in N-gram-based word matching: the case for Arabic text searching期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Character contiguity in N-gram-based word matching: the case for Arabic text searching

Institution:	1. Department of Mathematics and Statistics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;2. Institute of Mathematics and Information Science, North-Eastern Federal University, Yakutsk, Russia;3. Optoelectronics Center, University of North Carolina at Charlotte, Charlotte, NC 28223, USA;4. Department of Mathematics and Statistics, Georgia State University, Atlanta, GA 30302, USA;1. Department of Rehabilitation Medicine, Fukuoka University Hospital, Fukuoka, Japan;2. Department of Neurology, Fukuoka University, Fukuoka, Japan;3. Department of Neurosurgery, Fukuoka University, Fukuoka, Japan

Abstract:	This work assesses the performance of two N-gram matching techniques for Arabic root-driven string searching: contiguous N-grams and hybrid N-grams, combining contiguous and non-contiguous. The two techniques were tested using three experiments involving different levels of textual word stemming, a textual corpus containing about 25 thousand words (with a total size of about 160KB), and a set of 100 query textual words. The results of the hybrid approach showed significant performance improvement over the conventional contiguous approach, especially in the cases where stemming was used. The present results and the inconsistent findings of previous studies raise some questions regarding the efficiency of pure conventional N-gram matching and the ways in which it should be used in languages other than English.

Keywords:
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏