一种基于N-Gram改进的文本特征提取算法 An Improved Text Feature Extraction Algorithm Based on N-Gram期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

一种基于N-Gram改进的文本特征提取算法

引用本文：	于津凯,王映雪,陈怀楚.一种基于N-Gram改进的文本特征提取算法[J].图书情报工作,2004,48(8):48-50,43.

作者姓名：	于津凯王映雪陈怀楚

作者单位：	清华大学计算机与信息管理中心, 北京 100084

摘要：	介绍一种改进的文本特征提取及匹配算法。该算法基于N-Gram算法思路进行文本处理和特征提取，设计了gram关联矩阵用于统计与合并特征词，从而在固定长度N-Gram算法的基础上能够提取出不同长度的特征词。实验证明，该特征提取算法能够更为准确地描述文本特征，可应用于文本检索、Web挖掘等信息处理领域。
关键词：	文本特征提取 N-Gram算法 gram关联矩阵
收稿时间：	2003-11-03
An Improved Text Feature Extraction Algorithm Based on N-Gram

Yu Jinkai Wang Yingxue Chen Huaichu Computer & Information Management Center,Tsinghua University,Beijing.An Improved Text Feature Extraction Algorithm Based on N-Gram[J].Library and Information Service,2004,48(8):48-50,43.

Authors:	Yu Jinkai Wang Yingxue Chen Huaichu Computer & Information Management Center Tsinghua University Beijing

Institution:	Computer & Information Management Center, Tsinghua University, Beijing 100084

Abstract:	This paper introduces an improved text feature extraction algorithm based on N - Gram theory. It designs a gram correlative matrix to unite the consecutive bigrams into a multigram and breaks the limit of N - Gram which has fixed - length gram extractions and forms the multigram features.

Keywords:	text feature extraction N-Gram algorithm gram correlative matrix
本文献已被 CNKI 维普万方数据等数据库收录！
	点击此处可从《图书情报工作》浏览原始摘要信息
	点击此处可从《图书情报工作》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏