首页 | 本学科首页   官方微博 | 高级检索  
     检索      

用于双语术语抽取的专业领域中英文可比语料库构建
引用本文:康小丽,章成志.用于双语术语抽取的专业领域中英文可比语料库构建[J].现代图书情报技术,2012(2):28-33.
作者姓名:康小丽  章成志
作者单位:南昌大学图书馆;南京理工大学信息管理系
基金项目:国家自然科学基金项目“基于可比语料的多语言文本聚类研究”(项目编号:70903032);南京理工大学自主科研专项计划项目“多语言标签聚类研究”(项目编号:2011ZDJH15)的研究成果之一
摘    要:面向双语术语抽取这一应用目标,提出专业领域可比语料库的构建方案并进行实验论证。针对给定的主题领域分别进行中英文专业语料的采集,从中分别获取中英文关键词,根据词语共现统计获取该主题领域的其他相关关键词;以这些关键词作为查询入口,通过学术搜索引擎从网络获取候选可比语料;对可比语料进行定量评估,以剔除不符合要求的语料,最终得到特定主题领域的可比语料库。

关 键 词:可比语料库  语料库构建  双语术语抽取

Chinese-English Comparable Corpus Construction for Bilingual Terminology Extraction
Kang Xiaoli,Zhang Chengzhi.Chinese-English Comparable Corpus Construction for Bilingual Terminology Extraction[J].New Technology of Library and Information Service,2012(2):28-33.
Authors:Kang Xiaoli  Zhang Chengzhi
Institution:1(Library of Nanchang University,Nanchang 330031,China) 2(Department of Information Management,Nanjing University of Science and Technology,Nanjing 210094,China)
Abstract:In this paper,the process of building comparable corpus in special domain for bilingual terminology is designed.Firstly,bilingual sample corpus in a special domain is collected,and Key words are extracted from the sample corpus based on word co-occurrence method.Then,these keywords are used to be a query to a scholar search engine,and the searched result is used to be candidate comparable corpus.Finally,the comparable corpus in the special domain is obtained after filtering noise documents by quantitative evaluation.
Keywords:Comparable corpus Corpus construction Bilingual terminology extraction
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号