首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
本文介绍了一种基于最大公共子串(Longest Common Substring,LCS)算法的术语抽取方法:按标点符号对领域文档进行切分;抽取切分后的语句片断的所有最大公共子串作为候选术语集;通过停用词过滤、对照领域词筛选和术语嵌套子串筛选等规则进行判别,得到最终的术语集.通过学前教育领域术语抽取的实验,验证了该算法可以有效地抽取中文领域术语:术语抽取平均准确率达84.2%;4~6字符双词术语抽取的效果尤佳,准确率接近100%.  相似文献   

2.
研究从科技论文文本中抽取作者关键词以外的科技术语的方法。因为标引效应问题,单纯选择论文中的关键词作为候选术语会影响术语库的数量和质量,需要考虑从论文文本中抽取术语。现有的大多数术语抽取方法重视采用termhood指标,而忽视unithood指标,针对此问题,在C-value算法的基础上,提出用于生成候选术语的中文术语构词规则和测量术语内部结合强度的unithood指标,实现从论文文本中抽取中文科技术语。以信息资源管理领域的术语抽取为例对提出的方法进行验证,实验结果证明,提出的方法能够有效地抽取领域科技术语,抽取精度较高。  相似文献   

3.
为充分发挥知识组织在企业专利战略中的作用,在分析专利文献的基础上,根据中文专利文献句法描述的特点,利用最大串频匹配、蚁群聚类、多层KMeans聚类、改进关联规则计算、基于规则和CRFs的术语关系抽取等算法,设计出一套领域本体的半自动构建系统,包括术语抽取、分类关系抽取、非分类关系抽取、本体形式化等模块,初步实现结构化数据和非结构化文本的本体半自动构建。  相似文献   

4.
应用社会网络分析的方法解决多属性关联规则挖掘的问题,这是解决这类问题全新的视角.首先,从啤酒的不同品牌与尿不湿不同颜色的搭配引出了多属性关联规则挖掘的问题,并指出这类问题也包含着广泛的评价和推荐问题;而后,基于社会网络分析的视角,建立了相应的图模型及与之等价的矩阵,通过对图和矩阵的分析,引出了多属性关联规则挖掘的方法;为了进一步使方法有助于程序化表达,将既有的方法通过引入"指标向量"实现了统一表达,这有助于程序递归的实现;最后,给出了本文方法的算法步骤,并将其应用在一个100 000评估量规模的数据集上对方法进行实证分析.结果表明:本文通过社会网络分析的视角将抽象的关联规则挖掘变得可视化,这便于矩阵表达的引入,使得到的方法具有算法复杂度低、直观和易于把握的特征,相比于既有的多属性关联规则挖掘算法有优势.  相似文献   

5.
提出一种新的政务本体术语自动抽取的方法。首先通过中文分词技术和单字合并法提取政务文本中的词作为候选术语;通过C-value求解法和TF-IDF算法对候选术语进行过滤抽取,从而实现政务领域术语的自动抽取。通过实验比较,发现该方法在不影响领域术语抽取召回率的同时可以提高抽取术语的正确率。  相似文献   

6.
专利技术术语的抽取方法   总被引:2,自引:0,他引:2  
针对专利中缺少技术关键词的问题,在对主要的术语抽取方法研究的基础上,引入C-value方法,修改了术语构词规则和术语度(termhood)计算公式,用PC-value值测量一个词语的术语度,提出了专利技术术语抽取的流程模型,实现了从专利中抽取技术术语.该模型分为四个阶段:①分词和词性标注; ②运用语言学规则取得可能术语列表; ③计算词语的术语度值,取得候选术语列表; ④领域专家评估并确定术语.实验结果证明,提出的方法能很好地抽取中文专利技术术语,在长术语的抽取和抽取精度上比C-value方法更具有优势.  相似文献   

7.
一种从WEB上抽取信息的方法   总被引:1,自引:0,他引:1  
韩立新  谢立 《情报学报》2004,23(1):45-51
由于WWW上的信息很多存储在HTML页面上 ,因此如何从HTML文档中抽取有用信息是一个迫切需要解决的问题。文中提出一种从HTML文档中抽取信息的方法。该方法综合运用关联规则法、模式匹配、语法规则、聚类法等技术来抽取信息 ,从而较好地解决了现有的抽取方法准确性较差、通用性较差、人工干预较多的问题。  相似文献   

8.
术语的抽取是领域本体构建的基础工作,决定了本体构建的质量.获取的术语除了要求有准确的短语识别率,还要求有较高的术语领域度.本文试图研究一种不依赖于背景语料的术语领域度筛选方法.本文的主要工作集中在两个方面:一是通过统计和规则相结合的方法从领域语料中抽取候选术语(短语),二是提出了通过候选术语的分布度、活跃度以及主题度进行计算的多策略术语抽取方法,并通过实验进行了验证和分析.实验结果表明,在小规模航空航天领域语料库上进行验证性实验后发现,在不大量增加计算时间复杂度的情况下,能够有效提高领域术语抽取的质量,获得令人较满意的结果.  相似文献   

9.
从信息分析的实际需求出发,对与电动汽车相关的5 405条专利数据进行术语抽取、生僻术语识别和字段比较研究。结果显示关键短语抽取的方法可行,互信息抽取的术语所在文档的平均文档长度更接近集合的平均文档长度;摘要和First Claim字段的术语存在一定差别,但对分类或聚类同等重要;生僻术语识别算法能够发现生僻词和高频词的对应关系。研究结论可以为专利文本挖掘和专利信息分析提供结果和方法,并为信息分析工作提供所需的参考术语。  相似文献   

10.
侯丽  李姣  侯震  陈松景 《图书情报工作》2015,59(23):115-123
[目的/意义] 从互联网公众查询数据中发现公众使用的健康术语,为建立公众健康术语与医学专业术语的映射提供基础,进而优化健康类知识服务平台的知识组织与管理性能。[方法/过程] 设计规则与N-Gram相结合的健康术语新词的识别模型,采集公众查询数据,开展实验验证,通过多次实验,逐步完善过滤语料集合,结合人工判读,不断优化并验证方案的有效性。[结果/结论] 从互联网中公众提问句抽取出规则,结合统计算法进行公众使用的健康类新词抽取,该技术方法对识别公众使用的健康术语具有一定的通用性,能为建立公众术语与医学术语映射提供数据基础。实验结果表明:基于规则进行公众日志数据预处理,能为后续的实验方案提供较好的预处理文本,而采用N-Gram及各种过滤规则结合的术语识别方法,能较好地识别发现短文本中的新词。  相似文献   

11.
This article surveys a sample of sources of the information about Romania available to British readers in nineteenth century British newspapers and periodicals. It traces first contacts between the Romanian lands and Britain after the union of the principalities of Wallachia and Moldavia in 1859, then after their independence from the Ottoman Empire. The article highlights an increased Romanian interest in British periodicals, which reported and reviewed Romanian literature and scholarship. The article concludes that nineteenth century British newspapers and periodicals offer a great variety and wealth of new material previously unavailable or unknown to researchers. It also states that only a portion of a large quantity of this material has been indexed and is therefore available via the bibliographic sources mentioned in the article. The author argues for the need of a new and updated British-Romanian bibliography, which can draw on new online resources offering access to thousands of new newspapers and periodical records.  相似文献   

12.
ABSTRACT

The paper looks at library approval plans for material published in Slavic, East European, and Eurasian countries from the selector's point of view. Reasons why a selector would or would not want one are examined. Success with approval plans requires monitoring receipts, as well as good and ongoing communication among the selector, the acquistions department, and the vendor. A preliminary list of vendors offering approval plans for the countries of the region appears in the appendix.  相似文献   

13.
为进一步提升武汉科技信息共享服务平台使用效率,本文从平台资源建设、资源应用、供需对接方式和供需特点等方面分析了武汉科技信息资源服务现状;基于需求和利用的角度,结合平台管理实践和走访用户、问卷调查等研究方法,从信息资源需求主体和平台自身建设管理两个维度,找出制约科技资源供需对接的主要因素;以市场化和制度化为创新理念,从政策创新、机制创新、市场化服务、环境营造、人才培养等方面提出平台建设由“资源集聚”向“需求导向”转变的对策建议.  相似文献   

14.
ABSTRACT

The article examines the most important periodicals of ethnic minorities in Poland. After 1989, many ethnic groups (e.g., Germans and Romanies) were allowed to publish journals and newspapers for the first time since the end of World War II. The publications examined show the rich cultural life of the various ethnic groups as well as their current status in Poland. In addition to popular titles, some scholarly publications are also discussed.  相似文献   

15.
在分析文献在不同研究阶段用词时间特征的倾向性基础上,提出一种基于主题模型的研究发展阶段识别方法。重点阐述该方法的构建过程,包括时间特征抽取、发展阶段界定、主题冷热变化分析等步骤。为验证该方法的有效性,针对词频统计法和主题模型方法在主题演化分析中的效果进行比较分析。结果表明,该方法能在识别主题热点和发展趋势的同时,有效地区分不同主题所反映的研究发展阶段。  相似文献   

16.
ABSTRACT

In today's current political environment raising support for millage renewals, bond campaigns or even millage continuations for public libraries is affected by national politics, and a tendency towards tax aversion on principle across the country. The lessons we can learn from the governing boards of small and rural public libraries are worth raising up to a greater national consciousness. Board governance, community consciousness, and facilities management are clear and logical tools that elected boards of public libraries can use to politick for support of libraries.  相似文献   

17.
The author answers a reference question on bibliographic sources for the Ukrainian periodical press 1840–1850. Helpful publications include bibliographies, guides, and library catalogs. These potentially make mention of revolutionary developments in Hungary (such as the Twelve Points paragraph of the Demands of the Hungarian Nation in March 1848, the subsequent April Laws, and Hungary's declaration of independence in April 1949), and elsewhere in the Hapsburg Empire.  相似文献   

18.
Chromebooks and the G Suite group of products, like Google Search, Gmail, and Google Docs, have rapidly expanded in American schools during the past 5 years. The impact of one-to-one Chromebook devices and the pervasive use of Google's software products in American education cannot be overstated. This article explores some of the influences of these products on research, based on the experiences of a librarian and technology coordinator at an elementary education level. The author has several suggestions for effective research with these products in mind.  相似文献   

19.
Library historians can learn a great deal from studying the gender of authorship and institutional affiliation of a scholarly journal. The focus of this study is to examine these two aspects of journal production in library history to see who is producing published research in this field. Twenty-three years of Libraries & Culture were chosen as the target volumes. The study reveals that more men than women published in library history as well as locating which institutions were represented. This type of information is usedful to the library historian engaged in the analysis of published scholarship, and more generally to scholars with an interest in patterns of literature production in fields closely related to the social and behavioral sciences.  相似文献   

20.
ABSTRACT

The history of the almanac in Croatia is reconstructed through primary research in bibliographic and archival sources. The almanac is a vehicle for knowledge communication in informal contexts, engaging both oral tradition and literary forms traceable to medieval literacy and ways of structuring knowledge. The history of the almanac in Croatia reflects the changing context of the book trade, literacy, and the evolution of language. Four main stages are identified: (1) the beginning of the annual almanac in the seventeenth century; astrological almanacs reflecting the sensibility of the Baroque period; (2) the Enlightenment's stimulation of almanac publishing in the spirit of contemporary secular reforms in agriculture and education; (3) nineteenth-and twentieth-century almanac trade, showing complex and overlapping networks for the production, distribution and appropriation of printed almanacs;(4) roughly the end of World War II, when the almanac slowly moved out of the role of a popular mass medium and into specialized niches represented by regional, diaspora, and religious almanacs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号