首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
彭玉芳  陈将浩 《情报科学》2022,39(1):141-147
【目的/意义】从海量的学术文献内容中,抽取科研人员所需要的目标数据,一方面有助于提高研究者的科 研效率,另一方面有利于改善目前文献数据库的检索服务。【方法/过程】根据科研人员的学术需求,首先通过深度 学习方法从大量的学术文献中抽取目标数据。其次使用NER和TF-IDF抽取目标数据的“5W”规则,接着对目标 数据做第二层需求规则过滤,凡是满足“5W”规则的数据,被鉴定为目标数据。最后对目标数据做第三层人工校 验,最终生成学术文献“目标数据”。【结果/结论】本文构建的学术文献“目标数据”抽取模型的准确率可达0.88,再融 合“5W”规则的过滤和最后的人工校验,不仅有利于提高科研工作者的学术文献查准率,而且一定程度上辅助文献 数据库机构的检索工作。【创新/局限】深度学习与需求规则融合,实现学术文献的检索结果从学术文献的题录信息 层面到进入学术文献内容的数据层面。  相似文献   

2.
Digital libraries of scientific articles contain collections of digital objects that are usually described by bibliographic metadata records. These records can be acquired from different sources and be represented using several metadata standards. These metadata standards may be heterogeneous in both, content and structure. All of this implies that many records may be duplicated in the repository, thus affecting the quality of services, such as searching and browsing. In this article we present an approach that identifies duplicated bibliographic metadata records in an efficient and effective way. We propose similarity functions especially designed for the digital library domain and experimentally evaluate them. Our results show that the proposed functions improve the quality of metadata deduplication up to 188% compared to four different baselines. We also show that our approach achieves statistical equivalent results when compared to a state-of-the-art method for replica identification based on genetic programming, without the burden and cost of any training process.  相似文献   

3.
张霞  谭黎娟  周蕊 《现代情报》2011,31(8):134-137
在比较分析国内外在线数据库题录输出接口及导出方式的基础上,以CNKI中国知网学术文献总库期刊文献的输出格式及参考文献管理软件NoteExpress为例, 详细分析了在线数据库期刊文献输出、NoteExpress题录信息导入过程及导入后题录信息的输出情况。并通过NoteExpress软件的"在线更新题录"功能及修改输出样式的方式完善了科技查新检索结果的题录信息。  相似文献   

4.
5.
This paper is concerned with the mechanisms through which medical knowledge emerges, grows and transforms itself. It is a large-scale empirical analysis of the development of treatments for coronary artery disease, which is the most common cause of death in developed countries. We uncover the structure of medical understanding of the disease and the path-dependent co-evolution of scientific and technical knowledge in the search for solutions to the relevant set of problems. After reviewing a broad range of secondary sources and a number of interviews with leading clinicians, we use new tools recently developed for the longitudinal analysis of large citation networks. We apply them to a bibliographic database of 11,240 papers published in the area of coronary artery disease between 1979 and 2003 and to a patent dataset of 5136 US patents documents granted between 1976 and 2003 for angioplasty-related devices. The results are consistent maps, which we critically discuss, of the major scientific and technological trajectories associated with one of the most important medical procedures of the last 30 years.  相似文献   

6.
Bibliographic collections in traditional libraries often compile records from distributed sources where variable criteria have been applied to the normalization of the data. Furthermore, the source records often follow classical standards, such as MARC21, where a strict normalization of author names is not enforced. The identification of equivalent records in large catalogues is therefore required, for example, when migrating the data to new repositories which apply modern specifications for cataloguing, such as the FRBR and RDA standards. An open-source tool has been implemented to assist authority control in bibliographic catalogues when external features (such as the citations found in scientific articles) are not available for the disambiguation of creator names. This tool is based on similarity measures between the variants of author names combined with a parser which interprets the dates and periods associated with the creator. An efficient data structure (the unigram frequency vector trie) has been used to accelerate the identification of variants. The algorithms employed and the attribute grammar are described in detail and their implementation is distributed as an open-source resource to allow for an easier uptake.  相似文献   

7.
The database to be used with an online bibliographic information system must meet a number of requirements which are often not satisfied by conventional database management systems. Most important of these is the requirement for full authority file control over the indexes to the database. This paper reviews the special requirements of a bibliographic database and shows how they are met in the database system of DOBIS-LIBIS (Dortmund Library System-Leuven Library System).  相似文献   

8.
A prototype system is created that integrates a microfiche catalog into an online computer system for bibliographic control. Costs and operational data are collected and analyzed. The system permits the more economical microfiche storage of catalog records than would be feasible for comparable online magnetic disk storage. Experimental tests demonstrate the feasibility of the online microfiche catalog system for use in library technical services and retrieval of bibliographic data. The primary result of the project is the creation of a completely operational facility, including all equipment, software, procedures, and data bases necessary to demonstrate the system. A second set of results is derived from the experimental use of the system and the evaluation of costs and times for various operations. The cost effectiveness of the online microfiche catalog is demonstrated.  相似文献   

9.
谢发徽 《现代情报》2011,31(6):80-82
在网店系统的设计开发中,要访问原管理系统的数据,心须提供访问数据库的API接口。在无API支持下,通过数据库设计、数据导出、数据修改、数据导入等手段,综合运用ASP技术,成功地将原管理系统数据转移到网店系统数据库中,实现了网店系统书目数据与原管理系统书目数据的同步。  相似文献   

10.
A Zipfian model of an automatic bibliographic system is developed using parameters describing the contents of it database and its inverted file. The underlying structure of the Zipf distribution is derived, with particular emphasis on its application to work frequencies, especially with regard to the inverted flies of an automatic bibliographic system. Andrew Booth developed a form of Zipf's law which estimates the number of words of a particular frequency for a given author and text. His formulation has been adopted as the basis of a model of term dispersion in an inverted file system. The model is also distinctive in its consideration of the proliferation of spelling errors in free text, and the inclusion of all searchable elements from the system's inverted file. This model is applied to the National Library of Medicine's MEDLINE. The model carries implications for the determination of database storage requirements, search response time, and search exhaustiveness.  相似文献   

11.
A prefix trie index (originally called trie hashing) is applied to the problem of providing fast search times, fast load times and fast update properties in a bibliographic or full text retrieval system. For all but the largest dictionaries a single key search in the dictionary under trie hashing takes exactly one disk read. Front compression of search keys is used to enhance performance. Partial combining of the postings into the dictionary is analyzed as a method to give both faster retrieval and improved update properties for the trie hashing inverted file. Statistics are given for a test database consisting of an online catalog at the Graduate School of Library and Information Science Library of the University of Western Ontario. The effect of changing various parameters of prefix tries are tested in this application.  相似文献   

12.
[目的/意义]网络和信息技术的发展给信息环境带来变化,数据的来源日趋多源化,不同维度的科技数据反映的情报信息各有侧重,开展多维度科技数据的情报感知方法及研究是非常必要的。近年来,世界先进国家高度重视生命科学领域的相关研究工作,加强生命科学领域的前瞻部署和研究投资。因此,面向生命科学领域的科技发展前瞻研究具有现实意义。[方法/过程]基于情报感知的理念,分别从战略研发计划视角、科技项目研发投资和科技论文与专利3个维度,提出定量和定性相结合的情报感知方法进行科技发展前瞻研究和分析。研究所用的数据为可公开获取的开源数据,主要来源于互联网及科技文献数据库。[结果/结论]对3个维度真实有效科技数据进行分析,通过综合不同维度科技数据的分析结果和专家意见,最终得出生命科学领域科技发展前瞻研究的情报感知分析结论。  相似文献   

13.
The national standard machine-readable bibliographic format for films was expanded and modified for use in cataloging a collection of medical illustrations at the University of California, San Francisco. SNOMED and MeSH subject terms are stored in the computer catalog records and used to produce precoordinated entry terms for an Anatomic Index and a Disease/Procedures Index to the collection.  相似文献   

14.
The strategies, heuristics, and tradeoff?s involved in online searching of bibliographic citation networks are discussed. Results from the field of citation analysis are used to describe the mathematical structure of citation networks. An experimental environment called DBASE (Data Base Access and Search Environment) is discussed and its use in two studies of human information seeking behavior considered. Variables examined in these studies included the nature of the search question and the interconnectivity of the data base.  相似文献   

15.
本文针对当前专利数据预处理中存在的处理效率低、耗费资源量大、处理准确度不高的问题,结合数据挖掘中预处理技术,以欧洲专利局文献管理数据库(DOCDB)专利数据为例,设计并实现了DOCDB专利数据的预处理系统。该系统能够对DOCDB专利数据文件的结构进行解析,提取相关的专利信息,并将处理后的数据存入数据库中。实验结果表明,该系统能够高效处理专利数据,有力的提高了专利预处理的自动化水平。  相似文献   

16.
Technology transfer, research and development and engineering projects frequently require in-depth literature reviews. These reviews are carried out using computerized, bibliographic data bases. The review and/or searching process involves keywords selected from data base thesauri. The search strategy is formulated to provide both breadth and depth of coverage and yields both relevant and nonrelevant citations. Experience indicates that about 10–20% of the citations are relevant. As a consequence, significant amounts of time are required to eliminate the nonrelevant citations. This paper describes statistically based, lexical association methods which can be employed to determine citation relevance. In particular, the searcher selects relevant terms from citation-derived indexes and this information along with lexical statistics is used to determine citation relevance. Preliminary results are encouraging with the techniques providing an effective concentration of relevant citations.  相似文献   

17.
This research focuses specifically on uncertainty and information seeking in a digital environment. In this research we argue that different types of uncertainty are associated with the information seeking process and that, with the proliferation of new and different search tools, sources and channels, uncertainty, positive/desirable or negative/undesirable, continues to be a significant factor in the search process. Users may feel uncertain at any stage of the information search and retrieval process and uncertainty may remain even after completion of the process resulting in what may be called persistent uncertainty. An online questionnaire was used to collect data from users in the higher education sector. There were three parts to the questionnaire focusing on: information seeking activities, information seeking problems, and access to specific information channels or sources. Quantitative analysis was carried out on the data collected through the online questionnaire. A total of 668 responses were returned from the chosen user categories of academic staff, research staff and research students. This research has shown that there are some information seeking activities and information seeking problems that are the most common causes of uncertainty among significant number of users from different disciplines, age, gender, ICT skills, etc. This is also the case with respect to access to and use of specific information sources/channels, although the degrees of uncertainty in relation are relatively small. Possible implications of this study and further research issues are indicated.  相似文献   

18.
The purpose of this study was to propose a design for a Superintendent of Documents (SuDocs) number search key to retrieve bibliographic records for United States Government documents from OCLC's On-Line Union Catalog. Experimentation with a test file of 25,000 records indicated that a search key derived from a maximum of the first 14 digits in the SuDocs number is sufficiently distinctive to obtain an expected average retrieval of 2.5 records per search. OCLC will implement a SuDocs number search key in the future. It is expected that this key will be a valuable tool for library catalogers and users.  相似文献   

19.
王兴兰 《现代情报》2016,36(1):90-95
为了适应网络环境,国会图书馆提出BIBFRAME书目格式。作为新的书目数据格式,BIBFRAME对内容不作限制,适用范围广,面向语义网。给出的用例看出,BIBFRAME的使用能够方便用户和图书馆员。目前,一些机构开始对BIBFRAME进行实证研究。本文从BIBFRAME的理论和应用两方面全面介绍BIBFRAME的进展。  相似文献   

20.
The evaluation of exploratory search relies on the ongoing paradigm shift from focusing on the search algorithm to focusing on the interactive process. This paper proposes a model-driven formative evaluation approach, in which the goal is not the evaluation of a specific system, per se, but the exploration of new design possibilities. This paper gives an example of this approach where a model of sensemaking was used to inform the evaluation of a basic exploratory search system(s) in the context of a sensemaking task. The model suggested that, rather than just looking at simple search performance measures, we should examine closely the interwoven, interactive processes of both representation construction and information seeking. Participants were asked to make sense of an unfamiliar topic using an augmented query-based search system. The processes of representation construction and information seeking were captured and analyzed using data from experiment notes, interviews, and a system log. The data analysis revealed users’ sources of ideas for structuring representations and a tightly coupled relationship between search and representation construction in their exploratory searches. For example, users strategically used search to find useful structure ideas instead of just accumulating information facts. Implications for improving current search systems and designing new systems are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号