首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
副主题词是指主题词所论述的重点课题的自然范畴或者通常发生的某一方面,对主题词限定的一类词汇。副主题词本身涉及医学研究的特定范畴,通过对副主题词频率的统计分析可以揭示医学研究的热点与方向。本文采用基本检索的方式,检索了近年来在中国生物医学文献数据库(CBM)所标引的文献,从中抽取3000篇文献,对其中的副主题词出现的频次进行统计,并对统计结果以图表的形式进行了分析。希望信息用户能根据这些分析更加了解在学科上医学文献的研究热点与特色,以期待更好的为临床及科学研究服务。  相似文献   

2.
1 分类检索途径的必要性 1.1 分类检索与主题检索 在文献检索过程中,根据文献的内容特征进行检索的途径主要有分类途径和主题途径.分类途径是从文献内容的学科类别角度进行检索;主题途径是从表达文献内容的主题词入手进行检索.  相似文献   

3.
现有的主题标引方法一般只能抽取文本中出现的词汇,无法从几万或数十万主题词中选择语义关联强且未出现的词汇;基于机器学习的多标签分类算法则需要每一个标签下有训练数据,限制了它们在主题标引上的应用。面向大规模主题词在海量文献上的标引需求,提出一个基于分布式词向量的混合型自动标引方法,利用大规模语料训练的词向量生成同维度的主题词表示向量和文本表示向量,实现主题词与文本语义相似度的计算。基于大规模语料构建主题词与普通词的映射表,使文本向量只和少量的语义强相关主题词向量比较,大大减少了计算量,提高了标引效率。开发的自动标引工具对近亿篇文献进行了主题标引,达到了较高的速度。与结巴关键词的实验对比结果显示,本文方法抽取的主题词与作者关键词重合度较低,且在去除结巴关键词中的非主题词后,取得了比结巴关键词更高的标引准确率;与人工标引的实验对比结果显示,随着人工标引词数量的增加,本文方法的效果、结果与人工标引结果的一致性在不断增加。  相似文献   

4.
应用改进的共词聚类法探索医学信息学热点主题演变   总被引:4,自引:0,他引:4  
对传统的共词聚类方法进行完善:依据高频低频词界分公式选取高频词;计算粘合力确定每个类别的中心词;对比分析两个时间段,发现主题演变。以医学信息学为例,从PubMed数据库分别下载1999年-2003年和2004年-2008年该学科相关文献,提取主要主题词,进行共词聚类分析,探索医学信息学学科结构的演变过程。  相似文献   

5.
文献数据库中书目信息共现挖掘系统的开发   总被引:9,自引:0,他引:9  
针对权威的生物医学数据库和引文索引数据,介绍一个基于文献数据库中书目信息共现关系进行文本挖掘的系统。该系统具有基本的文献计量学分析功能,并对相应的结果进行可视化表达;对高频主题词、高产作者和高被引论文和高被引作者进行共现分析,据此进行聚类分析和关联分析,获得有关的研究主题聚类和主题词/副主题词关联规则、合著聚类分析、高被引论文同被引聚类分析和高被引作者同被引聚类分析的结果和可视化表达。其中对关联规则的分析可以发现主题词之间的潜在语义规则,其他的文献计量学指标和共现分析结果可以用于科学计量学的分析。  相似文献   

6.
目的:调查国外作者发表的中草药研究文献, 分析国外作者对中草药研究的关注点.方法:对PubMed数据库中近10年由国外作者发表的有关中草药的文献进行主题词共现聚类分析, 下载这些文献的主题词并统计主题词出现的频次,截取高频主题词,形成主题词共现矩阵,对矩阵做聚类.通过分析各类的文献内容,得到该主题的研究热点.结果:共检出国外作者发表的论文2 609篇,出现频次超过20次的主题词为43个,聚类得到5个研究热点.结论:①中药抗炎免疫药理、植物型抗肿瘤药物药理、中药抗氧化作用药理以及中药制剂作为神经保护药的药理学研究、黄酮类化合物和生物碱类的药理学研究是中草药药理学研究的关注点;②糖尿病、哮喘等疾病是国外进行中草药治疗应用研究的热点病种;③含马兜铃酸中药的肾脏毒性研究和中草药引起中毒性肝炎的研究是中草药副作用研究的热点;④植物型抗肿瘤药物用于治疗前列腺癌的研究曾受到国外研究人员极大关注;⑤人参属等药用植物的化学成分和人参皂苷药理活性研究受到关注.  相似文献   

7.
基于Web of Science的本体研究论文定量分析   总被引:3,自引:0,他引:3  
以Web of Science为情报源,以主题词为检索方式,采用定量分析的方法,从论文的年代分布、期刊分布、作者、关键词以及被引频次等几个方面进行统计分析,确定本体研究领域的核心期刊、核心机构、核心作者和经典文献, 同时对本体的研究主题也进行了基于定量的分析。  相似文献   

8.
分众分类标签的语义模糊性和标签无序性使得检索效率越来越低,为准确定位标签语义,旨在研究一种新的方法,从主题词表的词间关系判断标签间的语义关系。以《中国分类主题词表》教育类主题词搜索Delicious网站得到的相关标签作为数据源,分析标签与主题词的重叠度,选择《中国分类主题词表》和ERIC在线教育词表作为标签的语义关系抽取工具,认为相关标签集中语义关系较为丰富。  相似文献   

9.
本文以干细胞研究文献为例,以医学主题词表(Medical Subject Headings,Me SH)为基础,通过对主题词进行分类,对知识进行重构,使同一类别的文献通过同一类别的主题词聚集在一起,构建了面向情报学分析的分类主题词表,解决专题文献检索问题,并通过检索实验验证了构建的分类词表的有效性。为进一步实现基于分类词表的文献分类导航和文献自动归类,最终为科技管理服务奠定基础。  相似文献   

10.
急性白血病相关基因的文本挖掘分析   总被引:2,自引:0,他引:2  
闫雷  崔雷 《情报学报》2008,27(2):169-174
从PubMed检索1966年到2005年9月6日间白血病与基因关系的相关文献3 529篇.经编程处理生成主题词词篇矩阵并进行聚类.通过聚类树图可将所提取的主题词/副主题词分成13类,经对比原始文献进行验证,全部29种基因中只与ALL相关的有3种, 占10.34%;只与AML相关的有8种,占27.59%.特异的可用于鉴别ALL和AML的基因有11种,占37.93%.通过主题词的共现关系进行聚类可以基本实现发现基因与疾病之间的联系,但该方法所获得的相关基因较少,不利于对疾病与基因关系的全面了解.  相似文献   

11.
[目的/意义]学术文献影响力评价指标不断推陈出新,但尚缺乏在研究主题层面对文献影响力的评价,为发现不同研究主题内具有高影响力和引用价值的文献,本研究给出一种基于研究主题的文献影响力评价方法。[方法/过程]以Web of Science数据库中2011年-2015年间情报学领域500篇高被引文献为样本,利用LDA模型对样本文献进行主题建模,将主题对文献的支持度与文献被引频次相结合,计算特定主题文献的被引频次(specific topic cited frequency,简称STCF),并根据每篇文献在相应主题内的STCF值对文献进行影响力排序。[结果/结论]结果表明,STCF值能反映文献的主题内容、细粒度体现文献的学术地位、呈现文献研究主题的多元性,能够有效弥补被引频次、Altmetrics指标的不足。  相似文献   

12.
The objective of this study was to evaluate the HealthInsite topic query technique, which uses a dynamic database search to assign resources to a topic. It is an alternative to the explicit classification technique, which relies on the classification of each resource using a predefined classification scheme. We performed a recall-precision analysis on all topics within the broad topic area of Child Health. Recall and precision errors were checked to determine which part of the information retrieval process was at fault. We then compared the topic query technique with the explicit classification technique. The results show errors or problems at every stage of the information retrieval process. This has initiated a review of all the tools used in the process, from indexing guidelines to the search engine. While many errors could be corrected, there were still features of the explicit classification technique that could not be achieved by the topic query technique. In conclusion, the topic query technique has the advantage of flexibility, but close co-operation between the different information retrieval specialists is needed to get the best results. The HealthInsite topic navigation structure should be regarded as an organized set of predefined searches rather than a full classified listing.  相似文献   

13.

Background

Degenerative cervical myelopathy (DCM) is a recently proposed umbrella term for symptomatic cervical spinal cord compression secondary to degeneration of the spine. Currently literature searching for DCM is challenged by the inconsistent uptake of the term ‘DCM’ with many overlapping keywords and numerous synonyms.

Objectives

Here, we adapt our previous Ovid medline search filter for the Ovid embase database, to support comprehensive literature searching. Both embase and medline are recommended as a minimum for systematic reviews.

Methods

References contained within embase identified in our prior study formed a ‘development gold standard’ reference database (N = 220). The search filter was adapted for embase and checked against the reference database. The filter was then validated against the ‘validation gold standard’.

Results

A direct translation was not possible, as medline indexing for DCM and the keywords search field were not available in embase . We also used the ‘focus’ function to improve precision. The resulting search filter has 100% sensitivity in testing.

Discussion and Conclusion

We have developed a validated search filter capable of retrieving DCM references in embase with high sensitivity. In the absence of consistent terminology and indexing, this will support more efficient and robust evidence synthesis in the field.  相似文献   

14.
Entity ranking has recently emerged as a research field that aims at retrieving entities as answers to a query. Unlike entity extraction where the goal is to tag names of entities in documents, entity ranking is primarily focused on returning a ranked list of relevant entity names for the query. Many approaches to entity ranking have been proposed, and most of them were evaluated on the INEX Wikipedia test collection. In this paper, we describe a system we developed for ranking Wikipedia entities in answer to a query. The entity ranking approach implemented in our system utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the entity examples (when provided) to retrieve relevant entities as answers to the query. We also extend our entity ranking approach by utilising the knowledge of predicted classes of topic difficulty. To predict the topic difficulty, we generate a classifier that uses features extracted from an INEX topic definition to classify the topic into an experimentally pre-determined class. This knowledge is then utilised to dynamically set the optimal values for the retrieval parameters of our entity ranking system. Our experiments demonstrate that the use of categories and the link structure of Wikipedia can significantly improve entity ranking effectiveness, and that topic difficulty prediction is a promising approach that could also be exploited to further improve the entity ranking performance.  相似文献   

15.
Two-stage statistical language models for text database selection   总被引:2,自引:0,他引:2  
As the number and diversity of distributed Web databases on the Internet exponentially increase, it is difficult for user to know which databases are appropriate to search. Given database language models that describe the content of each database, database selection services can provide assistance in locating databases relevant to the information needs of users. In this paper, we propose a database selection approach based on statistical language modeling. The basic idea behind the approach is that, for databases that are categorized into a topic hierarchy, individual language models are estimated at different search stages, and then the databases are ranked by the similarity to the query according to the estimated language model. Two-stage smoothed language models are presented to circumvent inaccuracy due to word sparseness. Experimental results demonstrate that such a language modeling approach is competitive with current state-of-the-art database selection approaches.  相似文献   

16.
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet?allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts.  相似文献   

17.
It is widely accepted that data is fundamental for research and should therefore be cited as textual scientific publications. However, issues like data citation, handling and counting the credit generated by such citations, remain open research questions.Data credit is a new measure of value built on top of data citation, which enables us to annotate data with a value, representing its importance. Data credit can be considered as a new tool that, together with traditional citations, helps to recognize the value of data and its creators in a world that is ever more depending on data.In this paper we define data credit distribution (DCD) as a process by which credit generated by citations is given to the single elements of a database. We focus on a scenario where a paper cites data from a database obtained by issuing a query. The citation generates credit which is then divided among the database entities responsible for generating the query output. One key aspect of our work is to credit not only the explicitly cited entities, but even those that contribute to their existence, but which are not accounted in the query output.We propose a data credit distribution strategy (CDS) based on data provenance and implement a system that uses the information provided by data citations to distribute the credit in a relational database accordingly.As use case and for evaluation purposes, we adopt the IUPHAR/BPS Guide to Pharmacology (GtoPdb), a curated relational database. We show how credit can be used to highlight areas of the database that are frequently used. Moreover, we also underline how credit rewards data and authors based on their research impact, and not merely on the number of citations. This can lead to designing new bibliometrics for data citations.  相似文献   

18.
Patent prior art search is a type of search in the patent domain where documents are searched for that describe the work previously carried out related to a patent application. The goal of this search is to check whether the idea in the patent application is novel. Vocabulary mismatch is one of the main problems of patent retrieval which results in low retrievability of similar documents for a given patent application. In this paper we show how the term distribution of the cited documents in an initially retrieved ranked list can be used to address the vocabulary mismatch. We propose a method for query modeling estimation which utilizes the citation links in a pseudo relevance feedback set. We first build a topic dependent citation graph, starting from the initially retrieved set of feedback documents and utilizing citation links of feedback documents to expand the set. We identify the important documents in the topic dependent citation graph using a citation analysis measure. We then use the term distribution of the documents in the citation graph to estimate a query model by identifying the distinguishing terms and their respective weights. We then use these terms to expand our original query. We use CLEF-IP 2011 collection to evaluate the effectiveness of our query modeling approach for prior art search. We also study the influence of different parameters on the performance of the proposed method. The experimental results demonstrate that the proposed approach significantly improves the recall over a state-of-the-art baseline which uses the link-based structure of the citation graph but not the term distribution of the cited documents.  相似文献   

19.
The collective feedback of the users of an Information Retrieval (IR) system has been shown to provide semantic information that, while hard to extract using standard IR techniques, can be useful in Web mining tasks. In the last few years, several approaches have been proposed to process the logs stored by Internet Service Providers (ISP), Intranet proxies or Web search engines. However, the solutions proposed in the literature only partially represent the information available in the Web logs. In this paper, we propose to use a richer data structure, which is able to preserve most of the information available in the Web logs. This data structure consists of three groups of entities: users, documents and queries, which are connected in a network of relations. Query refinements correspond to separate transitions between the corresponding query nodes in the graph, while users are linked to the queries they have issued and to the documents they have selected. The classical query/document transitions, which connect a query to the documents selected by the users’ in the returned result page, are also considered. The resulting data structure is a complete representation of the collective search activity performed by the users of a search engine or of an Intranet. The experimental results show that this more powerful representation can be successfully used in several Web mining tasks like discovering semantically relevant query suggestions and Web page categorization by topic.  相似文献   

20.
This paper reviews literature on dictionary-based cross-language information retrieval (CLIR) and presents CLIR research done at the University of Tampere (UTA). The main problems associated with dictionary-based CLIR, as well as appropriate methods to deal with the problems are discussed. We will present the structured query model by Pirkola and report findings for four different language pairs concerning the effectiveness of query structuring. The architecture of our automatic query translation and construction system is presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号