期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Studying machine translation technologies for large-data CLIR tasks: a patent prior-art search case study

Walid Magdy Gareth J. F. Jones 《Information Retrieval》2014,17(5-6):492-519

Prior-art search in patent retrieval is concerned with finding all existing patents relevant to a patent application. Since patents often appear in different languages, cross-language information retrieval (CLIR) is an essential component of effective patent search. In recent years machine translation (MT) has become the dominant approach to translation in CLIR. Standard MT systems focus on generating proper translations that are morphologically and syntactically correct. Development of effective MT systems of this type requires large training resources and high computational power for training and translation. This is an important issue for patent CLIR where queries are typically very long sometimes taking the form of a full patent application, meaning that query translation using MT systems can be very slow. However, in contrast to MT, the focus for information retrieval (IR) is on the conceptual meaning of the search words regardless of their surface form, or the linguistic structure of the output. Thus much of the complexity of MT is not required for effective CLIR. We present an adapted MT technique specifically designed for CLIR. In this method IR text pre-processing in the form of stop word removal and stemming are applied to the MT training corpus prior to the training phase. Applying this step leads to a significant decrease in the MT computational and training resources requirements. Experimental application of the new approach to the cross language patent retrieval task from CLEF-IP 2010 shows that the new technique to be up to 23 times faster than standard MT for query translations, while maintaining IR effectiveness statistically indistinguishable from standard MT when large training resources are used. Furthermore the new method is significantly better than standard MT when only limited translation training resources are available, which can be a significant issue for translation in specialized domains. The new MT technique also enables patent document translation in a practical amount of time with a resulting significant improvement in the retrieval effectiveness. 相似文献

2.

Using multiple query representations in patent prior-art search

Dong Zhou Mark Truran Jianxun Liu Sanrong Zhang 《Information Retrieval》2014,17(5-6):471-491

Before a patent application is made, it is important to search the appropriate databases for prior-art (i.e., pre-existing patents that may affect the validity of the application). Previous work on prior-art search has concentrated on single query representations of the patent application. In the following paper, we describe an approach which uses multiple query representations. We evaluate our technique using a well-known test collection (CLEF-IP 2011). Our results suggest that multiple query representations significantly outperform single query representations. 相似文献

3.

Using query logs of USPTO patent examiners for automatic query expansion in patent searching

Wolfgang Tannebaum Andreas Rauber 《Information Retrieval》2014,17(5-6):452-470

In the patent domain significant efforts are invested to assist researchers in formulating better queries, preferably via automated query expansion. Currently, automatic query expansion in patent search is mostly limited to computing co-occurring terms for the searchable features of the invention. Additional query terms are extracted automatically from patent documents based on entropy measures. Learning synonyms in the patent domain for automatic query expansion has been a difficult task. No dedicated sources providing synonyms for the patent domain, such as patent domain specific lexica or thesauri, are available. In this paper we focus on the highly professional search setting of patent examiners. In particular, we use query logs to learn synonyms for the patent domain. For automatic query expansion, we create term networks based on the query logs specifically for several USPTO patent classes. Experiments show good performance in automatic query expansion using these automatically generated term networks. Specifically, with a larger number of query logs for a specific patent US class available the performance of the learned term networks increases. 相似文献

4.

Noun phrases in interactive query expansion and document ranking

Olga Vechtomova 《Information Retrieval》2006,9(4):399-420

The paper presents several techniques for selecting noun phrases for interactive query expansion following pseudo-relevance feedback and a new phrase-based document ranking method. A combined syntactico-statistical method was used for the selection of phrases for query expansion. Several statistical measures of phrase selection were evaluated. Experiments were also conducted studying the effectiveness of noun phrases in document ranking. One of the major problems in phrase-based document retrieval is weighting of overlapping and non-contiguous word sequences in documents. The paper presents a new method of phrase weighting, which addressed this problem, and its evaluation on the TREC dataset. 相似文献

5.

关键技术挖掘与企业技术竞争情报——以DVD激光头技术为例 总被引：2，自引：0，他引：2

孙涛涛金碧辉《图书情报工作》2008,52(5):129-129

以美国专利数据库（USPTO）1995—2004年的专利数据为基础,利用关键词检索与专利引文检索相结合的方法来构建DVD激光头相关主题的专利数据集,用专利文献耦合和专利引证关系的文献计量学方法分析DVD激光头技术中的子技术主题的逐年演变和技术间的知识流动。相似文献

6.

利用技术功效语义关联构建技术实现路径

张金柱于文倩李溢峰《图书馆论坛》2021,(3):31-41

文章明晰技术功效间的多种语义联系,设计技术实现路径的自动化构建方法,实现其即时更新和可视化。结合专利数据特点,基于规则从专利标题中抽取技术词,利用BiLSTM-CRF深度学习模型从专利摘要中抽取专利功效短语,并设计规则从功效短语中自动识别出功效词以及表示技术功效间语义联系的关系词,构建“技术词-关系词-功效词”结构的技术功效语义关联,通过计算实体间语义相似度实现技术词对齐和功效词对齐,优化技术功效关联,依此构建技术实现路径,并以知识网络的形式对其进行可视化。在5G技术领域的实证结果表明,该方法能有效揭示技术功效间的多种语义联系和自动构建技术实现路径,并实现路径的即时更新和清晰展示。相似文献

7.

The effect of citation analysis on query expansion for patent retrieval

Parvaz Mahdabi Fabio Crestani 《Information Retrieval》2014,17(5-6):412-429

Patent prior art search is a type of search in the patent domain where documents are searched for that describe the work previously carried out related to a patent application. The goal of this search is to check whether the idea in the patent application is novel. Vocabulary mismatch is one of the main problems of patent retrieval which results in low retrievability of similar documents for a given patent application. In this paper we show how the term distribution of the cited documents in an initially retrieved ranked list can be used to address the vocabulary mismatch. We propose a method for query modeling estimation which utilizes the citation links in a pseudo relevance feedback set. We first build a topic dependent citation graph, starting from the initially retrieved set of feedback documents and utilizing citation links of feedback documents to expand the set. We identify the important documents in the topic dependent citation graph using a citation analysis measure. We then use the term distribution of the documents in the citation graph to estimate a query model by identifying the distinguishing terms and their respective weights. We then use these terms to expand our original query. We use CLEF-IP 2011 collection to evaluate the effectiveness of our query modeling approach for prior art search. We also study the influence of different parameters on the performance of the proposed method. The experimental results demonstrate that the proposed approach significantly improves the recall over a state-of-the-art baseline which uses the link-based structure of the citation graph but not the term distribution of the cited documents. 相似文献

8.

Mining information across multiple domains: A case study of application to patent laws and regulations in biotechnology

Hang Yu Siddharth Taduri Jay Kesan Gloria Lau Kincho H. Law 《Government Information Quarterly》2012

In this paper, we present a framework that can process a user query for retrieval of information from documents of different properties across multiple domains, with specific application to patent laws and regulations. The framework has three basic components. The first component is ontology mapping and generation. What happens is that the keywords entered by users are mapped into a subset of relevant keywords. This step is performed by looking up those words in an ontology database. The second component is the joint and cross search in various document domains; in our case, they are patents and scientific publications. The last component is to modify the search results by applying user feedback statistics. The results of feedback will be saved as metadata for future uses.A case example is given to demonstrate how results from multiple domain searches can be combined using ontology and cross referencing. We use an example of well-known biotechnology patents on erythropoietin (EPO) and give detailed analysis on each document domain with this keyword. Relationships between each domain are demonstrated.A user feedback mechanism is also discussed in this paper. The ability to take user feedback into the framework is important. There is no doubt that domain knowledge from expert or experienced users could be a very good compliment to the proposed system. Both direct and indirect user feedbacks are discussed. 相似文献

9.

Identifying top relevant dates for implicit time sensitive queries

Ricardo?Campos Email author View author&#;s OrcID profile Ga?l?Dias Alípio?Mário?Jorge Célia?Nunes 《Information Retrieval》2017,20(4):363-398

Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not explicit in his/her search intents (e.g., “philip seymour hoffman”) search engines may likely fail to present an overall historic perspective of the topic. In most such cases, they are limited to retrieving the most recent results. One possible solution to this shortcoming is to understand the different time periods of the query. In this context, most state-of-the-art methodologies consider any occurrence of temporal expressions in web documents and other web data as equally relevant to an implicit time sensitive query. To approach this problem in a more adequate manner, we propose in this paper the detection of relevant temporal expressions to the query. Unlike previous metadata and query log-based approaches, we show how to achieve this goal based on information extracted from document content. However, instead of simply focusing on the detection of the most obvious date we are also interested in retrieving the set of dates that are relevant to the query. Towards this goal, we define a general similarity measure that makes use of co-occurrences of words and years based on corpus statistics and a classification methodology that is able to identify the set of top relevant dates for a given implicit time sensitive query, while filtering out the non-relevant ones. Through extensive experimental evaluation, we mean to demonstrate that our approach offers promising results in the field of temporal information retrieval (T-IR), as demonstrated by the experiments conducted over several baselines on web corpora collections. 相似文献

10.

Patent citation spectroscopy (PCS): Online retrieval of landmark patents based on an algorithmic approach

Jordan A. Comins Stephanie A. Carmack Loet Leydesdorff 《Journal of Informetrics》2018,12(4):1223-1231

One essential component in the construction of patent landscapes in biomedical research and development (R&D) is identifying the most seminal patents. Hitherto, the identification of seminal patents required subject matter experts within biomedical areas. In this article, we report an analytical method and tool, Patent Citation Spectroscopy (PCS), for the online identification of landmark patents in user-specified areas of biomedical innovation. Using USPTO data, PCS mines the cited references within large sets of patents at the internet and provides an estimate of the historically most impactful prior work. We show the efficacy of PCS in three case studies of biomedical innovation with clinical relevance: (1) RNA interference (RNAi), (2) cholesterol and (3) cloning. PCS mined and analyzed cited references related to patents on RNA interference and correctly identified the foundational patent of this technology, as independently reported by subject matter experts on RNAi intellectual property. Secondly, we apply PCS to a broad set of patents dealing with cholesterol – a case study chosen to reflect a more general, as opposed to expert, patent search query. PCS mined through cited references and identified the seminal patent as that for Lipitor, the groundbreaking medication for treating high cholesterol as well as the pair of patents underlying Repatha. The final case study, cloning, highlights some of the advantages conferred by the PCS methodology in identifying seminal patents. These cases suggest that PCS provides a useful method for identifying seminal patents in areas of biomedical innovation and therapeutics. The interactive tool is free-to-use at: http://www.leydesdorff.net/comins/pcs/index.html. 相似文献

11.

核心专利判别方法及其在风力发电产业中的应用

罗天雨《图书情报工作》2012,56(24):96-101

研究核心专利的定义和特征,比较现有核心专利判别方法的优缺点,通过应用文献计量、专家评分和案例研究等方法,设计一种从大量专利文献中判别核心专利技术的方法,设定多个专利指标,并通过层次分析法综合多个指标构建成一个核心专利判别指标体系,并通过实例应用于风力发电产业技术领域,判定风力发电控制领域核心专利技术,为我国企业创新发展提供一些借鉴。相似文献

12.

Evaluation of query expansion using MeSH in PubMed

Zhiyong Lu Won Kim W. John Wilbur 《Information Retrieval》2009,12(1):69-80

This paper investigates the effectiveness of using MeSH^® in PubMed through its automatic query expansion process: Automatic Term Mapping (ATM). We run Boolean searches based on a collection of 55 topics and about 160,000 MEDLINE^® citations used in the 2006 and 2007 TREC Genomics Tracks. For each topic, we first automatically construct a query by selecting keywords from the question. Next, each query is expanded by ATM, which assigns different search tags to terms in the query. Three search tags: [MeSH Terms], [Text Words], and [All Fields] are chosen to be studied after expansion because they all make use of the MeSH field of indexed MEDLINE citations. Furthermore, we characterize the two different mechanisms by which the MeSH field is used. Retrieval results using MeSH after expansion are compared to those solely based on the words in MEDLINE title and abstracts. The aggregate retrieval performance is assessed using both F-measure and mean rank precision. Experimental results suggest that query expansion using MeSH in PubMed can generally improve retrieval performance, but the improvement may not affect end PubMed users in realistic situations. 相似文献

13.

Exploiting entity relationship for query expansion in enterprise search

Xitong Liu Fei Chen Hui Fang Min Wang 《Information Retrieval》2014,17(3):265-294

Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Enterprise data contain both structured and unstructured information. Since these two types of information are complementary and the structured information such as relational databases is designed based on ER (entity-relationship) models, there is a rich body of information about entities in enterprise data. As a result, many information needs of enterprise search center around entities. For example, a user may formulate a query describing a problem that she encounters with an entity, e.g., the web browser, and want to retrieve relevant documents to solve the problem. Intuitively, information related to the entities mentioned in the query, such as related entities and their relations, would be useful to reformulate the query and improve the retrieval performance. However, most existing studies on query expansion are term-centric. In this paper, we propose a novel entity-centric query expansion framework for enterprise search. Specifically, given a query containing entities, we first utilize both unstructured and structured information to find entities that are related to the ones in the query. We then discuss how to adapt existing feedback methods to use the related entities and their relations to improve search quality. Experimental results over two real-world enterprise collections show that the proposed entity-centric query expansion strategies are more effective and robust to improve the search performance than the state-of-the-art pseudo feedback methods for long natural language-like queries with entities. Moreover, results over a TREC ad hoc retrieval collections show that the proposed methods can also work well for short keyword queries in the general search domain. 相似文献

14.

中国大陆地区专利地图技术研究 总被引：1，自引：0，他引：1

孙凌云孙守迁《情报学报》2008,27(5)

针对中国大陆地区专利的特点,引入自然语言处理和基于内容的图像检索等技术,研究专利地图的分析和绘制方法。对于发明和实用新型专利,使用语义度量、新词识别等技术处理其用词,并将专利说明书和权利要求书结构化;定义了基于近义词组的文档特征表达,据此计算发明和实用新型专利的相似性并完成聚类。对于外观设计专利,借助基于内容的图像检索技术提取专利图像的颜色、纹理和形状特征,通过相关反馈确定权重,据此计算外观设计专利的相似性并完成专利聚类。以此为基础开发完成了专利地图软件,可实现对指定范围的中国大陆地区的专利地图的分析绘制,从而辅助设计师和企业进行决策分析和产品定位。相似文献

15.

专利文本主题建模中领域停用词自动选取研究

俞琰赵乃瑄《图书情报工作》2018,62(11):120-126

[目的/意义]针对专利文本主题建模中领域停用词自动选取尚未有充分研究的问题,提出一种新的领域停用词自动选取方法,用于专利文本主题模型分析,以提高专利主题模型的区分度与建模质量。[方法/过程]领域停用词本质上是信息比较少,在不同类别专利文本中区分度低的词。因此,引入辅助专利文本集,使用类别熵衡量词的分布情况,然后依据词的类别熵进行排序,选取类别熵最大的若干词作为领域停用词。[结果/结论]实验通过专利文本数据,验证了该方法的可行性与有效性,能够有效地提高专利主题模型的区分度。相似文献

16.

Improving search via personalized query expansion using social media

Dong Zhou Séamus Lawless Vincent Wade 《Information Retrieval》2012,15(3-4):218-242

Social tagging systems have gained increasing popularity as a method of annotating and categorizing a wide range of different web resources. Web search that utilizes social tagging data suffers from an extreme example of the vocabulary mismatch problem encountered in traditional information retrieval (IR). This is due to the personalized, unrestricted vocabulary that users choose to describe and tag each resource. Previous research has proposed the utilization of query expansion to deal with search in this rather complicated space. However, non-personalized approaches based on relevance feedback and personalized approaches based on co-occurrence statistics only showed limited improvements. This paper proposes a novel query expansion framework based on individual user profiles mined from the annotations and resources the user has marked. The underlying theory is to regularize the smoothness of word associations over a connected graph using a regularizer function on terms extracted from top-ranked documents. The intuition behind the model is the prior assumption of term consistency: the most appropriate expansion terms for a query are likely to be associated with, and influenced by terms extracted from the documents ranked highly for the initial query. The framework also simultaneously incorporates annotations and web documents through a Tag-Topic model in a latent graph. The experimental results suggest that the proposed personalized query expansion method can produce better results than both the classical non-personalized search approach and other personalized query expansion methods. Hence, the proposed approach significantly benefits personalized web search by leveraging users’ social media data. 相似文献

17.

The possibility of using the Google Patents search tool in patentometric analysis (based on the example of the world’s largest innovative companies)

V. M. Moskovkin N. A. Shigorina Dieter Popov 《Scientific and Technical Information Processing》2012,39(2):107-112

The possibility of using the Google Patents search tool in patentometric analysis based on the world largest innovative companies is substantiated. Of these companies, the dynamics of issued patents for 2-year intervals over a 10-year period (2001?C2010) were analyzed; with this method it was possible to classify these companies according to their patent activity. It is shown that along with the very high patent activity (from 1.0 to 1.2 mln patents in 10 years) of the Sony, Samsung Electronics, Intel, Hewlett-Packard, and Siemens companies, 76% of the most innovative companies of the world that are included in the TOP-50 Business Week 2010 have very low patent activities (from 0 to 0.2 mln patents in 10 years). The conclusion is that most patent active innovation companies have stably growing or established dynamics of patent activity. 相似文献

18.

Identifying missing relevant patent citation links by using bibliographic coupling in LED illuminating technology

Dar-Zen Chen Mu-Hsuan Huang Hui-Chen Hsieh Chang-Pin Lin 《Journal of Informetrics》2011,5(3):400-412

This study uses bibliographic coupling to identify missing relevant patent links, in order to construct a comprehensive citation network. Missing citation links can be added by taking the missing relevant patent links into account. The Pareto principle is used to determine the threshold of bibliographic coupling strength, in order to identify the missing relevant patent links. Comparisons between the original patent citation network and the comprehensive patent citation network with the missing relevant patent links are illustrated at both the patent and assignee levels. Light emitting diode (LED) illuminating technology is chosen as the case study. The relationships between the patents and the assignees are obviously enhanced after adding the missing relevant patent links. The results show that the growth rates on both the total number and the average number of links have apparently improved at the patent level. At the assignee level, the number of linked assignees and the average number of links between two assignees are increased. The differences between the two citation networks are further examined by means of the Freeman vertex betweenness centrality and Johnson's hierarchical clustering. The patents with more new links to other patents have distinct results in terms of the Freeman vertex betweenness centrality. The enhancement of links among patents also results in different clustering. 相似文献

19.

面向专利技术主题分析的WI-LDA模型研究

吴红伊惠芳马永新李昌《图书情报工作》2018,62(17):68-74

[目的/意义] 改善现有LDA专利技术主题分析存在的辨识度低、可解释性弱和界限划分模糊问题,对于把握技术热点、追踪技术前沿具有重要意义。[方法/过程] 将国际分类号IPC引入LDA专利主题分析中,将其作为技术词的语境,以<词/词组,分类号>二元组的WI （Word IPC）结构进行训练,构建WI-LDA模型,实现对专利文献主题的识别和分析。[结果/结论] 通过中国石墨烯领域的实证研究及与传统LDA模型的对比研究证明,WI-LDA模型泛化能力较强,在专利技术主题分析上能有效降低主题的辨识难度,增加主题的可解释性,使文本主题划分更加清晰。相似文献

20.

专利技术主题分析:基于SAO结构的LDA主题模型方法

杨超朱东华汪雪锋朱福进衡晓帆《图书情报工作》2017,61(3):86-96

[目的/意义]改善现有专利技术主题分析方法主题辨识度低、主题词二义性、无法识别技术信息中的"问题"与相应"解决方案"等问题。[方法/过程]本文通过抽取专利文本中的SAO结构,并从SAO结构中识别"问题和解决方案"（P&S）模式,基于"bag of P&S"假设,构建基于"主语-行为-宾语"（subject-action-object,SAO）结构的LDA主题模型,实现对专利文献主题结构的识别和分析。[结果/结论]案例研究表明,该方法能够有效识别主题分布,并在主题辨识度和语义消岐方面较传统LDA模型具有较大优势。相似文献