首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Both structured and unstructured data, as well as structured data representing several different types of tuples, may be integrated into a single list for browsing or retrieval. Data may be arranged in the Gray code order of the features and metadata, producing optimal ordering for browsing. We provide several metrics for evaluating the performance of systems supporting browsing, given some constraints. Metadata and indexing terms are used for sorting keys and attributes for structured data, as well as for semi-structured or unstructured documents, images, media, etc. Economic and information theoretic models are suggested that enable the ordering to adapt to user preferences. Different relational structures and unstructured data may be integrated into a single, optimal ordering for browsing or for displaying tables in digital libraries, database management systems, or information retrieval systems. Adaptive displays of data are discussed.  相似文献   

2.
Classical information retrieval and overlap measures such as the Jaccard index, the Dice coefficient and Salton’s cosine measure can be characterized by Lorenz curves. This result demonstrates the existence of a formal link between information retrieval and the information sciences on the one hand, and concentration and diversity theory, as used, e.g., in social economics and ecology on the other.  相似文献   

3.
A bottom-up approach to sentence ordering for multi-document summarization   总被引:1,自引:0,他引:1  
Ordering information is a difficult but important task for applications generating natural language texts such as multi-document summarization, question answering, and concept-to-text generation. In multi-document summarization, information is selected from a set of source documents. However, improper ordering of information in a summary can confuse the reader and deteriorate the readability of the summary. Therefore, it is vital to properly order the information in multi-document summarization. We present a bottom-up approach to arrange sentences extracted for multi-document summarization. To capture the association and order of two textual segments (e.g. sentences), we define four criteria: chronology, topical-closeness, precedence, and succession. These criteria are integrated into a criterion by a supervised learning approach. We repeatedly concatenate two textual segments into one segment based on the criterion, until we obtain the overall segment with all sentences arranged. We evaluate the sentence orderings produced by the proposed method and numerous baselines using subjective gradings as well as automatic evaluation measures. We introduce the average continuity, an automatic evaluation measure of sentence ordering in a summary, and investigate its appropriateness for this task.  相似文献   

4.
An information retrieval performance measure that is interpreted as the percent of perfect performance (PPP) can be used to study the effects of the inclusion of specific document features or feature classes or techniques in an information retrieval system. Using this, one can measure the relative quality of a new ranking algorithm, the result of incorporating specific types of metadata or folksonomies from natural language, or determine what happens when one makes modifications to terms, such as stemming or adding part-of-speech tags. For example, knowledge that removing stopwords in a specific system improves the performance 5% of the way from the level of random performance to the best possible result is relatively easy to interpret and to use in decision making; using this percent based measure also allows us to simply compute and interpret that there remains 95% of the possible performance to be obtained using other methods. The PPP measure as used here is based on the average search length, a measure of the ordering quality of a set of data, and may be used when evaluating all the documents or just the first N documents in an ordered list of documents. Because the ASL may be computed empirically or may be estimated analytically, the PPP measure may also be computed empirically or performance may be estimated analytically. Different levels of upper bound performance are discussed.  相似文献   

5.
本文对信息检索中不同维度在相关性度量公式中的体现进行了讨论,并给出了改进相关性度量的几点建议。  相似文献   

6.
基于WWW的ProQuest6.0全文数据库检索系统分析   总被引:1,自引:0,他引:1  
吴丹 《情报科学》2003,21(12):1331-1334
本文以美国ProQuest公司开发的ProQuest6.0全文数据库检索系统为例,主要分析了其检索性能,指出较之以前版本的新增功能所在,并对其特点进行了评价,提出对中文检索系统发展的一点建议。  相似文献   

7.
With the advent of various services and applications of Semantic Web, semantic annotation has emerged as an important research topic. The application of semantically annotated ontology had been evident in numerous information processing and retrieval tasks. One of such tasks is utilizing the semantically annotated ontology in product design which is able to suggest many important applications that are critical to aid various design related tasks. However, ontology development in design engineering remains a time consuming and tedious task that demands considerable human efforts. In the context of product family design, management of different product information that features efficient indexing, update, navigation, search and retrieval across product families is both desirable and challenging. For instance, an efficient way of retrieving timely information on product family can be useful for tasks such as product family redesign and new product variant derivation when requirements change. However, the current research and application of information search and navigation in product family is mostly limited to its structural aspect which is insufficient to handle advanced information search especially when the query targets at multiple aspects of a product. This paper attempts to address this problem by proposing an information search and retrieval framework based on the semantically annotated multi-facet product family ontology. Particularly, we propose a document profile (DP) model to suggest semantic tags for annotation purpose. Using a case study of digital camera families, we illustrate how the faceted search and retrieval of product information can be accomplished. We also exemplify how we can derive new product variants based on the designer’s query of requirements via the faceted search and retrieval of product family information. Lastly, in order to highlight the value of our current work, we briefly discuss some further research and applications in design decision support, e.g. commonality analysis and variety comparison, based on the semantically annotated multi-facet product family ontology.  相似文献   

8.
Clusters of queries submitted to a given information retrieval system can be used as a basis for an effective method of clustering documents. This indirect procedure of document clustering requires the availability of a similarity measure for queries. Research carried out along these lines has resulted in the development of some methodologies for estimating such query similarities, applicable both in the case of queries characterized by sets of weighted or unweighted index terms and in the case of queries represented by Boolean combinations of index terms. This paper reports the results of further research by the author into a methodology of the latter type, i.e. a methodology for determining the similarity between queries characterized by Boolean search request formulations. The novelty of the presented approach, as compared with the methodology introduced in an earlier paper by the author, is that some relations among index terms are now taken into account. A number of similarity measures for Boolean combinations of index terms are discussed here in some detail. The rationale behind these measures is outlined, and the conditions to be met for ensuring their equivalence are identified. Moreover, the results of an experiment concerning two of the similarity measures introduced are given.  相似文献   

9.
With over 60,000 US K-12 science and mathematics education standards and a rapid proliferation of Web-enabled curriculum, retrieving curriculum that aligns with the standards to which teachers must teach is a key objective for educational digital libraries. However, previous studies of such alignment use single-dimensional and binary measures of the alignment concept. As a consequence, they suffer from low inter-rater reliability (IRR), with experts agreeing about alignments only some 20–40% of the time. We present the results of an experiment in which the alignment variable was operationalized using the Saracevic model of relevance ‘clues’ taken from the everyday practice of K-12 teaching. Results show high IRR across all clues with IRR on several specific alignment dimensions significantly higher than on overall alignment. In addition, a model of overall alignment is derived and estimated. The structure and explanatory power of the model as well as the relationships between alignment clues differ significantly between alignments of curriculum found by users themselves and curriculum found by others. These results illustrate the usefulness of clue-based relevance measures for information retrieval and have important consequences for both the formulation of automated retrieval mechanisms and the construction of a gold standard or benchmark set of standard-curriculum alignments.  相似文献   

10.
经济全球化进程的加快,国际交流的日益增多,使得社会所需的英语专业人才也在不断增多。然而从实际情况看,高职英语专业学生的就业率并不高。这主要是由于社会和用人单位对英语专业人才提出了更高的要求,故而传统的英语教学已不能适应社会需求。因此本文首先对高职英语教学存在的问题进行分析,然后针对这些问题,在教学观念、教学方法、教学模式以及师资建设方面提出相应对策以加强高职英浯教学改革.  相似文献   

11.
Content-based image retrieval (CBIR) with global features is notoriously noisy, especially for image queries with low percentages of relevant images in a collection. Moreover, CBIR typically ranks the whole collection, which is inefficient for large databases. We experiment with a method for image retrieval from multimedia databases, which improves both the effectiveness and efficiency of traditional CBIR by exploring secondary media. We perform retrieval in a two-stage fashion: first rank by a secondary medium, and then perform CBIR only on the top-K items. Thus, effectiveness is improved by performing CBIR on a ‘better’ subset. Using a relatively ‘cheap’ first stage, efficiency is also improved via the fewer CBIR operations performed. Our main novelty is that K is dynamic, i.e. estimated per query to optimize a predefined effectiveness measure. We show that our dynamic two-stage method can be significantly more effective and robust than similar setups with static thresholds previously proposed. In additional experiments using local feature derivatives in the visual stage instead of global, such as the emerging visual codebook approach, we find that two-stage does not work very well. We attribute the weaker performance of the visual codebook to the enhanced visual diversity produced by the textual stage which diminishes codebook’s advantage over global features. Furthermore, we compare dynamic two-stage retrieval to traditional score-based fusion of results retrieved visually and textually. We find that fusion is also significantly more effective than single-medium baselines. Although, there is no clear winner between two-stage and fusion, the methods exhibit different robustness features; nevertheless, two-stage retrieval provides efficiency benefits over fusion.  相似文献   

12.
This paper reviews the career and legacy of William (Bill) Goffman, who served as a researcher, Professor, Dean and Emeritus at Case Western Reserve University, Cleveland, Ohio, from 1959 to 2000. Goffman pioneered mathematical information science broadly and in several key areas. First, he applied disease epidemiology concepts to model accurately the spread of knowledge and the formation of knowledge systems and their ecologies, including the dynamics of scientific discovery. Second, he proposed significant improvements in information retrieval through the deployment of multi-valued logic, appropriate file ordering, effective and efficient retrieval measures, and simplified retrieval approaches, including early work in citation-based searching. Third, Goffman applied Bradford-like distributions to model effective core research literature collection development and usage. Fourth, he developed original epidemiology models, and was an early contributor in biomedical informatics. His mathematical contributions have stood the test of time and will continue to be applicable indefinitely.  相似文献   

13.
蔡淑琴  邱洁  王旸  周鹏  林勇 《情报杂志》2012,(3):168-173,167
为解决互联网点评信息的过载、迷失,研究了互联网点评信息的序化问题。以大众点评网(dianping.com)为背景,研究了点评信息的特征及用户需求,并作为互联网点评信息序化的基础,研究了序化过程,提出并设计了点评信息的表层序化与内容序化,设计了基于用户需求的点评信息有序性的度量指标,并给出了点评信息序化的实例。  相似文献   

14.
15.
王磊  朱学芳 《情报科学》2005,23(9):1414-1417
随着图像检索技术近几年来的快速发展,基于内容图像检索和基于文本图像检索两种技术的不和谐现象越来越明显;两者各自所对应的元数据集之间很难兼容;基于内容图像检索和图像元数据联系相对薄弱。本文正是针对这样一种不协调的情况,从用户对图像检索的需求出发,以图像元数据标准为平台,对基于内容图像检索和基于文本图像检索的融合问题做一探讨,这有利于解决图像检索中存在的有关兼容问题。  相似文献   

16.
In this paper, we present a comparison of collocation-based similarity measures: Jaccard, Dice and Cosine similarity measures for the proper selection of additional search terms in query expansion. In addition, we consider two more similarity measures: average conditional probability (ACP) and normalized mutual information (NMI). ACP is the mean value of two conditional probabilities between a query term and an additional search term. NMI is a normalized value of the two terms' mutual information. All these similarity measures are the functions of any two terms' frequencies and the collocation frequency, but are different in the methods of measurement. The selected measure changes the order of additional search terms and their weights, hence has a strong influence on the retrieval performance. In our experiments of query expansion using these five similarity measures, the additional search terms of Jaccard, Dice and Cosine similarity measures include more frequent terms with lower similarity values than ACP or NMI. In overall assessments of query expansion, the Jaccard, Dice and Cosine similarity measures are better than ACP and NMI in terms of retrieval effectiveness, whereas, NMI and ACP are better in terms of execution efficiency.  相似文献   

17.
简析人员流动中的知识产权流失及保护对策   总被引:2,自引:1,他引:2  
在分析人员流动中知识产权流失的基础上,提出了三条行之有效的保护措施:(1)强化专利意识是防止人员流动造成知识产权流失的“矛与盾”;(2)提倡竞业精神是防止由人员流动带走商业秘密的重要措施;(3)主动诉讼,寻求司法保护。  相似文献   

18.
To resolve some of lexical disagreement problems between queries and FAQs, we propose a reliable FAQ retrieval system using query log clustering. On indexing time, the proposed system clusters the logs of users’ queries into predefined FAQ categories. To increase the precision and the recall rate of clustering, the proposed system adopts a new similarity measure using a machine readable dictionary. On searching time, the proposed system calculates the similarities between users’ queries and each cluster in order to smooth FAQs. By virtue of the cluster-based retrieval technique, the proposed system could partially bridge lexical chasms between queries and FAQs. In addition, the proposed system outperforms the traditional information retrieval systems in FAQ retrieval.  相似文献   

19.
The use of natural language information can improve decision-making. Darwinian considerations suggest that language may have developed because it leads to improved decision-making and survival, justifying the study of language's contribution to decision-making. The study of information-based decision-making within the context of evolution provides a view of information use that allows us to both describe the phenomenon of information use as well as to explain why an information use occurs as it does. Increasing information retrieval performance using phrases and part-of-speech (POS) information is one example of a type of decision-making performance that is improved when using this linguistic information. By studying a set of phrases used in a text retrieval system, we are able to show the relative effectiveness of using multi-term phrases as opposed to individual terms, as well as the relative worth of POS tagged terms or phrases, as opposed to untagged terms or phrases. An explanation is suggested for why POS tags contribute less to higher order grammatical constructs. We propose a measure of those needs for POS disambiguation that can be addressed by tagging; some example terms are analyzed using this measure, and specific degrees of ambiguity are proposed.  相似文献   

20.
科技文献检索策略的失误因素与检索技巧   总被引:3,自引:0,他引:3  
徐英  张玉花  朱斌  卞福荃 《情报科学》2001,19(11):1178-1180
从实用文献信息学角度,对科技文献的检索途径和策略做了探讨,并从检索理论和实际运作的角度,提出了文献检索中常出现的误差因素及修正措施。对科技人员运用规范的检索语言和检索技术,开发和利用科技文献,对文献进行技术标引,达到最佳检索效率等方面,具有指导意义。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号