首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
关联数据:概念、技术及应用展望   总被引:10,自引:2,他引:8  
概述了关联数据概念的提出、基本内涵、技术实现和当前国内外的研究应用状况,对其在图书馆行业的应用作了简要介绍,点评了国内该领域的研究开发情况,重点阐述了对于图书馆在Web上发布书目数据和规范数据的重要意义,认为关联数据与网络时代的图书情报工作关系密切,是互联网发展到语义网时代,对网上资源和数字对象进行"编目"和"规范控制"的基础性技术,是数字图书馆进行信息资源发布和服务的核心技术之一。最后作者呼吁我国图书情报界重视这一技术,及早投入一定的资源和人力进行研究开发和应用推广,使图书馆大量的权威数据在互联网上占据一席之地。  相似文献   

2.
关联数据发布技术及其实现——以Drupal为例   总被引:4,自引:1,他引:3  
关联数据是一种轻量级的语义网实现技术,其重要价值在于通过RDF数据模型,将网络上的非结构化数据和采用不同标准的结构化数据转换成遵循统一标准的结构化数据,以便机器理解。文章梳理了关联数据的实现方式,目前关联数据的发布模式有:静态发布、批量存储、调用时生成、事后转换(D2R),发布关联数据的工具包括:VoID词表、前端转换工具、OWL及SKOS相关工具、Web services、Web应用框架、CMS及RDFa。并详细介绍了开源CMS平台Drupal。Drupal作为一直关注语义网技术的开源CMS平台,其高度模块化的架构为实现关联数据的发布打下了良好的基础,RDF、SPARQL、SPARQL_Endpoint及其周边模块构成了一套较为成熟的关联数据发布与消费的框架,最后利用该框架实现将"中国历史纪年和公元纪年对照表"发布成关联数据。  相似文献   

3.
本文旨在为国内外分类法拥有者快速实现现有分类法的Web版、Web Service术语服务、Linked Data化和自动分类等网络共享服务提供参考和支持。以《中国图书馆分类法》第四版为例,采用CNKOS对分类法进行语义化描述,使用Lucene全文检索引擎、Ext Js插件、Axis SOAP引擎和URL Rewrite等关键技术实现了分类法共享服务(CLSS)原型系统。经验证明所有功能模块运行和调用都符合预期效果。实践证明该套解决方案可投入使用,而且其他分类法类型的中文知识组织系统均可依此方法快速完成相应的部署。但是,对于部分复杂的自动分类需求,如要获得更为准确的分类号还需更深入的研究或人工辅助。图5。表4。参考文献21。  相似文献   

4.
基于Web链接矩阵求解的排序算法是目前Web信息检索系统的主流排序算法,大体可以分为3类,基于随机漫游模型的、基于权威/中心网页关系模型、综合使用前两种模型的算法。在分别说明其代表算法的基本原理并讨论优劣之处后,提出一种基于链接矩阵的混合型算法。  相似文献   

5.
中文叙词表的语义化转换   总被引:1,自引:0,他引:1  
欧石燕 《图书情报工作》2015,59(16):110-118
[目的/意义]随着语义网与关联数据的兴起与发展,采用SKOS语言对叙词表进行语义化描述成为主流,这为叙词表在网络上的发布、共享以及在网络环境下的应用提供新的契机。[方法/过程]以《汉语主题词表》为例,对中文叙词表的语义化表示、验证和关联数据发布进行探索。首先,制定基于SKOS、SKOS-XL和SKOS扩展的叙词表语义化表示方案, 实现对叙词表的无损语义描述, 开发基于N-Triples格式的词表语义化转换程序, 使大规模词表的语义化转换更加简单高效;其次,采用新兴的SPIN框架对语义化词表的完整性进行验证, 为SKOS词表的正确性与合法性提供保证;最后,采用“Jena TDB+Fuseki+Pubby”的组合将SKOS/RDF词表数据在网络上发布为关联数据,并开发词表关联数据检索界面。[结果/结论]实验结果表明,采用本文的方法能够实现整个《汉语主题词表》的高效语义化转换、验证与发布,促进中文叙词表在网络上的共享与应用。  相似文献   

6.
Semantic Web Identity (SWI) is proposed as the condition in which search engines recognize the existence and nature of entities. The display of a Knowledge Graph Card in Google search results is an indicator of SWI, as it demonstrates that Google has gathered verifiable facts about the entity. Such recognition is likely to improve the accuracy and relevancy of Google's referrals to that entity. This article summarizes part of the research conducted for a recent doctoral dissertation, showing that SWI is poor for ARL libraries. The study hypothesizes that the failure to populate records in appropriate Linked Open Data and proprietary Semantic Web knowledge bases contributes to poor SWI.  相似文献   

7.
基于本体和DOM相结合的Web信息抽取器   总被引:1,自引:0,他引:1  
针对基于Web页面信息本体的信息抽取不能准确划定抽取区域的缺点,设计基于本体和DOM相结合的Web信息抽取器。利用DOM树设计对样本页面信息项路径进行归纳学习的算法,该算法能准确划定信息抽取区域,降低页面噪声,实现对Web页面的预处理。实验表明,改进后的抽取方法提高了Web信息的抽准率。  相似文献   

8.
RDA与关联数据   总被引:2,自引:0,他引:2  
作为新一代编目规则,RDA在应用模型、书目记录的结构、术语和编目规则等方面与其前一个版本AACR2相比,有很大变化。RDA首先区分书目对象相关实体,再确定各类实体所需描述的属性,以及各类实体、属性、取值等要素之间的关系,并对各类规范取值词表进行规定。这种基于概念模型的描述特别适合利用语义网技术来实现。关联数据是语义网的一个简化方案,以RDA编目的书目数据用关联数据发布,能使RDA的潜力发挥到极致。不久的将来,具有语义的书目数据在互联网上将不断增多,书目数据的普遍关联将指日可待。  相似文献   

9.
Web search algorithms that rank Web pages by examining the link structure of the Web are attractive from both theoretical and practical aspects. Todays prevailing link-based ranking algorithms rank Web pages by using the dominant eigenvector of certain matrices—like the co-citation matrix or variations thereof. Recent analyses of ranking algorithms have focused attention on the case where the corresponding matrices are irreducible, thus avoiding singularities of reducible matrices. Consequently, rank analysis has been concentrated on authority connected graphs, which are graphs whose co-citation matrix is irreducible (after deleting zero rows and columns). Such graphs conceptually correspond to thematically related collections, in which most pages pertain to a single, dominant topic of interest.A link-based search algorithm A is rank-stable if minor changes in the link structure of the input graph, which is usually a subgraph of the Web, do not affect the ranking it produces; algorithms A,B are rank-similar if they produce similar rankings. These concepts were introduced and studied recently for various existing search algorithms.This paper studies the rank-stability and rank-similarity of three link-based ranking algorithms—PageRank, HITS and SALSA—in authority connected graphs. For this class of graphs, we show that neither HITS nor PageRank is rank stable. We then show that HITS and PageRank are not rank similar on this class, nor is any of them rank similar to SALSA.This research was supported by the Fund for the Promotion of Research at the Technion, and by the Barnard Elkin Chair in Computer Science.  相似文献   

10.
互联网的发展产生了大量的各种类型的信息,但由于传统网络数据的发布格式缺乏结构和语义,造成了网络上大多数文档和数据的孤立。关联数据作为一种新的网络数据发布方式,通过RDF(资源描述框架)构建数据模型,通过URI(统一资源标识符)命名数据实体,发布有关联的互联网信息,使用HTTP协议获取这些相互关联的信息,实现互联网信息的语义整合,实现关联数据浏览器和搜索引擎的应用,使计算机能够更加智能化地帮助人们组织和管理信息。  相似文献   

11.
关联数据在网络信息管理中的应用   总被引:1,自引:0,他引:1  
互联网的发展产生了大量的各种类型的信息,但由于传统网络数据的发布格式缺乏结构和语义,造成了网络上大多数文档和数据的孤立。关联数据作为一种新的网络数据发布方式,通过RDF(资源描述框架)构建数据模型,通过URI(统一资源标识符)命名数据实体,发布有关联的互联网信息,使用HTTP协议获取这些相互关联的信息,实现互联网信息的语义整合,实现关联数据浏览器和搜索引擎的应用,使计算机能够更加智能化地帮助人们组织和管理信息。  相似文献   

12.
13.
The rise of Big, Open and Linked Data (BOLD) enables Big Data Algorithmic Systems (BDAS) which are often based on machine learning, neural networks and other forms of Artificial Intelligence (AI). As such systems are increasingly requested to make decisions that are consequential to individuals, communities and society at large, their failures cannot be tolerated, and they are subject to stringent regulatory and ethical requirements. However, they all rely on data which is not only big, open and linked but varied, dynamic and streamed at high speeds in real-time. Managing such data is challenging. To overcome such challenges and utilize opportunities for BDAS, organizations are increasingly developing advanced data governance capabilities. This paper reviews challenges and approaches to data governance for such systems, and proposes a framework for data governance for trustworthy BDAS. The framework promotes the stewardship of data, processes and algorithms, the controlled opening of data and algorithms to enable external scrutiny, trusted information sharing within and between organizations, risk-based governance, system-level controls, and data control through shared ownership and self-sovereign identities. The framework is based on 13 design principles and is proposed incrementally, for a single organization and multiple networked organizations.  相似文献   

14.
ABSTRACT

One of the big challenges facing academic libraries today is to increase the relevance of the libraries to their user communities. If the libraries can increase the visibility of their resources on the open web, it will increase the chances of the libraries to reach to their user communities via the user's first search experience. BIBFRAME and library Linked Data will enable libraries to publish their resources in a way that the Web understands, consume Linked Data to enrich their resources relevant to the libraries' user communities, and visualize networks across collections. However, one of the important steps for transitioning to BIBFRAME and library Linked Data involves crosswalks, mapping MARC fields and subfields across data models and performing necessary data reformatting to be in compliance with the specifications of the new model, which is currently BIBFRAME 2.0. This article looks into how the Library of Congress has mapped library bibliographic data from the MARC format to the BIBFRAME 2.0 model and vocabulary published and updated since April 2016, available from http://www.loc.gov/bibframe/docs/index.html based on the recently released conversion specifications and converter, developed by the Library of Congress with input from many community members. The BIBFRAME 2.0 standard and conversion tools will enable libraries to transform bibliographic data from MARC into BIBFRAME 2.0, which introduces a Linked Data model as the improved method of bibliographic control for the future, and make bibliographic information more useful within and beyond library communities.  相似文献   

15.
The most common approach to measuring the effectiveness of Information Retrieval systems is by using test collections. The Contextual Suggestion (CS) TREC track provides an evaluation framework for systems that recommend items to users given their geographical context. The specific nature of this track allows the participating teams to identify candidate documents either from the Open Web or from the ClueWeb12 collection, a static version of the web. In the judging pool, the documents from the Open Web and ClueWeb12 collection are distinguished. Hence, each system submission should be based only on one resource, either Open Web (identified by URLs) or ClueWeb12 (identified by ids). To achieve reproducibility, ranking web pages from ClueWeb12 should be the preferred method for scientific evaluation of CS systems, but it has been found that the systems that build their suggestion algorithms on top of input taken from the Open Web achieve consistently a higher effectiveness. Because most of the systems take a rather similar approach to making CSs, this raises the question whether systems built by researchers on top of ClueWeb12 are still representative of those that would work directly on industry-strength web search engines. Do we need to sacrifice reproducibility for the sake of representativeness? We study the difference in effectiveness between Open Web systems and ClueWeb12 systems through analyzing the relevance assessments of documents identified from both the Open Web and ClueWeb12. Then, we identify documents that overlap between the relevance assessments of the Open Web and ClueWeb12, observing a dependency between relevance assessments and the source of the document being taken from the Open Web or from ClueWeb12. After that, we identify documents from the relevance assessments of the Open Web which exist in the ClueWeb12 collection but do not exist in the ClueWeb12 relevance assessments. We use these documents to expand the ClueWeb12 relevance assessments. Our main findings are twofold. First, our empirical analysis of the relevance assessments of 2  years of CS track shows that Open Web documents receive better ratings than ClueWeb12 documents, especially if we look at the documents in the overlap. Second, our approach for selecting candidate documents from ClueWeb12 collection based on information obtained from the Open Web makes an improvement step towards partially bridging the gap in effectiveness between Open Web and ClueWeb12 systems, while at the same time we achieve reproducible results on well-known representative sample of the web.  相似文献   

16.
Knowledge transfer for cross domain learning to rank   总被引:1,自引:1,他引:0  
Recently, learning to rank technology is attracting increasing attention from both academia and industry in the areas of machine learning and information retrieval. A number of algorithms have been proposed to rank documents according to the user-given query using a human-labeled training dataset. A basic assumption behind general learning to rank algorithms is that the training and test data are drawn from the same data distribution. However, this assumption does not always hold true in real world applications. For example, it can be violated when the labeled training data become outdated or originally come from another domain different from its counterpart of test data. Such situations bring a new problem, which we define as cross domain learning to rank. In this paper, we aim at improving the learning of a ranking model in target domain by leveraging knowledge from the outdated or out-of-domain data (both are referred to as source domain data). We first give a formal definition of the cross domain learning to rank problem. Following this, two novel methods are proposed to conduct knowledge transfer at feature level and instance level, respectively. These two methods both utilize Ranking SVM as the basic learner. In the experiments, we evaluate these two methods using data from benchmark datasets for document retrieval. The results show that the feature-level transfer method performs better with steady improvements over baseline approaches across different datasets, while the instance-level transfer method comes out with varying performance depending on the dataset used.  相似文献   

17.
Information Retrieval Journal - Ranking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a...  相似文献   

18.
在分析主题搜索引擎体系结构之后,提出基于OSS的系统实现策略,重点讨论主题建模方法、主题相关度算法以及基于相同代码规范、基于Web Service接口规范、基于JNI接口规范的开源系统集成实现技术。  相似文献   

19.
20.
集成检索系统中资源选择技术及算法   总被引:2,自引:0,他引:2  
介绍近年来出现的主要资源选择技术的基本思想及算法,包括基于资源相关度排序的资源选择技术、基于文献分布状况的资源选择技术、基于检索成本计算的资源选择技术和基于资源内容等级结构的资源选择技术,并对目前的资源选择技术进行分析和评价。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号