首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
Although deep learning breakthroughs in NLP are based on learning distributed word representations by neural language models, these methods suffer from a classic drawback of unsupervised learning techniques. Furthermore, the performance of general-word embedding has been shown to be heavily task-dependent. To tackle this issue, recent researches have been proposed to learn the sentiment-enhanced word vectors for sentiment analysis. However, the common limitation of these approaches is that they require external sentiment lexicon sources and the construction and maintenance of these resources involve a set of complexing, time-consuming, and error-prone tasks. In this regard, this paper proposes a method of sentiment lexicon embedding that better represents sentiment word's semantic relationships than existing word embedding techniques without manually-annotated sentiment corpus. The major distinguishing factor of the proposed framework was that joint encoding morphemes and their POS tags, and training only important lexical morphemes in the embedding space. To verify the effectiveness of the proposed method, we conducted experiments comparing with two baseline models. As a result, the revised embedding approach mitigated the problem of conventional context-based word embedding method and, in turn, improved the performance of sentiment classification.  相似文献   

2.
大众标注是近几年流行于互联网的一种信息自组织方式,这种方法允许用户使用标签存储和管理自己的信息资源,并提供分享和交流的平台。文章针对学科门户的缺陷,提出学科门户可以利用大众标注这种资源组织形式,即综合现有综合网摘和学科信息门户的优点,建立学科网摘门户,并设计了学科网摘门户的建设架构。  相似文献   

3.
Taxonomies enable organising information in a human–machine understandable form, but constructing them for reuse and maintainability remains difficult. The paper presents a formal underpinning to provide quality metrics for a taxonomy under development. It proposes a methodology for semi-automatic building of maintainable taxonomies and outlines key features of the knowledge engineering context where the metrics and methodology are most suitable. The strength of the approach presented is that it is applied during the actual construction of the taxonomy. Users provide terms to describe different domain elements, as well as their attributes, and methodology uses metrics to assess the quality of this input. Changes according to given quality constraints are then proposed during the actual development of the taxonomy.  相似文献   

4.
In this paper, the task of text segmentation is approached from a topic modeling perspective. We investigate the use of two unsupervised topic models, latent Dirichlet allocation (LDA) and multinomial mixture (MM), to segment a text into semantically coherent parts. The proposed topic model based approaches consistently outperform a standard baseline method on several datasets. A major benefit of the proposed LDA based approach is that along with the segment boundaries, it outputs the topic distribution associated with each segment. This information is of potential use in applications such as segment retrieval and discourse analysis. However, the proposed approaches, especially the LDA based method, have high computational requirements. Based on an analysis of the dynamic programming (DP) algorithm typically used for segmentation, we suggest a modification to DP that dramatically speeds up the process with no loss in performance. The proposed modification to the DP algorithm is not specific to the topic models only; it is applicable to all the algorithms that use DP for the task of text segmentation.  相似文献   

5.
信息生命周期管理作为一种信息分级存储策略,可以优化存储、降低成本。本文针对当前个性化信息服务中的信息资源建设问题进行剖析,提出了基于信息生命周期管理的个性化信息资源建设模型,以实现信息资源的分级存储和分级服务,解决用户需求变化时没有价值的信息仍占用存储介质的问题以及资源信息存储的无序性等问题。最后,对模型实现存在的问题进行了分析和总结。
Abstract:
Information lifecycle management,as a strategy for information hierarchical storage,can optimize storage and reduce cost. This paper analyzes the problem of information resources construction in the present personalized information service,and then proposes a model of personalized information resources construction based on information lifecycle management,which can fulfill the hierarchical storage and hierarchical service of information resources,and solve the problem of storage media taken up by the information with no value as the user demand changes,and the problem of out-of-sequence storage of information resources. Finally,the paper analyzes and summarizes the problems in constructing the model.  相似文献   

6.
王日花 《情报科学》2021,39(10):76-87
【目的/意义】解决自动问答系统构建过程中数据集构建成本高的问题,以及自动问答过程中仅考虑问题或 答案本身相关性的局限。【方法/过程】提出了一种融合标注问答库和社区问答数据的数据集构建方法,构建问题关 键词-问题-答案-答案簇多层异构网络模型,并给出了基于该模型的自动问答算法。获取图书馆语料进行处理作 为实验数据,将BERT-Cos、AINN、BiMPM模型作为对比对象进行了实验与分析。【结果/结论】通过实验得到了各 模型在图书馆自动问答任务上的效果,本文所提模型在各评价指标上均优于其他模型,模型准确率达87.85%。【创 新/局限】本文提出的多数据源融合数据集构建方法和自动问答模型在问答任务中相对于已有方法具有更好的表 现,同时根据模型效果分析给出用户提问词长建议。  相似文献   

7.
This paper proposes a learning approach for the merging process in multilingual information retrieval (MLIR). To conduct the learning approach, we present a number of features that may influence the MLIR merging process. These features are mainly extracted from three levels: query, document, and translation. After the feature extraction, we then use the FRank ranking algorithm to construct a merge model. To the best of our knowledge, this practice is the first attempt to use a learning-based ranking algorithm to construct a merge model for MLIR merging. In our experiments, three test collections for the task of crosslingual information retrieval (CLIR) in NTCIR3, 4, and 5 are employed to assess the performance of our proposed method. Moreover, several merging methods are also carried out for a comparison, including traditional merging methods, the 2-step merging strategy, and the merging method based on logistic regression. The experimental results show that our proposed method can significantly improve merging quality on two different types of datasets. In addition to the effectiveness, through the merge model generated by FRank, our method can further identify key factors that influence the merging process. This information might provide us more insight and understanding into MLIR merging.  相似文献   

8.
In recent years, there has been a rapid growth of user-generated data in collaborative tagging (a.k.a. folksonomy-based) systems due to the prevailing of Web 2.0 communities. To effectively assist users to find their desired resources, it is critical to understand user behaviors and preferences. Tag-based profile techniques, which model users and resources by a vector of relevant tags, are widely employed in folksonomy-based systems. This is mainly because that personalized search and recommendations can be facilitated by measuring relevance between user profiles and resource profiles. However, conventional measurements neglect the sentiment aspect of user-generated tags. In fact, tags can be very emotional and subjective, as users usually express their perceptions and feelings about the resources by tags. Therefore, it is necessary to take sentiment relevance into account into measurements. In this paper, we present a novel generic framework SenticRank to incorporate various sentiment information to various sentiment-based information for personalized search by user profiles and resource profiles. In this framework, content-based sentiment ranking and collaborative sentiment ranking methods are proposed to obtain sentiment-based personalized ranking. To the best of our knowledge, this is the first work of integrating sentiment information to address the problem of the personalized tag-based search in collaborative tagging systems. Moreover, we compare the proposed sentiment-based personalized search with baselines in the experiments, the results of which have verified the effectiveness of the proposed framework. In addition, we study the influences by popular sentiment dictionaries, and SenticNet is the most prominent knowledge base to boost the performance of personalized search in folksonomy.  相似文献   

9.
同行评议是当前对科研项目水平进行科学评价的主要方式之一,然而评议过程中专家评审能力的差别将会对科研项目评审结果产生影响。为此,本文提出了一种基于PageRank算法的评审专家信誉度度量方法,该方法首先利用高斯分布函数计算评审专家的评审能力,然后利用PageRank迭代算法对评审专家的信誉度进行求解,最后通过引入时间因子对评审专家的信誉度进行度量。基于同行评议真实数据集上的实验结果验证了本文提出方法的有效性,该方法将为科研项目评审及专家遴选提供有益参考。  相似文献   

10.
In the information retrieval systems, one of the most important and difficult operations is to extract appropriate keywords from documents. This paper proposes an effective substring search method by extending a pattern matching machine for multi-keyword based on Aho and Corasick (AC) called AC machine. The proposed method enables us to extract keyword candidates as much as possible and to select the suitable keywords for users' purpose at a retrieval stage. This method contains four types of substring search methods (exact, prefix, suffix and proper substring search). This paper also proposes a construction algorithm of the retrieval structure for speeding up the substring search. From the simulation results, it is shown that the retrieval time of the presented method is as fast as the key retrieval method based on the trie.  相似文献   

11.
俞扬信  刘瀛泽 《情报杂志》2012,31(2):136-140
针对传统检索方法在当今网络信息环境下所面临的问题,提出了一种用户个性化信息检索新方法。在这种方法中,根据形式概念分析(FCA)理论,将用户偏好定义为概念网,用户概念网中的概念定义了用户偏好的范围和目标。使用传统的TF-IDF加权方案和ODP的参考概念层次,将用户偏好用概念矢量表示,进行用户概念网的扩展。比较测试表明所提出的方法不仅具有实现可行性,而且在检索效果上优于传统的检索模式,具有一定的应用前景。  相似文献   

12.
[研究目的]战略情报分析主要是由情报专家人工分析为主,在信息系统方面的建设还比较薄弱,文章结合情报分析过程,提出基于孙子情报分析理论构建标签体系,作为信息系统设计实现的一种参考。[研究方法]通过孙子情报分析理论、标签的定义、战略情报分析过程描述了战略情报分析标签体系的构建及其作用,提出了以顶层标签作为切入点构建战略问题分析模型,以及战略情报分析计算模型。[研究结论]标签体系应用广泛,以孙子情报分析理论的“道、天、地、将、法”为基础构建战略情报分析标签体系,进行战略问题分析建模与计算,对战略情报分析研究和相关信息系统建设具有一定的指导意义。  相似文献   

13.
Recently, social network has been paid more and more attention by people. Inaccurate community detection in social network can provide better product designs, accurate information recommendation and public services. Thus, the community detection (CD) algorithm based on network topology and user interests is proposed in this paper. This paper mainly includes two parts. In first part, the focused crawler algorithm is used to acquire the personal tags from the tags posted by other users. Then, the tags are selected from the tag set based on the TFIDF weighting scheme, the semantic extension of tags and the user semantic model. In addition, the tag vector of user interests is derived with the respective tag weight calculated by the improved PageRank algorithm. In second part, for detecting communities, an initial social network, which consists of the direct and unweighted edges and the vertexes with interest vectors, is constructed by considering the following/follower relationship. Furthermore, initial social network is converted into a new social network including the undirected and weighted edges. Then, the weights are calculated by the direction and the interest vectors in the initial social network and the similarity between edges is calculated by the edge weights. The communities are detected by the hierarchical clustering algorithm based on the edge-weighted similarity. Finally, the number of detected communities is detected by the partition density. Also, the extensively experimental study shows that the performance of the proposed user interest detection (PUID) algorithm is better than that of CF algorithm and TFIDF algorithm with respect to F-measure, Precision and Recall. Moreover, Precision of the proposed community detection (PCD) algorithm is improved, on average, up to 8.21% comparing with that of Newman algorithm and up to 41.17% comparing with that of CPM algorithm.  相似文献   

14.
In this paper, we consider the parameter estimation issues of a class of multivariate output-error systems. A decomposition based recursive least squares identification method is proposed using the hierarchical identification principle and the auxiliary model idea, and its convergence is analyzed through the stochastic process theory. Compared with the existing results on parameter estimation of multivariate output-error systems, a distinct feature for the proposed algorithm is that such a system is decomposed into several sub-systems with smaller dimensions so that parameters to be identified can be estimated interactively. The analysis shows that the estimation errors converge to zero in mean square under certain conditions. Finally, in order to show the effectiveness of the proposed approach, some numerical simulations are provided.  相似文献   

15.
This paper aims at providing new design approaches for positive observers of discrete-time positive linear systems based on a construction method of linear copositive Lyapunov function for positive systems. First, an efficient positive observer design approach is proposed by using linear programming such that the observer error system is exponentially stable. Furthermore, an interval observer design is proposed for uncertain positive systems. Then, the results are extended to positive time delay systems. In contrast with the previous design approaches, the new design method provides a general observer design with lower computational burden. Finally, three comparison examples are given to show the merit of the new design approach.  相似文献   

16.
蔡皎洁  张玉峰 《现代情报》2013,33(5):105-111
鉴于领域本体开发缺乏统一化过程,本文拟提出一种基于软件工程开发的企业本体构建标准化流程,即用结构化开发方法将企业本体构建流程划分为规划、分析、设计、实施与运行5个阶段;具体在分析阶段,又利用原型化开发方法构建初始的企业本体概念框架,以在有限的时间内提高企业本体开发的质量。另外,实验中基于该流程构建了某企业手机产品本体,并比较了与基于"骨架法"流程所构建的相同领域本体在文本过滤应用中的效果。  相似文献   

17.
基于Ontology的内容分析法的理论基础研究   总被引:1,自引:0,他引:1  
文在分析网络环境下内客分析法面临新的需求与存在局限性的基础上。结合Ontology的基本含义与应用,提出了基于Ontology进行内容分析法创新研究的思路;通过对国内外研究现状的调查分析,探索了基于Ontology的内容分析法的理论基础:基本思想、Ontology的角色、操作流程;最后,分析了该方法的优点和目前应用时需要注意的问题。  相似文献   

18.
The breeding and spreading of negative emotion in public emergencies posed severe challenges to social governance. The traditional government information release strategies ignored the negative emotion evolution mechanism. Focusing on the information release policies from the perspectives of the government during public emergency events, by using cognitive big data analytics, our research applies deep learning method into news framing framework construction process, and tries to explore the influencing mechanism of government information release strategy on contagion-evolution of negative emotion. In particular, this paper first uses Word2Vec, cosine word vector similarity calculation and SO-PMI algorithms to build a public emergencies-oriented emotional lexicon; then, it proposes a emotion computing method based on dependency parsing, designs an emotion binary tree and dependency-based emotion calculation rules; and at last, through an experiment, it shows that the emotional lexicon proposed in this paper has a wider coverage and higher accuracy than the existing ones, and it also performs a emotion evolution analysis on an actual public event based on the emotional lexicon, using the emotion computing method proposed. And the empirical results show that the algorithm is feasible and effective. The experimental results showed that this model could effectively conduct fine-grained emotion computing, improve the accuracy and computational efficiency of sentiment classification. The final empirical analysis found that due to such defects as slow speed, non transparent content, poor penitence and weak department coordination, the existing government information release strategies had a significant negative impact on the contagion-evolution of anxiety and disgust emotion, could not regulate negative emotions effectively. These research results will provide theoretical implications and technical supports for the social governance. And it could also help to establish negative emotion management mode, and construct a new pattern of the public opinion guidance.  相似文献   

19.
基于分众分类的本体构建分析   总被引:3,自引:1,他引:2  
传统的本体创建方法主要依靠小部分人的力量,在适应网络信息的动态性、复杂性上存在缺陷。Web2.0环境下流行的分众分类法能够为本体建立和演化提供丰富语料库和概念语义信息,从而为本体建立提供强大支持。本文利用社会网络分析的理论和方法,采用浮出语义的思路,分析基于分众分类的标引者—标签概念—实例三部图模型发掘概念间语义信息、建立本体的方法和过程模型。  相似文献   

20.
[目的/意义]探究政务信息协同结构及特征对于切实推进智慧政务建设具有重要意义。文章以深圳市智慧政务信息协同结构分析为例,提出智慧政务信息协同结构的解析方法。[方法/过程]依据深圳市政务服务流程提取信息主体及信息链,基于业务流程解析其信息协同结构,构建信息协同网络,选取度、度分布、中心性、聚集系数、平均路径等分析指标解析网络拓扑结构及网络的无标度和小世界特征。[结果/结论]发现深圳市政务信息协同网络中主要存在两类信息主体,形成了以申请人、公安局、规划和国土资源部门为核心节点的政务信息协同结构,申请人作为网络中最为特殊的信息主体在信息协同网络优化过程中应被重点关注。基于具体政务业务流程数据,运用复杂网络分析方法,可以精确解析各个城市、区域的智慧政务信息协同结构及特征,为智慧政务的信息协同建设提供依据。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号