首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Aspect level sentiment analysis is important for numerous opinion mining and market analysis applications. In this paper, we study the problem of identifying and rating review aspects, which is the fundamental task in aspect level sentiment analysis. Previous review aspect analysis methods seldom consider entity or rating but only 2-tuples, i.e., head and modifier pair, e.g., in the phrase “nice room”, “room” is the head and “nice” is the modifier. To solve this problem, we novelly present a Quad-tuple Probability Latent Semantic Analysis (QPLSA), which incorporates entity and its rating together with the 2-tuples into the PLSA model. Specifically, QPLSA not only generates fine-granularity aspects, but also captures the correlations between words and ratings. We also develop two novel prediction approaches, the Quad-tuple Prediction (from the global perspective) and the Expectation Prediction (from the local perspective). For evaluation, systematic experiments show that: Quad-tuple PLSA outperforms 2-tuple PLSA significantly on both aspect identification and aspect rating prediction for publication datasets. Moreover, for aspect rating prediction, QPLSA shows significant superiority over state-of-the-art baseline methods. Besides, the Quad-tuple Prediction and the Expectation Prediction also show their strong ability in aspect rating on different datasets.  相似文献   

2.
Aspect-based sentiment analysis technologies may be a very practical methodology for securities trading, commodity sales, movie rating websites, etc. Most recent studies adopt the recurrent neural network or attention-based neural network methods to infer aspect sentiment using opinion context terms and sentence dependency trees. However, due to a sentence often having multiple aspects sentiment representation, these models are hard to achieve satisfactory classification results. In this paper, we discuss these problems by encoding sentence syntax tree, words relations and opinion dictionary information in a unified framework. We called this method heterogeneous graph neural networks (Hete_GNNs). Firstly, we adopt the interactive aspect words and contexts to encode the sentence sequence information for parameter sharing. Then, we utilized a novel heterogeneous graph neural network for encoding these sentences’ syntax dependency tree, prior sentiment dictionary, and some part-of-speech tagging information for sentiment prediction. We perform the Hete_GNNs sentiment judgment and report the experiments on five domain datasets, and the results confirm that the heterogeneous context information can be better captured with heterogeneous graph neural networks. The improvement of the proposed method is demonstrated by aspect sentiment classification task comparison.  相似文献   

3.
The physical interpretation of LDLt factorization and Bennett's method for matrix factor modification applied to structural analysis are studied. The sparsity of the matrix factors and their effect on matrix modification are examined.  相似文献   

4.
The paper explores the intent of knowledge sharing in complex organizational contexts. Findings from semi-structured interviewing with 54 subjects in two large organizations in Saudi Arabia indicate that self-perception and contextual interpretation create tensions that affect the way knowledge is managed and shared. The dichotomy between self-centeredness and self-doubt was found to affect trust and openness necessary for genuine knowledge sharing. Mutual trust, developed through timely self-disclosure, was found to offer psychological safety for employees to share knowledge more openly. Inner tensions become the stimuli for maximizing the social aspect of interaction to negotiate meanings, strategize knowledge sharing, and redefine role identity. The interplay of cognitive and behavioural participation challenges one’s knowing and becoming, increasing the complexity and dynamics of knowledge sharing. Knowledge-sharing intent determines the learning of individuals and learning in organizations. A conceptual framework is introduced and implications for practice are discussed.  相似文献   

5.
丁雪平 《大众科技》2013,(10):21-23,48
从数据挖掘的算法入手,简单阐述了一些数据挖掘的建模方法优缺点,重点介绍了二进制目标的自动建模。二进制目标的自动建模能够综合自动化与多个模型相结合的好处,这往往产生比从任何一个模型更准确的预测。分析研究了二元分类器节点在进行自动建模和对比二元结果中的作用,并以某金融公司为例进行实证研究。  相似文献   

6.
Integrating useful input information is essential to provide efficient recommendations to users. In this work, we focus on improving items ratings prediction by merging both multiple contexts and multiple criteria based research directions which were addressed separately in most existent literature. Throughout this article, Criteria refer to the items attributes, while Context denotes the circumstances in which the user uses an item. Our goal is to capture more fine grained preferences to improve items recommendation quality using users’ multiple criteria ratings under specific contextual situations. Therefore, we examine the recommenders’ data from the graph theory based perspective by representing three types of entities (users, contextual situations and criteria) as well as their relationships as a tripartite graph. Upon the assumption that contextually similar users tend to have similar interests for similar item criteria, we perform a high-order co-clustering on the tripartite graph for simultaneously partitioning the graph entities representing users in similar contextual situations and their evaluated item criteria. To predict cluster-based multi-criteria ratings, we introduce an improved rating prediction method that considers the dependency between users and their contextual situations, and also takes into account the correlation between criteria in the prediction process. The predicted multi-criteria ratings are finally aggregated into a single representative output corresponding to an overall item rating. To guide our investigation, we create a research hypothesis to provide insights about the tripartite graph partitioning and design clear and justified preliminary experiments including quantitative and qualitative analyzes to validate it. Further thorough experiments on the two available context-aware multi-criteria datasets, TripAdvisor and Educational, demonstrate that our proposal exhibits substantial improvements over alternative recommendations approaches.  相似文献   

7.
Predicting the probability that a user will click on a specific advertisement has been a prevalent issue in online advertising, attracting much research attention in the past decades. As a hot research frontier driven by industrial needs, recent years have witnessed more and more novel learning models employed to improve advertising CTR prediction. Although extant research provides necessary details on algorithmic design for addressing a variety of specific problems in advertising CTR prediction, the methodological evolution and connections between modeling frameworks are precluded. However, to the best of our knowledge, there are few comprehensive surveys on this topic. We make a systematic literature review on state-of-the-art and latest CTR prediction research, with a special focus on modeling frameworks. Specifically, we give a classification of state-of-the-art CTR prediction models in the extant literature, within which basic modeling frameworks and their extensions, advantages and disadvantages, and performance assessment for CTR prediction are presented. Moreover, we summarize CTR prediction models with respect to the complexity and the order of feature interactions, and performance comparisons on various datasets. Furthermore, we identify current research trends, main challenges and potential future directions worthy of further explorations. This review is expected to provide fundamental knowledge and efficient entry points for IS and marketing scholars who want to engage in this area.  相似文献   

8.
The matrix factorization model based on user-item rating data has been widely studied and applied in recommender systems. However, data sparsity, the cold-start problem, and poor explainability have restricted its performance. Textual reviews usually contain rich information about items’ features and users’ sentiments and preferences, which can solve the problem of insufficient information from only user ratings. However, most recommendation algorithms that take sentiment analysis of review texts into account are either fine- or coarse-grained, but not both, leading to uncertain accuracy and comprehensiveness regarding user preference. This study proposes a deep learning recommendation model (i.e., DeepCGSR) that integrates textual review sentiments and the rating matrix. DeepCGSR uses the review sets of users and items as a corpus to perform cross-grained sentiment analysis by combining fine- and coarse-grained levels to extract sentiment feature vectors for users and items. Deep learning technology is used to map between the extracted feature vector and latent factor through the rating-based matrix factorization model and obtain deep, nonlinear features to predict the user's rating of an item. Iterative experiments on e-commerce datasets from Amazon show that DeepCGSR consistently outperforms the recommendation models LFM, SVD++, DeepCoNN, TOPICMF, and NARRE. Overall, comparing with other recommendation models, the DeepCGSR model demonstrated improved evaluation results by 14.113% over LFM, 13.786% over SVD++, 9.920% over TOPICMF, 5.122% over DeepCoNN, and 2.765% over NARRE. Meanwhile, the DeepCGSR has great potential in fixing the overfitting and cold-start problems. Built upon previous studies and findings, the DeepCGSR is the state of the art, moving the design and development of the recommendation algorithms forward with improved recommendation accuracy.  相似文献   

9.
随着我国互联网的高速发展,数据挖掘技术尤其是Web挖掘作为企业搜寻商业信息为客户提供个性化服务的重要手段,不可避免地触到隐私保护这块"雷区"。隐私权保护在网络环境下既是法律界同时也是电子商务研究的热点话题。隐私保护限制了web挖掘数据中数据的搜集及知识的共享和传播,如何在web挖掘和隐私保护之间进行权衡是文章研究的出发点。结合我国网络隐私权保护的现状,通过对隐私权的内容及可能造成侵权形式的研究,探讨了隐私保护面临的挑战,提出了隐私权保护的解决方案框架。  相似文献   

10.
A methodology for automatically identifying and clustering semantic features or topics in a heterogeneous text collection is presented. Textual data is encoded using a low rank nonnegative matrix factorization algorithm to retain natural data nonnegativity, thereby eliminating the need to use subtractive basis vector and encoding calculations present in other techniques such as principal component analysis for semantic feature abstraction. Existing techniques for nonnegative matrix factorization are reviewed and a new hybrid technique for nonnegative matrix factorization is proposed. Performance evaluations of the proposed method are conducted on a few benchmark text collections used in standard topic detection studies.  相似文献   

11.
Term weighting for document ranking and retrieval has been an important research topic in information retrieval for decades. We propose a novel term weighting method based on a hypothesis that a term’s role in accumulated retrieval sessions in the past affects its general importance regardless. It utilizes availability of past retrieval results consisting of the queries that contain a particular term, retrieved documents, and their relevance judgments. A term’s evidential weight, as we propose in this paper, depends on the degree to which the mean frequency values for the relevant and non-relevant document distributions in the past are different. More precisely, it takes into account the rankings and similarity values of the relevant and non-relevant documents. Our experimental result using standard test collections shows that the proposed term weighting scheme improves conventional TF*IDF and language model based schemes. It indicates that evidential term weights bring in a new aspect of term importance and complement the collection statistics based on TF*IDF. We also show how the proposed term weighting scheme based on the notion of evidential weights are related to the well-known weighting schemes based on language modeling and probabilistic models.  相似文献   

12.
How open is innovation?   总被引:6,自引:1,他引:5  
This paper is motivated by a desire to clarify the definition of ‘openness’ as currently used in the literature on open innovation, and to re-conceptualize the idea for future research on the topic. We combine bibliographic analysis of all papers on the topic published in Thomson's ISI Web of Knowledge (ISI) with a systematic content analysis of the field to develop a deeper understanding of earlier work. Our review indicates two inbound processes: sourcing and acquiring, and two outbound processes, revealing and selling. We analyze the advantages and disadvantages of these different forms of openness. The paper concludes with implications for theory and practice, charting several promising areas for future research.  相似文献   

13.
Recommendation is an effective marketing tool widely used in the e-commerce business, and can be made based on ratings predicted from the rating data of purchased items. To improve the accuracy of rating prediction, user reviews or product images have been used separately as side information to learn the latent features of users (items). In this study, we developed a hybrid approach to analyze both user sentiments from review texts and user preferences from item images to make item recommendations more personalized for users. The hybrid model consists of two parallel modules to perform a procedure named the multiscale semantic and visual analyses (MSVA). The first module is designated to conduct semantic analysis on review documents in various aspects with word-aware and scale-aware attention mechanisms, while the second module is assigned to extract visual features with block-aware and visual-aware attention mechanisms. The MSVA model was trained, validated and tested using Amazon Product Data containing sampled reviews varying from 492,970 to 1 million records across 22 different domains. Three state-of-the-art recommendation models were used as the baselines for performance comparisons. Averagely, MSVA reduced the mean squared error (MSE) of predicted ratings by 6.00%, 3.14% and 3.25% as opposed to the three baselines. It was demonstrated that combining semantic and visual analyses enhanced MSVA's performance across a wide variety of products, and the multiscale scheme used in both the review and visual modules of MSVA made significant contributions to the rating prediction.  相似文献   

14.
大数据在地球科学各个学科中的应用越来越受到关注,数据驱动地球科学发现的案例不断出现,有关地球数据信息中心、地球大数据平台及相关学术会议数量逐渐增加,地球大数据正在科学研究上表现出巨大的潜力。科学家对地球大数据的科学方法和工具的需求很大,然而目前地球大数据的理论基础、储存管理和分析方法等仍处于发展之中,对地球大数据的研究和讨论有限。文章通过文献计量学的方法,对科学引文索引(SCI)和社会科学引文索引(SSCI)收录的地球大数据相关文献进行分析,从全球论文的产出数量、国家与机构领域研究影响力、研究主题分布、研究热点变迁和国际合作等多角度,分析揭示了地球大数据研究现状;最后,建议未来重点加强跨学科的地球大数据共享与融合,完善地球科学大数据深度挖掘理论和方法,实现对复杂地球系统的分析、建模与预测,支持和服务全球变化与可持续发展。  相似文献   

15.
This paper examines several different approaches to exploiting structural information in semi-structured document categorization. The methods under consideration are designed for categorization of documents consisting of a collection of fields, or arbitrary tree-structured documents that can be adequately modeled with such a flat structure. The approaches range from trivial modifications of text modeling to more elaborate schemes, specifically tailored to structured documents. We combine these methods with three different text classification algorithms and evaluate their performance on four standard datasets containing different types of semi-structured documents. The best results were obtained with stacking, an approach in which predictions based on different structural components are combined by a meta classifier. A further improvement of this method is achieved by including the flat text model in the final prediction.  相似文献   

16.
[目的/意义]随着MOOCs迅猛发展和普及,如何利用智能推荐技术为学习者从海量的MOOC中"寻找最佳课程"成为MOOC发展中需要解决的重要课题。[方法/过程]基于自我知觉理论和学习行为投入框架,充分利用学习行为日志和评分数据挖掘学习者之间的隐式信任关系,并通过信任传播建立MOOC社区信任网络,从而构建动态结合兴趣和隐式信任感知的混合推荐方法。为解决数据稀疏问题,提出基于信任的联合概率矩阵分解模型(TA-PMF),将课程评分矩阵、信任关系矩阵的分解相结合来挖掘用户及课程潜在特征,进而实现评分预测。[结果/结论]真实数据集测试结果表明,与显性评分值相比,学习行为投入信息对信任度构建贡献权重达到0.7;TA-PMF方法对MOOC推荐具有较好的适用性,且能在一定程度上缓解冷启动问题。  相似文献   

17.
While test collections provide the cornerstone for Cranfield-based evaluation of information retrieval (IR) systems, it has become practically infeasible to rely on traditional pooling techniques to construct test collections at the scale of today’s massive document collections (e.g., ClueWeb12’s 700M+ Webpages). This has motivated a flurry of studies proposing more cost-effective yet reliable IR evaluation methods. In this paper, we propose a new intelligent topic selection method which reduces the number of search topics (and thereby costly human relevance judgments) needed for reliable IR evaluation. To rigorously assess our method, we integrate previously disparate lines of research on intelligent topic selection and deep vs. shallow judging (i.e., whether it is more cost-effective to collect many relevance judgments for a few topics or a few judgments for many topics). While prior work on intelligent topic selection has never been evaluated against shallow judging baselines, prior work on deep vs. shallow judging has largely argued for shallowed judging, but assuming random topic selection. We argue that for evaluating any topic selection method, ultimately one must ask whether it is actually useful to select topics, or should one simply perform shallow judging over many topics? In seeking a rigorous answer to this over-arching question, we conduct a comprehensive investigation over a set of relevant factors never previously studied together: 1) method of topic selection; 2) the effect of topic familiarity on human judging speed; and 3) how different topic generation processes (requiring varying human effort) impact (i) budget utilization and (ii) the resultant quality of judgments. Experiments on NIST TREC Robust 2003 and Robust 2004 test collections show that not only can we reliably evaluate IR systems with fewer topics, but also that: 1) when topics are intelligently selected, deep judging is often more cost-effective than shallow judging in evaluation reliability; and 2) topic familiarity and topic generation costs greatly impact the evaluation cost vs. reliability trade-off. Our findings challenge conventional wisdom in showing that deep judging is often preferable to shallow judging when topics are selected intelligently.  相似文献   

18.
International companies expanding and competing in an increasingly global context are currently discovering the necessity of sharing knowledge across geographical and disciplinary borders. Yet, especially in such contexts, sharing knowledge is inherently complex and problematic in practice. Inspired by recent contributions in science studies, this paper argues that knowledge sharing in a global context must take into account the heterogeneous and locally embedded nature of knowledge. In this perspective, knowledge cannot easily be received through advanced information technologies, but must always be achieved in practice. Empirically, this paper draws from two contrasting initiatives in a major international oil and gas company for improving its current ways of sharing knowledge between geographically distributed sites and disciplines involved in well planning and drilling. The contrasting cases reveal that while a shared database system failed to improve knowledge sharing across contexts, a flexible arrangement supporting collaboration and use of different representation of knowledge was surprisingly successful. Based on these findings the paper underscores and conceptualizes various triangulating practices conducted in order to achieve knowledge across borders. More accurately these practices are central for individuals’ and communities’ abilities to: (i) negotiate ambiguous information, (ii) filter, combine, and integrate various heterogeneous sources of information, and (iii) judge the trustworthiness of information. Concerning the design and use of information technologies this implies that new designs need to facilitate triangulating practices of users rather than just providing advanced platforms (“digital junkyards”) for sharing information.  相似文献   

19.
In information retrieval, cluster-based retrieval is a well-known attempt in resolving the problem of term mismatch. Clustering requires similarity information between the documents, which is difficult to calculate at a feasible time. The adaptive document clustering scheme has been investigated by researchers to resolve this problem. However, its theoretical viewpoint has not been fully discovered. In this regard, we provide a conceptual viewpoint of the adaptive document clustering based on query-based similarities, by regarding the user’s query as a concept. As a result, adaptive document clustering scheme can be viewed as an approximation of this similarity. Based on this idea, we derive three new query-based similarity measures in language modeling framework, and evaluate them in the context of cluster-based retrieval, comparing with K-means clustering and full document expansion. Evaluation result shows that retrievals based on query-based similarities significantly improve the baseline, while being comparable to other methods. This implies that the newly developed query-based similarities become feasible criterions for adaptive document clustering.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号