首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 46 毫秒
A large volume of data flowing throughout location-based social networks (LBSN) gives support to the recommendation of points-of-interest (POI). One of the major challenges that significantly affects the precision of recommendation is to find dynamic spatio-temporal patterns of visiting behaviors, which can hardly be figured out because of the multiple side factors. To confront this difficulty, we jointly study the effects of users’ social relationships, textual reviews, and POIs’ geographical proximity in order to excavate complex spatio-temporal patterns of visiting behaviors when the data quality is unreliable for location recommendation in spatio-temporal social networks. We craft a novel framework that recommends any user the POIs with effectiveness. The framework contains two significant techniques: (i) a network embedding method is adopted to learn the vectors of users and POIs in an embedding space of low dimension; (ii) a dynamic factor graph model is proposed to model various factors such as the correlation of vectors in the previous phase. A collection of experiments was carried out on two real large-scale datasets, and the experimental outcomes demonstrate the supremacy of the proposed method over the most advanced baseline algorithms owing to its highly effective and efficient performance of POI recommendation.  相似文献   

Social emotion refers to the emotion evoked to the reader by a textual document. In contrast to the emotion cause extraction task which analyzes the cause of the author's sentiments based on the expressions in text, identifying the causes of social emotion evoked to the reader from text has not been explored previously. Social emotion mining and its cause analysis is not only an important research topic in Web-based social media analytics and text mining but also has a number of applications in multiple domains. As the focus of social emotion cause identification is on analyzing the causes of the reader's emotions elicited by a text that are not explicitly or implicitly expressed, it is a challenging task fundamentally different from the previous research. To tackle this, it also needs a deeper level understanding of the cognitive process underlying the inference of social emotion and its cause analysis. In this paper, we propose the new task of social emotion cause identification (SECI). Inspired by the cognitive structure of emotions (OCC) theory, we present a Cognitive Emotion model Enhanced Sequential (CogEES) method for SECI. Specifically, based on the implications of the OCC model, our method first establishes the correspondence between words/phrases in text and emotional dimensions identified in OCC and builds the emotional dimension lexicons with 1,676 distinct words/phrases. Then, our method utilizes lexicons information and discourse coherence for the semantic segmentation of document and the enhancement of clause representation learning. Finally, our method combines text segmentation and clause representation into a sequential model for cause clause prediction. We construct the SECI dataset for this new task and conduct experiments to evaluate CogEES. Our method outperforms the baselines and achieves over 10% F1 improvement on average, with better interpretability of the prediction results.  相似文献   

This paper examines how alternative food networks (AFNs) cultivate engagement on a social media platform. Using the method proposed in Kar and Dwivedi (2020) and Berente et al. (2019), we contribute to theory through combining exploratory text analysis with model testing. Using the theoretical lens of relationship cultivation and social media engagement, we collected 55,358 original Weibo posts by 90 farms and other AFN participants in China and used Latent Dirichlet Allocation (LDA) modeling for topic analysis. We then used the literature to map the topics with constructs and developed a theoretical model. To validate the theoretical model, a panel dataset was constructed on Weibo account and year level, with Chinese city-level yearly economic data included as control variables. A fixed effects panel data regression analysis was performed. The empirical results revealed that posts centered on openness/disclosure, sharing of tasks, and knowledge sharing result in positive levels of social media engagement. Posting about irrelevant information and advertising that uses repetitive wording in multiple posts had negative effects on engagement. Our findings suggest that cultivating engagement requires different relationship strategies, and social media platforms should be leveraged according to the context and the purpose of the social cause. Our research is also among the early studies that use both big data analysis of large quantities of textual data and model validation for theoretical insights.  相似文献   

With the explosion of multilingual content on Web, particularly in social media platforms, identification of languages present in the text is becoming an important task for various applications. While automatic language identification (ALI) in social media text is considered to be a non-trivial task due to the presence of slang words, misspellings, creative spellings and special elements such as hashtags, user mentions etc., ALI in multilingual environment becomes even more challenging task. In a highly multilingual society, code-mixing without affecting the underlying language sense has become a natural phenomenon. In such a dynamic environment, conversational text alone often fails to identify the underlying languages present in the text. This paper proposes various methods of exploiting social conversational features for enhancing ALI performance. Although social conversational features for ALI have been explored previously using methods like probabilistic language modeling, these models often fail to address issues related to code-mixing, phonetic typing, out-of-vocabulary etc. which are prevalent in a highly multilingual environment. This paper differs in the way the social conversational features are used to propose text refinement strategies that are suitable for ALI in highly multilingual environment. The contributions in this paper therefore includes the following. First, this paper analyzes the characteristics of various social conversational features by exploiting language usage patterns. Second, various methods of text refinement suitable for language identification are proposed. Third, the effects of the proposed refinement methods are investigated using various sentence level language identification frameworks. From various experimental observations over three conversational datasets collected from Facebook, Youtube and Twitter social media platforms, it is evident that our proposed method of ALI using social conversational features outperforms the baseline counterparts.  相似文献   

Multimedia objects can be retrieved using their context that can be for instance the text surrounding them in documents. This text may be either near or far from the searched objects. Our goal in this paper is to study the impact, in term of effectiveness, of text position relatively to searched objects. The multimedia objects we consider are described in structured documents such as XML ones. The document structure is therefore exploited to provide this text position in documents. Although structural information has been shown to be an effective source of evidence in textual information retrieval, only a few works investigated its interest in multimedia retrieval. More precisely, the task we are interested in this paper is to retrieve multimedia fragments (i.e. XML elements having at least one multimedia object). Our general approach is built on two steps: we first retrieve XML elements containing multimedia objects, and we then explore the surrounding information to retrieve relevant multimedia fragments. In both cases, we study the impact of the surrounding information using the documents structure.  相似文献   

This paper presents a robust and comprehensive graph-based rank aggregation approach, used to combine results of isolated ranker models in retrieval tasks. The method follows an unsupervised scheme, which is independent of how the isolated ranks are formulated. Our approach is able to combine arbitrary models, defined in terms of different ranking criteria, such as those based on textual, image or hybrid content representations.We reformulate the ad-hoc retrieval problem as a document retrieval based on fusion graphs, which we propose as a new unified representation model capable of merging multiple ranks and expressing inter-relationships of retrieval results automatically. By doing so, we claim that the retrieval system can benefit from learning the manifold structure of datasets, thus leading to more effective results. Another contribution is that our graph-based aggregation formulation, unlike existing approaches, allows for encapsulating contextual information encoded from multiple ranks, which can be directly used for ranking, without further computations and post-processing steps over the graphs. Based on the graphs, a novel similarity retrieval score is formulated using an efficient computation of minimum common subgraphs. Finally, another benefit over existing approaches is the absence of hyperparameters.A comprehensive experimental evaluation was conducted considering diverse well-known public datasets, composed of textual, image, and multimodal documents. Performed experiments demonstrate that our method reaches top performance, yielding better effectiveness scores than state-of-the-art baseline methods and promoting large gains over the rankers being fused, thus demonstrating the successful capability of the proposal in representing queries based on a unified graph-based model of rank fusions.  相似文献   

Semantic representation reflects the meaning of the text as it may be understood by humans. Thus, it contributes to facilitating various automated language processing applications. Although semantic representation is very useful for several applications, a few models were proposed for the Arabic language. In that context, this paper proposes a graph-based semantic representation model for Arabic text. The proposed model aims to extract the semantic relations between Arabic words. Several tools and concepts have been employed such as dependency relations, part-of-speech tags, name entities, patterns, and Arabic language predefined linguistic rules. The core idea of the proposed model is to represent the meaning of Arabic sentences as a rooted acyclic graph. Textual entailment recognition challenge is considered in order to evaluate the ability of the proposed model to enhance other Arabic NLP applications. The experiments have been conducted using a benchmark Arabic textual entailment dataset, namely, ArbTED. The results proved that the proposed graph-based model is able to enhance the performance of the textual entailment recognition task in comparison to other baseline models. On average, the proposed model achieved 8.6%, 30.2%, 5.3% and 16.2% improvement in terms of accuracy, recall, precision, and F-score results, respectively.  相似文献   

Nowadays, stress has become a growing problem for society due to its high impact on individuals but also on health care systems and companies. In order to overcome this problem, early detection of stress is a key factor. Previous studies have shown the effectiveness of text analysis in the detection of sentiment, emotion, and mental illness. However, existing solutions for stress detection from text are focused on a specific corpus. There is still a lack of well-validated methods that provide good results in different datasets. We aim to advance state of the art by proposing a method to detect stress in textual data and evaluating it using multiple public English datasets. The proposed approach combines lexicon-based features with distributional representations to enhance classification performance. To help organize features for stress detection in text, we propose a lexicon-based feature framework that exploits affective, syntactic, social, and topic-related features. Also, three different word embedding techniques are studied for exploiting distributional representation. Our approach has been implemented with three machine learning models that have been evaluated in terms of performance through several experiments. This evaluation has been conducted using three public English datasets and provides a baseline for other researchers. The obtained results identify the combination of FastText embeddings with a selection of lexicon-based features as the best-performing model, achieving F-scores above 80%.  相似文献   

With the popularity of social platforms such as Sina Weibo, Tweet, etc., a large number of public events spread rapidly on social networks and huge amount of textual data are generated along with the discussion of netizens. Social text clustering has become one of the most critical methods to help people find relevant information and provides quality data for subsequent timely public opinion analysis. Most existing neural clustering methods rely on manual labeling of training sets and take a long time in the learning process. Due to the explosiveness and the large-scale of social media data, it is a challenge for social text data clustering to satisfy the timeliness demand of users. This paper proposes a novel unsupervised event-oriented graph clustering framework (EGC), which can achieve efficient clustering performance on large-scale datasets with less time overhead and does not require any labeled data. Specifically, EGC first mines the potential relations existing in social text data and transforms the textual data of social media into an event-oriented graph by taking advantage of graph structure for complex relations representation. Secondly, EGC uses a keyword-based local importance method to accurately measure the weights of relations in event-oriented graph. Finally, a bidirectional depth-first clustering algorithm based on the interrelations is proposed to cluster the nodes in event-oriented graph. By projecting the relations of the graph into a smaller domain, EGC achieves fast convergence. The experimental results show that the clustering performance of EGC on the Weibo dataset reaches 0.926 (NMI), 0.926 (AMI), 0.866 (ARI), which are 13%–30% higher than other clustering methods. In addition, the average query time of EGC clustered data is 16.7ms, which is 90% less than the original data.  相似文献   

Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in the conventional social media. Consequently, the conventional text-based sentiment analysis has evolved into more complicated studies of multimodal sentiment analysis. To tackle the challenge of how to effectively exploit the information from both visual content and textual content from image-text posts, this paper proposes a new image-text consistency driven multimodal sentiment analysis approach. The proposed approach explores the correlation between the image and the text, followed by a multimodal adaptive sentiment analysis method. To be more specific, the mid-level visual features extracted by the conventional SentiBank approach are used to represent visual concepts, with the integration of other features, including textual, visual and social features, to develop a machine learning sentiment analysis approach. Extensive experiments are conducted to demonstrate the superior performance of the proposed approach.  相似文献   

Rapid appraisal of damages related to hazard events is of importance to first responders, government agencies, insurance industries, and other private and public organizations. While satellite monitoring, ground-based sensor systems, inspections and other technologies provide data to inform post-disaster response, crowdsourcing through social media is an additional and novel data source. In this study, the use of social media data, principally Twitter postings, is investigated to make approximate but rapid early assessments of damages following a disaster. The goal is to explore the potential utility of using social media data for rapid damage assessment after sudden-onset hazard events and to identify insights related to potential challenges. This study defines a text-based damage assessment scale for earthquake damages, and then develops a text classification model for rapid damage assessment. Although the accuracy remains a challenge compared to ground-based instrumental readings and inspections, the proposed damage assessment model features rapidity with large amounts of data at spatial densities that exceed those of conventional sensor networks. The 2019 Ridgecrest, California earthquake sequence is investigated as a case study.  相似文献   

Text clustering is a well-known method for information retrieval and numerous methods for classifying words, documents or both together have been proposed. Frequently, textual data are encoded using vector models so the corpus is transformed in to a matrix of terms by documents; using this representation text clustering generates groups of similar objects on the basis of the presence/absence of the words in the documents. An alternative way to work on texts is to represent them as a network where nodes are entities connected by the presence and distribution of the words in the documents. In this work, after summarising the state of the art of text clustering we will present a new network approach to textual data. We undertake text co-clustering using methods developed for social network analysis. Several experimental results will be presented to demonstrate the validity of the approach and the advantages of this technique compared to existing methods.  相似文献   

Image and text matching bridges visual and textual modality differences and plays a considerable role in cross-modal retrieval. Much progress has been achieved through semantic representation and alignment. However, the distribution of multimedia data is severely unbalanced and contains many low-frequency occurrences, which are often ignored and cause performance degradation, i.e., the long-tail effect. In this work, we propose a novel rare-aware attention network (RAAN), which explores and exploits textual rare content for tackling the long-tail effect of image and text matching. Specifically, we first design a rare-aware mining module, which contains global prior information construction and rare fragment detector for modeling the characteristic of rare content. Then, the rare attention matching utilizes prior information as attention to guide the representation enhancement of rare content and introduces the rareness representation to strengthen the similarity calculation. Finally, we design prior information loss to optimize the model together with the triplet loss. We perform quantitative and qualitative experiments on two large-scale databases and achieve leading performance. In particular, we conduct 0-shot test for rare content and improve rSum by 21.0 and 41.5 on Flickr30K (155,000 image and text pairs) and MSCOCO (616,435 image and text pairs), demonstrating the effectiveness of the proposed method for the long-tail effect.  相似文献   

Understanding the effects of gender-specific emotional responses on information sharing behaviors are of great importance for swift, clear, and accurate public health crisis communication, but remains underexplored. This study fills this gap by investigating gender-specific anxiety- and anger-related emotional responses and their effects on the virality of crisis information by creatively drawing on social role theory, integrated crisis communication modeling, and text mining. The theoretical model is tested using two datasets (Changsheng vaccine crisis with 2,423,074 textual data and COVID-19 pandemic with 893,930 textual data) collected from Weibo, a leading social media platform in China. Females express significantly high anxiety and anger levels (p value<0.001) during the Changsheng fake vaccine crisis, while express significantly higher levels of anxiety during COVID-19 than males (p value<0.001), but not anger (p value=0.13). Regression analysis suggests that the virality of crisis information is significantly strengthened when the level of anger in posts of males is high or the level of anxiety in posts of females is high for both crises. However, such gender-specific virality differences of anger/anxiety expressions are violated once females have large numbers of followers (influencers). Furthermore, the gender-specific emotional effects on crisis information are more significantly enhanced for male influencers than female influencers. This study contributes to the literature on gender-specific emotional characteristics of crisis communication on social media and provides implications for practice.  相似文献   

In this paper we focus on the problem of question ranking in community question answering (cQA) forums in Arabic. We address the task with machine learning algorithms using advanced Arabic text representations. The latter are obtained by applying tree kernels to constituency parse trees combined with textual similarities, including word embeddings. Our two main contributions are: (i) an Arabic language processing pipeline based on UIMA—from segmentation to constituency parsing—built on top of Farasa, a state-of-the-art Arabic language processing toolkit; and (ii) the application of long short-term memory neural networks to identify the best text fragments in questions to be used in our tree-kernel-based ranker. Our thorough experimentation on a recently released cQA dataset shows that the Arabic linguistic processing provided by Farasa produces strong results and that neural networks combined with tree kernels further boost the performance in terms of both efficiency and accuracy. Our approach also enables an implicit comparison between different processing pipelines as our tests on Farasa and Stanford parsers demonstrate.  相似文献   

Event relations specify how different event flows expressed within the context of a textual passage relate to each other in terms of temporal and causal sequences. There have already been impactful work in the area of temporal and causal event relation extraction; however, the challenge with these approaches is that (1) they are mostly supervised methods and (2) they rely on syntactic and grammatical structure patterns at the sentence-level. In this paper, we address these challenges by proposing an unsupervised event network representation for temporal and causal relation extraction that operates at the document level. More specifically, we benefit from existing Open IE systems to generate a set of triple relations that are then used to build an event network. The event network is bootstrapped by labeling the temporal disposition of events that are directly linked to each other. We then systematically traverse the event network to identify the temporal and causal relations between indirectly connected events. We perform experiments based on the widely adopted TempEval-3 and Causal-TimeBank corpora and compare our work with several strong baselines. We show that our method improves performance compared to several strong methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号