首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
2.
In this era, the proliferating role of social media in our lives has popularized the posting of the short text. The short texts contain limited context with unique characteristics which makes them difficult to handle. Every day billions of short texts are produced in the form of tags, keywords, tweets, phone messages, messenger conversations social network posts, etc. The analysis of these short texts is imperative in the field of text mining and content analysis. The extraction of precise topics from large-scale short text documents is a critical and challenging task. The conventional approaches fail to obtain word co-occurrence patterns in topics due to the sparsity problem in short texts, such as text over the web, social media like Twitter, and news headlines. Therefore, in this paper, the sparsity problem is ameliorated by presenting a novel fuzzy topic modeling (FTM) approach for short text through fuzzy perspective. In this research, the local and global term frequencies are computed through a bag-of-words (BOW) model. To remove the negative impact of high dimensionality on the global term weighting, the principal component analysis is adopted; thereafter the fuzzy c-means algorithm is employed to retrieve the semantically relevant topics from the documents. The experiments are conducted over the three real-world short text datasets: the snippets dataset is in the category of small dataset whereas the other two datasets, Twitter and questions, are the bigger datasets. Experimental results show that the proposed approach discovered the topics more precisely and performed better as compared to other state-of-the-art baseline topic models such as GLTM, CSTM, LTM, LDA, Mix-gram, BTM, SATM, and DREx+LDA. The performance of FTM is also demonstrated in classification, clustering, topic coherence and execution time. FTM classification accuracy is 0.95, 0.94, 0.91, 0.89 and 0.87 on snippets dataset with 50, 75, 100, 125 and 200 number of topics. The classification accuracy of FTM on questions dataset is 0.73, 0.74, 0.70, 0.68 and 0.78 with 50, 75, 100, 125 and 200 number of topics. The classification accuracies of FTM on snippets and questions datasets are higher than state-of-the-art baseline topic models.  相似文献   

3.
Misinformation has captured the interest of academia in recent years with several studies looking at the topic broadly with inconsistent results. In this research, we attempt to bridge the gap in the literature by examining the impacts of user-, time-, and content-based characteristics that affect the virality of real versus misinformation during a crisis event. Using a big data-driven approach, we collected over 42 million tweets during Hurricane Harvey and obtained 3589 original verified real or false tweets by cross-checking with fact-checking websites and a relevant federal agency. Our results show that virality is higher for misinformation, novel tweets, and tweets with negative sentiment or lower lexical density. In addition, we reveal the opposite impacts of sentiment on the virality of real news versus misinformation. We also find that tweets on the environment are less likely to go viral than the baseline religious news, while real social news tweets are more likely to go viral than misinformation on social news.  相似文献   

4.
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further investigated to improve the state of the art.  相似文献   

5.
In the context of social media, users usually post relevant information corresponding to the contents of events mentioned in a Web document. This information posses two important values in that (i) it reflects the content of an event and (ii) it shares hidden topics with sentences in the main document. In this paper, we present a novel model to capture the nature of relationships between document sentences and post information (comments or tweets) in sharing hidden topics for summarization of Web documents by utilizing relevant post information. Unlike previous methods which are usually based on hand-crafted features, our approach ranks document sentences and user posts based on their importance to the topics. The sentence-user-post relation is formulated in a share topic matrix, which presents their mutual reinforcement support. Our proposed matrix co-factorization algorithm computes the score of each document sentence and user post and extracts the top ranked document sentences and comments (or tweets) as a summary. We apply the model to the task of summarization on three datasets in two languages, English and Vietnamese, of social context summarization and also on DUC 2004 (a standard corpus of the traditional summarization task). According to the experimental results, our model significantly outperforms the basic matrix factorization and achieves competitive ROUGE-scores with state-of-the-art methods.  相似文献   

6.
Recently, geolocalisation of tweets has become important for a wide range of real-time applications, including real-time event detection, topic detection or disaster and emergency analysis. However, the number of relevant geotagged tweets available to enable such tasks remains insufficient. To overcome this limitation, predicting the location of non-geotagged tweets, while challenging, can increase the sample of geotagged data and has consequences for a wide range of applications. In this paper, we propose a location inference method that utilises a ranking approach combined with a majority voting of tweets, where each vote is weighted based on evidence gathered from the ranking. Using geotagged tweets from two cities, Chicago and New York (USA), our experimental results demonstrate that our method (statistically) significantly outperforms state-of-the-art baselines in terms of accuracy and error distance, in both cities, with the cost of decreased coverage. Finally, we investigated the applicability of our method in a real-time scenario by means of a traffic incident detection task. Our analysis shows that our fine-grained geolocalisation method can overcome the limitations of geotagged tweets and precisely map incident-related tweets at the real location of the incident.  相似文献   

7.
Climate change has become one of the most significant crises of our time. Public opinion on climate change is influenced by social media platforms such as Twitter, often divided into believers and deniers. In this paper, we propose a framework to classify a tweet’s stance on climate change (denier/believer). Existing approaches to stance detection and classification of climate change tweets either have paid little attention to the characteristics of deniers’ tweets or often lack an appropriate architecture. However, the relevant literature reveals that the sentimental aspects and time perspective of climate change conversations on Twitter have a major impact on public attitudes and environmental orientation. Therefore, in our study, we focus on exploring the role of temporal orientation and sentiment analysis (auxiliary tasks) in detecting the attitude of tweets on climate change (main task). Our proposed framework STASY integrates word- and sentence-based feature encoders with the intra-task and shared-private attention frameworks to better encode the interactions between task-specific and shared features. We conducted our experiments on our novel curated climate change CLiCS dataset (2465 denier and 7235 believer tweets), two publicly available climate change datasets (ClimateICWSM-2022 and ClimateStance-2022), and two benchmark stance detection datasets (SemEval-2016 and COVID-19-Stance). Experiments show that our proposed approach improves stance detection performance (with an average improvement of 12.14% on our climate change dataset, 15.18% on ClimateICWSM-2022, 12.94% on ClimateStance-2022, 19.38% on SemEval-2016, and 35.01% on COVID-19-Stance in terms of average F1 scores) by benefiting from the auxiliary tasks compared to the baseline methods.  相似文献   

8.
李明帅  管桦 《情报探索》2014,(12):12-15
基于微博内容对四川省政府政务微博进行分类,提取16个市及自治州的2233条政务微博进行分析,运用层次分析法建立微博关注度评价指标体系,确定指标权重,计算各内容类型政务微博关注度的得分并排名,分析排名原因。对发文量与微博关注度适应情况进行二维分析并提出改进意见,为日后政府政务微博的信息发布提供一种合理的导向。  相似文献   

9.
Politicians’ tweets can have important political and economic implications. However, limited context makes it hard for readers to instantly and precisely understand them, especially from a causal perspective. The triggers for these tweets may have been reported in news prior to the tweets, but simply finding similar news articles would not serve the purpose, given the following reasons. First, readers may only be interested in finding the reasons and contexts (we call causal backgrounds) for a certain part of a tweet. Intuitively, such content would be politically relevant and accord with public’s recent attention, which is not usually reflected within the context. Besides, the content should be human-readable, while the noisy and informal nature of tweets hinders regular Open Information Extraction systems. Second, similarity does not capture causality and the causality between tweet contents and news contents is beyond the scopes of causality extraction tools. Meanwhile, it will be non-trivial to construct a high-quality tweet-to-intent dataset.We propose the first end-to-end framework for discovering causal backgrounds of politicians’ tweets by: 1. Designing an Open IE system considering rule-free representations for tweets; 2. Introducing sources like Wikipedia linkage and edit history to identify focal contents; 3. Finding implicit causalities between different contexts using explicit causalities learned elsewhere. We curate a comprehensive dataset of interpretations from political journalists for 533 tweets from 5 US politicians. On average, we obtain the correct answers within top-2 recommendations. We make our dataset and framework code publicly available.  相似文献   

10.
Content-based filtering can be deployed for personalised information dissemination on the web, but this is a possibility that has been largely ignored. Nowadays, there are no successful content-based filtering applications available online. Nootropia is an immune-inspired user profiling model for content-based filtering. It has the advantageous property to be able to represent a user’s multiple interests and adapt to a variety of changes in them. In this paper we describe our early efforts to develop real world personalisation services based on Nootropia. We present, the architecture, implementation, usage and evaluation of the personalised news and paper aggregator, which aggregates news and papers that are relevant to an individual’s interests. Our user study shows that Nootropia can effectively learn a user’s interests and identify relevant information. It also indicates that information filtering is a complicated task with many factors affecting its successful application in a real situation.  相似文献   

11.
Modeling discussions on social networks is a challenging task, especially if we consider sensitive topics, such as politics or healthcare. However, the knowledge hidden in these debates helps to investigate trends and opinions and to identify the cohesion of users when they deal with a specific topic. To this end, we propose a general multilayer network approach to investigate discussions on a social network. In order to prove the validity of our model, we apply it on a Twitter dataset containing tweets concerning opinions on COVID-19 vaccines. We extract a set of relevant hashtags (i.e., gold-standard hashtags) for each line of thought (i.e., pro-vaxxer, neutral, and anti-vaxxer). Then, thanks to our multilayer network model, we figure out that the anti-vaxxers tend to have ego networks denser (+14.39%) and more cohesive (+64.2%) than the ones of pro-vaxxer, which leads to a higher number of interactions among anti-vaxxers than pro-vaxxers (+393.89%). Finally, we report a comparison between our approach and one based on single networks analysis. We prove the effectiveness of our model to extract influencers having ego networks with more nodes (+40.46%), edges (+39.36%), and interactions with their neighbors (+28.56%) with respect to the other approach. As a result, these influential users are much more important to analyze and can provide more valuable information.  相似文献   

12.
【目的/意义】基于客观行为数据构建微博发布-评论行为的定量模型,解释社交网络用户信息交互关系,为微博舆情的监控与引导提供理论依据。【方法/过程】以新浪微博为研究对象,对揭示微博群体层面多对多模式中的发布-评论行为特征的五个重要指标进行分析,构建描述微博多对多模式中的发布-评论行为的定量模型,并通过仿真验证模型的有效性。【结果/结论】微博评论数频数分布满足幂指数为1.6659的幂律分布;微博已获评论数、微博影响力、信息可见度是影响新增评论的连接机制。本文构建的定量模型可以较好模拟真实的微博发布-评论行为。【创新/局限】从人类动力学视角解释了微博多对多模式的微博发布-评论行为的过程,揭示了群体层面的发布-评论行为生成机制。本文没有对不同类型微博的评论行为进行区分,在未来的研究中,将针对不同类型微博的评论行为做进一步探索。  相似文献   

13.
The spreading of misinformation and disinformation is a great problem on microblogs, leading user evaluation of information credibility a critical issue. This study incorporates two message format factors related to multimedia usage on microblogs (vividness and multimedia diagnosticity) with two well-discussed factors for information credibility (i.e., argument quality and source credibility) as a holistic framework to investigate user evaluation of microblog information credibility. Further, the study draws on two-factor theory and its variant three-factor lens to explain the nonlinear effects of the above factors on microblog information credibility. An online survey was conducted to test the proposed framework by collecting data from microblog users. The research findings reveal that for the effects on microblog information credibility: (1) argument quality (a hygiene factor) exerts a decreasing incremental effect; (2) source credibility (a bivalent factor) exerts only a linear effect; and (3) multimedia diagnosticity (a motivating factor) exerts an increasing incremental effect. This study adds to current knowledge about information credibility by proposing an insightful framework to understand the key predictors of microblog information credibility and further examining the nonlinear effects of these predictors.  相似文献   

14.
高校微博信息发布研究   总被引:1,自引:0,他引:1  
范哲  周计刚 《现代情报》2013,33(4):90-95
微博已经成为高校信息发布的重要平台,本文以新浪微博平台上24所高校官方微博为样本对象,运用内容分析法、统计分析、聚类分析等方法,分析微博信息发布的现状,对高校微博信息发布的主要模式及其特征进行归纳,并根据发布效果探讨较为合理的高校微博信息发布方式。本文结论对于指导高校建设官方微博,充分利用微博特点进行信息发布,具有积极的意义。  相似文献   

15.
Health misinformation has become an unfortunate truism of social media platforms, where lies could spread faster than truth. Despite considerable work devoted to suppressing fake news, health misinformation, including low-quality health news, persists and even increases in recent years. One promising approach to fighting bad information is studying the temporal and sentiment effects of health news stories and how they are discussed and disseminated on social media platforms like Twitter. As part of the effort of searching for innovative ways to fight health misinformation, this study analyzes a dataset of more than 1600 objectively and independently reviewed health news stories published over a 10-year span and nearly 50,000 Twitter posts responding to them. Specifically, it examines the source credibility of health news circulated on Twitter and the temporal, sentiment features of the tweets containing or responding to the health news reports. The results show that health news stories that are rated low by experts are discussed more, persist longer, and produce stronger sentiments than highly rated ones in the tweetosphere. However, the highly rated stories retained a fresh interest in the form of new tweets for a longer period. An in-depth understanding of the characteristics of health news distribution and discussion is the first step toward mitigating the surge of health misinformation. The findings provide insights into understanding the mechanism of health information dissemination on social media and practical implications to fight and mitigate health misinformation on digital media platforms.  相似文献   

16.
With the information explosion of news articles, personalized news recommendation has become important for users to quickly find news that they are interested in. Existing methods on news recommendation mainly include collaborative filtering methods which rely on direct user-item interactions and content based methods which characterize the content of user reading history. Although these methods have achieved good performances, they still suffer from data sparse problem, since most of them fail to extensively exploit high-order structure information (similar users tend to read similar news articles) in news recommendation systems. In this paper, we propose to build a heterogeneous graph to explicitly model the interactions among users, news and latent topics. The incorporated topic information would help indicate a user’s interest and alleviate the sparsity of user-item interactions. Then we take advantage of graph neural networks to learn user and news representations that encode high-order structure information by propagating embeddings over the graph. The learned user embeddings with complete historic user clicks capture the users’ long-term interests. We also consider a user’s short-term interest using the recent reading history with an attention based LSTM model. Experimental results on real-world datasets show that our proposed model significantly outperforms state-of-the-art methods on news recommendation.  相似文献   

17.
With the onset of COVID-19, the pandemic has aroused huge discussions on social media like Twitter, followed by many social media analyses concerning it. Despite such an abundance of studies, however, little work has been done on reactions from the public and officials on social networks and their associations, especially during the early outbreak stage. In this paper, a total of 9,259,861 COVID-19-related English tweets published from 31 December 2019 to 11 March 2020 are accumulated for exploring the participatory dynamics of public attention and news coverage during the early stage of the pandemic. An easy numeric data augmentation (ENDA) technique is proposed for generating new samples while preserving label validity. It attains superior performance on text classification tasks with deep models (BERT) than an easier data augmentation method. To demonstrate the efficacy of ENDA further, experiments and ablation studies have also been implemented on other benchmark datasets. The classification results of COVID-19 tweets show tweets peaks trigged by momentous events and a strong positive correlation between the daily number of personal narratives and news reports. We argue that there were three periods divided by the turning points on January 20 and February 23 and the low level of news coverage suggests the missed windows for government response in early January and February. Our study not only contributes to a deeper understanding of the dynamic patterns and relationships of public attention and news coverage on social media during the pandemic but also sheds light on early emergency management and government response on social media during global health crises.  相似文献   

18.
[目的/意义] 借助知识图谱对区域政务微博内容进行知识组织与可视化展示,能够提升用户的知识阅读及获取效率。[方法/过程] 首先,基于LDA模型对区域政务微博进行主题建模,通过依存句法分析对微博内容进行语义三元组抽取。其次,构建了区域政务微博知识模型,形成了知识图谱的语义架构。最后,借助图数据库Neo4j及D3.js插件实现了区域政务微博的知识图谱可视化及关联化保存。[结果/结论] 经理论构型与实际验证,本研究构建了基于主题划分的区域政务微博知识图谱,为社交媒体内容的知识图谱构建提供了一定的思路及方法。  相似文献   

19.
Recently, the high popularity of social networks accelerates the development of item recommendation. Integrating the influence diffusion of social networks in recommendation systems is a challenging task since topic distribution over users and items is latent and user topic interest may change over time. In this paper, we propose a dynamic generative model for item recommendation which captures the potential influence logs based on the community-level topic influence diffusion to infer the latent topic distribution over users and items. Our model enables tracking the time-varying distributions of topic interest and topic popularity over communities in social networks. A collapsed Gibbs sampling algorithm is proposed to train the model, and an improved diversification algorithm is proposed to obtain item diversified recommendation list. Extensive experiments are conducted to evaluate the effectiveness and efficiency of our method. The results validate our approach and show the superiority of our method compared with state-of-the-art diversified recommendation methods.  相似文献   

20.
【目的/意义】通过对政务微博网络舆情信息传播效率进行评价,有利于政务微博的运营和管理。【方法/过程】应用道格拉斯生产函数对政务微博网络舆情信息传播效率评价指标体系的投入和产生指标进行分析,应用DEA模型对政务微博网络舆情信息传播效率进行测算和评价,并利用聚类分析方法对政务微博进行分类,从而对政务微博信息传递指标进行归纳。【结果/结论】政务微博规模效率表现较差的原因是政务微博信息传播效率表现不佳;政务微博信息传递规模效率较低的原因是粉丝数和关注数不足;最后基于投影分析,提出政务微博信息传递效率的改进方案。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号