首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
基于句子相似度的文本主题句提取算法研究   总被引:1,自引:0,他引:1  
文本主题提取是文本挖掘领域的重要研究内容,解决文本信息泛滥的重要手段.为了解决现有文本主题句提取中一些局部主题容易被忽略的问题,本文提出一种"先分割,再提取"的思想.首先将文本表示为句子的线性序列,句子表示为词的线性序列,并对每个句子都预处理为含有实词的词汇链,然后基于知网(Hownet)计算相邻句子相似度.基于句子相似度,采用文本分割技术将文本分为多个关于子主题的句子包,通过句子关系图对这些句子包进行主题句提取.最后选用不同的语料库,设计进行了可接受性测试,实验结果验证该算法是可行、有效地.  相似文献   

2.
[目的/意义] 准确地计算微博相似度可以提高微博主题挖掘效率,对舆情治理、保障信息安全具有实践意义。针对微博文本语义稀疏、高维的问题,提出一种融入微博非文本特征的超边相似度算法。[方法/过程] 分析微博舆情发生机制,利用超网络模型表示微博舆情主题形成过程,通过计算各层子网相似度及各层子网对主题形成的贡献度构建超边相似度算法。[结果/结论] 研究发现,论文所提出的相似度方法有助于提升微博舆情信息的主题聚类效果,特别是对于文字性表述相似程度高的微博信息,具有明显的主题区分性。  相似文献   

3.
为探究面向学科新兴主题探测领域多源科技文献融合过程中的时滞性问题,本文设计了多源科技文献时滞计算方案。首先,从获取的4种科技文献数据集中提取学科主题,计算学科主题间的相似度,构建相似矩阵;其次,基于匈牙利最优匹配算法寻求相似度损耗最小条件下的最优组合;最后,构建线性方程模型并拟合计算时滞程度。本文以2009-2016年农业学科领域337790篇摘要文本为实验数据,抽取基金项目文本学科主题为250个、专利文献为260个、期刊论文为260个、会议论文为240个,利用上述多源科技文献时滞计算方案实验。结果表明:期刊论文滞后于基金项目文本和会议论文1年,专利文献滞后于期刊论文1年,结合以往对不同学科领域数据的研究结果,验证了多源科技文献时滞计算方案的可行性和有效性,同时也为多源科技文献融合策略的制定提供新思路。  相似文献   

4.
张培晶  宋蕾 《图书情报工作》2012,56(24):120-126
在介绍概率主题模型发展过程以及概率主题模型的代表性模型LDA基本原理的基础上,分析LDA模型的特征及其用于微博类网络文本挖掘的优势;介绍和评述微博环境下现有的基于LDA模型的文本主题建模方法,并对其扩展方式和建模效果进行总结和比较;最后对微博文本主题建模的发展方向进行展望。  相似文献   

5.
基于句子的文本表示及中文文本分类研究   总被引:1,自引:0,他引:1  
文本挖掘技术是信息资源管理的一项关键技术.向量空间模型是文本挖掘中成熟的文本表示模型,通常以词语或短语作为特征项,但这些特征项只能提供较少的语义信息.为实现基于内容的文本挖掘,本文将文本切分粒度从词语或短语提高到句子,用句子包表示文本,使用句子相似度定义文本相似度,用KNN算法进行中文文本分类,验证模型的可行性.实验证明,基于句子包的KNN算法的平均精度(92.12%)和召回率(92.01%)是比较理想的.  相似文献   

6.
基于动态LDA主题模型的内容主题挖掘与演化   总被引:1,自引:0,他引:1  
指出文本内容主题的挖掘和演化研究对于文本建模和分类及推荐效果提升具有重要作用。从分析基于LDA主题模型的文本内容主题挖掘原理入手,针对当前网络环境下的文本内容特点,构建适用于动态文内容本主题挖掘的LDA模型,并通过改进的Gibbs抽样估计提高主题挖掘的准确性,进而从主题相似度和强度两个方面研究内容主题随时间的演化问题。实验表明,所提方法可行且有效,对后续有关文本语义建模和分类研究等具有重要的实践意义。  相似文献   

7.
本文提出了一种对中文文本摘要中抽取出的句子进行重述的方法.首先使用基于统计的方法对文本进行特征统计,计算词和句子的权重,摘取出权值较高的句子;然后对这些句子应用一种基于向量相似度计算的算法进行指代消解,同时提出一种新的句子向量相似度的计算方法去除冗余;最后利用启发式规则进行加工,从而得到文本摘要.实验结果显示,系统修改后的文摘具有较好的连贯性和流畅性,与修改之前的文摘相比,文摘质量有明显提高.  相似文献   

8.
为了弥补目前微博平台主题挖掘方法的不足,兼顾到微博信息的稀疏性、多维性、海量性等特点,提出根据微博信息特点进行有针对性的预处理后,使用基于先验概率的潜在语义分析模型LDA(Latent Dirichlet Allocation)进行微博主题挖掘,并在LDA建模的基础上,设计文本增量聚类算法,进一步实现主题结构的识别,从而使用户更好地理解主题及其结构。通过在真实微博数据集上的实验,证明该模型能有效进行主题挖掘和主题结构的识别。  相似文献   

9.
针对目前网络上的标题党新闻,提出一种标题党新闻自动识别的算法。通过分析新闻网页构成的特点,抽取出新闻标题和新闻正文;以句子关系矩阵为基础,提出一种以语句为单位的主题句抽取算法;根据句子相似度计算结果来进行判断。实验表明,本方法的识别精度可达到80%,是一种有效的方法。  相似文献   

10.
[目的/意义] 提出一种融合评论主题识别与技术属性多维度分析的技术机会发现方法,从技术需求驱动视角识别技术机会,为企业前瞻布局研发方向与进行科研管理规划提供决策建议支持。[方法/过程] 以产品在线评论为研究数据源,首先,利用LDA主题模型识别出评论技术主题,提出技术评论主题强度和主题新颖度两个指标,筛选出新兴重点技术评论主题。然后,从学术论文、技术专利中人工选取技术属性词,通过TF-IDF值计算得到评论高频词,结合专家知识进一步筛选出技术特征词,构建产品技术属性词-技术特征词表。通过相关性计算分别得到与评论相关和与新兴重点技术评论主题相关的技术属性。最后,提出一种产品重要技术属性识别指标模型并设计一种多维度分析方法,分析产品重要技术属性的特征情况,最终识别出蕴含在评论文本中的新兴技术机会。[结果/结论] 实验结果表明该方法能够有效地识别技术机会,为企业产品技术研发管理提供参考。  相似文献   

11.
This article surveys a sample of sources of the information about Romania available to British readers in nineteenth century British newspapers and periodicals. It traces first contacts between the Romanian lands and Britain after the union of the principalities of Wallachia and Moldavia in 1859, then after their independence from the Ottoman Empire. The article highlights an increased Romanian interest in British periodicals, which reported and reviewed Romanian literature and scholarship. The article concludes that nineteenth century British newspapers and periodicals offer a great variety and wealth of new material previously unavailable or unknown to researchers. It also states that only a portion of a large quantity of this material has been indexed and is therefore available via the bibliographic sources mentioned in the article. The author argues for the need of a new and updated British-Romanian bibliography, which can draw on new online resources offering access to thousands of new newspapers and periodical records.  相似文献   

12.
ABSTRACT

The paper looks at library approval plans for material published in Slavic, East European, and Eurasian countries from the selector's point of view. Reasons why a selector would or would not want one are examined. Success with approval plans requires monitoring receipts, as well as good and ongoing communication among the selector, the acquistions department, and the vendor. A preliminary list of vendors offering approval plans for the countries of the region appears in the appendix.  相似文献   

13.
为进一步提升武汉科技信息共享服务平台使用效率,本文从平台资源建设、资源应用、供需对接方式和供需特点等方面分析了武汉科技信息资源服务现状;基于需求和利用的角度,结合平台管理实践和走访用户、问卷调查等研究方法,从信息资源需求主体和平台自身建设管理两个维度,找出制约科技资源供需对接的主要因素;以市场化和制度化为创新理念,从政策创新、机制创新、市场化服务、环境营造、人才培养等方面提出平台建设由“资源集聚”向“需求导向”转变的对策建议.  相似文献   

14.
ABSTRACT

The article examines the most important periodicals of ethnic minorities in Poland. After 1989, many ethnic groups (e.g., Germans and Romanies) were allowed to publish journals and newspapers for the first time since the end of World War II. The publications examined show the rich cultural life of the various ethnic groups as well as their current status in Poland. In addition to popular titles, some scholarly publications are also discussed.  相似文献   

15.
在分析文献在不同研究阶段用词时间特征的倾向性基础上,提出一种基于主题模型的研究发展阶段识别方法。重点阐述该方法的构建过程,包括时间特征抽取、发展阶段界定、主题冷热变化分析等步骤。为验证该方法的有效性,针对词频统计法和主题模型方法在主题演化分析中的效果进行比较分析。结果表明,该方法能在识别主题热点和发展趋势的同时,有效地区分不同主题所反映的研究发展阶段。  相似文献   

16.
The author answers a reference question on bibliographic sources for the Ukrainian periodical press 1840–1850. Helpful publications include bibliographies, guides, and library catalogs. These potentially make mention of revolutionary developments in Hungary (such as the Twelve Points paragraph of the Demands of the Hungarian Nation in March 1848, the subsequent April Laws, and Hungary's declaration of independence in April 1949), and elsewhere in the Hapsburg Empire.  相似文献   

17.
Chromebooks and the G Suite group of products, like Google Search, Gmail, and Google Docs, have rapidly expanded in American schools during the past 5 years. The impact of one-to-one Chromebook devices and the pervasive use of Google's software products in American education cannot be overstated. This article explores some of the influences of these products on research, based on the experiences of a librarian and technology coordinator at an elementary education level. The author has several suggestions for effective research with these products in mind.  相似文献   

18.
Library historians can learn a great deal from studying the gender of authorship and institutional affiliation of a scholarly journal. The focus of this study is to examine these two aspects of journal production in library history to see who is producing published research in this field. Twenty-three years of Libraries & Culture were chosen as the target volumes. The study reveals that more men than women published in library history as well as locating which institutions were represented. This type of information is usedful to the library historian engaged in the analysis of published scholarship, and more generally to scholars with an interest in patterns of literature production in fields closely related to the social and behavioral sciences.  相似文献   

19.
ABSTRACT

The history of the almanac in Croatia is reconstructed through primary research in bibliographic and archival sources. The almanac is a vehicle for knowledge communication in informal contexts, engaging both oral tradition and literary forms traceable to medieval literacy and ways of structuring knowledge. The history of the almanac in Croatia reflects the changing context of the book trade, literacy, and the evolution of language. Four main stages are identified: (1) the beginning of the annual almanac in the seventeenth century; astrological almanacs reflecting the sensibility of the Baroque period; (2) the Enlightenment's stimulation of almanac publishing in the spirit of contemporary secular reforms in agriculture and education; (3) nineteenth-and twentieth-century almanac trade, showing complex and overlapping networks for the production, distribution and appropriation of printed almanacs;(4) roughly the end of World War II, when the almanac slowly moved out of the role of a popular mass medium and into specialized niches represented by regional, diaspora, and religious almanacs.  相似文献   

20.
《The Reference Librarian》2013,54(74):121-164
Summary

Although the Internet provides access to a wealth of information, there is little, if any, control over the quality of that information. Side-by-side with reliable information, one finds disinformation, misinformation, and hoaxes. The authors of this paper discuss numerous examples of fabricated historical information on the Internet (ranging from denials of the Holocaust to personal vendettas), offer suggestions on how to evaluate websites, and argue that these fabrications can be incorporated into bibliographic instruction classes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号