首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
利用网络搜索关键词的搜索量变化来分析和预测相关事物发展趋势是一种逐渐被广泛关注的研究领域。提出网络搜索关键词时序变化特征包括领先、同步和滞后三种特征。通过采集搜索网站关键词的搜索量数据,针对分析预测对象进行时差相关分析,可以识别出相关关键词时序变化特征。通过H7 N9禽流感关键词时序变化特征识别实验,说明该方法的可行性。  相似文献   

2.
王旭艳 《现代情报》2003,23(7):83-84,68
美国化工网是一个全球性的综合类化工搜索网站。从关键词搜索、分类测览及“过滤器”限定法三个方面介绍了利用该网站进行化工信息搜索的方法;介绍了化学词典同义词查询功能。  相似文献   

3.
李佳  曾平 《情报探索》2011,(8):99-101
分析了百度相关搜索的功能特点,分析其关键词之间相关性的原理,旨在为企业利用百度相关搜索开展网站推广提供参考。  相似文献   

4.
网络搜索数据是消费者在信息搜集和购买决策过程中真实足迹的反映,对了解消费者购买需求具有重要价值。本文运用与现有研究不同的关键词获取方法,以我国汽车市场为背景,研究网络搜索数据与销量之间的关系。首先,确定网络搜索数据的关键词,主要运用了文本挖掘技术,具体而言:①对抓取的汽车论坛文本进行Jieba分词;②利用Word2vec模型把分词结果转化为向量空间模型形式;③结合TF-IDF算法和余弦相似度算法确定关键词。然后,基于108个月的长面板数据,建立网络搜索与汽车销量的固定效应模型。最后,采取滚动窗口的方式预测最近12个月的汽车销量。实证结果显示:网络搜索与汽车销量之间存在长期均衡关系,回归模型可以解释76%的方差;网络搜索数据有助于预测我国汽车销量。  相似文献   

5.
一种基于TFIDF方法的中文关键词抽取算法   总被引:4,自引:1,他引:3  
本文在海量智能分词基础之上,提出了一种基于向量空间模型和TFIDF方法的中文关键词抽取算法.该算法在对文本进行自动分词后,用TFIDF方法对文献空间中的每个词进行权重计算,然后根据计算结果抽取出科技文献的关键词.通过自编软件进行的实验测试表明该算法对中文科技文献的关键词自动抽取成效显著.  相似文献   

6.
随着互联网的高速发展和搜索引擎的广泛应用,使得网站进行搜索引擎优化(SE0)成为一种趋势,特别是在大型网站的应用中。主要从关键词分析、页面逆向优化、前台页面优化、内部链接策略、外部链接策略、搜索引擎友好写作策略7方面介绍大型网站的SEO策略。  相似文献   

7.
为了提高用户对网站使用的效率,提高网站本体模型的搜索性能,研究一种高效的网页语义概念树构建方法,进行搜索覆盖度层状拓展。传统方法中,使用搜索引擎的词语相似度算法进行搜索拓展,利用规则、聚类等技术对形式背景进行约简,无法有效简历概念间的上下位关系,性能不好。提出一种基于语义主题树特征匹配的搜索覆盖度层状拓展方法,进行Web语义模型和主题树构建,构建特征空间互信息区域文档词频向量模型,对数据库中记录的属性字段进行归类抽象,形成概念汇聚点,实现语义主题树构建搜索覆盖度拓展设计,构建语义主题树特征匹配算法,优化搜索引擎对文本特征的搜索敏感度,提高搜索覆盖度,实现文本搜索覆盖度层状拓展。实验分析得出,该方法具有较好的文本特征分类结果,语义层次结构清晰,可以有效提高文本数据召回率和查准率,展示了较好的应用价值。  相似文献   

8.
正1、目前在各类搜索网站输入关键词"中国科技信息",搜索出的结果很多都是在仿造我们的官方网站。请记住我们的官网地址:www.cnkjxx.com.  相似文献   

9.
正1、目前在各类搜索网站输入关键词"中国科技信息",搜索出的结果很多都是在仿造我们的官方网站。请记住我们的官网地址:www.cnkjxx.com.  相似文献   

10.
正1.目前在各类搜索网站输入关键词"中国科技信息",搜索出的结果很多都是在仿造我们的官方网站。请记住我们的官网地址:www.cnkjxx.com.  相似文献   

11.
The study of query performance prediction (QPP) in information retrieval (IR) aims to predict retrieval effectiveness. The specificity of the underlying information need of a query often determines how effectively can a search engine retrieve relevant documents at top ranks. The presence of ambiguous terms makes a query less specific to the sought information need, which in turn may degrade IR effectiveness. In this paper, we propose a novel word embedding based pre-retrieval feature which measures the ambiguity of each query term by estimating how many ‘senses’ each word is associated with. Assuming each sense roughly corresponds to a Gaussian mixture component, our proposed generative model first estimates a Gaussian mixture model (GMM) from the word vectors that are most similar to the given query terms. We then use the posterior probabilities of generating the query terms themselves from this estimated GMM in order to quantify the ambiguity of the query. Previous studies have shown that post-retrieval QPP approaches often outperform pre-retrieval ones because they use additional information from the top ranked documents. To achieve the best of both worlds, we formalize a linear combination of our proposed GMM based pre-retrieval predictor with NQC, a state-of-the-art post-retrieval QPP. Our experiments on the TREC benchmark news and web collections demonstrate that our proposed hybrid QPP approach (in linear combination with NQC) significantly outperforms a range of other existing pre-retrieval approaches in combination with NQC used as baselines.  相似文献   

12.
Traditional topic models are based on the bag-of-words assumption, which states that the topic assignment of each word is independent of the others. However, this assumption ignores the relationship between words, which may hinder the quality of extracted topics. To address this issue, some recent works formulate documents as graphs based on word co-occurrence patterns. It assumes that if two words co-occur frequently, they should have the same topic. Nevertheless, it introduces noise edges into the model and thus hinders topic quality since two words co-occur frequently do not mean that they are on the same topic. In this paper, we use the commonsense relationship between words as a bridge to connect the words in each document. Compared to word co-occurrence, the commonsense relationship can explicitly imply the semantic relevance between words, which can be utilized to filter out noise edges. We use a relational graph neural network to capture the relation information in the graph. Moreover, manifold regularization is utilized to constrain the documents’ topic distributions. Experimental results on a public dataset show that our method is effective at extracting topics compared to baseline methods.  相似文献   

13.
In this paper, we propose a new learning method for extracting bilingual word pairs from parallel corpora in various languages. In cross-language information retrieval, the system must deal with various languages. Therefore, automatic extraction of bilingual word pairs from parallel corpora with various languages is important. However, previous works based on statistical methods are insufficient because of the sparse data problem. Our learning method automatically acquires rules, which are effective to solve the sparse data problem, only from parallel corpora without any prior preparation of a bilingual resource (e.g., a bilingual dictionary, a machine translation system). We call this learning method Inductive Chain Learning (ICL). Moreover, the system using ICL can extract bilingual word pairs even from bilingual sentence pairs for which the grammatical structures of the source language differ from the grammatical structures of the target language because the acquired rules have the information to cope with the different word orders of source language and target language in local parts of bilingual sentence pairs. Evaluation experiments demonstrated that the recalls of systems based on several statistical approaches were improved through the use of ICL.  相似文献   

14.
刘君 《科教文汇》2011,(8):71-73
“自我抑制(S1Chentha]ten)”是一个被学界长期忽视的海氏术语.它的意义非比寻常,体现了海德格尔早期思想与黑格尔哲学的传承关系。本文试图先澄清该词目前在中、英译本上遗留的问题,而后深入挖掘该术语的理论内涵。  相似文献   

15.
赵瑞  李政 《科技广场》2006,(7):85-86
本文介绍了一种利用计算机辅助制订高校教学计划的方法,利用此模板可以在制订教学计划时,自动统计各种数据,生成需要的信息,打印输出结果。该设计可以避免修订教学计划过程中反复人工计算、誊写的麻烦,对提高教学管理水平和工作效率有重要作用。  相似文献   

16.
Measuring the similarity between the semantic relations that exist between words is an important step in numerous tasks in natural language processing such as answering word analogy questions, classifying compound nouns, and word sense disambiguation. Given two word pairs (AB) and (CD), we propose a method to measure the relational similarity between the semantic relations that exist between the two words in each word pair. Typically, a high degree of relational similarity can be observed between proportional analogies (i.e. analogies that exist among the four words, A is to B such as C is to D). We describe eight different types of relational symmetries that are frequently observed in proportional analogies and use those symmetries to robustly and accurately estimate the relational similarity between two given word pairs. We use automatically extracted lexical-syntactic patterns to represent the semantic relations that exist between two words and then match those patterns in Web search engine snippets to find candidate words that form proportional analogies with the original word pair. We define eight types of relational symmetries for proportional analogies and use those as features in a supervised learning approach. We evaluate the proposed method using the Scholastic Aptitude Test (SAT) word analogy benchmark dataset. Our experimental results show that the proposed method can accurately measure relational similarity between word pairs by exploiting the symmetries that exist in proportional analogies. The proposed method achieves an SAT score of 49.2% on the benchmark dataset, which is comparable to the best results reported on this dataset.  相似文献   

17.
陈晓华  刘慧 《科学学研究》2013,31(8):1178-1190
 摘 要:通过修正和完善Schott(2006)模型,本文构建了一个测度出口技术结构的新方法,并借助该方法结合我国31个省级区域HS码的65章产品出口数据,测度了2002-2008年我国各省级区域的出口技术结构。在此基础上,运用两步法系统GMM估计从东、中和西部三个区域层面研究了国际分散化生产对我国出口技术结构的影响。得出的主要结论有:首先国际分散化生产对我国出口技术结构的影响表现出先负后正的V型特征;其次近年来,我国各省级区域出口技术结构均有明显提升,但并不像Rodrik(2006) 测度的那么高,且区域间出口技术结构有“多均衡收敛”的发展趋势;最后我国出口技术结构有着与普通发展中国家不同的提升模式,外商直接投资对我国出口技术结构的提升作用具有显著的边际递减效应。  相似文献   

18.
Recently, using a pretrained word embedding to represent words achieves success in many natural language processing tasks. According to objective functions, different word embedding models capture different aspects of linguistic properties. However, the Semantic Textual Similarity task, which evaluates similarity/relation between two sentences, requires to take into account of these linguistic aspects. Therefore, this research aims to encode various characteristics from multiple sets of word embeddings into one embedding and then learn similarity/relation between sentences via this novel embedding. Representing each word by multiple word embeddings, the proposed MaxLSTM-CNN encoder generates a novel sentence embedding. We then learn the similarity/relation between our sentence embeddings via Multi-level comparison. Our method M-MaxLSTM-CNN consistently shows strong performances in several tasks (i.e., measure textual similarity, identify paraphrase, recognize textual entailment). Our model does not use hand-crafted features (e.g., alignment features, Ngram overlaps, dependency features) as well as does not require pre-trained word embeddings to have the same dimension.  相似文献   

19.
李果 《科教文汇》2011,(2):81-82
根据汉字是方块字的特点,我们先来做一个这样的假想,假设每个字内部都有一条看不见的横轴和纵轴,而且这两条线都已经把每个字均匀地分割成四部分。  相似文献   

20.
A main challenge in Cross-Language Information Retrieval (CLIR) is to estimate a proper translation model from available translation resources, since translation quality directly affects the retrieval performance. Among different translation resources, we focus on obtaining translation models from comparable corpora, because they provide appropriate translations for both languages and domains with limited linguistic resources. In this paper, we employ a two-step approach to build an effective translation model from comparable corpora, without requiring any additional linguistic resources, for the CLIR task. In the first step, translations are extracted by deriving correlations between source–target word pairs. These correlations are used to estimate word translation probabilities in the second step. We propose a language modeling approach for the first step, where modeling based on probability distribution provides two key advantages. First, our approach can be tuned easier in comparison with heuristically adjusted previous work. Second, it provides a principled basis for integrating additional lexical and translational relations to improve the accuracy of translations from comparable corpora. As an indication, we integrate monolingual relations of word co-occurrences into the process of translation extraction, which helps to extract more reliable translations for low-frequency words in a comparable corpus. Experimental results on an English–Persian comparable corpus show that our method outperforms the previous approaches in terms of both translation quality and the performance of CLIR. Indeed, the proposed method is naturally applicable to any comparable corpus, regardless of its languages. In addition, we demonstrate the significant impact of word translation probabilities, estimated in the second step of our approach, on the performance of CLIR.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号