首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 996 毫秒
1.
立足于语料库与语言测试的内涵,阐述了关于语料库应用于语言测试的理性认识,分析了基于计算机的语料库应用于语言测试的一些可能,指出其利用过程中的优势和局限性,以及语料库在语言测试上应用的前景。  相似文献   

2.
张翯  裴云红 《科教文汇》2013,(28):126-127
语料库语言学研究这一新兴领域因其独特的研究视角及应用前景越来越受到国内外学者的关注,而将其应用于语言教学也成为研究的重点。我国学者在语料库语言学与英语教学的结合方面已经取得了一定的成绩,但在基础教学及微观应用方面涉猎较少,尚待深入研究。本文以英语阅读教学为切入点,论述语料库语言视角下英语阅读教学的必要性和可行性,并从语篇导入、文体分析、语境猜词三个方面探讨基于语料库的英语阅读教学。  相似文献   

3.
语料库语言学研究这一新兴领域因其独特的研究视角及应用前景越来越受到国内外学者的关注,而将其应用于语言教学也成为研究的重点。我国学者在语料库语言学与英语教学的结合方面已经取得了一定的成绩,但在基础教学及微观应用方面涉猎较少,尚待深入研究。本文以英语阅读教学为切入点,论述语料库语言视角下英语阅读教学的必要性和可行性,并从语篇导入、文体分析、语境猜词三个方面探讨基于语料库的英语阅读教学。  相似文献   

4.
文章提出的基于三元组可比语料库的自动语言剖析技术扩大了该研究领域的内涵,使其包括面向自然语言处理的应用研究。从工程可实现性考虑,创新性地提出建造三元组可比语料库,利用n-元词串、关键词簇和语义多词表达等自动抽取技术,通过对比中式英语表达,发掘英语本族语言模型,实现改进和发展机器翻译、跨语言信息检索等自然语言处理应用的目标。  相似文献   

5.
对基于语料库的语言教学模式研究的缺乏是目前国内在教学中使用语料库所面临的困难之一。这里介绍并分析了数据驱动语言学习这一基于语料库的语言教学模式,认为这一模式在教学理念,学习内容,以及学习材料等方面与传统的教学模式存在很大的区别,对于在语法学习、词汇搭配、比较同义词和近义词、纠正语言失误、检查翻译译法等方面具有积极的意义。  相似文献   

6.
康大伟 《科教文汇》2009,(36):144-144,159
语料库是收集自然出现的语言材料的大型电子文库。语料库语言学是以语料库为基础对语言进行分析、描述和研究的新兴学科。英语语料库对我国英语教学中的词汇教学、语法教学、教材编写、词典编撰、学生自主学习能力培养等方面有着积极影响,在畜牧兽医行业英语教学中适当采用语料库手段,将会对教学大纲制定、教学内容选取、教材词典编写、教学评估等起到重要的推动作用。  相似文献   

7.
夏秸  李严严 《科教文汇》2012,(34):116-116,182
现代语料库作为一种大容量、快捷有效的语言研究工具,在语言教学领域的研究中发挥着日益重要的作用.本研究主要结合英语专业本科生阅读教学的实际,运用语料库辅助学生的课外阅读活动,以期改进阅读课的教学效果,提高学生的自主阅读能力.  相似文献   

8.
情报语言的兼容与互换问题是21世纪情报语言学的重要研究课题之一.互联网时代,当代情报语言的兼容化具有重要意义.在研究、比较国内外情报检索语言兼容互换方式方法的基础上,认为可以基于大规模已标引好的语料,对多个不同的分类体系进行自动转换,并且设计了一种基于大语料库的多分类体系自动转换方案.  相似文献   

9.
基于网络的语料库是近年来新兴的语言学习资源,其最大特点是语言的海量提供和材料的自然真实性。本文将浅议英语语料库这种新型教学学习资源对我国英语教学的内容、语法教学、词汇教学、学生语言错误分析和学生自主学习等方面产生积极的影响和推动作用,并对语料库的局限性及其应用中应注意的问题予以简要分析。  相似文献   

10.
现代语料库语言学对于英语语言教学产生了巨大的影响。写作教学是外语教学中的薄弱环节之一,语料库的兴起对英语写作教学具有重大意义。本文从以下三个方面探讨语料库对英语写作教学的重要作用:帮助学生提高写作中语言的准确性,增强文章结构的连贯性,丰富文章的内容。  相似文献   

11.
Knowledge acquisition and bilingual terminology extraction from multilingual corpora are challenging tasks for cross-language information retrieval. In this study, we propose a novel method for mining high quality translation knowledge from our constructed Persian–English comparable corpus, University of Tehran Persian–English Comparable Corpus (UTPECC). We extract translation knowledge based on Term Association Network (TAN) constructed from term co-occurrences in same language as well as term associations in different languages. We further propose a post-processing step to do term translation validity check by detecting the mistranslated terms as outliers. Evaluation results on two different data sets show that translating queries using UTPECC and using the proposed methods significantly outperform simple dictionary-based methods. Moreover, the experimental results show that our methods are especially effective in translating Out-Of-Vocabulary terms and also expanding query words based on their associated terms.  相似文献   

12.
将大量中英文对照的专利文本作为平行语料库,提出一种自动抽取中英文词典的方法。先利用外部语义资源维基百科构建种子双语词典,再通过计算点互信息获得中英文词对的候补,并设置阈值筛选出用于补充种子词典的词对。实验结果表明:对英语文档进行单词的短语化有助于提高自动抽取结果的综合性能;另一方面,虽然通过句对齐方式可以提高自动抽取结果的正确率,但会对抽取结果的召回率产生负面影响。通过所述方法构建的专利双语词典能够在构建多语言版本的技术知识图谱中起到积极作用。  相似文献   

13.
Probabilistic topic models are unsupervised generative models which model document content as a two-step generation process, that is, documents are observed as mixtures of latent concepts or topics, while topics are probability distributions over vocabulary words. Recently, a significant research effort has been invested into transferring the probabilistic topic modeling concept from monolingual to multilingual settings. Novel topic models have been designed to work with parallel and comparable texts. We define multilingual probabilistic topic modeling (MuPTM) and present the first full overview of the current research, methodology, advantages and limitations in MuPTM. As a representative example, we choose a natural extension of the omnipresent LDA model to multilingual settings called bilingual LDA (BiLDA). We provide a thorough overview of this representative multilingual model from its high-level modeling assumptions down to its mathematical foundations. We demonstrate how to use the data representation by means of output sets of (i) per-topic word distributions and (ii) per-document topic distributions coming from a multilingual probabilistic topic model in various real-life cross-lingual tasks involving different languages, without any external language pair dependent translation resource: (1) cross-lingual event-centered news clustering, (2) cross-lingual document classification, (3) cross-lingual semantic similarity, and (4) cross-lingual information retrieval. We also briefly review several other applications present in the relevant literature, and introduce and illustrate two related modeling concepts: topic smoothing and topic pruning. In summary, this article encompasses the current research in multilingual probabilistic topic modeling. By presenting a series of potential applications, we reveal the importance of the language-independent and language pair independent data representations by means of MuPTM. We provide clear directions for future research in the field by providing a systematic overview of how to link and transfer aspect knowledge across corpora written in different languages via the shared space of latent cross-lingual topics, that is, how to effectively employ learned per-topic word distributions and per-document topic distributions of any multilingual probabilistic topic model in various cross-lingual applications.  相似文献   

14.
The paper presents new annotated corpora for performing stance detection on Spanish Twitter data, most notably Health-related tweets. The objectives of this research are threefold: (1) to develop a manually annotated benchmark corpus for emotion recognition taking into account different variants of Spanish in social posts; (2) to evaluate the efficiency of semi-supervised models for extending such corpus with unlabelled posts; and (3) to describe such short text corpora via specialised topic modelling.A corpus of 2,801 tweets about COVID-19 vaccination was annotated by three native speakers to be in favour (904), against (674) or neither (1,223) with a 0.725 Fleiss’ kappa score. Results show that the self-training method with SVM base estimator can alleviate annotation work while ensuring high model performance. The self-training model outperformed the other approaches and produced a corpus of 11,204 tweets with a macro averaged f1 score of 0.94. The combination of sentence-level deep learning embeddings and density-based clustering was applied to explore the contents of both corpora. Topic quality was measured in terms of the trustworthiness and the validation index.  相似文献   

15.
开发了一个集多媒体语料库构建、标注、检索与定位播放的管理系统。通过该平台用户不仅能够方便自主地构建多媒体语料库,还可方便快速实现对多媒体语料的多维度标注、语料库的检索以及音视频内容的自动定位与播放。  相似文献   

16.
The application of natural language processing (NLP) to financial fields is advancing with an increase in the number of available financial documents. Transformer-based models such as Bidirectional Encoder Representations from Transformers (BERT) have been successful in NLP in recent years. These cutting-edge models have been adapted to the financial domain by applying financial corpora to existing pre-trained models and by pre-training with the financial corpora from scratch. In Japanese, by contrast, financial terminology cannot be applied from a general vocabulary without further processing. In this study, we construct language models suitable for the financial domain. Furthermore, we compare methods for adapting language models to the financial domain, such as pre-training methods and vocabulary adaptation. We confirm that the adaptation of a pre-training corpus and tokenizer vocabulary based on a corpus of financial text is effective in several downstream financial tasks. No significant difference is observed between pre-training with the financial corpus and continuous pre-training from the general language model with the financial corpus. We have released our source code and pre-trained models.  相似文献   

17.
In this paper, we propose a new learning method for extracting bilingual word pairs from parallel corpora in various languages. In cross-language information retrieval, the system must deal with various languages. Therefore, automatic extraction of bilingual word pairs from parallel corpora with various languages is important. However, previous works based on statistical methods are insufficient because of the sparse data problem. Our learning method automatically acquires rules, which are effective to solve the sparse data problem, only from parallel corpora without any prior preparation of a bilingual resource (e.g., a bilingual dictionary, a machine translation system). We call this learning method Inductive Chain Learning (ICL). Moreover, the system using ICL can extract bilingual word pairs even from bilingual sentence pairs for which the grammatical structures of the source language differ from the grammatical structures of the target language because the acquired rules have the information to cope with the different word orders of source language and target language in local parts of bilingual sentence pairs. Evaluation experiments demonstrated that the recalls of systems based on several statistical approaches were improved through the use of ICL.  相似文献   

18.
The Yarowsky bootstrapping algorithm resolves the homograph-level word sense disambiguation (WSD) problem, which is the sense granularity level required for real natural language processing (NLP) applications. At the same time it resolves the knowledge acquisition bottleneck problem affecting most WSD algorithms and can be easily applied to foreign language corpora. However, this paper shows that the Yarowsky algorithm is significantly less accurate when applied to domain fluctuating, real corpora. This paper also introduces a new bootstrapping methodology that performs much better when applied to these corpora. The accuracy achieved in non-domain fluctuating corpora is not reached due to inherent domain fluctuation ambiguities.  相似文献   

19.
鲍玉来  耿雪来  飞龙 《现代情报》2019,39(8):132-136
[目的/意义]在非结构化语料集中抽取知识要素,是实现知识图谱的重要环节,本文探索了应用深度学习中的卷积神经网络(CNN)模型进行旅游领域知识关系抽取方法。[方法/过程]抓取专业旅游网站的相关数据建立语料库,对部分语料进行人工标注作为训练集和测试集,通过Python语言编程实现分词、向量化及CNN模型,进行关系抽取实验。[结果/结论]实验结果表明,应用卷积神经网络对非结构化的旅游文本进行关系抽取时能够取得满意的效果(Precision 0.77,Recall 0.76,F1-measure 0.76)。抽取结果通过人工校对进行优化后,可以为旅游知识图谱构建、领域本体构建等工作奠定基础。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号