首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A nearest neighbour search procedure is described for the automatic correction of misspellings. The procedure involves the replacement of a misspelt word by that word in a dictionary which best matches the misspelling, the degree of match being calculated using a similarity coefficient based on the number of trigrams common to the two words. Experiments with a collection of 1544 misspellings and a dictionary of 64,636 words suggest that the procedure results in the unique identification of the correct spelling for over 75% of the misspellings if the correct form of the word is in the dictionary, and that this figure may be increased to over 90% if near, rather than nearest, neighbours are acceptable.  相似文献   

2.
In contrast with their monolingual counterparts, little attention has been paid to the effects that misspelled queries have on the performance of Cross-Language Information Retrieval (CLIR) systems. The present work makes a first attempt to fill this gap by extending our previous work on monolingual retrieval in order to study the impact that the progressive addition of misspellings to input queries has, this time, on the output of CLIR systems. Two approaches for dealing with this problem are analyzed in this paper. Firstly, the use of automatic spelling correction techniques for which, in turn, we consider two algorithms: the first one for the correction of isolated words and the second one for a correction based on the linguistic context of the misspelled word. The second approach to be studied is the use of character n-grams both as index terms and translation units, seeking to take advantage of their inherent robustness and language-independence. All these approaches have been tested on a from-Spanish-to-English CLIR system, that is, Spanish queries on English documents. Real, user-generated spelling errors have been used under a methodology that allows us to study the effectiveness of the different approaches to be tested and their behavior when confronted with different error rates. The results obtained show the great sensitiveness of classic word-based approaches to misspelled queries, although spelling correction techniques can mitigate such negative effects. On the other hand, the use of character n-grams provides great robustness against misspellings.  相似文献   

3.
There are many cases where it is necessary to store sets of data that are variable in length, and to search these in order to satisfy requests for subsets with a common characteristic. This article presents a file structure that holds an integrated English dictionary used to locate clusters of words for presentation to an intelligent spelling error correction system. Although the emphasis has been on misspelling, the structure presented is capable of handling any other types of lumpy data provided the characteristics used in search requests can be translated into a set of integer numbers.  相似文献   

4.
This paper describes an intelligent spelling error correction system for use in a word processing environment. The system employs a dictionary of 93,769 words and provided the intended word is in the dictionary it identifies 80 to 90% of spelling and typing errors.  相似文献   

5.
This paper demonstrates that the vast majority of spelling errors follow specific rules which are based on phonological and sequential considerations. It introduces and describes three categories of spelling errors (consonantal, vowel and sequential) and presents the results of the analysis of 1377 spelling error forms.  相似文献   

6.
An automatic method for correcting spelling and typing errors from teletypewriter keyboard input is proposed. The computerized correcting process is presented as a heuristic tree search. The correct spellings are stored character-by-character in a psuedo-binary tree. The search examines a small subset of the database (selected branches of the tree) while checking for insertion, substitution, deletion and transposition errors. The correction procedure utilizes the inherent redundancy of natural language. Multiple errors can be handled if at least two correct characters appear between errors. Test results indicate that this approach has the highest error correction accuracy to date.  相似文献   

7.
Under the joint auspices ofthe International Union ofPure and Applied Chemis-try (IUPAC), China Association forScience and Technology, CAS, Na-tional Natural Science Foundation ofChina, Ministry of the Science andTechnology, Ministry of Education,and China Petroleum & ChemicalCorporation, the Chinese ChemicalSociety (CCS) will organize the 40thIUPAC Congress from August 14 19, 2005, in Beijing. The meetingwill be held in conjunction with the43rd IUPAC General Assembly fromA…  相似文献   

8.
王俊丽 《科教文汇》2012,(35):131-132
通过对高中英语学习错误的量化统计,并对其施以错误分析,结果发现:词汇错误中所占比例最大的类型是词性的掌握,其次为同近义词混淆,单词拼写和词汇搭配分列第三和第四位.动词及时态的使用和复合句的错误率均分别是语法和语篇中出错比例最高的一类.  相似文献   

9.
The University of Science and Technology of China (USTC) is located in Hefei, the capital of Anhui province, and has its own characteristics among the universities in China. Established by the Chinese Academy of Sciences (CAS), USTC is distinctively tinted with a scientific color. It is also famous for its ‘Special Class for the Gifted Young’ and is considered one of the best Chinese universities in the fields of science and technology (S&T). Recently, National Science Review interviewed Professor Xinhe Bao, the President of USTC, about the characteristics of the university and the education and research in China. Xinhe Bao is an academician of CAS and has made seminal contributions in catalysis and energy chemistry in the past decades. Before joining USTC, he had worked at Dalian Institute of Chemical Physics (DICP), CAS and Fudan University (Shanghai), and thus possesses an in-depth understanding of the education and research in China.  相似文献   

10.
Bibliometrics and citation analysis have become important sets of methods for library and information science, as well as exceptional sources of information and knowledge for many other areas. Their main sources are citation indices, which are bibliographic databases like Web of Science, Scopus, Google Scholar, etc. However, bibliographical databases lack perfection and standardization. There are several software tools that perform useful information management and bibliometric analysis importing data from them. A comparison has been carried out to identify which of them perform certain pre-processing tasks. Usually, they are not strong enough to detect all the duplications, mistakes, misspellings and variant names, leaving to the user the tedious and time-consuming task of correcting the data. Furthermore, some of them do not import datasets from different citation indices, but mainly from Web of Science (WoS).A new software tool, called STICCI.eu (Software Tool for Improving and Converting Citation Indices – enhancing uniformity), which is freely available online, has been created to solve these problems. STICCI.eu is able to do conversions between bibliographical citation formats (WoS, Scopus, CSV, BibTex, RIS), correct the usual mistakes appearing in those databases, detect duplications, misspellings, etc., identify and transform the full or abbreviated titles of the journals, homogenize toponymical names of countries and relevant cities or regions and list the processed data in terms of the most cited authors, journals, references, etc.  相似文献   

11.
A Zipfian model of an automatic bibliographic system is developed using parameters describing the contents of it database and its inverted file. The underlying structure of the Zipf distribution is derived, with particular emphasis on its application to work frequencies, especially with regard to the inverted flies of an automatic bibliographic system. Andrew Booth developed a form of Zipf's law which estimates the number of words of a particular frequency for a given author and text. His formulation has been adopted as the basis of a model of term dispersion in an inverted file system. The model is also distinctive in its consideration of the proliferation of spelling errors in free text, and the inclusion of all searchable elements from the system's inverted file. This model is applied to the National Library of Medicine's MEDLINE. The model carries implications for the determination of database storage requirements, search response time, and search exhaustiveness.  相似文献   

12.
Using a composite sample of over 3600 index terms drawn from 11 different machine-readable bibliographic data bases, estimates were made of the spelling error frequencies of each of these data bases, as well as the frequency of posting to misspelled terms. The terms studied included assigned index terms as well as some terms from titles and abstracts. The frequency of index term misspellings ranged from a high of almost 23% for one data base to a low of less than 12% for another data base. The frequency of posting to misspelled terms ranged from about one posting in 8000 citations for one data base, to about one posting in 160 citations in another data base. The impact of these error rates is discussed for the tape supplier, tape user and end user. Some suggestions are given regarding search strategry.  相似文献   

13.
本文运用定量统计方法,从收录范围、报导速度、摘录质量、辅助索引等方面对1988年《情报科学文摘》行分析评价。同时,通过与《图书馆学文摘》、美国的《情报科学文摘》(ISA)两种检索刊物的统计数据进行比较,进一步说明《情报科学文摘》的质量,并提出了提高其质量的几点建议。  相似文献   

14.
Co-organized by the CAS Graduate University (GUCAS) and the CAS Kunming Branch, 2007 Sciencel00, a CAS annual forum for outstanding young scientists, was opened on 28 November, 2007 at the CAS Xishuangbanna Tropical Botanical Garden (XTBG), with an attendance of more than 90 experts and scholars from various CAS affiliates.  相似文献   

15.
Documents in computer-readable form can be used to provide information about other documents, i.e. those they cite.To do this efficiently requires procedures for computer recognition of citing statements. This is not easy, especially for multi-sentence citing statements. Computer recognition procedures have been developed which are accurate to the following extent: 73% of the words in statements selected by computer procedures as being citing statements are words which are correctly attributable to the corresponding documents.The retrieval effectiveness of computer-recognized citing statements was tested in the following way. First, for eight retrieval requests in inorganiic chemistry, average recall by search of Chemical Abstracts Service indexing and Chemical Abstracts abstract text words was found to be 50%. Words from citing statements referring to the papers to be retrieved were then added to the index terms and abstract words as additional access points, and searching was repeated. Average recall increased to 70%. Only words from citing statements published within a year of the cited papers were used.The retrieval effect of citing statement words alone (published within a year) without index or abstract terms was the following: average recall was 40%. When just the words of the titles of the cited papers were added to those citing statement words, average recall increased to 50%.  相似文献   

16.
The Partner Group on Colloid and Interface Science is based on the International Joint Laboratory, established in the former Beijing Institute of Photographic Chemistry, Chinese Academy of Sciences (CAS), and was incorporated into the Center for Molecular Science of the Institute of Chemistry, CAS in 1999. The Partner Group aims at research on molecular assembly of biomimetic membranes and the synthesis of nano-structured materials by way of training younger Chinese scientists and making use of the advanced facilities in Germany. Both sides cooperate closely in sectors in which they have a common interest to push forward the development of related sciences in China. The group leader, Li Junbai, has received many awards and has obtained much financial support inside China. On the German side, the Partner Group is hosting or participating in several research and development projects commissioned by the Max Planck Society and the Deutsche Forschungsgemeinschaft (DFG). The Partner Group has thus secured goo  相似文献   

17.
Under the aegis of CAS and the Royal Society of London, a high-level Sino-UK seminar on solar energy convened in Beijing on 2 and 3 March.  相似文献   

18.
SciFinder Scholar 2007检索特点及分析功能   总被引:1,自引:0,他引:1  
邵诚敏 《现代情报》2008,28(2):178-179,184
SciFinder Scholar是美国<化学文摘>网络版专门提供学术研究机构使用的桌面检索平台,本文主要介绍SciFinder Scholar 2007的检索特点及分析功能.  相似文献   

19.
With the objective of exploring approaches to integrated research into water resources and giving full play to the Steering Committee of the CAS Water Resources Research Center (WRRC), the second meeting of WRRC experts and a symposium on WRRC development were held at the CAS Institute of Geographic Science and Resources Research (IGSNRR) in Beijing on 4 and 5 December,  相似文献   

20.
Profs. LIN Liwu and SHA Guohe, two physical chemists with the CAS Dalian Institute of Chemical Physics (DICP), were recently elected Fellows of the Royal Society of Chemistry (RSC) in UK.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号