首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
A main challenge in Cross-Language Information Retrieval (CLIR) is to estimate a proper translation model from available translation resources, since translation quality directly affects the retrieval performance. Among different translation resources, we focus on obtaining translation models from comparable corpora, because they provide appropriate translations for both languages and domains with limited linguistic resources. In this paper, we employ a two-step approach to build an effective translation model from comparable corpora, without requiring any additional linguistic resources, for the CLIR task. In the first step, translations are extracted by deriving correlations between source–target word pairs. These correlations are used to estimate word translation probabilities in the second step. We propose a language modeling approach for the first step, where modeling based on probability distribution provides two key advantages. First, our approach can be tuned easier in comparison with heuristically adjusted previous work. Second, it provides a principled basis for integrating additional lexical and translational relations to improve the accuracy of translations from comparable corpora. As an indication, we integrate monolingual relations of word co-occurrences into the process of translation extraction, which helps to extract more reliable translations for low-frequency words in a comparable corpus. Experimental results on an English–Persian comparable corpus show that our method outperforms the previous approaches in terms of both translation quality and the performance of CLIR. Indeed, the proposed method is naturally applicable to any comparable corpus, regardless of its languages. In addition, we demonstrate the significant impact of word translation probabilities, estimated in the second step of our approach, on the performance of CLIR.  相似文献   

2.
Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries––one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus.We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.  相似文献   

3.
Arabic is a morphologically rich language that presents significant challenges to many natural language processing applications because a word often conveys complex meanings decomposable into several morphemes (i.e. prefix, stem, suffix). By segmenting words into morphemes, we could improve the performance of English/Arabic translation pair’s extraction from parallel texts. This paper describes two algorithms and their combination to automatically extract an English/Arabic bilingual dictionary from parallel texts that exist in the Internet archive after using an Arabic light stemmer as a preprocessing step. Before using the Arabic light stemmer, the total system precision and recall were 88.6% and 81.5% respectively, then the system precision an recall increased to 91.6% and 82.6% respectively after applying the Arabic light stemmer on the Arabic documents.  相似文献   

4.
Cross-lingual semantic interoperability has drawn significant attention in recent digital library and World Wide Web research as the information in languages other than English has grown exponentially. Cross-lingual information retrieval (CLIR) across different European languages, such as English, Spanish, and French, has been widely explored; however, CLIR across European languages and Oriental languages is still in the initial stage. To cross language boundary, corpus-based approach is promising to overcome the limitation of the knowledge-based and controlled vocabulary approaches but collecting parallel corpora between European language and Oriental language is not an easy task. Length-based and text-based approaches are two major approaches to align parallel documents. In this paper, we investigate several techniques using these approaches and compare their performances in aligning English and Chinese titles of parallel documents available on the Web.  相似文献   

5.
This work assesses the performance of two N-gram matching techniques for Arabic root-driven string searching: contiguous N-grams and hybrid N-grams, combining contiguous and non-contiguous. The two techniques were tested using three experiments involving different levels of textual word stemming, a textual corpus containing about 25 thousand words (with a total size of about 160KB), and a set of 100 query textual words. The results of the hybrid approach showed significant performance improvement over the conventional contiguous approach, especially in the cases where stemming was used. The present results and the inconsistent findings of previous studies raise some questions regarding the efficiency of pure conventional N-gram matching and the ways in which it should be used in languages other than English.  相似文献   

6.
Knowledge acquisition and bilingual terminology extraction from multilingual corpora are challenging tasks for cross-language information retrieval. In this study, we propose a novel method for mining high quality translation knowledge from our constructed Persian–English comparable corpus, University of Tehran Persian–English Comparable Corpus (UTPECC). We extract translation knowledge based on Term Association Network (TAN) constructed from term co-occurrences in same language as well as term associations in different languages. We further propose a post-processing step to do term translation validity check by detecting the mistranslated terms as outliers. Evaluation results on two different data sets show that translating queries using UTPECC and using the proposed methods significantly outperform simple dictionary-based methods. Moreover, the experimental results show that our methods are especially effective in translating Out-Of-Vocabulary terms and also expanding query words based on their associated terms.  相似文献   

7.

Introduction:

Intensive exercising may significantly damage muscles which is reflected in pain, fatigue and the increase of muscle proteins concentrations in blood such are creatinin kinase (CK), lactic dehydrogenase (LD), myoglobin (MB) and other biochemical parameters including urea serum concentration (SU). Biochemical markers vary with age, sex, race, muscle mass, physical activity and climate conditions. They also assist us in determining the limit between the capacity for adaptation to given training process which results in supercomepensation and in condition of overtraining (OT), in the case of load that exceeds the physiologic potential of regeneration. Concerning the problem of diagnosis and explanation of the symptoms of overtraining, markers that can apply reliably and with sufficient sensitivity and simplicity of interpretation in the praxis are sought. It is critical to take into account difference among individuals and groups that could hamper the interpretation.

The most frequently used markers:

The most frequently used biomarkers that provide us with the information on physical activity and on the amount of load through exercise are CK, SU and LD. Level of serum retaining kinas has been measured and interpreted for years as part of different scientific and professional investigations and presents one of basic parameters for determining the level of muscle damage. It reaches maximal concentration of the fourth day of exercising which depends on the type of exercise and the nature of stress triggered by exercise but also on individual characteristics.The level of serum urea presents marker of nitric compounds metabolism and is the principle chemical substance in the urine of mammals. It is thus possible to draw a parallel between the increases of serum urea concentration on increased degradations of proteins. Significant fall of serum amino acid levels occurs after sixty to seventy minutes of exercising with the increase of urea and free tyrosine and these changes have high correlation with the duration and intensity of.LD changes are important index of well-trained sportsmen and their capability to withstand the pace and force during strain in the training process. The level of LD is a good index of exercise intensity and marker of metabolic exchange in tissues whose concentration in serum is dependent of cell damage.

Conclusion:

There is not a single, unique parameter that would provide enough valuable information for the estimation of the quality of exercising, amount of load and identification of overtraining. Delayed measurement of biomarkers is far from ideal, but it is obvious that the amount of stress/ load in training is the most important factor for the development of state of overtraining. Daily body weight control, diet, biochemical indices values and the input of water should be known and standardized before measurements. For the most of parameters determination of basal levels are needed in specific populations for more accurate interpretation and evaluation of results. The sampling process itself should be under the most strict conditions of standardization by repeating measurement at least every third day. Dependence of mentioned parameters (SU, CK, LD) on exercise intensity varies among individuals and without these additional measurements and subpopulation evaluations it is difficult to come to conclusions with certainty as well as to come to conclusions on causative correlations of training load and dynamic in biochemical parameters.Biochem Med (Zagreb) 2013 Jun; 23(2): A57–A58. Published online 2013 Jun 15. doi: 10.11613/BM.2013.027

Common sports injuries

Miljenko FranićAuthor information Copyright and License information DisclaimerDubrava University Hospital, ZagrebCorresponding author: rh.dbk@cinarfm©Copyright by Croatian Society of Medical Biochemistry and Laboratory MedicineThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.Sports injuries are injuries that occur in athletic activities and can be broadly classified as either traumatic or overuse injuries. Traumatic injuries because of the dynamic and high collision are nature of some sports. Overuse injuries cause wear and tear on the body, particularly on joints subjected to repeated activity.At every age, competitive and recreational athletes sustain a wide variety of soft tissue, bone, ligament, tendon and nerve injuries, caused by direct trauma or repetitive stress. Different sports are associated with different patterns and types of injuries, whereas age, gender and type of activity influence the prevalence of injuries. Sports trauma commonly affects joints of the extremities or the spine.The hip, knee and ankle are at risk of developing osteoarthritis (OA) after injury or in the presence of malalignment, especially in association with high impact sport. Spine pathologies are associated more commonly with certain sports. Upper extremity syndromes caused by a single stress or by repetitive micro-trauma occur in a variety of sports.Random control trials expose some subjects, but not others, to an intervention. This is more clinical in nature and not typically appropriate for the study of injury patterns. Cohort studies monitor both injured and non-injured athletes, thereby providing results on the effects of participation. Case-control studies monitor only those athletes who suffered an injury. The Ideal study would be Cohort design conducted over several teams, with longitudinal prospective data collection and one recorder where possible, as well as uniformity of injury definition across sports so comparisons between studies can be made accurately.Physical injury is an inherent risk in sports participation and, to a certain extent, must be considered an inevitable cost of athletic training and competition. Injury may lead to incomplete recovery and residual symptoms, drop out from sports, and can cause joint degeneration in the long term.Advances in arthroscopic techniques allow operative management of most intraarticular post-traumatic pathologies in the lower and upper limb joints, but long-term outcomes are not available yet. It is important to balance the negative effects of sports injuries with the many benefits that a serious commitment to sport brings.Biochem Med (Zagreb) 2013 Jun; 23(2): A58–A59. Published online 2013 Jun 15. doi: 10.11613/BM.2013.027

Determination of sample size and number of study groups in sport studies

Mladen PetrovečkiAuthor information Copyright and License information DisclaimerDepartment of Laboratory Diagnosis, Dubrava University Hospital, Zagreb, Croatia, and Department of Medical Informatics, Rijeka University School of Medicine, RijekaCorresponding author: rh.irdem@pnedalmCopyright by Croatian Society of Medical Biochemistry and Laboratory MedicineThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.  相似文献   

8.
Culture is widely acknowledged to be a critical success factor in knowledge management (KM). This paper presents the case of KM implementation at MKS, an IT consulting firm based in India. Although the KM initiative at MKS had many of the hallmarks associated with successful KM projects, the initiative failed to get off the ground due to the absence of a ‘knowledge culture’ within the organisation. Subsequent interviews with MKS staff uncovered a range of cultural themes that appeared to impede the institutionalisation of KM at MKS. These cultural themes included:
  • internal competitiveness among MKS staff resulting in ‘knowledge hoarding’,
  • a lack of personal reward and incentive to engage in knowledge sharing,
  • concerns over job security and the ‘devaluation’ of employees,
  • stigma associated with the reliance on someone else's ideas,
  • preference for a face-to-face mode of knowledge sharing over a tool-supported approach and
  • doubts over the quality of knowledge shared by more junior staff.
  相似文献   

9.
Many operational IR indexes are non-normalized, i.e. no lemmatization or stemming techniques, etc. have been employed in indexing. This poses a challenge for dictionary-based cross-language retrieval (CLIR), because translations are mostly lemmas. In this study, we face the challenge of dictionary-based CLIR in a non-normalized index. We test two optional approaches: FCG (Frequent Case Generation) and s-gramming. The idea of FCG is to automatically generate the most frequent inflected forms for a given lemma. FCG has been tested in monolingual retrieval and has been shown to be a good method for inflected retrieval, especially for highly inflected languages. S-gramming is an approximate string matching technique (an extension of n-gramming). The language pairs in our tests were English–Finnish, English–Swedish, Swedish–Finnish and Finnish–Swedish. Both our approaches performed quite well, but the results varied depending on the language pair. S-gramming and FCG performed quite equally in all the other language pairs except Finnish–Swedish, where s-gramming outperformed FCG.  相似文献   

10.
11.
The paper reports on experiments carried out in transitive translation, a branch of cross-language information retrieval (CLIR). By transitive translation we mean translation of search queries into the language of the document collection through an intermediate (or pivot) language. In our experiments, queries constructed from CLEF 2000 and 2001 Swedish, Finnish and German topics were translated into English through Finnish and Swedish by an automated translation process using morphological analyzers, stopword lists, electronic dictionaries, n-gramming of untranslatable words, and structured and unstructured queries. The results of the transitive runs were compared to the results of the bilingual runs, i.e. runs translating the same queries directly into English. The transitive runs using structured target queries performed well. The differences ranged from −6.6% to +2.9% units (or −25.5% to +7.8%) between the approaches. Thus transitive translation challenges direct translation and considerably simplifies global CLIR efforts.  相似文献   

12.
This paper presents QACID an ontology-based Question Answering system applied to the CInema Domain. This system allows users to retrieve information from formal ontologies by using as input queries formulated in natural language. The original characteristic of QACID is the strategy used to fill the gap between users’ expressiveness and formal knowledge representation. This approach is based on collections of user queries and offers a simple adaptability to deal with multilingual capabilities, inter-domain portability and changes in user information requirements. All these capabilities permit developing Question Answering applications for actual users. This system has been developed and tested on the Spanish language and using an ontology modelling the cinema domain. The performance level achieved enables the use of the system in real environments.  相似文献   

13.
14.

This article develops and tests a model examining the relationship between firm globalization, scope of e-commerce use, and firm performance, using data from a large-scale cross-country survey of firms from three industries. We find that globalization leads to both greater scope of e-commerce use and improved performance, measured as efficiency, coordination, and market impacts. Scope of e-commerce use also leads to greater firm performance of all three types. Globalization has differential effects on B2B and B2C e-commerce, however, such that highly global firms are more likely to do B2B but less likely to do B2C. Our findings provide support for Porter's (1986) Porter, M. E., ed. 1986. Competition in global industries, Boston: Harvard Business School Press.  [Google Scholar] thesis that upstream business activities (namely, B2B) are more global while downstream business activities (B2C) are more local or multidomestic.  相似文献   

15.
Text classification or categorization is the process of automatically tagging a textual document with most relevant labels or categories. When the number of labels is restricted to one, the task becomes single-label text categorization. However, the multi-label version is challenging. For Arabic language, both tasks (especially the latter one) become more challenging in the absence of large and free Arabic rich and rational datasets. Therefore, we introduce new rich and unbiased datasets for both the single-label (SANAD) as well as the multi-label (NADiA) Arabic text categorization tasks. Both corpora are made freely available to the research community on Arabic computational linguistics. Further, we present an extensive comparison of several deep learning (DL) models for Arabic text categorization in order to evaluate the effectiveness of such models on SANAD and NADiA. A unique characteristic of our proposed work, when compared to existing ones, is that it does not require a pre-processing phase and fully based on deep learning models. Besides, we studied the impact of utilizing word2vec embedding models to improve the performance of the classification tasks. Our experimental results showed solid performance of all models on SANAD corpus with a minimum accuracy of 91.18%, achieved by convolutional-GRU, and top performance of 96.94%, achieved by attention-GRU. As for NADiA, attention-GRU achieved the highest overall accuracy of 88.68% for a maximum subsets of 10 categories on “Masrawy” dataset.  相似文献   

16.
L-carnitine is popular as a potential ergogenic aid because of its role in the conversion of fat into energy. The present study was undertaken to investigate the effect of short term supplementation of L-carnitine on metabolic markers and physical efficiency tests under short term calorie restriction. Male albino rats were divided into four groups (n = 12 in each)—control, calorie restricted (CR for 5 days, 25 % of basal food intake), L-carnitine supplemented (CAR, given orally for 5 days at a dose of 100 mg/kg), CR with L-carnitine supplementation (CR + CAR). Food intake and body weight of the rats were measured along with biochemical variables like blood glucose, tissue glycogen, plasma and muscle protein and enzymatic activities of CPT-1 (carnitine palmitoyl transferase-1) and AMP kinase. Results demonstrated that L-carnitine caused marked increase in muscle glycogen, plasma protein, CPT-1 activity and swim time of rats (P < 0.05) on short term supplementation. In addition to the substantive effects caused by CR alone, L-carnitine under CR significantly affected muscle glycogen, plasma protein, CPT-1 activity and AMP kinase (P < 0.05). Short term CR along with L-carnitine also resulted in increased swim time of rats than control, CR and L-carnitine treated rats (P < 0.05). The present study was an attempt towards developing an approach for better adherence to dietary restriction regimen, with the use of L-carnitine.  相似文献   

17.
18.
Ocimum sanctum Linn. (also known as Tulsi) is a sacred Indian plant, the beneficial role of which, in obesity and diabetes is described traditionally. This is a randomized, parallel group, open label pilot study to investigate the effect of O. sanctum on metabolic and biochemical parameters in thirty overweight/obese subjects, divided into two groups A and B. Group A (n = 16) received one 250 mg capsule of Tulsi (O. sanctum) extract twice daily in empty stomach for 8 weeks and group B (n = 14) received no intervention. Statistically significant improvements in the values of serum triglycerides (p = 0.019); low density lipoprotein (p = 0.001); high density lipoprotein (p = 0.001); very low density lipoprotein (p = 0.019); Body Mass Index, BMI (p = 0.005); plasma insulin (p = 0.021) and insulin resistance (p = 0.049) were observed after 8 weeks in the O. sanctum intervention group. The improvement in HDL-C in the intervention group when compared to the control group was also statistically significant (p = 0.037). There was no significant alteration of the liver enzymes SGOT and SGPT in both the intervention (p = 0.141; p = 0.074) and control arms (p = 0.102; p = 0.055) respectively. These observations clearly indicate the beneficial effects of O. sanctum on various biochemical parameters in young overweight/obese subjects.  相似文献   

19.
将大量中英文对照的专利文本作为平行语料库,提出一种自动抽取中英文词典的方法。先利用外部语义资源维基百科构建种子双语词典,再通过计算点互信息获得中英文词对的候补,并设置阈值筛选出用于补充种子词典的词对。实验结果表明:对英语文档进行单词的短语化有助于提高自动抽取结果的综合性能;另一方面,虽然通过句对齐方式可以提高自动抽取结果的正确率,但会对抽取结果的召回率产生负面影响。通过所述方法构建的专利双语词典能够在构建多语言版本的技术知识图谱中起到积极作用。  相似文献   

20.
This paper argues against the moral Turing test (MTT) as a framework for evaluating the moral performance of autonomous systems. Though the term has been carefully introduced, considered, and cautioned about in previous discussions (Allen et al. in J Exp Theor Artif Intell 12(3):251–261, 2000; Allen and Wallach 2009), it has lingered on as a touchstone for developing computational approaches to moral reasoning (Gerdes and Øhrstrøm in J Inf Commun Ethics Soc 13(2):98–109, 2015). While these efforts have not led to the detailed development of an MTT, they nonetheless retain the idea to discuss what kinds of action and reasoning should be demanded of autonomous systems. We explore the flawed basis of an MTT in imitation, even one based on scenarios of morally accountable actions. MTT-based evaluations are vulnerable to deception, inadequate reasoning, and inferior moral performance vis a vis a system’s capabilities. We propose verification—which demands the design of transparent, accountable processes of reasoning that reliably prefigure the performance of autonomous systems—serves as a superior framework for both designer and system alike. As autonomous social robots in particular take on an increasing range of critical roles within society, we conclude that verification offers an essential, albeit challenging, moral measure of their design and performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号