首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Aspects of Swedish morphology and semantics from the perspective of mono- and cross-language information retrieval
Institution:1. Institute of Environmental Research (INFU), Department of Chemistry and Chemical Biology, Chair of Environmental Chemistry and Analytical Chemistry, TU Dortmund, Otto-Hahn-Str. 6, 44221 Dortmund, Germany;2. Department of Chemistry, University of Botswana, Private Bag 0022, Gaborone, Botswana;3. Georg-August University Göttingen, Institute for Organic and Biomolecular Chemistry, Tammannstraβe 2, D-37077 Göttingen, Germany;1. Institute of Mathematics “Simion Stoilow” of the Romanian Academy, P.O. Box 1-764, RO-014700 Bucharest, Romania;2. Department of Mathematics, Quaid-i-Azam University, Islamabad 45320, Pakistan;1. Department of Family Medicine, Oregon Health and Science University, Portland, Oregon;2. Department of Public Health and Preventive Medicine, Oregon Health and Science University, Portland, Oregon;3. Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon;4. Department of Medicine, Oregon Health and Science University, Portland, Oregon;5. Department of Internal Medicine, University of Washington School of Medicine, Seattle, Washington;6. Department of Medicine, University of Washington School of Medicine, Seattle, Washington;7. Pathology Associates, Clovis, California;8. Dermatopathology Northwest, Bellevue, Washington;9. Department of Pathology, University of California, Los Angeles, California;10. Department of Pathology, Institut Curie, Paris, France;11. Pathology, University of Pennsylvania, Philadelphia, Pennsylvania;12. Family Medicine, University of Vermont, Burlington, Vermont;13. Epidemiology and of Pediatrics, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire;14. Dermatology and Epidemiology, Center for Dermatoepidemiology, Department of Veterans Affairs Medical Center, Providence, Rhode Island;15. Department of Dermatology, Rhode Island Hospital, Providence, Rhode Island;p. Departments of Dermatology and Epidemiology, Brown University, Providence, Rhode Island;q. Cancer Prevention and Screening, Providence Cancer Center, Providence Health and Services Oregon, Portland, Oregon
Abstract:This paper analyzes the features of the Swedish language from the viewpoint of mono- and cross-language information retrieval (CLIR). The study was motivated by the fact that Swedish is known poorly from the IR perspective. This paper shows that Swedish has unique features, in particular gender features, the use of fogemorphemes in the formation of compound words, and a high frequency of homographic words. Especially in dictionary-based CLIR, correct word normalization and compound splitting are essential. It was shown in this study, however, that publicly available morphological analysis tools used for normalization and compound splitting have pitfalls that might decrease the effectiveness of IR and CLIR. A comparative study was performed to test the degree of lexical ambiguity in Swedish, Finnish and English. The results suggest that part-of-speech tagging might be useful in Swedish IR due to the high frequency of homographic words.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号