首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 265 毫秒
1.
在作文评分中.评分员容易出现”趋中评分”.即打”中间分”的现象。趋中评分是评分过程中形成的一种系统性误差,它在一定程度上会影响作文评分的质量。基于多面Rasch模型,采用MHK(中国少数民族汉语水平等级考试)三级考试的实测数据.以FACETS为统计软件.展开的一项研究表明,  相似文献   

2.
采用多面Rasch模型,以913名高中生为研究对象,从被试、评分员、任务和评分标准四个层面对外语写作思辨能力评价进行效度验证.研究结果显示:(1)包含提出问题、表达观点、提供证据、推理论证、得出结论、阐释评价的评价框架符合多面Rasch模型的测评要求,能体现并合理区分被试的外语写作思辨能力.(2)推理论证和提供证据对测...  相似文献   

3.
本研究运用多面Rasch模型对比分析了大、小两种评分量表下评分员的评分效应。结果显示,与小尺度评分量表相比,评分员在大尺度评分量表下不能使用全距分值,容易给出趋中分数;而且,在大尺度评分量表下评分员间的评分一致性较差。据此,提出应改进我国各项考试中写作评分量表的设置,并单独报告写作成绩的建议。  相似文献   

4.
本研究收集评分员对130篇大学专业英语八级考试(TEM-8)作文的评分数据,采用多面Rasch模型分析法以及有声思维法收集证据对TEM-8作文评分标准进行了多维度效度验证。结果表明,评分标准大体上能够反映写作理论构念,评分尺度划分较为合理;大部分评分员能够有效使用评分标准进行评分,可信度较高。  相似文献   

5.
为克服经典测量理论存在的测量依赖性和样本依赖性,本研究将Rasch模型应用于小学六年级学生科学素养评测的质量分析中,从整体质量检验、单维性检验、怀特图、单题质量分析、气泡图等方面介绍了Rasch模型在质量分析中的应用。同时指出该评测设计的题目信效度高、区分度合理,绝大多数题目达到了测量预期。Rasch模型在评测设计中的应用,为评测设计提供了一定的测量质量数据的参考。  相似文献   

6.
多面Rasch模型(MFRM)是Rasch单参数模型的延伸,对于检测不同测量层面一致性差异具有很好的支持作用。本文聚焦于运用多面测量模型的分析软件FACETS,重点阐述如何利用TXT数据文件和Excel数据文件生成控制数据文件的方法步骤。力求通过详细的过程介绍并配合图文说明,使学习者对Facets控制数据文件生成有更加清晰的认识,为后续应用Rasch模型开展数据分析奠定基础。  相似文献   

7.
本研究利用多面rasch模型(MFRM)评估大学生"多元统计方法分析"课程的能力水平,并分析题目的难度和评分者的严苛度。研究结果显示,多面Rasch分析可以很好地解决开放式考试中对于学科能力的评估,其评估结果与学生的反馈一致。  相似文献   

8.
张洁 《考试研究》2008,(4):65-78
口语考试作为一种相对真实(authentic)和直接(direct)的测试手段,已被越来越广泛地应用于语言测试实践中。然而,在测试过程中引入的主观判断、评分标准和量表的设计与使用等因素,使分数受到更多考生能力以外因素的影响。本研究基于2007年某考点PETS三级口语考试数据,用多侧面Rasch模型(Many-facet Rasch Model,简称MFRM)对这次考试的评分进行了事后质量控制研究。MFRM将语言运用测试多方面因素综合在一个数学模型中,不仅能够把所有侧面在同一标尺下进行衡量,还能对单独侧面,甚至每个个体进行具体分析,有针对性地找到潜在的"问题评分员"和可能被误判的考生,是主观评分环节有效的质量监控手段。  相似文献   

9.
Rasch模型在研究生入学考试质量分析中的应用   总被引:1,自引:0,他引:1  
运用Rasch模型对2010年全国硕士研究生入学考试心理学专业基础综合考试进行分析。结果表明,该试题总体上是一套高质量的测验,试题的内容覆盖了所有能力水平的考生,且能够较好地区分考生的能力水平,达到了预期的选拔目的。但通过Rasch分析也发现,在试题中有个别题目没有达到预期的测量目标,可以考虑在今后的工作中对其做出相应的修改。基于Rasch模型的试题分析能为考生能力和试题质量分析提供更多的测量信息。  相似文献   

10.
Rasch模型应用在试卷质量分析中有如下方法:怀特图(Wright Map)——让读者对试卷的整体情况有一个大致地了解;多维性检验(Multidimensionality Investigations)——考查试卷是否测量被试的同一潜在特质(即阅读能力);项目拟合和误差统计(ITEM:fit order)、气泡图(Bubble Diagram),等等。文章以广西壮族自治区五六年级学生阅读素养前测试卷的质量分析为例,呈现了Rasch模型测评的过程。测评表明,该试题总体上是一套高质量的试卷,试题项目覆盖了所有能力水平的被试,难度编制合理,绝大多数题目达到了预期的测验效果。然而,由于测量目标的不同,Rasch模型功能和指标的选择以及结果的解释都存在相当大的差异,研究者需要基于测量目标进行选择,根据实际情况灵活处理。  相似文献   

11.
《教育实用测度》2013,26(3):171-191
The purpose of this study is to describe a Many-Faceted Rasch (FACETS) model for the measurement of writing ability. The FACETS model is a multivariate extension of Rasch measurement models that can be used to provide a framework for calibrating both raters and writing tasks within the context of writing assessment. The use of the FACETS model for solving measurement problems encountered in the large-scale assessment of writing ability is presented here. A random sample of 1,000 students from a statewide assessment of writing ability is used to illustrate the FACETS model. The data suggest that there are significant differences in rater severity, even after extensive training. Small, but statistically significant, differences in writing- task difficulty were also found. The FACETS model offers a promising approach for addressing measurement problems encountered in the large- scale assessment of writing ability through written compositions.  相似文献   

12.
Evaluating Rater Accuracy in Performance Assessments   总被引:1,自引:0,他引:1  
A new method for evaluating rater accuracy within the context of performance assessments is described. Accuracy is defined as the match between ratings obtained from operational raters and those obtained from an expert panel on a set of benchmark, exemplar, or anchor performances. An extended Rasch measurement model called the FACETS model is presented for examining rater accuracy. The FACETS model is illustrated with 373 benchmark papers rated by 20 operational raters and an expert panel. The data are from the 1993field test of the High School Graduation Writing Test in Georgia. The data suggest that there are statistically significant differences in rater accuracy; the data also suggest that it is easier to be accurate on some benchmark papers than on others. A small example is presented to illustrate how the accuracy ordering of raters may not be invariant over different subsets of benchmarks used to evaluate accuracy.  相似文献   

13.
This study describes several categories of rater errors (rater severity, halo effect, central tendency, and restriction of range). Criteria are presented for evaluating the quality of ratings based on a many-faceted Rasch measurement (FACETS) model for analyzing judgments. A random sample of 264 compositions rated by 15 raters and a validity committee from the 1990 administration of the Eighth Grade Writing Test in Georgia is used to illustrate the model. The data suggest that there are significant differences in rater severity. Evidence of a halo effect is found for two raters who appear to be rating the compositions holistically rather than analytically. Approximately 80% of the ratings are in the two middle categories of the rating scale, indicating that the error of central tendency is present. Restriction of range is evident when the unadjusted raw score distribution is examined, although this rater error is less evident when adjusted estimates of writing competence are used  相似文献   

14.
15.
A growing body of literature in second-language writing suggests that the writing ability of international second language (L2) learners, who attend post-secondary education abroad after having completed high school in their home countries, and the so-called Generation 1.5 population, that is, L2 learners who enter post-secondary education after attending high school in the new country, differs. The present study provides much-needed empirical evidence concerning potential differences in the writing ability of these two groups. Many-facet Rasch measurement procedures were used to analyze learners’ writing scores in five components, based on a theoretical model of writing ability: grammatical, cohesive, rhetorical, sociopragmatic, and content control. Results revealed that the international learners performed better overall than the Generation 1.5 learners and that the two groups had opposing strengths and weaknesses in grammatical and sociopragmatic control. Language program administrators and practitioners can use these results when designing curricula addressing the needs of diverse groups of L2 learners.  相似文献   

16.
The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start‐up, plodding, boredom, or fatigue. An understanding of the different types of measurement disturbances can lead to a more complete understanding of persons or items in terms of the construct being measured. Although measurement disturbances have been explored in several contexts, they have not been explicitly considered in the context of performance assessments. The purpose of this study is to illustrate the use of graphical methods to explore measurement disturbances related to raters within the context of a writing assessment. Graphical displays that illustrate the alignment between expected and empirical rater response functions are considered as they relate to indicators of rating quality based on the Rasch model. Results suggest that graphical displays can be used to identify measurement disturbances for raters related to specific ranges of student achievement that suggest potential rater bias. Further, results highlight the added diagnostic value of graphical displays for detecting measurement disturbances that are not captured using Rasch model–data fit statistics.  相似文献   

17.
The teaching and assessment of essay writing at primary schools throughout Vietnam is regulated by the Ministry of Education and Training. The analytical error-recognition method of assessment, however, does not facilitate direct interpretation of students’ writing competence. In this study, which involved samples of Grade 5 students in five provinces in Vietnam, a combination of traditional and partial credit scoring rubrics was developed to enable data analysis using the Rasch model. Based on such analysis, a continuum of writing ability at Grade 5 level was identified and a mastery level defined in terms of writing skills. The study has implications for possible changes in future assessment and marking schemes.  相似文献   

18.
Drawing from multiple theoretical frameworks representing cognitive and educational psychology, we present a writing task and scoring system for measurement of students’ informative writing. Participants in this study were 72 fifth- and sixth-grade students who wrote compositions describing real-world problems and how mathematics, science, and social studies information could be used to solve those problems. Of the 72 students, 69 were able to craft a cohesive response that not only demonstrated planning in writing structure but also elaboration of relevant knowledge in one or more domains. Many-facet Rasch Modeling (MFRM) techniques were used to examine the reliability and validity of scores for the writing rating scale. Additionally, comparison of fifth- and sixth-grade responses supported the validity of scores, as did the results of a correlational analysis with scores from an overall interest measure. Recommendations for improving writing scoring systems based on the findings of this investigation are provided.  相似文献   

19.
This study adapts Levels 1 and 2 of Kirkpatrick’s model of training evaluation to evaluate learning outcomes of an English as a second language (ESL) paragraph writing course offered by a major Asian university. The study uses a combination of surveys and writing tests administered at the beginning and end of the course. The survey evaluated changes in students’ perception of their skills, attitude, and knowledge (SAK), and the writing tests measured their writing ability. Rasch measurement was applied to examine the psychometric validity of the instruments. The measured abilities were successively subjected to path modeling to evaluate Levels 1 and 2 of the model. The students reported that the module was enjoyable and useful. In addition, their self-perceived level of skills and knowledge developed across time alongside their writing scores but their attitude remained unchanged. Limitations of Kirkpatrick’s model as well as lack of solid frameworks for evaluating educational effectiveness in applied linguistics are discussed.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号