共查询到20条相似文献,搜索用时 265 毫秒
1.
2.
采用多面Rasch模型,以913名高中生为研究对象,从被试、评分员、任务和评分标准四个层面对外语写作思辨能力评价进行效度验证.研究结果显示:(1)包含提出问题、表达观点、提供证据、推理论证、得出结论、阐释评价的评价框架符合多面Rasch模型的测评要求,能体现并合理区分被试的外语写作思辨能力.(2)推理论证和提供证据对测... 相似文献
3.
4.
5.
为克服经典测量理论存在的测量依赖性和样本依赖性,本研究将Rasch模型应用于小学六年级学生科学素养评测的质量分析中,从整体质量检验、单维性检验、怀特图、单题质量分析、气泡图等方面介绍了Rasch模型在质量分析中的应用。同时指出该评测设计的题目信效度高、区分度合理,绝大多数题目达到了测量预期。Rasch模型在评测设计中的应用,为评测设计提供了一定的测量质量数据的参考。 相似文献
6.
7.
本研究利用多面rasch模型(MFRM)评估大学生"多元统计方法分析"课程的能力水平,并分析题目的难度和评分者的严苛度。研究结果显示,多面Rasch分析可以很好地解决开放式考试中对于学科能力的评估,其评估结果与学生的反馈一致。 相似文献
8.
口语考试作为一种相对真实(authentic)和直接(direct)的测试手段,已被越来越广泛地应用于语言测试实践中。然而,在测试过程中引入的主观判断、评分标准和量表的设计与使用等因素,使分数受到更多考生能力以外因素的影响。本研究基于2007年某考点PETS三级口语考试数据,用多侧面Rasch模型(Many-facet Rasch Model,简称MFRM)对这次考试的评分进行了事后质量控制研究。MFRM将语言运用测试多方面因素综合在一个数学模型中,不仅能够把所有侧面在同一标尺下进行衡量,还能对单独侧面,甚至每个个体进行具体分析,有针对性地找到潜在的"问题评分员"和可能被误判的考生,是主观评分环节有效的质量监控手段。 相似文献
9.
10.
Rasch模型应用在试卷质量分析中有如下方法:怀特图(Wright Map)——让读者对试卷的整体情况有一个大致地了解;多维性检验(Multidimensionality Investigations)——考查试卷是否测量被试的同一潜在特质(即阅读能力);项目拟合和误差统计(ITEM:fit order)、气泡图(Bubble Diagram),等等。文章以广西壮族自治区五六年级学生阅读素养前测试卷的质量分析为例,呈现了Rasch模型测评的过程。测评表明,该试题总体上是一套高质量的试卷,试题项目覆盖了所有能力水平的被试,难度编制合理,绝大多数题目达到了预期的测验效果。然而,由于测量目标的不同,Rasch模型功能和指标的选择以及结果的解释都存在相当大的差异,研究者需要基于测量目标进行选择,根据实际情况灵活处理。 相似文献
11.
《教育实用测度》2013,26(3):171-191
The purpose of this study is to describe a Many-Faceted Rasch (FACETS) model for the measurement of writing ability. The FACETS model is a multivariate extension of Rasch measurement models that can be used to provide a framework for calibrating both raters and writing tasks within the context of writing assessment. The use of the FACETS model for solving measurement problems encountered in the large-scale assessment of writing ability is presented here. A random sample of 1,000 students from a statewide assessment of writing ability is used to illustrate the FACETS model. The data suggest that there are significant differences in rater severity, even after extensive training. Small, but statistically significant, differences in writing- task difficulty were also found. The FACETS model offers a promising approach for addressing measurement problems encountered in the large- scale assessment of writing ability through written compositions. 相似文献
12.
Evaluating Rater Accuracy in Performance Assessments 总被引:1,自引:0,他引:1
A new method for evaluating rater accuracy within the context of performance assessments is described. Accuracy is defined as the match between ratings obtained from operational raters and those obtained from an expert panel on a set of benchmark, exemplar, or anchor performances. An extended Rasch measurement model called the FACETS model is presented for examining rater accuracy. The FACETS model is illustrated with 373 benchmark papers rated by 20 operational raters and an expert panel. The data are from the 1993field test of the High School Graduation Writing Test in Georgia. The data suggest that there are statistically significant differences in rater accuracy; the data also suggest that it is easier to be accurate on some benchmark papers than on others. A small example is presented to illustrate how the accuracy ordering of raters may not be invariant over different subsets of benchmarks used to evaluate accuracy. 相似文献
13.
Examining Rater Errors in the Assessment of Written Composition With a Many-Faceted Rasch Model 总被引:2,自引:0,他引:2
This study describes several categories of rater errors (rater severity, halo effect, central tendency, and restriction of range). Criteria are presented for evaluating the quality of ratings based on a many-faceted Rasch measurement (FACETS) model for analyzing judgments. A random sample of 264 compositions rated by 15 raters and a validity committee from the 1990 administration of the Eighth Grade Writing Test in Georgia is used to illustrate the model. The data suggest that there are significant differences in rater severity. Evidence of a halo effect is found for two raters who appear to be rating the compositions holistically rather than analytically. Approximately 80% of the ratings are in the two middle categories of the rating scale, indicating that the error of central tendency is present. Restriction of range is evident when the unadjusted raw score distribution is examined, although this rater error is less evident when adjusted estimates of writing competence are used 相似文献
14.
15.
Kristen di Gennaro 《Assessing Writing》2013,18(2):154-172
A growing body of literature in second-language writing suggests that the writing ability of international second language (L2) learners, who attend post-secondary education abroad after having completed high school in their home countries, and the so-called Generation 1.5 population, that is, L2 learners who enter post-secondary education after attending high school in the new country, differs. The present study provides much-needed empirical evidence concerning potential differences in the writing ability of these two groups. Many-facet Rasch measurement procedures were used to analyze learners’ writing scores in five components, based on a theoretical model of writing ability: grammatical, cohesive, rhetorical, sociopragmatic, and content control. Results revealed that the international learners performed better overall than the Generation 1.5 learners and that the two groups had opposing strengths and weaknesses in grammatical and sociopragmatic control. Language program administrators and practitioners can use these results when designing curricula addressing the needs of diverse groups of L2 learners. 相似文献
16.
The term measurement disturbance has been used to describe systematic conditions that affect a measurement process, resulting in a compromised interpretation of person or item estimates. Measurement disturbances have been discussed in relation to systematic response patterns associated with items and persons, such as start‐up, plodding, boredom, or fatigue. An understanding of the different types of measurement disturbances can lead to a more complete understanding of persons or items in terms of the construct being measured. Although measurement disturbances have been explored in several contexts, they have not been explicitly considered in the context of performance assessments. The purpose of this study is to illustrate the use of graphical methods to explore measurement disturbances related to raters within the context of a writing assessment. Graphical displays that illustrate the alignment between expected and empirical rater response functions are considered as they relate to indicators of rating quality based on the Rasch model. Results suggest that graphical displays can be used to identify measurement disturbances for raters related to specific ranges of student achievement that suggest potential rater bias. Further, results highlight the added diagnostic value of graphical displays for detecting measurement disturbances that are not captured using Rasch model–data fit statistics. 相似文献
17.
The teaching and assessment of essay writing at primary schools throughout Vietnam is regulated by the Ministry of Education
and Training. The analytical error-recognition method of assessment, however, does not facilitate direct interpretation of
students’ writing competence. In this study, which involved samples of Grade 5 students in five provinces in Vietnam, a combination
of traditional and partial credit scoring rubrics was developed to enable data analysis using the Rasch model. Based on such
analysis, a continuum of writing ability at Grade 5 level was identified and a mastery level defined in terms of writing skills.
The study has implications for possible changes in future assessment and marking schemes. 相似文献
18.
Evaluating fifth- and sixth-grade students’ expository writing: task development, scoring, and psychometric issues 总被引:1,自引:0,他引:1
Drawing from multiple theoretical frameworks representing cognitive and educational psychology, we present a writing task
and scoring system for measurement of students’ informative writing. Participants in this study were 72 fifth- and sixth-grade
students who wrote compositions describing real-world problems and how mathematics, science, and social studies information
could be used to solve those problems. Of the 72 students, 69 were able to craft a cohesive response that not only demonstrated
planning in writing structure but also elaboration of relevant knowledge in one or more domains. Many-facet Rasch Modeling
(MFRM) techniques were used to examine the reliability and validity of scores for the writing rating scale. Additionally,
comparison of fifth- and sixth-grade responses supported the validity of scores, as did the results of a correlational analysis
with scores from an overall interest measure. Recommendations for improving writing scoring systems based on the findings
of this investigation are provided. 相似文献
19.
Vahid Aryadoust 《Pedagogies: An International Journal》2017,12(2):151-179
This study adapts Levels 1 and 2 of Kirkpatrick’s model of training evaluation to evaluate learning outcomes of an English as a second language (ESL) paragraph writing course offered by a major Asian university. The study uses a combination of surveys and writing tests administered at the beginning and end of the course. The survey evaluated changes in students’ perception of their skills, attitude, and knowledge (SAK), and the writing tests measured their writing ability. Rasch measurement was applied to examine the psychometric validity of the instruments. The measured abilities were successively subjected to path modeling to evaluate Levels 1 and 2 of the model. The students reported that the module was enjoyable and useful. In addition, their self-perceived level of skills and knowledge developed across time alongside their writing scores but their attitude remained unchanged. Limitations of Kirkpatrick’s model as well as lack of solid frameworks for evaluating educational effectiveness in applied linguistics are discussed. 相似文献