首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
概化理论(GT)提出了新的测量信度观并逐渐应用于大规模考试领域。本文将概化理论思想与方法引入自学考试信度分析研究与实践,提出了自学考试信度分析的概化理论研究框架。本框架对自学考试信度分析的基本任务与流程进行了梳理与总结,对一元概化理论(UGT)模型与多元概化理论(MGT)模型的应用进行了整合,选择了更为合理的自学考试信度指标并探讨了及格线决策信度的考察思路,为自学考试研究者进行概化理论研究提供一定程度的参考。  相似文献   

2.
概化理论提出了新的测量信度观并逐渐应用于大规模考试领域。本文运用多元概化理论对高等教育自学考试课程《综合英语(四)》听力测验的测量信度、测验总分合成、题型设计优化等问题进行探讨。研究发现:本次听力测验的总体测量信度较高,但"短文理解"分测验的测量信度较低;各分测验对全域总分的方差贡献比例与测验编制时的赋分比例存在一定差距;在删除"短文理解"分测验后将各分测验题量同时增至10题,可有效提高听力测验的测量信度。  相似文献   

3.
概化理论作为新一代测量理论逐渐应用于大规模考试领域。文章运用多元概化理论对自学考试课程《英语水平考试(一)笔试》试卷的测量信度、试卷总分合成、及格线决策信度、试卷结构优化等问题进行探讨。研究发现:本次考试的测量信度较高;各分测验对全域总分的方差贡献比例与试卷赋分意图基本一致;该试卷以60分作为及格线具有较高的决策信度;将各分测验题量同时增至15题或单独将词汇分测验题量增至20题,可有效提高测量信度。  相似文献   

4.
用概化理论分析高校教师教学水平评估   总被引:4,自引:1,他引:3  
用现代测量理论--概化理论对高校教师教学水平进行评估,提出改进性建议.使用自编的教师教学水平评估问卷,要求543名学生对大学外语部16名教师进行评价,对收集的数据作嵌套设计的多元概化分析.评估的可靠性较高,但某些指标可靠性不高;评估问卷原定指标权重不是最佳权重,通过改变权重可以提高评估的可靠性.  相似文献   

5.
白娟 《考试研究》2013,(1):51-57
全国硕士研究生入学中医综合考试,是为高等院校和科研院所招收中医药学专业硕士研究生而设置的、具有选拔性质的全国统一入学考试科目。本研究运用多元概化理论评估2012年中医综合考试的总体信度、试卷结构及二级学科分配比例的合理性。结果表明:(1)从考查的学科内容看,方剂学、中药学、针灸学、中医内科、中医诊断学的测量精度较高,而中医基础理论的测量精度相对偏低,可通过适当提高该学科试题的难度和区分度以增加测量精度;(2)从设置的题型看,各题型的测量精度均较高,各题型的分量分布较适当。  相似文献   

6.
Accepting that school based assessment may have the potential to bring additional reliability to the assessment outcomes of an educational system, this research uses Generalizability Theory to address the question “why school based assessment is not a universal feature of high stakes assessment systems”? Three major issues are identified: (a) there is a conflict between the psychometric model and classroom assessment practice; (b) different schools are not equally effective; and, (c) teachers’ judgments are frequently accused of being biased. The role of public examination boards is discussed in this context.  相似文献   

7.
本研究以韩国某集团于某年4月和10月,对同一批员工进行的两次C.TEST口语面试的数据为对象,分析口语测试的信度,以及这批考生在两次口语面试中的等级水平变化。肯德尔W系数的信度检验结果表明,绝大多数面试官之间的评分一致性比较高。概化理论分析后得出的概化系数显示,两次口语面试的测试信度都比较理想。通过对比分析同一批考生在两次口语面试中的成绩变化,本文认为,处于工作环境下的不同水平的韩国考生的汉语口语习得水平变化不一致,初级阶段的考生口语水平提高明显,中高级阶段的考生口语水平变化不大,保持稳定。  相似文献   

8.
Peer and self‐ratings have been strongly recommended as the means to adjust individual contributions to group work. To evaluate the quality of student ratings, previous research has primarily explored the validity of these ratings, as indicated by the degree of agreement between student and teacher ratings. This research describes a Generalizability Theory framework to evaluate the reliability of student ratings in terms of the degree of consistency among students themselves, as well as group and rater effects. Ratings from two group projects are analyzed to illustrate how this method can be applied. The reliability of student ratings differs for the two group projects considered in this research. While a strong group effect is present in both projects, the rater effect is different. Implications of this research for classroom assessment practice are discussed.  相似文献   

9.
现代远程教育的人才培养质量如何,这是社会各界及教育主管部门普遍关注的话题。本研究以全国电大系统的开放教育毕业生追踪调查(2004)数据为样本(N=15,602),运用多元概化理论和技术,对调查工具之一《电大学习效果评价表》的科学性和有效性进行验证。结果表明,《电大学习效果评价表》的测量精度比较高;从各维度对总方差的贡献比例的角度确定合成分数的权系数更科学;原评价表各维度的项目数量是比较合理的,但适当增加"业绩"的项目数量(6个),测量精度会比原始测量精度有所提高而且经济。  相似文献   

10.
测验长度(test length)是影响语言测试信度和效度的重要因素之一。本文借助概化理论(Generalizability Theory,GT)的固定侧面s×(i:p)嵌套设计和边际效用递减法则(the Law of Diminishing Marginal Utility),对中国汉语水平考试(HSK[中级])的测验长度进行了实证研究。研究结果显示:由130题构成的HSK[中级]测验具有相当高的测验信度,概化系数(Eρ2)可达0.8890,即使将测验的题目数量减少至120题或110题,测验的概化系数仍可以达到0.8856和0.8816(分别降低了0.38%和0.83%),这种测验长度的缩减不仅明显地降低了研发成本,而且提高了测试效率,完全能够满足标准化考试在误差控制方面的较高要求,并确保测验结果和分数解释具有较高的信度和效度。  相似文献   

11.
用多元概化理论变革命题方法   总被引:1,自引:0,他引:1  
目前测验编制中双向细目表存在明显的缺陷。三向细目表克服了双向细目表的缺陷。可从多元概化理论的角度提出和论证三向细目表的命题方法。该方法对我国心理与教育测验的研究和应用具有重要价值。  相似文献   

12.
对教育测量理论的发展进行了综述,分析了经典测验理论、概化理论、项目反应理论和认知诊断理论的理论基础与实践应用优缺点,进而探讨了理论假设存在的问题.  相似文献   

13.
Examining the validity of a theory scientifically requires careful attention to how one interprets data. Unfortunately the process is not clear-cut; the theoretical meaning of empirical data is not obvious. The experimental study by Avery et al. (1976) of empathic understanding (EU) is analyzed along with Horwitz's (1977) rejoinder. Generalizability theory and a multimethod strategy are recommended as ways to clarify reliability and data interpretation problems.  相似文献   

14.
Although federal regulations require testing students with severe cognitive disabilities, there is little guidance regarding how technical quality should be established. It is known that challenges exist with documentation of the reliability of scores for alternate assessments. Typical measures of reliability do little in modeling multiple sources of error, which are characteristic of alternate assessments. Instead, Generalizability theory (G-theory) allows researchers to identify sources of error and analyze the relative contribution of each source. This study demonstrates an application of G-theory to examine reliability for an alternate assessment. A G-study with the facets rater type, assessment attempts, and tasks was examined to determine the relative contribution of each to observed score variance. Results were used to determine the reliability of scores. The assessment design was modified to examine how changes might impact reliability. As a final step, designs that were deemed satisfactory were evaluated regarding the feasibility of adapting them into a statewide standardized assessment and accountability program.  相似文献   

15.
使用多元概化理论对教育教学能力测试实测数据进行了分析。结果表明:1.教育教学能力测试将说课、答辩和面试成绩进行合成是比较合理的;2.三个测评任务中说课和答辩的评分质量较好,面试评分的质量较差;3.总体测试结果较适合于相对决策,做绝对决策时要谨慎处理;4.影响教育教学能力测试质量的主要原因是评分者宽严程度不一;5.增加评分员可以提高测试的精度,但增幅递减,当评分员人数为5时能够较好满足测试要求。  相似文献   

16.
ABSTRACT

Generalizability Theory (GT) offers several advantages relevant to educational contexts, including the fact that it can be used to estimate multiple sources of variance (e.g., raters, forms) simultaneously and to derive coefficients for the purposes of both relative and absolute decision making. Although GT has been increasingly applied in recent years to assessment data in K–12 school settings, analysis and critique of its application has not yet taken place. The goals of the current article were therefore to (1) undertake a systematic review of the school-based assessment literature in order to identify relevant applications of this statistical framework and (2) assess the degree of consistency between methodological recommendations for use of GT and reporting practices within this literature. In addition to describing the current state of this literature, suggestions for strengthening future applications are provided.  相似文献   

17.
本文以某届国际奥林匹克运动会女子跳水决赛为例,综合应用CTT、GT和IRT三大测量理论进行评分者信度分析,从不同角度揭示评分者之间和评分者内部的差异情况。结果表明:CTT的评分者信度分别为0.981和078;GT的概化系数和可靠性指数分别为0.8279和0.8271,比赛中所采用的7名评委分别对选手在5轮上的跳水表现进行评定的决策是比较适宜的决策;在IRT中,相对而言,评委5在7名评委中最为严厉,评委2最为宽松,但评委之间在宽严程度上的差异不显著,评委1和评委4在自身一致性上存在问题,不同评委在评定不同选手、不同难度系数动作和不同轮数上存在偏差,但未达到显著性水平。基于本文的分析,可以了解三种评分者信度分析方法的特点及各自优势,为评分者培训和提高评分信度提供有用信息。  相似文献   

18.
ABSTRACT

The authors address the reliability of scores obtained on the summative performance assessments during the pilot year of our research. Contrary to classical test theory, we discussed the advantages of using generalizability theory for estimating reliability of scores for summative performance assessments. Generalizability theory was used as the framework because of the flexibility this approach provides for examining sources of inconsistency within a complex assessment. Two major sources of inconsistency on scores considered in this study were raters and agencies (teachers' rating vs. researchers' rating). Overall, results showed that the inconsistency in scores attributable to raters and agencies was relatively small. Suggestions regarding improvement of consistency in the subsequent years of our research were provided.  相似文献   

19.
Observational measures can add objective data to both research and clinical evaluations of children’s behavior in the classroom. However, they pose challenges for training and attaining high levels of interrater reliability between observers. The Behavioral Observation of Students in Schools (BOSS) is a commonly used school-based observation instrument that is well adapted to measure symptoms of attention deficit/hyperactivity disorder (ADHD) in the classroom setting. Reliable use of the BOSS for clinical or research purposes requires training to reach reliable standards (kappa?≥?0.80). The current study conducted training observations in one suburban and one urban elementary school in the Greater Boston area. To enhance interrater reliability and reduce training time, supplemental guidelines, including 30 additional rules to follow, were developed over two consecutive school years. The complete protocol was then used for training in the third school year. To reach sufficient interrater reliability (kappa?≥?0.80) during training, 45 training observations were required in the first year while, in the third year, only 17 observations were required. High interrater reliability was sustained after training across all three school years, accumulating a total of 1,001 post-training observations. It is estimated that clinicians or researchers following this proposed protocol, who are naive to the BOSS, will require approximately 30 training observations to reach proficient reliability. We believe this protocol will make the BOSS more accessible for clinical and research usage, and the procedures used to obtain high interrater reliability using the BOSS are broadly applicable to a variety of observational measures.  相似文献   

20.
A pilot study was conducted to evaluate and improve the rating procedure proposed for use in a research effort designed to assess the essay writing ability of college sophomores.Generalizability theory and the Many-Facet Rasch Model were each used to (a) estimate potential sources of error in the rating, (b) to obtain reliability estimates, and (c) to make recommendations for improving the rating process. Variance due to Task (writing prompt) and the Person-by-Task interaction were high while the variance attributable to Raters and Occasion was low. Twenty-two percent of the variability in the ratings was unexplained. The common and unique features of generalizability theory and the Many-Facet Rasch Model are described, and the advantages and disadvantages of each are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号