首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
Classical test theory (CTT), generalizability theory (GT), and multi-faceted Rasch model (MFRM) approaches to detecting and correcting for rater variability were compared. Each of 4,930 students' responses on an English examination was graded on 9 scales by 3 raters drawn from a pool of 70. CTT and MFRM indicated substantial variation among raters; the MFRM analysis identified far more raters as different than the CTT analysis did. In contrast, the GT rater variance component and the Rasch histograms suggested little rater variation. CTT and MFRM correction procedures both produced different scores for more than 50% of the examinees, but 75% of the examinees received identical results after each correction. The demonstrated value of a correction for systems of well-trained multiple graders has implications for all systems in which subjective scoring is used.  相似文献   

2.
Proponents of performance assessments purport that they allow more options for student choice and autonomy and, therefore, are more motivating and more preferred by students. This study explored the role of stakes and the student’s familiarity with the format in these examination preferences. A survey of 148 college students suggested that: their familiarity with open-ended assessments led students to prefer them without reference to stakes; they tended to prefer closed assessments when the stakes are low but open formats when the stakes are high; and their goal orientation had no relationship with these decisions. In the end, students seemed to be rather pragmatic and protective of their grades regardless of their goal orientations, which is only natural. The students’ goal orientations appear, then, to be desiderata for both them and their instructors, but a graded environment stifles their full operation.  相似文献   

3.
Rater training is an important part of developing and conducting large‐scale constructed‐response assessments. As part of this process, candidate raters have to pass a certification test to confirm that they are able to score consistently and accurately before they begin scoring operationally. Moreover, many assessment programs require raters to pass a calibration test before every scoring shift. To support the high‐stakes decisions made on the basis of rater certification tests, a psychometric approach for their development, analysis, and use is proposed. The circumstances and uses of these tests suggest that they are expected to have relatively low reliability. This expectation is supported by empirical data. Implications for the development and use of these tests to ensure their quality are discussed.  相似文献   

4.
This paper continues an exchange between its author and Andrew Davis. Part I addresses the attribution and ontological status of mental constructs and argues that philosophical work on these topics does not undermine high stakes testing. Part II examines the significance for testing of the connectedness of meaningful learning. Part III addresses the high stakes in high stakes testing in connection with the risk entailed by limited scoring reliability. It concludes that there is no straightforward relationship between the magnitude of what is at stake for students and teachers and the threshold of acceptable reliability in scoring.  相似文献   

5.
The study reported here aimed to establish whether the stakes of examinations taken by students in the final two years of compulsory education in the UK were associated with degree of self‐reported examination anxiety, and whether examination stakes moderated the anxiety–examination grade relationship. Data were collected from 615 students who were due to take examinations conceptualised as high stakes (a terminal examination), mid stakes (a modular examination), or low stakes (a mock examination). Findings suggested that students reported the lowest levels of anxiety and attained the highest grades in the mid stakes examination. Regression analysis suggested that examination stakes do moderate the inverse anxiety–grade relationship, but the effect for high stakes examinations was not in the expected direction. Results are interpreted in the context of limitations to this study’s design. Factors associated with the different timing of the examinations may have influenced results. Due to design limitations, these findings should only be considered provisional and an attempt should be made to replicate the findings using a more robust design. This study highlights the difficulties with designing studies and collecting data in an applied educational context.  相似文献   

6.
There has been a growing research interest in the identification and management of disengaged test taking, which poses a validity threat that is particularly prevalent with low‐stakes tests. This study investigated effort‐moderated (E‐M) scoring, in which item responses classified as rapid guesses are identified and excluded from scoring. Using achievement test data composed of test takers who were quickly retested and showed differential degrees of disengagement, three basic findings emerged. First, standard E‐M scoring accounted for roughly one‐third of the score distortion due to differential disengagement. Second, a modified E‐M scoring method that used more liberal time thresholds performed better—accounting for two‐thirds or more of the distortion. Finally, the inability of E‐M scoring to account for all of the score distortion suggests the additional presence of nonrapid item responses that reflect less‐than‐full engagement by some test takers.  相似文献   

7.
This paper describes a procedure for automated test forms assembly based on Classical Test Theory (CTT). The procedure uses stratified random content sampling and test form pre-equating to ensure both content and psychometric equivalence in generating virtually unlimited parallel forms. The procedure extends the usefulness of CTT in automated test form construction, yielding classical item statistics based on representative sample distributions and pre-equated test forms with known psychometric characteristics. A rationale for the procedure is presented followed by an example application and discussion of psychometric considerations related to its use.  相似文献   

8.
中国古老的科举制度具有1300年的历史,在促进社会进步和文化繁荣的同时也产生许多负面社会问题,最终积重难返并导致自身的覆灭。美国高利害测验项目是联邦政府近十年来为了推行教育问责制度而在全国推广的州级统考和高中毕业考试,在取得一定正面效果的同时也产生了许多负面影响。由于高利害测验在美国的实施时间相对较短,其长期影响尚未可知,但目前的一些弊端与科举流弊具有高度相似性。了解科举的负面效果对预见美国高利害测验未来可能产生的危害具有借鉴意义,便于美国政府防患于未然。  相似文献   

9.
Although online education is popularized, it is in a developing stage that continues to struggle with communicating and engaging with students. The question remains on how students can be better engaged in online educational materials that are presented in asynchronous media, especially in lecture videos. Thus, using engagement theory, the present study explored how online lecture videos can be improved by incorporating entertainment education. Using a public lecture video found on YouTube, an online survey (N = 133) was conducted to identify digital storytelling techniques and their effects. Results revealed that these techniques that are often utilized in entertainment became meaningful components to increase student engagement and learning outcomes. However, they can also negatively affect instructor credibility, which could suggest the need to increase instructors’ skills. The implications for the development of lecture videos using entertainment and its potential to positively impact online education are discussed.  相似文献   

10.
Objective testing techniques, such as multiple-choice examinations, are a widely accepted method of assessment in gross anatomy. In order to deter cheating on these types of examinations, instructors often design several versions of an examination to distribute. These versions usually involve the rearrangement of questions and their corresponding answer choices. This study will determine whether the distribution of different versions of an examination affects student performance in a lower division anatomical science course. Students who receive the original version of an examination may be at an advantage over those that receive a shuffled version of an examination because of the systematic tendencies that go into examination construction. This study concludes that the shuffling of questions and answer choices to produce multiple versions of an examination does not affect student performance.  相似文献   

11.
Over recent years, UK medical schools have moved to more integrated summative examinations. This paper analyses data from the written assessment of undergraduate medical students to investigate two key psychometric aspects of this type of high-stakes assessment. Firstly, the strength of the relationship between examiner predictions of item performance (as required under the Ebel standard setting method employed) and actual item performance (‘facility’) in the examination is explored. It is found that there is a systematic pattern of difference between these two measures, with examiners tending to underestimate the difficulty of items classified as relatively easy, and overestimating that of items classified harder. The implications of these differences for standard setting are considered. Secondly, the integration of the assessment raises the question as to whether the student total score in the exam can provide a single meaningful measure of student performance across a broad range of medical specialties. Therefore, Rasch measurement theory is employed to evaluate psychometric characteristics of the examination, including its dimensionality. Once adjustment is made for item interdependency, the examination is shown to be unidimensional with fit to the Rasch model implying that a single underlying trait, clinical knowledge, is being measured.  相似文献   

12.
The rise of computer‐based testing has brought with it the capability to measure more aspects of a test event than simply the answers selected or constructed by the test taker. One behavior that has drawn much research interest is the time test takers spend responding to individual multiple‐choice items. In particular, very short response time—termed rapid guessing—has been shown to indicate disengaged test taking, regardless whether it occurs in high‐stakes or low‐stakes testing contexts. This article examines rapid‐guessing behavior—its theoretical conceptualization and underlying assumptions, methods for identifying it, misconceptions regarding its dynamics, and the contextual requirements for its proper interpretation. It is argued that because it does not reflect what a test taker knows and can do, a rapid guess to an item represents a choice by the test taker to momentarily opt out of being measured. As a result, rapid guessing tends to negatively distort scores and thereby diminish validity. Therefore, because rapid guesses do not contribute to measurement, it makes little sense to include them in scoring.  相似文献   

13.
Because the psychological assessment of high ability usually concentrates on intelligence testing, it is pertinent to discuss the validity of intelligence test batteries. The well‐known Wechsler's scales are analyzed and evaluated. Based on psychometric models, especially the Rasch model, analyses are made of some German editions, which show that hardly a single subtest scores fairly. That is, the true extent of testees’ abilities will not be correctly represented by the scores obtained under current scoring rules. Since many of the items of the analyzed editions correspond to items of the American edition (WISC‐R), the same shortcomings must also be suspect for that test battery. In this light, the administration of these tests is no longer acceptable. However, it is shown that Wechsler's basic concept is worthwhile when accompanied by (modern) psychometric tools: a new (German) test battery, AID, is introduced which, in particular, conforms to economic requirements if high ability is to be assessed.  相似文献   

14.
相关研究表明,IRT在教育考试评价中比CTT具有诸多优点。本文以某地区高考数学考试数据为基础,比较CTT与IRT在项目参数、评价方式、精度估计三个方面之间的差异。研究结果证明,在IRT下参数更容易反映观测各个项目的特征属性,IRT参数比CTT参数更具精确性,项目信息函数能更好的反映试题信息;CTT与IRT的评价方式不同,IRT下的能力分数优于CTT下的测验分数,更能反映学生能力水平;CTT与IRT精度估计不同,IRT测验信息函数和能力置信区间比CTT有更好的精度。实证展示出IRT在高考数学考试评价中的优越性,具有重要的价值和应用前景。  相似文献   

15.
浅谈大学生情感教育的策略与技巧   总被引:4,自引:0,他引:4  
以辅导员工作经验为基础,分析论证大学生情感教育的策略和技巧,提出了教师应该热爱学生策略,即以情育情;辅导员自身的情感修养策略。辅导员应与学生保持密切的联系和相当的共处时间的策略,以及善于把握最佳的适时技巧;师生平等的友谊技巧;冷热兼用的适度技巧;善于发掘的诱导技巧。对大学生辅导员工作作出了有益的探索。  相似文献   

16.
The clinical use of ultrasound has dramatically increased, necessitating early ultrasound education and the development of new tools in ultrasound training and assessment. The goal of this study was to devise a novel low-resource examination that tested the anatomical knowledge and technical skill of early undergraduate medical students in a gross anatomy course. The team-based ultrasound objective structured practice examination (OSPE) was created as a method for assessing practical ultrasound competencies, anatomical knowledge, and non-technical skills such as teamwork and professionalism. The examination utilized a rotation of students through four team roles as they scanned different areas of the body. This station-based examination required four models and four instructors, and tested ultrasound skills in the heart, abdominal vessels, abdominal organs, and neck regions. A Likert scale survey assessed student attitudes toward the examination. Survey data from participants (n = 46) were examined along with OSPE examination grades (n = 52). Mean and standard deviations were calculated for examination items and survey responses. Student grades were high in both technical (96.5%). and professional (96.5%) competencies with structure identification scoring the lowest (93.8%). There were no statistical differences between performances in each of the body regions being scanned. The survey showed that students deemed the examination to be fair and effective. In addition, students agreed that the examination motivated them to practice ultrasound. The team-based OSPE was found to be an efficient and student-favored method for evaluating integrated ultrasound competencies, anatomical knowledge, team-work, and professional attributes.  相似文献   

17.
Denise Shaver 《TechTrends》2017,61(5):438-443
Do you find it challenging to have discussions with instructors about designing online courses and best practices in teaching? This article will highlight key components to conducting effective Learning Design meetings. It outlines techniques used by this institution that inspires faculty to design coherent courses that lead to meaningful learning experiences. These meetings invite instructors to express their feelings about online formats, inform them about expectations and time commitment, and reassures them of support throughout the process. Learning Design (LD) meetings have proven to be a compelling manner of decreasing faculty resistance, while exposing instructors to best practices in pedagogy, andragogy and online learning. Instructional Designers (IDs), Instructional Facilitators (IFs), and Course Authors (CAs) who work in higher educational online settings should find this information useful. Novices in the field may find these practical techniques particularly beneficial.  相似文献   

18.
19.
Accepting that school based assessment may have the potential to bring additional reliability to the assessment outcomes of an educational system, this research uses Generalizability Theory to address the question “why school based assessment is not a universal feature of high stakes assessment systems”? Three major issues are identified: (a) there is a conflict between the psychometric model and classroom assessment practice; (b) different schools are not equally effective; and, (c) teachers’ judgments are frequently accused of being biased. The role of public examination boards is discussed in this context.  相似文献   

20.
Teachers appreciate nonverbally responsive students, but what is missing is an understanding of the direct influence of teachers' self-perceptions on their perceptions of how engaged their students are in class. Using the emotional contagion theory as a lens, this study examines the premise that satisfied instructors expect students to mirror their own behaviors in the classroom through being nonverbally responsive. Results of the regression model confirm that teachers' perceptions of their own confirmation behaviors most strongly predict their perceptions of how nonverbally responsive students are in class. Thus, instructors who are more expressive will likely induce students to be more expressive, leading them to determine their students are being more nonverbally responsive. Further, expressive instructors will be more attuned to student interaction because they may subconsciously expect students to mirror their actions through nonverbal behaviors—they will look for it. Additionally, satisfied instructors view their students as satisfied and look for these feelings to be exposed via nonverbal response behaviors. Implications for teacher training and mentoring programs are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号