首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 29 毫秒
1.
This module describes some common standard-setting procedures used to derive performance levels for achievement tests in education, licensure, and certification. Upon completing the module, readers will be able to: describe what standard setting is; understand why standard setting is necessary; recognize some of the purposes of standard setting; calculate cut scores using various methods; and identify elements to be considered when evaluating standard-setting procedures. A self-test and annotated bibliography are provided at the end of the module. Teaching aids to accompany the module are available through NCME.  相似文献   

2.
Some writers in the measurement literature have been skeptical of the meaningfulness of achievement standards and described the standard-setting process as blatantly arbitrary. We argue that standard setting is more appropriately conceived of as a measurement process similar to student assessment. The construct being measured is the panelists' representation of student performance at the threshold of an achievement level. In the first section of this paper, we argue that standard setting is an example of stimulus-centered measurement. In the second section, we elaborate on this idea by comparing some popular standard-setting methods to the stimulus-centered scaling methods known as psychophysical scaling. In the third section, we use the lens of standard setting as a measurement process to take a fresh look at the two criticisms of standard setting: the role of judgment and the variability of results. In the fourth section, we offer a vision of standard-setting research and practice as grounded in the theory and practice of educational measurement .  相似文献   

3.
Standard-setting studies utilizing procedures such as the Bookmark or Angoff methods are just one component of the complete standard-setting process. Decision makers ultimately must determine what they believe to be the most appropriate standard or cut score to use, employing the input of the standard-setting panelists as one piece of information among multiple sources. However, guidance for weighing the various components is limited. The current article describes considerations about data that are used to make standard-setting decisions, as previously outlined by Geisinger (1991) . The ten points provided by Geisinger have been expanded as they relate to shifts in educational policy and practice in educational measurement. They have been amended with six new components as well. The new considerations addressed are smoothing across grades, raising standards in progression (over grades or over time), opportunity to learn or instructional validity, input from other groups, equating or linking to previous standards, and organizational vision and goals .  相似文献   

4.
An important consideration in standard setting is recruiting a group of panelists with different experiences and backgrounds to serve on the standard-setting panel. This study uses data from 14 different Angoff standard settings from a variety of medical imaging credentialing programs to examine whether people with different professional roles and test development experiences tended to recommend higher or lower cut scores or were more or less accurate in their standard-setting judgments. Results suggested that there were not any statistically significant differences for different types of panelists in terms of the cut scores they recommended or the accuracy of their judgments. Discussion of what these results may mean for panelist selection and recruitment is provided.  相似文献   

5.
Evidence of the internal consistency of standard-setting judgments is a critical part of the validity argument for tests used to make classification decisions. The bookmark standard-setting procedure is a popular approach to establishing performance standards, but there is relatively little research that reflects on the internal consistency of the resulting judgments. This article presents the results of an experiment in which content experts were randomly assigned to one of two response probability conditions: .67 and .80. If the standard-setting judgments collected with the bookmark procedure are internally consistent, both conditions should produce highly similar cut scores. The results showed substantially different cut scores for the two conditions; this calls into question whether content experts can produce the type of internally consistent judgments that are required using the bookmark procedure.  相似文献   

6.
The Bookmark Standard-Setting Method: A Literature Review   总被引:1,自引:0,他引:1  
The Bookmark method for setting standards on educational tests is currently one of the most popular standard-setting methods. However, research to support the method is scarce. In this report, we review the published and unpublished literature on this method as well as some seminal work in the area of evaluating standard-setting studies. Our review highlights both strengths and limitations of the method. Strengths include its wide acceptance and panelist confidence in the method. Limitations include a potential bias to produce lower-than-intended standards and problems in selecting the most appropriate response probability value for ordering the items presented to panelists. It is clear that more research on this method is needed to support its wide use. Several areas for future research to better understand the validity of the Bookmark method for setting standards on educational tests are presented.  相似文献   

7.
Judgmental standard-setting methods, such as the Angoff(1971) method, use item performance estimates as the basis for determining the minimum passing score (MPS). Therefore, the accuracy, of these item peformance estimates is crucial to the validity of the resulting MPS. Recent researchers (Shepard, 1995; Impara & Plake, 1998; National Research Council. 1999) have called into question the ability of judges to make accurate item performance estimates for target subgroups of candidates, such as minimally competent candidates. The propose of this study was to examine the intra- and inter-rater consistency of item performance estimates from an Angoff standard setting. Results provide evidence that item pelformance estimates were consistent within and across panels within and across years. Factors that might have influenced this high degree of reliability, in the item performance estimates in a standard setting study are discussed.  相似文献   

8.
9.
Standard-setting procedures are a key component within many large-scale educational assessment systems. They are consensual approaches in which committees of experts set cut-scores on continuous proficiency scales, which facilitate communication of proficiency distributions of students to a wide variety of stakeholders. This communicative function makes standard-setting studies a key gateway for validity concerns at the intersection of evidentiary and consequential aspects of score interpretations. This short review paper describes the conceptual and empirical basis of validity arguments for standard-setting procedures in light of recent research on validity theory. It specifically demonstrates how procedural and internal evidence for the validity of standard-setting procedures can be collected to form part of the consequential basis of validity evidence for test use.  相似文献   

10.
Angoff-based standard setting is widely used, especially for high-stakes licensure assessments. Nonetheless, some critics have claimed that the judgment task is too cognitively complex for panelists, whereas others have explicitly challenged the consistency in (replicability of) standard-setting outcomes. Evidence of consistency in item judgments and passing scores is necessary to justify using the passing scores for consequential decisions. Few studies, however, have directly evaluated consistency across different standard-setting panels. The purpose of this study was to investigate consistency of Angoff-based standard-setting judgments and passing scores across 9 different educator licensure assessments. Two independent, multistate panels of educators were formed to recommend the passing score for each assessment, with each panel engaging in 2 rounds of judgments. Multiple measures of consistency were applied to each round of judgments. The results provide positive evidence of the consistency in judgments and passing scores.  相似文献   

11.
In this digital ITEMS module, Dr. Michael Bunch provides an in-depth, step-by-step look at how standard setting is done. It does not focus on any specific procedure or methodology (e.g., modified Angoff, bookmark, and body of work) but on the practical tasks that must be completed for any standard setting activity. Dr. Bunch carries the participant through every stage of the standard setting process, from developing a plan, through preparations for standard setting, conducting standard setting, and all the follow-up activities that must occur after standard setting in order to obtain the approval of cut scores and translate those cut scores into score reports. The digital module includes a 120-page manual, various ancillary files (e.g., PowerPoint slides, Excel workbooks, sample documents, and forms), links to datasets from the book Standard Setting (Cizek & Bunch, 2007), links to final reports from four recent large-scale standard setting events, quiz questions with formative feedback, and a glossary.  相似文献   

12.
This paper reports two studies of standard setting using Angoff's method. Results of the first study suggest that specialization within broad content areas does not affect an expert's estimates of the performance of the borderline group. This is reassuring because the knowledge base of many professions is so large that no individual can be considered an expert in all aspects of it. Results of the second study support the recommendation that performance data be provided during the standard-setting process. They are frequently used by experts, but will not have an impact on the standard unless the distribution of item difficulties is skewed markedly. It also increases the correspondence between p-values and estimates of borderline group performance, thereby reducing errors in pass/fail decisions. Overall, the results support recommendations often made in standard-setting literature, but they need to be replicated with other groups of experts  相似文献   

13.
This article explores the challenge of setting performance standards in a non-Western context. The study is centered on standard-setting practice in the national learning assessments of Trinidad and Tobago. Quantitative and qualitative data from annual evaluations between 2005 and 2009 were compiled, analyzed, and deconstructed. In the mixed methods research design, data were integrated under an evaluation framework for validating performance standards. The quantitative data included panelists’ judgments across standard-setting rounds and methods. The qualitative data included both retrospective comments from open-ended surveys and real-time data from reflective diaries. Findings for procedural and internal validity were mixed, but the evidence for external validity suggested that the final outcomes were reasonable and defensible. Nevertheless, the real-time qualitative data from the reflective diaries highlighted several cognitive challenges experienced by panelists that may have impinged on procedural and internal validity. Additional unique hindrances were lack of resources and wide variation in achievement scores. Ensuring a sustainable system of performance standards requires attention to these deficits.  相似文献   

14.
Setting motor performance standards has long been a process of interest to physical educators. Theoretical advances in the measurement technology appropriate for standard-setting, however, have occurred only in the last decade. The first portion of this paper is devoted to a discussion of issues in setting standards and a brief review of procedures for standard-setting. In the latter section, gender differences in motor performance are examined and the impact of these differences on standard-setting is considered.  相似文献   

15.
《Educational Assessment》2013,18(2):129-153
States are increasingly using test scores as part of the requirements for high school graduation or certification. In these circumstances, a battery of tests or, with writing, analytic traits are considered that usually cover different aspects of the state's content standards. Because pass or fail decisions are made affecting students' futures, the validity of standard-setting procedures and strategies is a major concern. Policymakers and legislators must decide which of these 2 standard-setting strategies to use for making pass or fail decisions for students seeking certification or for meeting a high school graduation requirement. The compensatory strategy focuses on total performance, summing scores across all tests in the battery. The conjunctive strategy requires passing performance for each test in the battery. This article reviews and evaluates compensatory and conjunctive standard-setting strategies. The rationales for each type are presented and discussed. Results from a study comparing the compensatory and conjunctive strategies for a state high school certification writing test provide insight into the problem of choosing either strategy. This article concludes with a set of recommendations for those who must decide which type of standard-setting strategy to use.  相似文献   

16.
A common belief is that the Bookmark method is a cognitively simpler standard-setting method than the modified Angoff method. However, a limited amount of research has investigated panelist's ability to perform well the Bookmark method, and whether some of the challenges panelists face with the Angoff method may also be present in the Bookmark method. This article presents results from three experiments where panelists were asked to give Bookmark-type ratings to separate items into groups based on item difficulty data. Results of the experiments showed, consistent with results often observed with the Angoff method, that panelists typically and paradoxically perceived hard items to be too easy and easy items to be too hard. These perceptions were reflected in panelists often placing their Bookmarks too early for hard items and often placing their Bookmarks too late for easy items. The article concludes with a discussion of what these results imply for educators and policymakers using the Bookmark standard-setting method.  相似文献   

17.
考试标准的设定是一项系统工程,本文介绍了选择切点的六个步骤和四种比较常用的方法。六个步骤是确定标准的类型、确定设定标准的方法、选择评判者、举行设定标准的会议、计算标准和确定以后要做的工作。四种比较常用的方法是相对的方法、以试题评判为依据的绝对的方法、以应试者个人评判为基础的绝对的方法和折衷的方法。  相似文献   

18.
In test-centered standard-setting methods, borderline performance can be represented by many different profiles of strengths and weaknesses. As a result, asking panelists to estimate item or test performance for a hypothetical group study of borderline examinees, or a typical borderline examinee, may be an extremely difficult task and one that can lead to questionable results in setting cut scores. In this study, data collected from a previous standard-setting study are used to deduce panelists’ conceptions of profiles of borderline performance. These profiles are then used to predict cut scores on a test of algebra readiness. The results indicate that these profiles can predict a very wide range of cut scores both within and between panelists. Modifications are proposed to existing training procedures for test-centered methods that can account for the variation in borderline profiles.  相似文献   

19.
Setting performance standards is a judgmental process involving human opinions and values as well as technical and empirical considerations. Although all cut score decisions are by nature somewhat arbitrary, they should not be capricious. Judges selected for standard‐setting panels should have the proper qualifications to make the judgments asked of them; however, even qualified judges vary in expertise and in some cases, such as highly specialized areas or when members of the public are involved, it may be difficult to ensure that each member of a standard‐setting panel has the requisite expertise to make qualified judgments. Given the subjective nature of these types of judgments, and that a large part of the validity argument for an exam lies in the robustness of its passing standard, an examination of the influence of judge proficiency on the judgments is warranted. This study explores the use of the many‐facet Rasch model as a method for adjusting modified Angoff standard‐setting ratings based on judges’ proficiency levels. The results suggest differences in the severity and quality of standard‐setting judgments across levels of judge proficiency, such that judges who answered easy items incorrectly tended to perceive them as easier, but those who answered correctly tended to provide ratings within normal stochastic limits.  相似文献   

20.
Cut scores, estimated using the Angoff procedure, are routinely used to make high-stakes classification decisions based on examinee scores. Precision is necessary in estimation of cut scores because of the importance of these decisions. Although much has been written about how these procedures should be implemented, there is relatively little literature providing empirical support for specific approaches to providing training and feedback to standard-setting judges. This article presents a multivariate generalizability analysis designed to examine the impact of training and feedback on various sources of error in estimation of cut scores for a standard-setting procedure in which multiple independent groups completed the judgments. The results indicate that after training, there was little improvement in the ability of judges to rank order items by difficulty but there was a substantial improvement in inter-judge consistency in centering ratings. The results also show a substantial group effect. Consistent with this result, the direction of change for the estimated cut score was shown to be group dependent.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号