期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Procedures for Selecting Items for Computerized Adaptive Tests

《教育实用测度》2013,26(4):359-375

Many procedures have been developed for selecting the "best" items for a computerized adaptive test. There is a trend toward the use of adaptive testing in applied settings such as licensure tests, program entrance tests, and educational tests. It is useful to consider procedures for item selection and the special needs of applied testing settings to facilitate test design. The current study reviews several classical approaches and alternative approaches to item selection and discusses their relative merit. This study also describes procedures for constrained computerized adaptive testing (C-CAT) that may be added to classical item selection approaches to allow them to be used for applied testing, while maintaining the high measurement precision and short test length that made adaptive testing attractive to practitioners initially. 相似文献

2.

Item Review and the Rearrangement Procedure: Its process and its results

Elena C Papanastasiou 《Educational Research and Evaluation》2013,19(4):303-321

Permitting item review is to the benefit of the examinees who typically increase their test scores with item review. However, testing companies do not prefer item review since it does not follow the logic on which adaptive tests are based, and since it is prone to cheating strategies. Consequently, item review is not permitted in many adaptive tests. This study attempts to provide a solution that would allow examinees to revise their answers, without jeopardizing the quality and efficiency of the test. The purpose of this study is to test the efficiency of a “rearrangement procedure” that rearranges and skips certain items in order to better estimate the examinees' abilities, without allowing them to cheat on the test. This was examined through a simulation study. The results show that the rearrangement procedure is effective in reducing the standard error of the Bayesian ability estimates and in increasing the reliability of the same estimates. 相似文献

3.

APPLICATION OF COMPUTERIZED ADAPTIVE TESTING TO EDUCATIONAL PROBLEMS 总被引：1，自引：0，他引：1

DAVID J. WEISS G. GAGE KINGSBURY 《Journal of Educational Measurement》1984,21(4):361-375

Three applications of computerized adaptive testing (CAT) to help solve problems encountered in educational settings are described and discussed. Each of these applications makes use of item response theory to select test questions from an item pool to estimate a student's achievement level and its precision. These estimates may then be used in conjunction with certain testing strategies to facilitate certain educational decisions. The three applications considered are (a) adaptive mastery testing for determining whether or not a student has mastered a particular content area, (b) adaptive grading for assigning grades to students, and (c) adaptive self-referenced testing for estimating change in a student's achievement level. Differences between currently used classroom procedures and these CAT procedures are discussed. For the adaptive mastery testing procedure, evidence from a series of studies comparing conventional and adaptive testing procedures is presented showing that the adaptive procedure results in more accurate mastery classifications than do conventional mastery tests, while using fewer test questions. 相似文献

4.

An NCME Instructional Module on Multistage Testing

Amy Hendrickson 《Educational Measurement》2007,26(2):44-52

Multistage tests are those in which sets of items are administered adaptively and are scored as a unit. These tests have all of the advantages of adaptive testing, with more efficient and precise measurement across the proficiency scale as well as time savings, without many of the disadvantages of an item-level adaptive test. As a seemingly balanced compromise between linear paper-and-pencil and item-level adaptive tests, development and use of multistage tests is increasing. This module describes multistage tests, including two-stage and testlet-based tests, and discusses the relative advantages and disadvantages of multistage testing as well as considerations and steps in creating such tests. 相似文献

5.

Computerized Adaptive and Fixed-Item Testing of Music Listening Skill: A Comparison of Efficiency, Precision, and Concurrent Validity

Walter P. Vispoel Tianyou Wang Timothy Bleiler 《Journal of Educational Measurement》1997,34(1):43-63

We evaluated the efficiency, precision, and concurrent validity of results obtained from adaptive and fired-item music listening tests in three studies: (a) a computer simulation study in which each of 2,200 simulees completed a computerized adaptive tonal memory test, a computerized fired-item tonal memory test constructed from items in the adaptive test pool and two standardized group-administered tonal memory tests; (b) a live testing study in which each of 204 examinees took the computerized adaptive test and the standardized tests; and (c) a live testing study in which randomly equivalent groups took either the computerized adaptive test (n = 86) or the computerized fired-item test (n = 86). The adaptive music test required 50% to 93% fewer items to match the reliability and concurrent validity of the fired-item tests, and it yielded higher levels of reliability and concurrent validity than the fired-item tests when test length was held constant. These findings suggest that computerized adaptive tests, which typically have been limited to visually produced items, may also be well suited for measuring skills that require aurally produced items. 相似文献

6.

A New Stopping Rule for Computerized Adaptive Testing

Choi SW Grady MW Dodd BG 《Educational and psychological measurement》2010,70(6):1-17

The goal of the current study was to introduce a new stopping rule for computerized adaptive testing. The predicted standard error reduction stopping rule (PSER) uses the predictive posterior variance to determine the reduction in standard error that would result from the administration of additional items. The performance of the PSER was compared to that of the minimum standard error stopping rule and a modified version of the minimum information stopping rule in a series of simulated adaptive tests, drawn from a number of item pools. Results indicate that the PSER makes efficient use of CAT item pools, administering fewer items when predictive gains in information are small and increasing measurement precision when information is abundant. 相似文献

7.

A Comparison of Procedures for Content-Sensitive Item Selection in Computerized Adaptive Tests

《教育实用测度》2013,26(3):241-261

This simulation study compared two procedures to enable an adaptive test to select items in correspondence with a content blueprint. Trait level estimates obtained from testlet-based and constrained adaptive tests administered to 10,000 simulated examinees under two trait distributions and three item pool sizes were compared to the trait level estimates obtained from traditional adaptive tests in terms of mean absolute error, bias, and information. Results indicate that using constrained adaptive testing requires an increase of 5% to 11% in test length over the traditional adaptive test to reach the same error level and, using testlets requires an increase of 43% to 104% in test length over the traditional adaptive test. Given these results, the use of constrained computerized adaptive testing is recommended for situations in which an adaptive test must adhere to particular content specifications. 相似文献

8.

Constructing Rotating Item Pools for Constrained Adaptive Testing

Adelaide Ariel Bernard P. Veldkamp Wim J. van der Linden 《Journal of Educational Measurement》2004,41(4):345-359

Preventing items in adaptive testing from being over- or underexposed is one of the main problems in computerized adaptive testing. Though the problem of overexposed items can be solved using a probabilistic item-exposure control method, such methods are unable to deal with the problem of underexposed items. Using a system of rotating item pools, on the other hand, is a method that potentially solves both problems. In this method, a master pool is divided into (possibly overlapping) smaller item pools, which are required to have similar distributions of content and statistical attributes. These pools are rotated among the testing sites to realize desirable exposure rates for the items. A test assembly model, motivated by Gulliksen's matched random subtests method, was explored to help solve the problem of dividing a master pool into a set of smaller pools. Different methods to solve the model are proposed. An item pool from the Law School Admission Test was used to evaluate the performances of computerized adaptive tests from systems of rotating item pools constructed using these methods. 相似文献

9.

A Conditional Exposure Control Method for Multidimensional Adaptive Testing

Matthew Finkelman Michael L. Nering Louis A. Roussos 《Journal of Educational Measurement》2009,46(1):84-103

In computerized adaptive testing (CAT), ensuring the security of test items is a crucial practical consideration. A common approach to reducing item theft is to define maximum item exposure rates, i.e., to limit the proportion of examinees to whom a given item can be administered. Numerous methods for controlling exposure rates have been proposed for tests employing the unidimensional 3-PL model. The present article explores the issues associated with controlling exposure rates when a multidimensional item response theory (MIRT) model is utilized and exposure rates must be controlled conditional upon ability. This situation is complicated by the exponentially increasing number of possible ability values in multiple dimensions. The article introduces a new procedure, called the generalized Stocking-Lewis method, that controls the exposure rate for students of comparable ability as well as with respect to the overall population. A realistic simulation set compares the new method with three other approaches: Kullback-Leibler information with no exposure control, Kullback-Leibler information with unconditional Sympson-Hetter exposure control, and random item selection. 相似文献

10.

Psychometric aspects of pupil monitoring systems

Cees A.W. Glas Hanneke Geerlings 《Studies in Educational Evaluation》2009,35(2-3):83-88

Pupil monitoring systems support the teacher in tailoring teaching to the individual level of a student and in comparing the progress and results of teaching with national standards. The systems are based on the availability of an item bank calibrated using item response theory. The assessment of the students’ progress and results can be further supported by using computerized adaptive testing where the items selected from the item bank are targeted at the specific ability level of the student. The present article discusses psychometric issues of pupil monitoring systems, such as ability estimation, the optimal construction of tests from the item bank and monitoring of progress. 相似文献

11.

面向多类终端的计算机自适应测试系统的设计与实现

路鹏周东岱钟绍春丛晓《现代教育技术》2012,22(6):88-92

近年来由于信息技术的进步,采用计算机自适应测试进行评价得到迅速的发展;此外,移动技术的可用性也为评价提供了新的途径。文章设计并开发了面向多类终端的自适应测试系统,在项目选择过程中充分考虑了已有算法所存在的部分项目曝光率高、题库利用率低、内容平衡等问题,重新设计了项目选择引擎。通过该系统可以为形成性评估、总结性评估和自我评估提供支持。相似文献

12.

能力测量发展中的若干新趋势

漆书青《江西师范大学学报(哲学社会科学版)》2005,38(5):107-109

在认知心理学、现代测量模型探索与信息技术的推动下，21世纪的能力测量，出现了测验连续性校订、计算机化自适应测验、智能化项目创编以及跟教学结合在一起的动态测量等新趋势，从而使能力测量技术革新和对教育与社会生活的影响，出现崭新局面。相似文献

13.

An Efficiency Balanced Information Criterion for Item Selection in Computerized Adaptive Testing

Kyung T. Han 《Journal of Educational Measurement》2012,49(3):225-246

Successful administration of computerized adaptive testing (CAT) programs in educational settings requires that test security and item exposure control issues be taken seriously. Developing an item selection algorithm that strikes the right balance between test precision and level of item pool utilization is the key to successful implementation and long‐term quality control of CAT. This study proposed a new item selection method using the “efficiency balanced information” criterion to address issues with the maximum Fisher information method and stratification methods. According to the simulation results, the new efficiency balanced information method had desirable advantages over the other studied item selection methods in terms of improving the optimality of CAT assembly and utilizing items with low a‐values while eliminating the need for item pool stratification. 相似文献

14.

Controlling Bias in Both Constructed Response and Multiple‐Choice Items When Analyzed With the Dichotomous Rasch Model

下载免费PDF全文

David Andrich Ida Marais 《Journal of Educational Measurement》2018,55(2):281-307

Even though guessing biases difficulty estimates as a function of item difficulty in the dichotomous Rasch model, assessment programs with tests which include multiple‐choice items often construct scales using this model. Research has shown that when all items are multiple‐choice, this bias can largely be eliminated. However, many assessments have a combination of multiple‐choice and constructed response items. Using vertically scaled numeracy assessments from a large‐scale assessment program, this article shows that eliminating the bias on estimates of the multiple‐choice items also impacts on the difficulty estimates of the constructed response items. This implies that the original estimates of the constructed response items were biased by the guessing on the multiple‐choice items. This bias has implications for both defining difficulties in item banks for use in adaptive testing composed of both multiple‐choice and constructed response items, and for the construction of proficiency scales. 相似文献

15.

计算机自适应考试系统研究

吕岚《晋城职业技术学院学报》2013,6(4):56-59

本文结合专家经验确定法和项目反应理论,设计出一种简明、实用的计算机自适应考试系统的试题难度确定方法,同时重点分析计算机自适应考试系统的测试起点、终点选择,选题策略和能力值估计方法。最后列举了一个自适应测试的步骤实例。本系统能够根据不同能力被试者随机选择试题项目,减少了测试长度,与传统在线考试系统相比提高了考试效率。相似文献

16.

Automatic Item Generation: A More Efficient Process for Developing Mathematics Achievement Items?

下载免费PDF全文

Susan E. Embretson Neal M. Kingston 《Journal of Educational Measurement》2018,55(1):112-131

The continual supply of new items is crucial to maintaining quality for many tests. Automatic item generation (AIG) has the potential to rapidly increase the number of items that are available. However, the efficiency of AIG will be mitigated if the generated items must be submitted to traditional, time‐consuming review processes. In two studies, generated mathematics achievement items were subjected to multiple stages of qualitative review for measuring the intended skills, followed by empirical tryout in operational testing. High rates of success were found. Further, items generated from the same item structure had predictable psychometric properties. Thus, the feasibility of a more limited and expedient review processes was supported. Additionally, positive results were obtained on measuring the same skills from item structures with reduced cognitive complexity. 相似文献

17.

Some Practical Examples of Computer-Adaptive Sequential Testing

Richard M. Luecht Ronald J. Nungester 《Journal of Educational Measurement》1998,35(3):229-249

Computerized testing has created new challenges for the production and administration of test forms. Many testing organizations engaged in or considering computerized testing may find themselves changing from well-established procedures for handcrafiing a small number of paper-and-pencil test forms to procedures for mass producing many computerized test forms. This paper describes an integratedapproach to test development and administration called computer-adaptive sequential testing, or CAST. CAST is a structured approach to test construction which incorporates both adaptive testing methods with automated test assembly to allow test developers to maintain a greater degree of control over the production, quality assurance, and administration of different types of computerized tests. CAST retains much of the efficiency of traditional computer adaptive testing (CAT) and can be modified for computer mastery testing (CMT) applications. The CAST framework is described in detail and several applications are demonstrated using a medical licensure example. 相似文献

18.

An Automated Item Pool Assembly Framework for Maximizing Item Utilization for CAT

Hwanggyu Lim Kyung T. Han 《Educational Measurement》2024,43(1):39-51

Computerized adaptive testing (CAT) has gained deserved popularity in the administration of educational and professional assessments, but continues to face test security challenges. To ensure sustained quality assurance and testing integrity, it is imperative to establish and maintain multiple stable item pools that are consistent in terms of psychometric characteristics and content specifications. This study introduces the Honeycomb Pool Assembly (HPA) framework, an innovative solution for the construction of multiple parallel item pools for CAT that maximizes item utilization in the item bank. The HPA framework comprises two stages—cell assembly and pool assembly—and uses a mixed integer programming modeling approach. An empirical study demonstrated HPA's effectiveness in creating a large number of parallel pools using a real-world high-stakes CAT assessment item bank. The HPA framework offers several advantages, including (a) simultaneous creation of multiple parallel pools, (b) simplification of item pool maintenance, and (c) flexibility in establishing statistical and operational constraints. Moreover, it can help testing organizations efficiently manage and monitor the health of their item banks. Thus, the HPA framework is expected to be a valuable tool for testing professionals and organizations to address test security challenges and maintain the integrity of high-stakes CAT assessments. 相似文献

19.

Restrictive Stochastic Item Selection Methods in Cognitive Diagnostic Computerized Adaptive Testing

Chun Wang Hua‐Hua Chang Alan Huebner 《Journal of Educational Measurement》2011,48(3):255-273

This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback‐Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson‐Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method. 相似文献

20.

Using automatic item generation to meet the increasing item demands of high-stakes educational and occupational assessment

Martin E. Arendasy Markus Sommer 《Learning and individual differences》2012,22(1):112-117

相似文献