期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Cognitive Diagnostic Multistage Testing by Partitioning Hierarchically Structured Attributes

Rae Yeong Kim Yun Joo Yoo 《Journal of Educational Measurement》2023,60(1):126-147

In cognitive diagnostic models (CDMs), a set of fine-grained attributes is required to characterize complex problem solving and provide detailed diagnostic information about an examinee. However, it is challenging to ensure reliable estimation and control computational complexity when The test aims to identify the examinee's attribute profile in a large-scale map of attributes. To address this problem, this study proposes a cognitive diagnostic multistage testing by partitioning hierarchically structured attributes (CD-MST-PH) as a multistage testing for CDM. In CD-MST-PH, multiple testlets can be constructed based on separate attribute groups before testing occurs, which retains the advantages of multistage testing over fully adaptive testing or the on-the-fly approach. Moreover, testlets are offered sequentially and adaptively, thus improving test accuracy and efficiency. An item information measure is proposed to compute the discrimination power of an item for each attribute, and a module assembly method is presented to construct modules anchored at each separate attribute group. Several module selection indices for CD-MST-PH are also proposed by modifying the item selection indices used in cognitive diagnostic computerized adaptive testing. The results of simulation study show that CD-MST-PH can improve test accuracy and efficiency relative to the conventional test without adaptive stages. 相似文献

2.

Differences Between Self-Adapted and Computerized Adaptive Tests: A Meta-Analysis

Angela K. Pitkin Walter P. Vispoel 《Journal of Educational Measurement》2001,38(3):235-247

Self-adapted testing has been described as a variation of computerized adaptive testing that reduces test anxiety and thereby enhances test performance. The purpose of this study was to gain a better understanding of these proposed effects of self-adapted tests (SATs); meta-analysis procedures were used to estimate differences between SATs and computerized adaptive tests (CATs) in proficiency estimates and post-test anxiety levels across studies in which these two types of tests have been compared. After controlling for measurement error, the results showed that SATs yielded proficiency estimates that were 0.12 standard deviation units higher and post-test anxiety levels that were 0.19 standard deviation units lower than those yielded by CATs. We speculate about possible reasons for these differences and discuss advantages and disadvantages of using SATs in operational settings. 相似文献

3.

Longitudinal Multistage Testing

Steffi Pohl 《Journal of Educational Measurement》2013,50(4):447-468

This article introduces longitudinal multistage testing (lMST), a special form of multistage testing (MST), as a method for adaptive testing in longitudinal large‐scale studies. In lMST designs, test forms of different difficulty levels are used, whereas the values on a pretest determine the routing to these test forms. Since lMST allows for testing in paper and pencil mode, lMST may represent an alternative to conventional testing (CT) in assessments for which other adaptive testing designs are not applicable. In this article the performance of lMST is compared to CT in terms of test targeting as well as bias and efficiency of ability and change estimates. Using a simulation study, the effect of the stability of ability across waves, the difficulty level of the different test forms, and the number of link items between the test forms were investigated. 相似文献

4.

教考分离——高校考试改革的必然趋势 总被引：1，自引：0，他引：1

朱军杨万清代晶《成都教育学院学报》2007,21(11):20-20,75

文章从对传统的教考合一的弊端入手,分析了教考分离的优势,同时指出了教考分离的不足之处,提出了教考分离是我国高校考试改革发展的必然趋势. 相似文献

5.

Comprehension tests in mathematics

Jurie Conradie John Frith 《Educational Studies in Mathematics》2000,42(3):225-235

An alternative way for testing a student's understanding of theory in a tertiary mathematics course is presented. Two sample questions are provided and advantages and disadvantages of the method are discussed. We argue that the method is an acceptable and flexible means of testing students, and can be adapted to be used in other contexts as well. This revised version was published online in July 2006 with corrections to the Cover Date. 相似文献

6.

It Matters: Reference Indicator Selection in Measurement Invariance Tests

Yutian T. Thompson Hairong Song Dexin Shi Zhengkui Liu 《Educational and psychological measurement》2021,81(1):5

Conventional approaches for selecting a reference indicator (RI) could lead to misleading results in testing for measurement invariance (MI). Several newer quantitative methods have been available for more rigorous RI selection. However, it is still unknown how well these methods perform in terms of correctly identifying a truly invariant item to be an RI. Thus, Study 1 was designed to address this issue in various conditions using simulated data. As a follow-up, Study 2 further investigated the advantages/disadvantages of using RI-based approaches for MI testing in comparison with non-RI-based approaches. Altogether, the two studies provided a solid examination on how RI matters in MI tests. In addition, a large sample of real-world data was used to empirically compare the uses of the RI selection methods as well as the RI-based and non-RI-based approaches for MI testing. In the end, we offered a discussion on all these methods, followed by suggestions and recommendations for applied researchers. 相似文献

7.

The Role of Referent Indicators in Tests of Measurement Invariance

Emily C. Johnson Adam W. Meade Amy M. DuVernet 《Structural equation modeling》2013,20(4):642-657

Confirmatory factor analytic tests of measurement invariance (MI) require a referent indicator (RI) for model identification. Although the assumption that the RI is perfectly invariant across groups is acknowledged as problematic, the literature provides relatively little guidance for researchers to identify the conditions under which the practice is appropriate. Using simulated data, this study examined the effects of RI selection on both scale- and item-level MI tests. Results indicated that while inappropriate RI selection has little effect on the accuracy of conclusions drawn from scale-level tests of metric invariance, poor RI choice can produce very misleading results for item-level tests. As a result, group comparisons under conditions of partial invariance are highly susceptible to problems associated with poor RI choice. 相似文献

8.

Hybrid Computerized Adaptive Testing: From Group Sequential Design to Fully Sequential Design

Shiyu Wang Haiyan Lin Hua‐Hua Chang Jeff Douglas 《Journal of Educational Measurement》2016,53(1):45-62

Computerized adaptive testing (CAT) and multistage testing (MST) have become two of the most popular modes in large‐scale computer‐based sequential testing. Though most designs of CAT and MST exhibit strength and weakness in recent large‐scale implementations, there is no simple answer to the question of which design is better because different modes may fit different practical situations. This article proposes a hybrid adaptive framework to combine both CAT and MST, inspired by an analysis of the history of CAT and MST. The proposed procedure is a design which transitions from a group sequential design to a fully sequential design. This allows for the robustness of MST in early stages, but also shares the advantages of CAT in later stages with fine tuning of the ability estimator once its neighborhood has been identified. Simulation results showed that hybrid designs following our proposed principles provided comparable or even better estimation accuracy and efficiency than standard CAT and MST designs, especially for examinees at the two ends of the ability range. 相似文献

9.

Computerized Adaptive and Fixed-Item Testing of Music Listening Skill: A Comparison of Efficiency, Precision, and Concurrent Validity

Walter P. Vispoel Tianyou Wang Timothy Bleiler 《Journal of Educational Measurement》1997,34(1):43-63

We evaluated the efficiency, precision, and concurrent validity of results obtained from adaptive and fired-item music listening tests in three studies: (a) a computer simulation study in which each of 2,200 simulees completed a computerized adaptive tonal memory test, a computerized fired-item tonal memory test constructed from items in the adaptive test pool and two standardized group-administered tonal memory tests; (b) a live testing study in which each of 204 examinees took the computerized adaptive test and the standardized tests; and (c) a live testing study in which randomly equivalent groups took either the computerized adaptive test (n = 86) or the computerized fired-item test (n = 86). The adaptive music test required 50% to 93% fewer items to match the reliability and concurrent validity of the fired-item tests, and it yielded higher levels of reliability and concurrent validity than the fired-item tests when test length was held constant. These findings suggest that computerized adaptive tests, which typically have been limited to visually produced items, may also be well suited for measuring skills that require aurally produced items. 相似文献

10.

浅谈阅读的测试方法对测试结果的影响

陈晓颖《语文学刊:高等教育版》2008,(8):158-163

因为测试不仅是评价教学成果的很重要的手段,而且也会为未来的英语教学提供反馈信息。因此,对英语测试的研究是很有必要的。本文以英语阅读测试为研究对象,探讨了多项选择题和简答题两种不同的测试类型对英语阅读理解测试结果的影响。该研究的被试是112名非英语专业一年级学生。论文除了对被试进行了两次不同题型的测试外,还对他们应对不同题型题目时的答题状态、平时英语阅读的习惯等问题以问卷形式进行了调查。通过这两种研究手段论证了两种不同的测试方法各自的优缺点,以及他们对阅读理解测试的结果影响。希望本次研究对其以后的大学英语阅读方面的教学能够产生相应的指导意义。相似文献

11.

自适应软件设计模式探讨

金建刚 ;包晓安《乐山师范学院学报》2014,(5):28-32

根据移动终端应用需求,以自适应软件开发需求为背景,通过对诸多学者关于自适应软件研究与开发成果进行比较,分析其中的优、缺点,展望了自适应软件设计模式的发展趋势,提出了一种新的、基于设计模式组合的自适应软件开发模式可行性设计方法。相似文献

12.

Procedures for Selecting Items for Computerized Adaptive Tests

《教育实用测度》2013,26(4):359-375

Many procedures have been developed for selecting the "best" items for a computerized adaptive test. There is a trend toward the use of adaptive testing in applied settings such as licensure tests, program entrance tests, and educational tests. It is useful to consider procedures for item selection and the special needs of applied testing settings to facilitate test design. The current study reviews several classical approaches and alternative approaches to item selection and discusses their relative merit. This study also describes procedures for constrained computerized adaptive testing (C-CAT) that may be added to classical item selection approaches to allow them to be used for applied testing, while maintaining the high measurement precision and short test length that made adaptive testing attractive to practitioners initially. 相似文献

13.

A Comparative Study of Item Exposure Control Methods in Computerized Adaptive Testing

Shun-Wen Chang Timothy N. Ansley 《Journal of Educational Measurement》2003,40(1):71-103

This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The Stocking and Lewis conditional multinomial procedure and, to a slightly lesser extent, the Davey and Parshall method seemed to be the most promising considering all of the factors that this study addressed. 相似文献

14.

Efficiency of Targeted Multistage Calibration Designs Under Practical Constraints: A Simulation Study

Stphanie Berger Angela J. Verschoor Theo J. H. M. Eggen Urs Moser 《Journal of Educational Measurement》2019,56(1):121-146

Calibration of an item bank for computer adaptive testing requires substantial resources. In this study, we investigated whether the efficiency of calibration under the Rasch model could be enhanced by improving the match between item difficulty and student ability. We introduced targeted multistage calibration designs, a design type that considers ability‐related background variables and performance for assigning students to suitable items. Furthermore, we investigated whether uncertainty about item difficulty could impair the assembling of efficient designs. The results indicated that targeted multistage calibration designs were more efficient than ordinary targeted designs under optimal conditions. Limited knowledge about item difficulty reduced the efficiency of one of the two investigated targeted multistage calibration designs, whereas targeted designs were more robust. 相似文献

15.

APPLICATION OF COMPUTERIZED ADAPTIVE TESTING TO EDUCATIONAL PROBLEMS 总被引：1，自引：0，他引：1

DAVID J. WEISS G. GAGE KINGSBURY 《Journal of Educational Measurement》1984,21(4):361-375

Three applications of computerized adaptive testing (CAT) to help solve problems encountered in educational settings are described and discussed. Each of these applications makes use of item response theory to select test questions from an item pool to estimate a student's achievement level and its precision. These estimates may then be used in conjunction with certain testing strategies to facilitate certain educational decisions. The three applications considered are (a) adaptive mastery testing for determining whether or not a student has mastered a particular content area, (b) adaptive grading for assigning grades to students, and (c) adaptive self-referenced testing for estimating change in a student's achievement level. Differences between currently used classroom procedures and these CAT procedures are discussed. For the adaptive mastery testing procedure, evidence from a series of studies comparing conventional and adaptive testing procedures is presented showing that the adaptive procedure results in more accurate mastery classifications than do conventional mastery tests, while using fewer test questions. 相似文献

16.

MTS810上摩擦系数测试工装的改进

YANG Qi-quan ZOU Ding-qiang TIAN Chang-hai 《实验室研究与探索》2007,(10)

对4个MTS810上测试摩擦系数的工装进行简介,比较分析了各自的优缺点. 相似文献

17.

JAVA中对栈与堆的一点思考

黄珍刘涛《九江职业技术学院学报》2009,(1)

栈与堆都是JAVA用来在内存中存放数据的地方.与C++不同,JAVA自动管理栈和堆.栈与堆各有其优势、劣势.怎样来区别它们的优、劣势呢?怎样很好的把各自的优势在编程思想中体现出来?这是值得思考的. 相似文献

18.

关于测验成绩正态分布与偏态分布的思考

丁梦扬蒋波《常熟理工学院学报》2008,22(6):85-87

教育评价过程中,成绩的分布形态是由测验的目的、功能和评定标准,学生的能力素质和知识基础,教师和学生的积极能动性,学生群体规模的大小及学生的学习时间和效率等因素决定的。必须客观、全面地评价正态分布与偏态分布的利弊,在此基础上对两者加以有效整合,以提高测验的真实性和实用性,促成全体师生的共同努力,促进全体学生的共同发展。相似文献

19.

Critical Problems in Computer-Based Psychological Measurement

《教育实用测度》2013,26(3):223-231

Emerging areas of research, related to computer-based testing, are identified. Computer-based adaptive testing (CAT) poses many problems, including calibrating adaptive tests with their conventional counterparts, content-balancing in item selection, and accommodating multidimensional items. Using the computer to administer tests provides freedom to use many new item types, including tests of short-term memory, spatial memory, perceptual speed and accuracy, and movement judgment. Information processing theory offers a new way of conceptualizing abilities that is not easily reconciled with traditional measurement models. 相似文献

20.

斯坦福(Stanford)成就阅读理解考试的原本和客户本的结构等价

王蜀东焦红《考试研究》2006,(4)

本文是第一篇探索斯坦福成就阅读考试(第十版)的原本及其客户化版本的结构相似性的文章。研究分析是跨年级在多个观测变量(个别题目,题组,题包)上进行的。分析方法主要包括线性和非线性的探索性和实证性因素分析。分析结果表明在所有文章内的试题,都有不同程度的题组效应。在所有的模型当中,个别题目作为观测变量的模型的拟合度最低,题组作为观测变量的模型的拟合;其次,题包作为观测变量的模型的拟合度最高。在三种结构等性等级:同性等性(congenric),陶性等性(tau-equivalent)和并行等性(parallel)中,斯坦福成就阅读考试原本与其客户化版本的结构具有同性相似。相似文献