期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Constructing Rotating Item Pools for Constrained Adaptive Testing

Adelaide Ariel Bernard P. Veldkamp Wim J. van der Linden 《Journal of Educational Measurement》2004,41(4):345-359

Preventing items in adaptive testing from being over- or underexposed is one of the main problems in computerized adaptive testing. Though the problem of overexposed items can be solved using a probabilistic item-exposure control method, such methods are unable to deal with the problem of underexposed items. Using a system of rotating item pools, on the other hand, is a method that potentially solves both problems. In this method, a master pool is divided into (possibly overlapping) smaller item pools, which are required to have similar distributions of content and statistical attributes. These pools are rotated among the testing sites to realize desirable exposure rates for the items. A test assembly model, motivated by Gulliksen's matched random subtests method, was explored to help solve the problem of dividing a master pool into a set of smaller pools. Different methods to solve the model are proposed. An item pool from the Law School Admission Test was used to evaluate the performances of computerized adaptive tests from systems of rotating item pools constructed using these methods. 相似文献

2.

A Placement Study in Secondary School Economics

F. G. Macomber 《Journal of Experimental Education》2013,81(4):353-358

Quite often in data reduction, it is more meaningful and economical to select a subset of variables instead of reducing the dimensionality of the variable space with principal components analysis. The authors present a neglected method for variable selection called the BI-method (R. P. Bhargava &; T. Ishizuka, 1981). It is a direct, simple method that uses the same criterion—trace information—used in ordinary regression analysis. The authors begin by discussing the nature and properties of the BI-method and then show how it is different from other existing variable selection methods. Because the BI-method originally was applied to small datasets that had little or no relevance to psychology or education, the authors apply it to large datasets with relevance to the psychological and educational literature. Of particular interest was the application of the BI-method to select a subset of items from a large item pool. Two practical psychometric examples with 49 and 108 items, respectively, showed that item subsets selected with the BI-method reflected the underlying structure of the whole item pool and that the scales based on those item subsets showed good reliability and predictive validity. The appropriateness of this item selection method within the context of the domain-sampling model is discussed. 相似文献

3.

Item Writing and Item Banking by Microcomputer: An Update

Gale H. Roid 《Educational Measurement》1989,8(3):17-20

Computer packages that assist the test developer in writing items, generating tests, and building item banks are critically examined. There appears to be a lack of fully integrated software packages for item writing. Although there are many test generators, they do not really assist the test developer in checking the wording of items. Packages are available, however, for building quality item banks. 相似文献

4.

Development of the human interaction dimension of the Self‐Regulated Learning Questionnaire in asynchronous online learning environments

Moon‐Heum Cho David Jonassen 《教育心理学》2009,29(1):117-138

Two studies focusing on the development and validation of the Online Self‐Regulated Learning Inventory (OSRLI) were conducted. The OSRLI is a self‐report instrument assessing the human interaction dimension of online self‐regulated learning. It consists of an affect/motivation scale and an interaction strategies scale. In Study 1, exploratory factor analysis of an initial affect/motivation item pool yielded four factors: enjoyment of human interaction, self‐efficacy for interaction with instructors, concern for interaction with students, and self‐efficacy for contributing to the online community. Exploratory factor analysis of an initial learning strategies item pool revealed three factors: writing strategies, responding strategies, and reflection strategies. In Study 2, confirmatory factor analysis was conducted in order to evaluate the stability of multidimensional factor structures. These exploratory and confirmatory factor analyses showed the OSRLI to be statistically moderate in terms of reliability and validity. 相似文献

5.

The Stability of IRT b Values

Robert C. Sykes Anne R. Fitzpatrick 《Journal of Educational Measurement》1992,29(3):201-211

This study investigated possible explanations for an observed change in Rasch item parameters (b values) obtained from consecutive administrations of a professional licensure examination. Considered in this investigation were variables related to item position, item type, item content, and elapsed time between administrations of the item. An analysis of covariance methodology was used to assess the relations between these variables and change in item b values, with the elapsed time index serving to control for differences that could be attributed to average or pool changes in b values over time. A series of analysis of covariance models were fitted to the data in an attempt to identify item characteristics that were significantly related to the change in b values after the time elapsed between item administrations had been controlled. The findings indicated that the change in item b values was not related either to item position or to item type. A small, positive relationship between this change and elapsed time indicated that the pool b values were increasing over time. A test of simple effects suggested the presence of greater change for one of the content categories analyzed. These findings are interpreted, and suggestions for future research are provided. 相似文献

6.

Building Algebra Testlets: A Comparison of Hierarchical and Linear Structures

Howard Wainer Charles Lewis Bruce Kaplan James Braswell 《Journal of Educational Measurement》1991,28(4):311-323

Earlier (Wainer & Lewis, 1990), we reported the initial development of a testlet-based algebra test. In this account, we provide the details of this excursion into the use of testlets. A pretest of two 15–item algebra tests was carried out in which examinees' performance on a 4-item subset of each test (a 4–item testlet) was used to predict performance on the entire test. Two models for constructing the testlets were considered: hierarchical (adaptive) and linear (fixed format). These models are compared with each other. It was found on cross–validation that, although an adaptive testlet is superior to a fixed format testlet, this superiority is modest, whereas the potential cost of that superiority is considerable. It was concluded that in circumstances similar to those we report a fixed format testlet that uses the best items in a pool can do almost as well as the optimal adaptive testlet of equal length from that same pool. 相似文献

7.

Evaluating Comparability in Computerized Adaptive Testing: Issues, Criteria and an Example

Tianyou Wang Michael J. Kolen 《Journal of Educational Measurement》2001,38(1):19-49

When a computerized adaptive testing (CAT) version of a test co-exists with its paper-and-pencil (P&P) version, it is important for scores from the CAT version to be comparable to scores from its P&P version. The CAT version may require multiple item pools for test security reasons, and CAT scores based on alternate pools also need to be comparable to each other. In this paper, we review research literature on CAT comparability issues and synthesize issues specific to these two settings. A framework of criteria for evaluating comparability was developed that contains the following three categories of criteria: validity criterion, psychometric property/reliability criterion, and statistical assumption/test administration condition criterion. Methods for evaluating comparability under these criteria as well as various algorithms for improving comparability are described and discussed. Focusing on the psychometric property/reliability criterion, an example using an item pool of ACT Assessment Mathematics items is provided to demonstrate a process for developing comparable CAT versions and for evaluating comparability. This example illustrates how simulations can be used to improve comparability at the early stages of the development of a CAT. The effects of different specifications of practical constraints, such as content balancing and item exposure rate control, and the effects of using alternate item pools are examined. One interesting finding from this study is that a large part of incomparability may be due to the change from number-correct score-based scoring to IRT ability estimation-based scoring. In addition, changes in components of a CAT, such as exposure rate control, content balancing, test length, and item pool size were found to result in different levels of comparability in test scores. 相似文献

8.

Gauging Item Alignment Through Online Systems While Controlling for Rater Effects

下载免费PDF全文

Daniel Anderson Shawn Irvin Julie Alonzo Gerald A. Tindal 《Educational Measurement》2015,34(1):22-33

The alignment of test items to content standards is critical to the validity of decisions made from standards‐based tests. Generally, alignment is determined based on judgments made by a panel of content experts with either ratings averaged or via a consensus reached through discussion. When the pool of items to be reviewed is large, or the content‐matter experts are broadly distributed geographically, panel methods present significant challenges. This article illustrates the use of an online methodology for gauging item alignment that does not require that raters convene in person, reduces the overall cost of the study, increases time flexibility, and offers an efficient means for reviewing large item banks. Latent trait methods are applied to the data to control for between‐rater severity, evaluate intrarater consistency, and provide item‐level diagnostic statistics. Use of this methodology is illustrated with a large pool (1,345) of interim‐formative mathematics test items. Implications for the field and limitations of this approach are discussed. 相似文献

9.

Assessing cohesion in children’s writing: Development of a checklist

Lynda Struthers Judith C. Lapadat Peter D. MacMillan 《Assessing Writing》2013,18(3):187-201

Cohesion in writing is achieved through the use of linguistic devices that tie ideas together across a text, and is an important element in the development of coherent writing. Research shows that inter- and intra-developmental differences may appear in how children learn to use these devices, but cohesion is commonly overlooked in the evaluation and instruction of writing. In this study, we developed a checklist to assess cohesion in the writing of children in Grades 4–7, with the purpose of informing instructional practices. Following the procedure outlined by Crocker and Algina (1986), we developed and evaluated a checklist designed to assess the types of cohesive devices present in the writing of children. The checklist items showed fair to good discrimination between high and low scoring writers as demonstrated by a classical item analysis. We also found good interrater reliability, and evidence for discriminative validity. As internal consistency was weak, however, further research is needed to refine the instrument. Implications for the assessment of cohesion and future research are discussed. 相似文献

10.

关于教务办公自动化系统的探讨

李阳明杨铁林《教学研究(河北)》1999,(2)

以燕山大学教务管理的实践为背景,探讨了从题库、阅卷、学籍管理、教学计划、评价等环节采用办公自动化系统将产生的客观效果。相似文献

11.

Multidimensional CAT Item Selection Methods for Domain Scores and Composite Scores With Item Exposure Control and Content Constraints

Lihua Yao 《Journal of Educational Measurement》2014,51(1):18-38

The intent of this research was to find an item selection procedure in the multidimensional computer adaptive testing (CAT) framework that yielded higher precision for both the domain and composite abilities, had a higher usage of the item pool, and controlled the exposure rate. Five multidimensional CAT item selection procedures (minimum angle; volume; minimum error variance of the linear combination; minimum error variance of the composite score with optimized weight; and Kullback‐Leibler information) were studied and compared with two methods for item exposure control (the Sympson‐Hetter procedure and the fixed‐rate procedure, the latter simply refers to putting a limit on the item exposure rate) using simulated data. The maximum priority index method was used for the content constraints. Results showed that the Sympson‐Hetter procedure yielded better precision than the fixed‐rate procedure but had much lower item pool usage and took more time. The five item selection procedures performed similarly under Sympson‐Hetter. For the fixed‐rate procedure, there was a trade‐off between the precision of the ability estimates and the item pool usage: the five procedures had different patterns. It was found that (1) Kullback‐Leibler had better precision but lower item pool usage; (2) minimum angle and volume had balanced precision and item pool usage; and (3) the two methods minimizing the error variance had the best item pool usage and comparable overall score recovery but less precision for certain domains. The priority index for content constraints and item exposure was implemented successfully. 相似文献

12.

TECHNICAL GUIDELINES FOR ASSESSING COMPUTERIZED ADAPTIVE TESTS

BERT F. GREEN R. DARRELL BOCK LLOYD G. HUMPHREYS ROBERT L. LINN MARK D. RECKASE 《Journal of Educational Measurement》1984,21(4):347-360

Guidelines are proposed for evaluating a computerized adaptive test. Topics include dimensionality, measurement error, validity, estimation of item parameters, item pool characteristics and human factors. Equating CAT and conventional tests is considered and matters of equity are addressed. 相似文献

13.

Application of IRT Fixed Parameter Calibration to Multiple-Group Test Data

Seonghoon Kim Michael J. Kolen 《教育实用测度》2013,26(4):310-324

ABSTRACT

In applications of item response theory (IRT), fixed parameter calibration (FPC) has been used to estimate the item parameters of a new test form on the existing ability scale of an item pool. The present paper presents an application of FPC to multiple examinee groups test data that are linked to the item pool via anchor items, and investigates the performance of FPC relative to an alternative approach, namely independent 0–1 calibration and scale linking. Two designs for linking to the pool are proposed that involve multiple groups and test forms, for which multiple-group FPC can be effectively used. A real-data study shows that the multiple-group FPC method performs similarly to the alternative method in estimating ability distributions and new item parameters on the scale of the item pool. In addition, a simulation study shows that the multiple-group FPC method performs nearly equally to or better than the alternative method in recovering the underlying ability distributions and the new item parameters. 相似文献

14.

A Study of the Relation of General Semantics and Creativity

Sally True 《Journal of Experimental Education》2013,81(3):34-40

A form which students can use to assess their class experiences was presented. A factor analysis based on evaluations filled out by 1,648 students revealed four factors which measured (a) the quality of the instructors’ presentations, (b) the evaluation process and the student-instructor interactions, (c) the degree to which the students were stimulated and motivated by the instructors, and (d) the clarity of the tests. A further analysis indicated that subscale scores which reflected the factor scores could be developed from the total item pool. 相似文献

15.

英语语料库及相关软件在高考英语命题中的运用

李建平陶百强《考试研究》2014,(2):44-48

本文介绍了常见的几个英语语料库及相关软件,指出使用英语语料库及相关软件对于提高高考英语命题的质量和效率、减少或避免命题过程中出现的科学性错误、保证高考英语命题对中学英语教学的正确引导具有重要意义。相似文献

16.

The Relationship Between Item Exposure and Test Overlap in Computerized Adaptive Testing 总被引：1，自引：0，他引：1

Shu-Ying Chen Robert D. Ankenmann Judith A. Spray 《Journal of Educational Measurement》2003,40(2):129-145

The purpose of this article is to present an analytical derivation for the mathematical form of an average between-test overlap index as a function of the item exposure index, for fixed-length computerized adaptive tests (CATs). This algebraic relationship is used to investigate the simultaneous control of item exposure at both the item and test levels. The results indicate that, in fixed-length CATs, control of the average between-test overlap is achieved via the mean and variance of the item exposure rates of the items that constitute the CAT item pool. The mean of the item exposure rates is easily manipulated. Control over the variance of the item exposure rates can be achieved via the maximum item exposure rate (r_max). Therefore, item exposure control methods which implement a specification of r_max (e.g., Sympson & Hetter, 1985) provide the most direct control at both the item and test levels. 相似文献

17.

TEM8写作中存在的问题及教学策略

刘洁陈宏《江西教育学院学报》2006,27(2):59-62

英语专业八级考试写作项目是测试英语专业高年级学生实际写作水平的考项。本文旨在讨论英语学生在TEM8中出现的常见错误,从统一性、连贯性、语言的规范性和全文结构的整体性进行分析,以期在校学生能够有针对性地进行加强练习,在TEM8中取得良好成绩。相似文献

18.

Experiences in the Application of Item Response Theory in Test Construction

《教育实用测度》2013,26(4):297-312

Certain potential benefits of using item response theory in test construction are discussed and evaluated using the experience and evidence accumulated during 9 years of using a three-parameter model in the construction of major achievement batteries. We also discuss several cautions and limitations in realizing these benefits as well as issues in need of further research. The potential benefits considered are those of getting "sample-free" item calibrations and "item-free" person measurement, automatically equating various tests, decreasing the standard errors of scores without increasing the number of items used by using item pattern scoring, assessing item bias (or differential item functioning) independently of difficulty in a manner consistent with item selection, being able to determine just how adequate a tryout pool of items may be, setting up computer-generated "ideal" tests drawn from pools as targets for test developers, and controlling the standard error of a selected test at any desired set of score levels. 相似文献

19.

Person‐Fit Statistics for Joint Models for Accuracy and Speed

下载免费PDF全文

Jean‐Paul Fox Sukaesi Marianti 《Journal of Educational Measurement》2017,54(2):243-262

Response accuracy and response time data can be analyzed with a joint model to measure ability and speed of working, while accounting for relationships between item and person characteristics. In this study, person‐fit statistics are proposed for joint models to detect aberrant response accuracy and/or response time patterns. The person‐fit tests take the correlation between ability and speed into account, as well as the correlation between item characteristics. They are posited as Bayesian significance tests, which have the advantage that the extremeness of a test statistic value is quantified by a posterior probability. The person‐fit tests can be computed as by‐products of a Markov chain Monte Carlo algorithm. Simulation studies were conducted in order to evaluate their performance. For all person‐fit tests, the simulation studies showed good detection rates in identifying aberrant patterns. A real data example is given to illustrate the person‐fit statistics for the evaluation of the joint model. 相似文献

20.

Name writing ability not length of name is predictive of future academic attainment

Lee T. Copping Sarah Gott Helen Gray Peter Tymms 《Educational research; a review for teachers and all concerned with progress in education》2016,58(3):237-246

Background: The Performance Indicators in Primary Schools On Entry Baseline assessment for pupils starting school includes an item which aims to assess how well a pupil writes his or her own name. There is some debate regarding the utility of this measure, on the grounds that name length may constitute bias.

Purpose, method and design: The predictive validity of this item and its link to name length was investigated with a view to using this item in further assessments. Previous modest scale work from the USA, suggests that name writing ability is a robust indicator which correlates substantively with other known indicators of later reading whilst remaining independent of name length. This paper greatly expanded the sample size and geographical coverage and, rather than concurrent measures, the predictive validity of the item is assessed. The sample includes children from England, Scotland and Australia (N = 14932), assessed between 2011 and 2013. Potential confounding factors that are analysed include age, geographical region and ethnicity.

Findings and conclusions: The evidence suggests that the name writing item is a robust measure, with good predictive validity to future academic outcomes in early reading, phonological awareness and mathematics. The length was not related to the ability to write one’s own name nor was it predictive of future outcomes. 相似文献