首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A new approach for partitioning test items into dimensionally distinct item clusters is introduced. The core of the approach is a new item-pair conditional-covariance-based proximity measure that can be used with hierarchical cluster analysis. An extensive simulation study designed to test the limits of the approach indicates that when approximate simple structure holds, the procedure can correctly partition the test into dimensionally homogeneous item clusters even for very high correlations between the latent dimensions. In particular, the procedure can correctly classify (on average) over 90% of the items for correlations as high as .9. The cooperative role that the procedure can play when used in conjunction with other dimensionality assessment procedures is discussed.  相似文献   

2.
3.
An important part of test development is ensuring alignment between test forms and content standards. One common way of measuring alignment is the Webb (1997, 2007) alignment procedure. This article investigates (a) how well item writers understand components of the definition of Depth of Knowledge (DOK) from the Webb alignment procedure and (b) how consistent their DOK ratings are with ratings provided by other committees of educators across grade levels, content areas, and alternate assessment levels in a Midwestern state alternate assessment system. Results indicate that many item writers understand key features of DOK. However, some item writers struggled to articulate what DOK means and had some misconceptions. Additional analyses suggested some lack of consistency between the item writer DOK ratings and the committee DOK ratings. Some notable differences were found across alternate assessment levels and content areas. Implications for future item writing training and alignment studies are provided.  相似文献   

4.
The purpose of this article is to present logistic discriminant function analysis as a means of differential item functioning (DIF) identification of items that are polytomously scored. The procedure is presented with examples of a DIF analysis using items from a 27-item mathematics test which includes six open-ended response items scored polytomously. The results show that the logistic discriminant function procedure is ideally suited for DIF identification on nondichotomously scored test items. It is simpler and more practical than polytomous extensions of the logistic regression DIF procedure and appears to fee more powerful than a generalized Mantel-Haenszelprocedure.  相似文献   

5.
DIMTEST is a nonparametric statistical test procedure for assessing unidimensionality of binary item response data. The development of Stout's statistic, T, used in the DIMTEST procedure, does not require the assumption of a particular parametric form for the ability distributions or the item response functions. The purpose of the present study was to empirically investigate the performance of the statistic T with respect to different shapes of ability distributions. Several nonnormal distributions, both symmetric and nonsymmetric, were considered for this purpose. Other factors varied in the study were test length, sample size, and the level of correlation between abilities. The results of Type I error and power studies showed that the test statistic T exhibited consistently similar performance for all different shapes of ability distributions investigated in the study, which confirmed the nonparametric nature of the statistic T.  相似文献   

6.
Using a technique that controlled exposure of items, the investigator examined the effect on mean test score, item difficulty index, and reliability and validity coefficients of the reordering of items within a power test containing ten letter-series-completion items. The results suggest that effects on test statistics from item rearrangement are, generally, minimal. The implication of these findings for test designs involving an item sampling procedure is that performance on an item is minimally influenced by the context in which it occurs.  相似文献   

7.
A graphic procedure for studying differential item functioning (DIF) designed to provide diagnostic information for psychometricians and educators is presented. Items from a certifying examination in a subspecialty of internal medicine are used as examples. The procedure provides a “signature” of each test item that may be used in conjunction with a summary statistic to flag items showing DIF. Advantages and limitations of the procedure are noted, as are additional areas for investigation.  相似文献   

8.
In test development, item response theory (IRT) is a method to determine the amount of information that each item (i.e., item information function) and combination of items (i.e., test information function) provide in the estimation of an examinee's ability. Studies investigating the effects of item parameter estimation errors over a range of ability have demonstrated an overestimation of information when the most discriminating items are selected (i.e., item selection based on maximum information). In the present study, the authors examined the influence of item parameter estimation errors across 3 item selection methods—maximum no target, maximum target, and theta maximum—using the 2- and 3-parameter logistic IRT models. Tests created with the maximum no target and maximum target item selection procedures consistently overestimated the test information function. Conversely, tests created using the theta maximum item selection procedure yielded more consistent estimates of the test information function and, at times, underestimated the test information function. Implications for test development are discussed.  相似文献   

9.
Biased test items were intentionally imbedded within a set of test items, and the resulting instrument was administered to large samples of blacks and whites. Three popular item bias detection procedures were then applied to the data: (1) the three-parameter item characteristic curve procedure, (2) the chi-square method, and (3) the transformed item difficulty approach. The three-parameter item characteristic curve procedure proved most effective at detecting the intentionally biased test items; and the chi-square method was viewed as the best alternative. The transformed item difficulty approach has certain limitations yet represents a practical alternative if sample size, lack of computer facilities, or the like preclude the use of the other two procedures.  相似文献   

10.
《Educational Assessment》2013,18(4):333-356
Alignment has taken on increased importance given the current high-stakes nature of assessment. To make well-informed decisions about student learning on the basis of test results, assessment items need to be well aligned with standards. Project 2061 of the American Association for the Advancement of Science (AAAS) has developed a procedure for analyzing the content and quality of assessment items. The authors of this study used this alignment procedure to closely examine 2 mathematics assessment items. Student work on these 2 items was analyzed to determine whether the conclusions reached through the use of the alignment procedure could be validated. It was found that the Project 2061 alignment procedure was effective in providing a tool for in-depth analysis of the mathematical content of the item and a set of standards and in identifying 1 particular content standard that was most closely aligned with the standard. Through analyzing student work samples and student interviews, it was also found that students' thinking may not correspond to the standard identified as best aligned with the learning goals of the item. This finding highlights the potential usefulness of analyzing student work to clarify any additional deficiencies of an assessment item not revealed by an alignment procedure.  相似文献   

11.
Permitting item review is to the benefit of the examinees who typically increase their test scores with item review. However, testing companies do not prefer item review since it does not follow the logic on which adaptive tests are based, and since it is prone to cheating strategies. Consequently, item review is not permitted in many adaptive tests. This study attempts to provide a solution that would allow examinees to revise their answers, without jeopardizing the quality and efficiency of the test. The purpose of this study is to test the efficiency of a “rearrangement procedure” that rearranges and skips certain items in order to better estimate the examinees' abilities, without allowing them to cheat on the test. This was examined through a simulation study. The results show that the rearrangement procedure is effective in reducing the standard error of the Bayesian ability estimates and in increasing the reliability of the same estimates.  相似文献   

12.
This paper describes an item response model for multiple-choice items and illustrates its application in item analysis. The model provides parametric and graphical summaries of the performance of each alternative associated with a multiple-choice item; the summaries describe each alternative's relationship to the proficiency being measured. The interpretation of the parameters of the multiple-choice model and the use of the model in item analysis are illustrated using data obtained from a pilot test of mathematics achievement items. The use of such item analysis for the detection of flawed items, for item design and development, and for test construction is discussed.  相似文献   

13.
In this paper we present a new methodology for detecting differential item functioning (DIF). We introduce a DIF model, called the random item mixture (RIM), that is based on a Rasch model with random item difficulties (besides the common random person abilities). In addition, a mixture model is assumed for the item difficulties such that the items may belong to one of two classes: a DIF or a non-DIF class. The crucial difference between the DIF class and the non-DIF class is that the item difficulties in the DIF class may differ according to the observed person groups while they are equal across the person groups for the items from the non-DIF class. Statistical inference for the RIM is carried out in a Bayesian framework. The performance of the RIM is evaluated using a simulation study in which it is compared with traditional procedures, like the likelihood ratio test, the Mantel-Haenszel procedure and the standardized p -DIF procedure. In this comparison, the RIM performs better than the other methods. Finally, the usefulness of the model is also demonstrated on a real life data set.  相似文献   

14.
This study compared the properties of five methods of item exposure control within the purview of estimating examinees' abilities in a computerized adaptive testing (CAT) context. Each exposure control algorithm was incorporated into the item selection procedure and the adaptive testing progressed based on the CAT design established for this study. The merits and shortcomings of these strategies were considered under different item pool sizes and different desired maximum exposure rates and were evaluated in light of the observed maximum exposure rates, the test overlap rates, and the conditional standard errors of measurement. Each method had its advantages and disadvantages, but no one possessed all of the desired characteristics. There was a clear and logical trade-off between item exposure control and measurement precision. The Stocking and Lewis conditional multinomial procedure and, to a slightly lesser extent, the Davey and Parshall method seemed to be the most promising considering all of the factors that this study addressed.  相似文献   

15.
Variation in test performance among examinees from different regions or national jurisdictions is often partially attributed to differences in the degree of content correspondence between local school or training program curricula, and the test of interest. This posited relationship between test-curriculum correspondence, or “alignment,” and test performance is usually inferred from highly distal evidence, rather than directly examined. Utilizing mathematics standards content analysis data and achievement test item data from ten U.S. states, we examine the relationship between topic-specific alignment and test item performance. When a particular item’s content type is emphasized by the standards, we find evidence of a positive relationship between the alignment measure and proportion-correct test item difficulty, although this effect is not consistent across samples. Implications of the results for curricular achievement test development and score interpretation are discussed.  相似文献   

16.
In this study we evaluated and compared three item selection procedures: the maximum Fisher information procedure (F), the a-stratified multistage computer adaptive testing (CAT) (STR), and a refined stratification procedure that allows more items to be selected from the high a strata and fewer items from the low a strata (USTR), along with completely random item selection (RAN). The comparisons were with respect to error variances, reliability of ability estimates and item usage through CATs simulated under nine test conditions of various practical constraints and item selection space. The results showed that F had an apparent precision advantage over STR and USTR under unconstrained item selection, but with very poor item usage. USTR reduced error variances for STR under various conditions, with small compromises in item usage. Compared to F, USTR enhanced item usage while achieving comparable precision in ability estimates; it achieved a precision level similar to F with improved item usage when items were selected under exposure control and with limited item selection space. The results provide implications for choosing an appropriate item selection procedure in applied settings.  相似文献   

17.
A directly applicable latent variable modeling procedure for classical item analysis is outlined. The method allows one to point and interval estimate item difficulty, item correlations, and item-total correlations for composites consisting of categorical items. The approach is readily employed in empirical research and as a by-product permits examining the latent structure of tentative versions of multiple-component measuring instruments. The discussed procedure is straightforwardly utilized with the increasingly popular latent variable modeling software Mplus, and is illustrated on a numerical example.  相似文献   

18.
In Experiment 1, short-term memory for lists of visual stimuli was studied in four squirrel monkeys (Saimiri sciureus). A delayed-matching procedure was used in which a subject was presented with lists containing one, three, or six stimulus patterns, and memory for serial positions was probed by requiring the subject to choose between a list item and a nonlist item. The rate of item presentation was varied, as was the delay between the final item on a list and the retention test. In Experiment 2, the same procedures were used to compare recognition memory in four monkeys and four humans. Although differences in the levels and shapes of the serial-position curves appeared between species, both monkeys and humans showed primacy and recency effects. The presentation time of stimuli had a negligible effect on performance in both monkeys and humans, whereas delay significantly affected human retention but not monkey retention.  相似文献   

19.
This paper describes a procedure for automated test forms assembly based on Classical Test Theory (CTT). The procedure uses stratified random content sampling and test form pre-equating to ensure both content and psychometric equivalence in generating virtually unlimited parallel forms. The procedure extends the usefulness of CTT in automated test form construction, yielding classical item statistics based on representative sample distributions and pre-equated test forms with known psychometric characteristics. A rationale for the procedure is presented followed by an example application and discussion of psychometric considerations related to its use.  相似文献   

20.
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the existing methods were designed to detect drifts in individual items, which may not be adequate for test characteristic curve–based linking or equating. One example is the item response theory–based true score equating, whose goal is to generate a conversion table to relate number‐correct scores on two forms based on their test characteristic curves. This article introduces a stepwise test characteristic curve method to detect item parameter drift iteratively based on test characteristic curves without needing to set any predetermined critical values. Comparisons are made between the proposed method and two existing methods under the three‐parameter logistic item response model through simulation and real data analysis. Results show that the proposed method produces a small difference in test characteristic curves between administrations, an accurate conversion table, and a good classification of drifted and nondrifted items and at the same time keeps a large amount of linking items.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号