期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Cognitive Diagnostic Multistage Testing by Partitioning Hierarchically Structured Attributes

Rae Yeong Kim Yun Joo Yoo 《Journal of Educational Measurement》2023,60(1):126-147

In cognitive diagnostic models (CDMs), a set of fine-grained attributes is required to characterize complex problem solving and provide detailed diagnostic information about an examinee. However, it is challenging to ensure reliable estimation and control computational complexity when The test aims to identify the examinee's attribute profile in a large-scale map of attributes. To address this problem, this study proposes a cognitive diagnostic multistage testing by partitioning hierarchically structured attributes (CD-MST-PH) as a multistage testing for CDM. In CD-MST-PH, multiple testlets can be constructed based on separate attribute groups before testing occurs, which retains the advantages of multistage testing over fully adaptive testing or the on-the-fly approach. Moreover, testlets are offered sequentially and adaptively, thus improving test accuracy and efficiency. An item information measure is proposed to compute the discrimination power of an item for each attribute, and a module assembly method is presented to construct modules anchored at each separate attribute group. Several module selection indices for CD-MST-PH are also proposed by modifying the item selection indices used in cognitive diagnostic computerized adaptive testing. The results of simulation study show that CD-MST-PH can improve test accuracy and efficiency relative to the conventional test without adaptive stages. 相似文献

2.

Multilevel Cognitive Diagnosis Models for Assessing Changes in Latent Attributes

下载免费PDF全文

Hung‐Yu Huang 《Journal of Educational Measurement》2017,54(4):440-480

Cognitive diagnosis models (CDMs) have been developed to evaluate the mastery status of individuals with respect to a set of defined attributes or skills that are measured through testing. When individuals are repeatedly administered a cognitive diagnosis test, a new class of multilevel CDMs is required to assess the changes in their attributes and simultaneously estimate the model parameters from the different measurements. In this study, the most general CDM of the generalized deterministic input, noisy “and” gate (G‐DINA) model was extended to a multilevel higher order CDM by embedding a multilevel structure into higher order latent traits. A series of simulations based on diverse factors was conducted to assess the quality of the parameter estimation. The results demonstrate that the model parameters can be recovered fairly well and attribute mastery can be precisely estimated if the sample size is large and the test is sufficiently long. The range of the location parameters had opposing effects on the recovery of the item and person parameters. Ignoring the multilevel structure in the data by fitting a single‐level G‐DINA model decreased the attribute classification accuracy and the precision of latent trait estimation. The number of measurement occasions had a substantial impact on latent trait estimation. Satisfactory model and person parameter recoveries could be achieved even when assumptions of the measurement invariance of the model parameters over time were violated. A longitudinal basic ability assessment is outlined to demonstrate the application of the new models. 相似文献

3.

Relationships between cognitive diagnosis, CTT, and IRT indices: an empirical investigation

Young-Sun Lee Jimmy de la Torre Yoon Soo Park 《Asia Pacific Education Review》2012,13(2):333-345

Cognitive diagnosis models (CDMs) continue to generate interest among researchers and practitioners because they can provide diagnostic information relevant to classroom instruction and student learning. However, its modeling component has outpaced its complementary component??test construction. Thus, most applications of cognitive diagnosis modeling involve retrofitting of CDMs to assessments constructed using classical test theory (CTT) or item response theory (IRT). This study explores the relationship between item statistics used in the CTT, IRT, and CDM frameworks using such an assessment, specifically a large-scale mathematics assessment. Furthermore, by highlighting differences between tests with varying levels of diagnosticity using a measure of item discrimination from a CDM approach, this study empirically uncovers some important CTT and IRT item characteristics. These results can be used to formulate practical guidelines in using IRT- or CTT-constructed assessments for cognitive diagnosis purposes. 相似文献

4.

Cognitive diagnostic model of best choice: a study of reading comprehension

Hamdollah Ravand Alexander Robitzsch 《教育心理学》2018,38(10):1255-1277

Abstract

The present study compared the performance of six cognitive diagnostic models (CDMs) to explore inter skill relationship in a reading comprehension test. To this end, item responses of about 21,642 test-takers to a high-stakes reading comprehension test were analyzed. The models were compared in terms of model fit at both test and item levels, classification consistency and accuracy, and proportion of skill mastery profiles. The results showed that the G-DINA performed the best and the C-RUM, NC-RUM, and ACDM showed the closest affinity to the G-DINA. In terms of some criteria, the DINA showed comparable performance to the G-DINA. The test-level results were corroborated by the item-level model comparison, where DINA, DINO, and ACDM variously fit some of the items. The results of the study suggested that relationships among the subskills of reading comprehension might be a combination of compensatory and non-compensatory. Therefore, it is suggested that the choice of the CDM be carried out at item level rather than test level. 相似文献

5.

Relative and Absolute Fit Evaluation in Cognitive Diagnosis Modeling

Jinsong Chen Jimmy de la Torre Zao Zhang 《Journal of Educational Measurement》2013,50(2):123-140

As with any psychometric models, the validity of inferences from cognitive diagnosis models (CDMs) determines the extent to which these models can be useful. For inferences from CDMs to be valid, it is crucial that the fit of the model to the data is ascertained. Based on a simulation study, this study investigated the sensitivity of various fit statistics for absolute or relative fit under different CDM settings. The investigation covered various types of model–data misfit that can occur with the misspecifications of the Q‐matrix, the CDM, or both. Six fit statistics were considered: –2 log likelihood (–2LL), Akaike's information criterion (AIC), Bayesian information criterion (BIC), and residuals based on the proportion correct of individual items (p), the correlations (r), and the log‐odds ratio of item pairs (l). An empirical example involving real data was used to illustrate how the different fit statistics can be employed in conjunction with each other to identify different types of misspecifications. With these statistics and the saturated model serving as the basis, relative and absolute fit evaluation can be integrated to detect misspecification efficiently. 相似文献

6.

A Note on the Invariance of the DINA Model Parameters 总被引：1，自引：0，他引：1

Jimmy de la Torre Young-Sun Lee 《Journal of Educational Measurement》2010,47(1):115-127

Cognitive diagnosis models (CDMs), as alternative approaches to unidimensional item response models, have received increasing attention in recent years. CDMs are developed for the purpose of identifying the mastery or nonmastery of multiple fine-grained attributes or skills required for solving problems in a domain. For CDMs to receive wider use, researchers and practitioners need to understand the basic properties of these models. The article focuses on one CDM, the deterministic inputs, noisy "and" gate (DINA) model, and the invariance property of its parameters. Using simulated data involving different attribute distributions, the article demonstrates that the DINA model parameters are absolutely invariant when the model perfectly fits the data. An additional example involving different ability groups illustrates how noise in real data can contribute to the lack of invariance in these parameters. Some practical implications of these findings are discussed . 相似文献

7.

Evaluating the Wald Test for Item‐Level Comparison of Saturated and Reduced Models in Cognitive Diagnosis

Jimmy de la Torre Young‐Sun Lee 《Journal of Educational Measurement》2013,50(4):355-373

This article used the Wald test to evaluate the item‐level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G‐DINA model. Results show that when the sample size is small and a larger number of attributes are required, the Type I error rate of the Wald test for the DINA and DINO models can be higher than the nominal significance levels, while the Type I error rate of the A‐CDM is closer to the nominal significance levels. However, with larger sample sizes, the Type I error rates for the three models are closer to the nominal significance levels. In addition, the Wald test has excellent statistical power to detect when the true underlying model is none of the reduced models examined even for relatively small sample sizes. The performance of the Wald test was also examined with real data. With an increasing number of CDMs from which to choose, this article provides an important contribution toward advancing the use of CDMs in practical educational settings. 相似文献

8.

Parameter Invariance and Skill Attribute Continuity in the DINA Model

下载免费PDF全文

Daniel M. Bolt Jee‐Seon Kim 《Journal of Educational Measurement》2018,55(2):264-280

Cognitive diagnosis models (CDMs) typically assume skill attributes with discrete (often binary) levels of skill mastery, making the existence of skill continuity an anticipated form of model misspecification. In this article, misspecification due to skill continuity is argued to be of particular concern for several CDM applications due to the lack of invariance it yields in CDM skill attribute metrics, or what in this article are viewed as the “thresholds” applied to continuous attributes in distinguishing masters from nonmasters. Using the deterministic input noisy and (DINA) model as an illustration, the effects observed in real data are found to be systematic, with higher thresholds for mastery tending to emerge in higher ability populations. The results are shown to have significant implications for applications of CDMs that rely heavily upon the parameter invariance properties of the models, including, for example, applications toward the measurement of growth and differential item functioning analyses. 相似文献

9.

The Impact of Model Misspecification on Parameter Estimation and Item‐Fit Assessment in Log‐Linear Diagnostic Classification Models

Olga Kunina‐Habenicht André A. Rupp Oliver Wilhelm 《Journal of Educational Measurement》2012,49(1):59-81

Using a complex simulation study we investigated parameter recovery, classification accuracy, and performance of two item‐fit statistics for correct and misspecified diagnostic classification models within a log‐linear modeling framework. The basic manipulated test design factors included the number of respondents (1,000 vs. 10,000), attributes (3 vs. 5), and items (25 vs. 50) as well as different attribute correlations (.50 vs. .80) and marginal attribute difficulties (equal vs. different). We investigated misspecifications of interaction effect parameters under correct Q‐matrix specification and two types of Q‐matrix misspecification. While the misspecification of interaction effects had little impact on classification accuracy, invalid Q‐matrix specifications led to notably decreased classification accuracy. Two proposed item‐fit indexes were more strongly sensitive to overspecification of Q‐matrix entries for items than to underspecification. Information‐based fit indexes AIC and BIC were sensitive to both over‐ and underspecification. 相似文献

10.

Differential Item Functioning Assessment in Cognitive Diagnostic Modeling: Application of the Wald Test to Investigate DIF in the DINA Model

Likun Hou Jimmy de la Torre Ratna Nandakumar 《Journal of Educational Measurement》2014,51(1):98-125

Analyzing examinees’ responses using cognitive diagnostic models (CDMs) has the advantage of providing diagnostic information. To ensure the validity of the results from these models, differential item functioning (DIF) in CDMs needs to be investigated. In this article, the Wald test is proposed to examine DIF in the context of CDMs. This study explored the effectiveness of the Wald test in detecting both uniform and nonuniform DIF in the DINA model through a simulation study. Results of this study suggest that for relatively discriminating items, the Wald test had Type I error rates close to the nominal level. Moreover, its viability was underscored by the medium to high power rates for most investigated DIF types when DIF size was large. Furthermore, the performance of the Wald test in detecting uniform DIF was compared to that of the traditional Mantel‐Haenszel (MH) and SIBTEST procedures. The results of the comparison study showed that the Wald test was comparable to or outperformed the MH and SIBTEST procedures. Finally, the strengths and limitations of the proposed method and suggestions for future studies are discussed. 相似文献

11.

Assessment of Differential Item Functioning Under Cognitive Diagnosis Models: The DINA Model Example

下载免费PDF全文

Xiaomin Li Wen‐Chung Wang 《Journal of Educational Measurement》2015,52(1):28-54

The assessment of differential item functioning (DIF) is routinely conducted to ensure test fairness and validity. Although many DIF assessment methods have been developed in the context of classical test theory and item response theory, they are not applicable for cognitive diagnosis models (CDMs), as the underlying latent attributes of CDMs are multidimensional and binary. This study proposes a very general DIF assessment method in the CDM framework which is applicable for various CDMs, more than two groups of examinees, and multiple grouping variables that are categorical, continuous, observed, or latent. The parameters can be estimated with Markov chain Monte Carlo algorithms implemented in the freeware WinBUGS. Simulation results demonstrated a good parameter recovery and advantages in DIF assessment for the new method over the Wald method. 相似文献

12.

Using the Attribute Hierarchy Method to Make Diagnostic Inferences about Examinees’ Cognitive Skills in Critical Reading

Changjiang Wang Mark J. Gierl 《Journal of Educational Measurement》2011,48(2):165-187

The purpose of this study is to apply the attribute hierarchy method (AHM) to a subset of SAT critical reading items and illustrate how the method can be used to promote cognitive diagnostic inferences. The AHM is a psychometric procedure for classifying examinees’ test item responses into a set of attribute mastery patterns associated with different components from a cognitive model. The study was conducted in two steps. In step 1, three cognitive models were developed by reviewing selected literature in reading comprehension as well as research related to SAT Critical Reading. Then, the cognitive models were validated by having a sample of students think aloud as they solved each item. In step 2, psychometric analyses were conducted on the SAT critical reading cognitive models by evaluating the model‐data fit between the expected and observed response patterns produced from two random samples of 2,000 examinees who wrote the items. The model that provided best data‐model fit was then used to calculate attribute probabilities for 15 examinees to illustrate our diagnostic testing procedure. 相似文献

13.

An Item‐Level Expected Classification Accuracy and Its Applications in Cognitive Diagnostic Assessment

Wenyi Wang Lihong Song Ping Chen Shuliang Ding 《Journal of Educational Measurement》2019,56(1):51-75

Most of the existing classification accuracy indices of attribute patterns lose effectiveness when the response data is absent in diagnostic testing. To handle this issue, this article proposes new indices to predict the correct classification rate of a diagnostic test before administering the test under the deterministic noise input “and” gate (DINA) model. The new indices include an item‐level expected classification accuracy (ECA) for attributes and a test‐level ECA for attributes and attribute patterns, and both of them are calculated based solely on the known item parameters and Q ‐matrix. Theoretical analysis showed that the item‐level ECA could be regarded as a measure of correct classification rates of attributes contributed by an item. This article also illustrates how to apply the item‐level ECA for attributes to estimate the correct classification rate of attributes patterns at the test level. Simulation results showed that two test‐level ECA indices, ECA_I_W (an index based on the independence assumption and the weighted sum of the item‐level ECAs) and ECA_C_M (an index based on Gaussian Copula function that incorporates the dependence structure of the events of attribute classification and the simple average of the item‐level ECAs), could make an accurate prediction for correct classification rates of attribute patterns. 相似文献

14.

A Multilevel Testlet Model for Dual Local Dependence

Hong Jiao Akihito Kamata Shudong Wang Ying Jin 《Journal of Educational Measurement》2012,49(1):82-100

The applications of item response theory (IRT) models assume local item independence and that examinees are independent of each other. When a representative sample for psychometric analysis is selected using a cluster sampling method in a testlet‐based assessment, both local item dependence and local person dependence are likely to be induced. This study proposed a four‐level IRT model to simultaneously account for dual local dependence due to item clustering and person clustering. Model parameter estimation was explored using the Markov Chain Monte Carlo method. Model parameter recovery was evaluated in a simulation study in comparison with three other related models: the Rasch model, the Rasch testlet model, and the three‐level Rasch model for person clustering. In general, the proposed model recovered the item difficulty and person ability parameters with the least total error. The bias in both item and person parameter estimation was not affected but the standard error (SE) was affected. In some simulation conditions, the difference in classification accuracy between models could go up to 11%. The illustration using the real data generally supported model performance observed in the simulation study. 相似文献

15.

基于BP神经网络的认知诊断评估分类准确率影响因素分析

聂畅孙小坚辛涛《中国考试》2021,(3)

BP神经网络是目前应用最广泛的人工神经网络模型之一,在分类和识别上表现出良好的特性,因此被研究者用于认知诊断评估以对被试进行诊断分类。通过模拟研究,考查属性个数、属性层级关系、测验长度、题目质量、测试样本量5个因素对BP神经网络在认知诊断中分类准确性的影响。结果表明:1)基于BP神经网络的认知诊断分类准确率不依赖于测试样本量;2)题目质量和测验长度对BP神经网络的诊断准确率有显著的积极影响;3)属性个数对BP神经网络的分类准确率有消极影响;4)题目质量一定程度上会影响BP诊断方法在不同属性层级结构上的分类准确率。相似文献

16.

Diagnosing English reading ability in Chinese senior high schools

《Studies in Educational Evaluation》2020

To diagnose the English as a Foreign Language (EFL) reading ability of Chinese high-school students, the study explored how an educational theory, the revised taxonomy of educational objectives, could be used to create the attribute list. Q-matrices were proposed and refined qualitatively and quantitatively. The final Q-matrix specified the relationship between 53 reading items and 9 cognitive attributes. Thereafter, 978 examinees’ responses were calibrated by cognitive diagnosis models (CDMs) to explore their strengths and weaknesses in EFL reading. Results showed strengths and weaknesses on the 9 attributes of the sampled population, examinees at three proficiency levels and individual learners. A diagnostic score report was also developed to communicate multi-layered information to various stakeholders. The goodness of fit of the selected CDM was evaluated from multiple measures. The results provide empirical evidence for the utility of educational theories in cognitive diagnosis, and the feasibility of retrofitting non-diagnostic tests for diagnostic purposes in language testing. In addition, the study also demonstrates procedures of model selection and a post-hoc approach of model verification in language diagnosis. 相似文献

17.

Investigating the Effect of Item Position in Computer‐Based Tests

Feiming Li Allan Cohen Linjun Shen 《Journal of Educational Measurement》2012,49(4):362-379

Computer‐based tests (CBTs) often use random ordering of items in order to minimize item exposure and reduce the potential for answer copying. Little research has been done, however, to examine item position effects for these tests. In this study, different versions of a Rasch model and different response time models were examined and applied to data from a CBT administration of a medical licensure examination. The models specifically were used to investigate whether item position affected item difficulty and item intensity estimates. Results indicated that the position effect was negligible. 相似文献

18.

Diagnosing Teachers’ Understandings of Rational Numbers: Building a Multidimensional Test Within the Diagnostic Classification Framework

Laine Bradshaw Andrew Izsák Jonathan Templin Erik Jacobson 《Educational Measurement》2014,33(1):2-14

We report a multidimensional test that examines middle grades teachers’ understanding of fraction arithmetic, especially multiplication and division. The test is based on four attributes identified through an analysis of the extensive mathematics education research literature on teachers’ and students’ reasoning in this content area. We administered the test to a national sample of 990 in‐service middle grades teachers and analyzed the item responses using the log‐linear cognitive diagnosis model. We report the diagnostic quality of the test at the item level, mastery classifications for teachers, and attribute relationships. Our results demonstrate that, when a test is grounded in research on cognition and is designed to be multidimensional from the onset, it is possible to use diagnostic classification models to detect distinct patterns of attribute mastery. 相似文献

19.

Restrictive Stochastic Item Selection Methods in Cognitive Diagnostic Computerized Adaptive Testing

Chun Wang Hua‐Hua Chang Alan Huebner 《Journal of Educational Measurement》2011,48(3):255-273

This paper proposes two new item selection methods for cognitive diagnostic computerized adaptive testing: the restrictive progressive method and the restrictive threshold method. They are built upon the posterior weighted Kullback‐Leibler (KL) information index but include additional stochastic components either in the item selection index or in the item selection procedure. Simulation studies show that both methods are successful at simultaneously suppressing overexposed items and increasing the usage of underexposed items. Compared to item selection based upon (1) pure KL information and (2) the Sympson‐Hetter method, the two new methods strike a better balance between item exposure control and measurement accuracy. The two new methods are also compared with Barrada et al.'s (2008) progressive method and proportional method. 相似文献

20.

Computerized Adaptive Testing in Early Education: Exploring the Impact of Item Position Effects on Ability Estimation

Anthony D. Albano Liuhan Cai Erin M. Lease Scott R. McConnell 《Journal of Educational Measurement》2019,56(2):437-451

Studies have shown that item difficulty can vary significantly based on the context of an item within a test form. In particular, item position may be associated with practice and fatigue effects that influence item parameter estimation. The purpose of this research was to examine the relevance of item position specifically for assessments used in early education, an area of testing that has received relatively limited psychometric attention. In an initial study, multilevel item response models fit to data from an early literacy measure revealed statistically significant increases in difficulty for items appearing later in a 20‐item form. The estimated linear change in logits for an increase of 1 in position was .024, resulting in a predicted change of .46 logits for a shift from the beginning to the end of the form. A subsequent simulation study examined impacts of item position effects on person ability estimation within computerized adaptive testing. Implications and recommendations for practice are discussed. 相似文献