首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In educational environments, monitoring persons' progress over time may help teachers to evaluate the effectiveness of their teaching procedures. Electronic learning environments are increasingly being used as part of formal education and resulting datasets can be used to understand and to improve the environment. This study presents longitudinal models based on the item response theory (IRT) for measuring persons' ability within and between study sessions in data from web-based learning environments. Two empirical examples are used to illustrate the presented models. Results show that by incorporating time spent within- and between-study sessions into an IRT model; one is able to track changes in ability of a population of persons or for groups of persons at any time of the learning process.  相似文献   

2.
An approach called generalizability in item response modeling (GIRM) is introduced in this article. The GIRM approach essentially incorporates the sampling model of generalizability theory (GT) into the scaling model of item response theory (IRT) by making distributional assumptions about the relevant measurement facets. By specifying a random effects measurement model, and taking advantage of the flexibility of Markov Chain Monte Carlo (MCMC) estimation methods, it becomes possible to estimate GT variance components simultaneously with traditional IRT parameters. It is shown how GT and IRT can be linked together, in the context of a single-facet measurement design with binary items. Using both simulated and empirical data with the software WinBUGS, the GIRM approach is shown to produce results comparable to those from a standard GT analysis, while also producing results from a random effects IRT model.  相似文献   

3.
This study proposes a structured constructs model (SCM) to examine measurement in the context of a multidimensional learning progression (LP). The LP is assumed to have features that go beyond a typical multidimentional IRT model, in that there are hypothesized to be certain cross‐dimensional linkages that correspond to requirements between the levels of the different dimensions. The new model builds on multidimensional item response theory models and change‐point analysis to add cut‐score and discontinuity parameters that embody these substantive requirements. This modeling strategy allows us to place the examinees in the appropriate LP level and simultaneously to model the hypothesized requirement relations. Results from a simulation study indicate that the proposed change‐point SCM recovers the generating parameters well. When the hypothesized requirement relations are ignored, the model fit tends to become worse, and the model parameters appear to be more biased. Moreover, the proposed model can be used to find validity evidence to support or disprove initial theoretical hypothesized links in the LP through empirical data. We illustrate the technique with data from an assessment system designed to measure student progress in a middle‐school statistics and modeling curriculum.  相似文献   

4.
Functional form misfit is frequently a concern in item response theory (IRT), although the practical implications of misfit are often difficult to evaluate. In this article, we illustrate how seemingly negligible amounts of functional form misfit, when systematic, can be associated with significant distortions of the score metric in vertical scaling contexts. Our analysis uses two‐ and three‐parameter versions of Samejima's logistic positive exponent model (LPE) as a data generating model. Consistent with prior work, we find LPEs generally provide a better comparative fit to real item response data than traditional IRT models (2PL, 3PL). Further, our simulation results illustrate how 2PL‐ or 3PL‐based vertical scaling in the presence of LPE‐induced misspecification leads to an artificial growth deceleration across grades, consistent with that commonly seen in vertical scaling studies. The results raise further concerns about the use of standard IRT models in measuring growth, even apart from the frequently cited concerns of construct shift/multidimensionality across grades.  相似文献   

5.
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article is to compare two different formulations of an alternative Item Response Theory (IRT) model developed to parameterize unidimensional projections of multidimensional test items: Analytical and Empirical formulations. Two real data applications are provided to illustrate how the projection IRT model can be used in practice, as well as to further examine how ability estimates from the projection IRT model compare to external examinee measures. The results suggest that collateral information extracted by a projection IRT model can be used to improve reliability and validity of subscale scores, which in turn can be used to provide diagnostic information about strength and weaknesses of examinees helping stakeholders to link instruction or curriculum to assessment results.  相似文献   

6.
本研究采用“共同题?锚测验”设计,使用R语言ltm程序包中的IRT两参数模型进行各年级小学生数学学力认知诊断测验和被试参数的估计,并使用equateIRT程序包进行跨年级小学生数学学力认知诊断测验各项参数的等值转换。结果表明,等值转换后各年级测验的题目难度和小学生数学学力均随年级增长而逐渐递增,不同学校、民族、性别学生的数学学力发展差异性特征均与理论假设相符。本研究验证了采用IRT垂直等值方法构建跨年级小学生数学学力发展水平垂直量表的可行性,为制定系统性补救教学方案和自适应题库建设提供了必要的实证证据。  相似文献   

7.
Linear factor analysis (FA) models can be reliably tested using test statistics based on residual covariances. We show that the same statistics can be used to reliably test the fit of item response theory (IRT) models for ordinal data (under some conditions). Hence, the fit of an FA model and of an IRT model to the same data set can now be compared. When applied to a binary data set, our experience suggests that IRT and FA models yield similar fits. However, when the data are polytomous ordinal, IRT models yield a better fit because they involve a higher number of parameters. But when fit is assessed using the root mean square error of approximation (RMSEA), similar fits are obtained again. We explain why. These test statistics have little power to distinguish between FA and IRT models; they are unable to detect that linear FA is misspecified when applied to ordinal data generated under an IRT model.  相似文献   

8.
This article presents an experimental study of the assessment made by university students of their level of digital competence in the use of mobile devices such as smartphones, laptops and tablets. The study was part of an investigation into ubiquitous learning with mobile devices and is based on the analysis of responses from a sample of 203 university students at eleven European and Latin American universities. Participants were asked questions about their performance on a set of digital activities that tested various components of digital competence. The analysis methodology was based on Item Response Theory (IRT). The survey data was analysed by applying a statistical model to represent the probability of obtaining an affirmative answer to each activity proposed. This enabled us to identify the difficulty and discrimination parameters of each activity. As an outcome of the study, measures on latent digital competence in individual participants were articulated. The results allowed us to describe how a number of devices and activities interacted. Understanding these types of interactions is necessary for a continued development of the evaluation of digital competence in students.  相似文献   

9.
A mixed‐effects item response theory (IRT) model is presented as a logical extension of the generalized linear mixed‐effects modeling approach to formulating explanatory IRT models. Fixed and random coefficients in the extended model are estimated using a Metropolis‐Hastings Robbins‐Monro (MH‐RM) stochastic imputation algorithm to accommodate for increased dimensionality due to modeling multiple design‐ and trait‐based random effects. As a consequence of using this algorithm, more flexible explanatory IRT models, such as the multidimensional four‐parameter logistic model, are easily organized and efficiently estimated for unidimensional and multidimensional tests. Rasch versions of the linear latent trait and latent regression model, along with their extensions, are presented and discussed, Monte Carlo simulations are conducted to determine the efficiency of parameter recovery of the MH‐RM algorithm, and an empirical example using the extended mixed‐effects IRT model is presented.  相似文献   

10.
In this article we present a general approach not relying on item response theory models (non‐IRT) to detect differential item functioning (DIF) in dichotomous items with presence of guessing. The proposed nonlinear regression (NLR) procedure for DIF detection is an extension of method based on logistic regression. As a non‐IRT approach, NLR can be seen as a proxy of detection based on the three‐parameter IRT model which is a standard tool in the study field. Hence, NLR fills a logical gap in DIF detection methodology and as such is important for educational purposes. Moreover, the advantages of the NLR procedure as well as comparison to other commonly used methods are demonstrated in a simulation study. A real data analysis is offered to demonstrate practical use of the method.  相似文献   

11.
An Extension of Four IRT Linking Methods for Mixed-Format Tests   总被引:1,自引:0,他引:1  
Under item response theory (IRT), linking proficiency scales from separate calibrations of multiple forms of a test to achieve a common scale is required in many applications. Four IRT linking methods including the mean/mean, mean/sigma, Haebara, and Stocking-Lord methods have been presented for use with single-format tests. This study extends the four linking methods to a mixture of unidimensional IRT models for mixed-format tests. Each linking method extended is intended to handle mixed-format tests using any mixture of the following five IRT models: the three-parameter logistic, graded response, generalized partial credit, nominal response (NR), and multiple-choice (MC) models. A simulation study is conducted to investigate the performance of the four linking methods extended to mixed-format tests. Overall, the Haebara and Stocking-Lord methods yield more accurate linking results than the mean/mean and mean/sigma methods. When the NR model or the MC model is used to analyze data from mixed-format tests, limitations of the mean/mean, mean/sigma, and Stocking-Lord methods are described.  相似文献   

12.
Testing the goodness of fit of item response theory (IRT) models is relevant to validating IRT models, and new procedures have been proposed. These alternatives compare observed and expected response frequencies conditional on observed total scores, and use posterior probabilities for responses across θ levels rather than cross-classifying examinees using point estimates of θ and score responses. This research compared these alternatives with regard to their methods, properties (Type 1 error rates and empirical power), available research, and practical issues (computational demands, treatment of missing data, effects of sample size and sparse data, and available computer programs). Different advantages and disadvantages related to these characteristics are discussed. A simulation study provided additional information about empirical power and Type 1 error rates.  相似文献   

13.
Standard 3.9 of the Standards for Educational and Psychological Testing ( 1999 ) demands evidence of model fit when item response theory (IRT) models are employed to data from tests. Hambleton and Han ( 2005 ) and Sinharay ( 2005 ) recommended the assessment of practical significance of misfit of IRT models, but few examples of such assessment can be found in the literature concerning IRT model fit. In this article, practical significance of misfit of IRT models was assessed using data from several tests that employ IRT models to report scores. The IRT model did not fit any data set considered in this article. However, the extent of practical significance of misfit varied over the data sets.  相似文献   

14.
《教育实用测度》2013,26(2):199-210
When the item response theory (IRT) model uses the marginal maximum likelihood estimation, person parameters are usually treated as random parameters following a certain distribution as a prior distribution to estimate the structural parameters in the model. For example, both PARSCALE (Muraki &; Bock, 1999) and BILOG 3 (Mislevy &; Bock, 1990) use a standard normal distribution as a default person prior. When the fixed-item linking method is used with an IRT program having a fixed-person prior distribution, it biases person ability growth downward or upward depending on the direction of the growth due to the misspecification of the prior. This study demonstrated by simulation how much biasing impact there is on person ability growth from the use of the fixed prior distribution in fixed-item linking for mixed-format test data. In addition, the study demonstrated how to recover growth through an iterative prior update calibration procedure. This shows that fixed-item linking is still a viable linking method for a fixed-person prior IRT calibration.  相似文献   

15.
In educational and psychological measurement, a person-fit statistic (PFS) is designed to identify aberrant response patterns. For parametric PFSs, valid inference depends on several assumptions, one of which is that the item response theory (IRT) model is correctly specified. Previous studies have used empirical data sets to explore the effects of model misspecification on PFSs. We further this line of research by using a simulation study, which allows us to explore issues that may be of interest to practitioners. Results show that, depending on the generating and analysis item models, Type I error rates at fixed values of the latent variable may be greatly inflated, even when the aggregate rates are relatively accurate. Results also show that misspecification is most likely to affect PFSs for examinees with extreme latent variable scores. Two empirical data analyses are used to illustrate the importance of model specification.  相似文献   

16.
Data walls, used to display student data within schools, are a growing phenomenon internationally. Drawing on a conceptual model of data use, this systematic review examined 32 empirical sources from compulsory education settings to evaluate claims about the impact of data walls on teaching and learning. The review found details regarding goals for data walls and the displayed information, but there was limited evidence of how data analysis was used to guide improvement action and evaluation of impact. The review concluded that there is currently insufficient empirical research evidence to substantiate or refute claims about their educational effects.  相似文献   

17.
A polytomous item is one for which the responses are scored according to three or more categories. Given the increasing use of polytomous items in assessment practices, item response theory (IRT) models specialized for polytomous items are becoming increasingly common. The purpose of this ITEMS module is to provide an accessible overview of polytomous IRT models. The module presents commonly encountered polytomous IRT models, describes their properties, and contrasts their defining principles and assumptions. After completing this module, the reader should have a sound understating of what a polytomous IRT model is, the manner in which the equations of the models are generated from the model's underlying step functions, how widely used polytomous IRT models differ with respect to their definitional properties, and how to interpret the parameters of polytomous IRT models.  相似文献   

18.
In judgmental standard setting procedures (e.g., the Angoff procedure), expert raters establish minimum pass levels (MPLs) for test items, and these MPLs are then combined to generate a passing score for the test. As suggested by Van der Linden (1982), item response theory (IRT) models may be useful in analyzing the results of judgmental standard setting studies. This paper examines three issues relevant to the use of lRT models in analyzing the results of such studies. First, a statistic for examining the fit of MPLs, based on judges' ratings, to an IRT model is suggested. Second, three methods for setting the passing score on a test based on item MPLs are analyzed; these analyses, based on theoretical models rather than empirical comparisons among the three methods, suggest that the traditional approach (i.e., setting the passing score on the test equal to the sum of the item MPLs) does not provide the best results. Third, a simple procedure, based on generalizability theory, for examining the sources of error in estimates of the passing score is discussed.  相似文献   

19.
Drawing valid inferences from item response theory (IRT) models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. This instructional module provides an overview of methods used for evaluating the fit of IRT models. Upon completing this module, the reader will have an understanding of traditional and Bayesian approaches for evaluating model‐data fit of IRT models, the relative advantages of each approach, and the software available to implement each method.  相似文献   

20.
Both structural equation modeling (SEM) and item response theory (IRT) can be used for factor analysis of dichotomous item responses. In this case, the measurement models of both approaches are formally equivalent. They were refined within and across different disciplines, and make complementary contributions to central measurement problems encountered in almost all empirical social science research fields. In this article (a) fundamental formal similiarities between IRT and SEM models are pointed out. It will be demonstrated how both types of models can be used in combination to analyze (b) the dimensional structure and (c) the measurement invariance of survey item responses. All analyses are conducted with Mplus, which allows an integrated application of both approaches in a unified, general latent variable modeling framework. The aim is to promote a diffusion of useful measurement techniques and skills from different disciplines into empirical social research.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号