首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Studies of differential item functioning under item response theory require that item parameter estimates be placed on the same metric before comparisons can be made. The present study compared the effects of three methods for linking metrics: a weighted mean and sigma method (WMS); the test characteristic curve method (TCC); and the minimum chi-square method (MCS), on detection of differential item functioning. Both iterative and noniterative linking procedures were compared for each method. Results indicated that detection of differentially functioning items following linking via the test characteristic curve method gave the most accurate results when the sample size was small. When the sample size was large, results for the three linking methods were essentially the same. Iterative linking provided an improvement in detection of differentially functioning items over noniterative linking particularly with the .05 alpha level. The weighted mean and sigma method showed greater improvement with iterative linking than either the test characteristic curve or minimum chi-square method.  相似文献   

2.
Linking item parameters to a base scale   总被引:1,自引:0,他引:1  
This paper compares three methods of item calibration??concurrent calibration, separate calibration with linking, and fixed item parameter calibration??that are frequently used for linking item parameters to a base scale. Concurrent and separate calibrations were implemented using BILOG-MG. The Stocking and Lord in Appl Psychol Measure 7:201?C210, (1983) characteristic curve method of parameter linking was used in conjunction with separate calibration. The fixed item parameter calibration (FIPC) method was implemented using both BILOG-MG and PARSCALE because the method is carried out differently by the two programs. Both programs use multiple EM cycles, but BILOG-MG does not update the prior ability distribution during FIPC calibration, whereas PARSCALE updates the prior ability distribution multiple times. The methods were compared using simulations based on actual testing program data, and results were evaluated in terms of recovery of the underlying ability distributions, the item characteristic curves, and the test characteristic curves. Factors manipulated in the simulations were sample size, ability distributions, and numbers of common (or fixed) items. The results for concurrent calibration and separate calibration with linking were comparable, and both methods showed good recovery results for all conditions. Between the two fixed item parameter calibration procedures, only the appropriate use of PARSCALE consistently provided item parameter linking results similar to those of the other two methods.  相似文献   

3.
Increasing use of item pools in large-scale educational assessments calls for an appropriate scaling procedure to achieve a common metric among field-tested items. The present study examines scaling procedures for developing a new item pool under a spiraled block linking design. The three scaling procedures are considered: (a) concurrent calibration, (b) separate calibration with one linking, and (c) separate calibration with three sequential linking. Evaluation across varying sample sizes and item pool sizes suggests that calibrating an item pool simultaneously results in the most stable scaling. The separate calibration with linking procedures produced larger scaling errors as the number of linking steps increased. The Haebara’s item characteristic curve linking resulted in better performances than the test characteristic curve (TCC) linking method. The present article provides an analytic illustration that the test characteristic curve method may fail to find global solutions in polytomous items. Finally, comparison of the single- and mixed-format item pools suggests that the use of polytomous items as the anchor can improve the overall scaling accuracy of the item pools.  相似文献   

4.
ABSTRACT

Based on concerns about the item response theory (IRT) linking approach used in the Programme for International Student Assessment (PISA) until 2012 as well as the desire to include new, more complex, interactive items with the introduction of computer-based assessments, alternative IRT linking methods were implemented in the 2015 PISA round. The new linking method represents a concurrent calibration using all available data, enabling us to find item parameters that maximize fit across all groups and allowing us to investigate measurement invariance across groups. Apart from the Rasch model that historically has been used in PISA operational analyses, we compared our method against more general IRT models that can incorporate item-by-country interactions. The results suggest that our proposed method holds promise not only to provide a strong linkage across countries and cycles but also to serve as a tool for investigating measurement invariance.  相似文献   

5.
Abstract

One major aim of international large-scale assessments (ILSAs) is to monitor changes in student performance over time. To accomplish this task, a set of common items is repeatedly administered in each assessment and linking methods are used to align the results from the different assessments on a common scale. The present article introduces a framework for discussing linking errors in ILSAs, in which different components of linking errors are distinguished (country-by-item interaction, assessment-by-item interaction and country-by-assessment-by-item interaction). Furthermore, the different components of linking errors are used to analytically derive standard errors for national trend estimates. In a simulation study, the proposed standard error formula outperforms the method that is used in PISA. In addition, the PISA 2006 and 2009 reading data are used to illustrate how the interpretation of national trend estimates can change when different procedures are applied to calculate standard errors.  相似文献   

6.
Various applications of item response theory often require linking to achieve a common scale for item parameter estimates obtained from different groups. This article used a simulation to examine the relative performance of four different item response theory (IRT) linking procedures in a random groups equating design: concurrent calibration with multiple groups, separate calibration with the Stocking-Lord method, separate calibration with the Haebara method, and proficiency transformation. The simulation conditions used in this article included three sampling designs, two levels of sample size, and two levels of the number of items. In general, the separate calibration procedures performed better than the concurrent calibration and proficiency transformation procedures, even though some inconsistent results were observed across different simulation conditions. Some advantages and disadvantages of the linking procedures are discussed.  相似文献   

7.
针对传统协同过滤推荐算法在大数据环境下存在数据稀疏性及计算复杂性等问题,提出一种双向聚类协同过滤推荐算法。该算法首先从用户维度和项目维度两个方向分别进行属性聚类,然后在目标用户和目标项目所在类簇中分别使用改进后的相似度计算方法进行协同过滤推荐,最后通过平衡因子综合预测评分并形成最终推荐列表。在 MovieLens 公开数据集上进行实验,结果表明,该算法(DCF)相比传统协同过滤推荐算法(TCF)、基于用户聚类的协同过滤推荐算法(UCF)以及基于项目聚类的协同过滤推荐算法(ICF),在平均绝对误差上分别降低了 16%、8.1%、7.5%,有效提高了推荐精度。  相似文献   

8.
Despite embracing a bio-psycho-social perspective, the World Health Organization’s International Classification of Functioning, Disability and Health (ICF) assessment framework has had limited application to date with children who have special educational needs (SEN). This study examines its utility for educational psychologists’ work with children who have Autism Spectrum Disorders (ASD). Mothers of 40 children with ASD aged eight to 12 years were interviewed using a structured protocol based on the ICF framework. The Diagnostic Interview for Social and Communication Disorder (DISCO) was completed with a subset of 19 mothers. Internal consistency and inter-rater reliability of the interview assessments were found to be acceptable and there was evidence for concurrent and discriminant validity. Despite some limitations, initial support for the utility of the ICF model suggests its potential value across educational, health and care fields. Further consideration of its relevance to educational psychologists in new areas of multi-agency working is warranted.  相似文献   

9.
Previous research indicates that relative fit indices in structural equation modeling may vary across estimation methods. Sugawara and MacCallum (1993) explained that the discrepancy arises from difference in the function values for the null model with no further derivation given. In this study, we derive explicit solutions for parameters of the null model. The null model specifies the variances of the observed variables as model parameters and fixes all the covariances to be zero. Three methods of estimation are considered: the maximum likelihood (ML) method, the ordinary least squares (OLS) method, and the generalized least squares (GLS) method. Results indicate that ML and LS yield an identical estimator, which is different from GLS. Function values and associated chi‐square statistics of the null model vary across estimation methods. Consequently, relative fit indices using the null model as the reference point in computation may yield different results depending on the estimation method chosen. An illustration example is given and implications of this study are discussed.  相似文献   

10.
回顾了基于CPTU测试成果确定粘土固结屈服应力和超固结比的方法,表明土的固结屈服应力和超固结比与CPTU测试参数之间不存在惟一的相关关系,仅仅针对某一地区的相关关系才是有效的.根据苏北里下河地区泻湖相沉积土的CPTU测试资料,对3种经验方法进行了比较.评价了固结屈服应力与CPTU测试参数现有经验关系的有效性,并且确定出适宜于里下河地区泻湖相沉积土的预测方法.结果表明:基于净锥尖阻力的相关关系比其他经验方法具有较高的精度,可以有效地预测该地区泻湖相沉积土的固结屈服应力和超固结比.  相似文献   

11.
12.
This study evaluated the significant contents and concepts of the Biopsychosocial Assessment Method (MAB) as they relate to the International Classification of Functioning, Disability, and Health (ICF) and the connection between the Geriatric Core Set (GCS) and the different issues of the MAB. We linked the 56 items of the MAB to ICF and GCS categories according to published rules. The most significant concepts included in the MAB enabled the connection of 83 items to the ICF's categories. It was possible to establish a connection with all the components of the ICF except the Body Structures component. Of the 123 categories in the GCS, about 30% did not establish connections with MAB items. The results of this study show that—much like the ICF—the MAB is a tool based on the biopsychosocial model, allowing for a comprehensive and integrated assessment of the different components of functioning. Now, the MAB is the most utilized tool for the evaluation of the geriatric population in Portugal. Thus, it is of the utmost importance that we analyze its results in order to enhance its capabilities. It can then contribute to the creation of a shortened Core Set by the World Health Organization (WHO).  相似文献   

13.
The error associated with a proposed linking method for tests consisting of both constructed response and multiple choice items was investigated in a simulation study. Study factors that were varied included the relative proportion of constructed response items in the test, the size of the year-to-year change in the ability metric, the number of anchor items, the number of linking papers to be reassessed, and the presence of guessing. The results supported the use of the proposed linking method, In addition, simulations were used to illustrate possible linking bias resulting from (a) the use of the traditional linking method and (b) the use of only multiple choice anchor items in the presence of test multidimensionality.  相似文献   

14.
This paper reports the results of a national two-year project, commissioned by the Portuguese Ministry of Education, to investigate the implementation of the International Classification of Functioning, Disability and Health (ICF) under Decree-Law 3/2008. The Decree-Law also introduced the principle that the documentation of students' functioning profiles should be the basis for eligibility decision-making – replacing the need of a diagnosis. Of specific interest was the study of the ICF implementation in the assessment, eligibility and intervention processes of students in need of specialised supports. To that end, the study was based on a document analysis of case records of 214 students. The analysis of functioning profiles showed that the ICF use promoted a functional approach in students' assessment. In addition, the use of the ICF contributed to the differentiation of eligible and non-eligible students based on their functioning profiles and addressed the most suitable educational interventions within the Individualised Education Plans.  相似文献   

15.
Propensity score (PS) adjustments have become popular methods used to improve estimates of treatment effects in quasi-experiments. Although researchers continue to develop PS methods, other procedures can also be effective in reducing selection bias. One of these uses clustering to create balanced groups. However, the success of this new method depends on its efficacy compared to that of the existing methods. Therefore, this comparative study used experimental and nonexperimental data to examine bias reduction, case retention, and covariate balance in the clustering method, PS subclassification, and PS weighting. In general, results suggest that the cluster-based methods reduced at least as much bias as the PS methods. Under certain conditions, the PS methods reduced more bias than the cluster-based method, and under other conditions the cluster-based methods were more advantageous. Although all methods were equally effective in retaining cases and balancing covariates, other data-specific conditions may likely favor the use of a cluster-based approach.  相似文献   

16.
Abstract

This study investigated the effectiveness of three different methods of presenting new words to children who are beginning to recognize words. Three groups of kindergarteners-twenty-five girls and boys each-were taught four words cither by a word, word-picture, or word-object method. Using analysis of variance procedures, no significant differences were found between the three methods of presentation for the kindergarten girls, but significant differences (p < .05 level) were found for the boys. A further analysis utilizing the Neuman-Keuls method revealed that the word-object method was significantly different from the word method at the .05 level. No differences were found when the word and word-picture methods, and the word-picture and word-object methods were compared.  相似文献   

17.
An Extension of Four IRT Linking Methods for Mixed-Format Tests   总被引:1,自引:0,他引:1  
Under item response theory (IRT), linking proficiency scales from separate calibrations of multiple forms of a test to achieve a common scale is required in many applications. Four IRT linking methods including the mean/mean, mean/sigma, Haebara, and Stocking-Lord methods have been presented for use with single-format tests. This study extends the four linking methods to a mixture of unidimensional IRT models for mixed-format tests. Each linking method extended is intended to handle mixed-format tests using any mixture of the following five IRT models: the three-parameter logistic, graded response, generalized partial credit, nominal response (NR), and multiple-choice (MC) models. A simulation study is conducted to investigate the performance of the four linking methods extended to mixed-format tests. Overall, the Haebara and Stocking-Lord methods yield more accurate linking results than the mean/mean and mean/sigma methods. When the NR model or the MC model is used to analyze data from mixed-format tests, limitations of the mean/mean, mean/sigma, and Stocking-Lord methods are described.  相似文献   

18.
Item parameter drift (IPD) occurs when item parameter values change from their original value over time. IPD may pose a serious threat to the fairness and validity of test score interpretations, especially when the goal of the assessment is to measure growth or improvement. In this study, we examined the effect of multidirectional IPD (i.e., some items become harder while other items become easier) on the linking procedure and rescaled proficiency estimates. The impact of different combinations of linking items with various multidirectional IPD on the test equating procedure was investigated for three scaling methods (mean-mean, mean-sigma, and TCC method) via a series of simulation studies. It was observed that multidirectional IPD had a substantive effect on examinees' scores and achievement level classifications under some of the studied conditions. Choice of linking method had a direct effect on the results, as did the pattern of IPD.  相似文献   

19.
In this study we compared five item selection procedures using three ability estimation methods in the context of a mixed-format adaptive test based on the generalized partial credit model. The item selection procedures used were maximum posterior weighted information, maximum expected information, maximum posterior weighted Kullback-Leibler information, and maximum expected posterior weighted Kullback-Leibler information procedures. The ability estimation methods investigated were maximum likelihood estimation (MLE), weighted likelihood estimation (WLE), and expected a posteriori (EAP). Results suggested that all item selection procedures, regardless of the information functions on which they were based, performed equally well across ability estimation methods. The principal conclusions drawn about the ability estimation methods are that MLE is a practical choice and WLE should be considered when there is a mismatch between pool information and the population ability distribution. EAP can serve as a viable alternative when an appropriate prior ability distribution is specified. Several implications of the findings for applied measurement are discussed.  相似文献   

20.
This study addressed the sampling error and linking bias that occur with small samples in a nonequivalent groups anchor test design. We proposed a linking method called the synthetic function, which is a weighted average of the identity function and a traditional equating function (in this case, the chained linear equating function). Specifically, we compared the synthetic, identity, and chained linear functions for various‐sized samples from two types of national assessments. One design used a highly reliable test and an external anchor, and the other used a relatively low‐reliability test and an internal anchor. The results from each of these methods were compared to the criterion equating function derived from the total samples with respect to linking bias and error. The study indicated that the synthetic functions might be a better choice than the chained linear equating method when samples are not large and, as a result, unrepresentative.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号