期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Logistic Regression Procedure Using Penalized Maximum Likelihood Estimation for Differential Item Functioning

Sunbok Lee 《Journal of Educational Measurement》2020,57(3):443-457

In the logistic regression (LR) procedure for differential item functioning (DIF), the parameters of LR have often been estimated using maximum likelihood (ML) estimation. However, ML estimation suffers from the finite-sample bias. Furthermore, ML estimation for LR can be substantially biased in the presence of rare event data. The bias of ML estimation due to small samples and rare event data can degrade the performance of the LR procedure, especially when testing the DIF of difficult items in small samples. Penalized ML (PML) estimation was originally developed to reduce the finite-sample bias of conventional ML estimation and also was known to reduce the bias in the estimation of LR for the rare events data. The goal of this study is to compare the performances of the LR procedures based on the ML and PML estimation in terms of the statistical power and Type I error. In a simulation study, Swaminathan and Rogers's Wald test based on PML estimation (PSR) showed the highest statistical power in most of the simulation conditions, and LRT based on conventional PML estimation (PLRT) showed the most robust and stable Type I error. The discussion about the trade-off between bias and variance is presented in the discussion section. 相似文献

2.

Using Johnson's Transformation With Approximate Test Statistics for the Simple Regression Slope Homogeneity

Wei-Ming Luh Jiin-Huarng Guo 《Journal of Experimental Education》2013,81(1):69-81

The authors used Johnson's transformation with approximate test statistics to test the homogeneity of simple linear regression slopes when both x_ij and x_ij may have nonnormal distributions and there is Type I heteroscedasticity, Type II heteroscedasticity, or complete heteroscedasticity. The test statistic t was first transformed by Johnson's method for each group to correct the nonnormality and to correct the heteroscedasticity; also an approximate test, such as the Welch test or the DeShon-Alexander test, was applied to test the homogeneity of the regression slopes. Computer simulations showed that the proposed technique can control Type I error rate under various circumstances. Finally, the authors provide an example to demonstrate the calculation. 相似文献

3.

Maximum Likelihood Estimation of Structural Equation Models for Continuous Data: Standard Errors and Goodness of Fit

Alberto Maydeu-Olivares 《Structural equation modeling》2017,24(3):383-394

Classical accounts of maximum likelihood (ML) estimation of structural equation models for continuous outcomes involve normality assumptions: standard errors (SEs) are obtained using the expected information matrix and the goodness of fit of the model is tested using the likelihood ratio (LR) statistic. Satorra and Bentler (1994) introduced SEs and mean adjustments or mean and variance adjustments to the LR statistic (involving also the expected information matrix) that are robust to nonnormality. However, in recent years, SEs obtained using the observed information matrix and alternative test statistics have become available. We investigate what choice of SE and test statistic yields better results using an extensive simulation study. We found that robust SEs computed using the expected information matrix coupled with a mean- and variance-adjusted LR test statistic (i.e., MLMV) is the optimal choice, even with normally distributed data, as it yielded the best combination of accurate SEs and Type I errors. 相似文献

4.

Lord's Wald Test for Detecting DIF in Multidimensional IRT Models: A Comparison of Two Estimation Approaches

下载免费PDF全文

Soo Lee Youngsuk Suh 《Journal of Educational Measurement》2018,55(2):328-353

Lord's Wald test for differential item functioning (DIF) has not been studied extensively in the context of the multidimensional item response theory (MIRT) framework. In this article, Lord's Wald test was implemented using two estimation approaches, marginal maximum likelihood estimation and Bayesian Markov chain Monte Carlo estimation, to detect uniform and nonuniform DIF under MIRT models. The Type I error and power rates for Lord's Wald test were investigated under various simulation conditions, including different DIF types and magnitudes, different means and correlations of two ability parameters, and different sample sizes. Furthermore, English usage data were analyzed to illustrate the use of Lord's Wald test with the two estimation approaches. 相似文献

5.

Performance of Bootstrapping Approaches to Model Test Statistics and Parameter Standard Error Estimation in Structural Equation Modeling

《Structural equation modeling》2013,20(3):353-377

Though the common default maximum likelihood estimator used in structural equation modeling is predicated on the assumption of multivariate normality, applied researchers often find themselves with data clearly violating this assumption and without sufficient sample size to utilize distribution-free estimation methods. Fortunately, promising alternatives are being integrated into popular software packages. Bootstrap resampling, which is offered in AMOS (Arbuckle, 1997), is one potential solution for estimating model test statistic p values and parameter standard errors under nonnormal data conditions. This study is an evaluation of the bootstrap method under varied conditions of nonnormality, sample size, model specification, and number of bootstrap samples drawn from the resampling space. Accuracy of the test statistic p values is evaluated in terms of model rejection rates, whereas accuracy of bootstrap standard error estimates takes the form of bias and variability of the standard error estimates themselves. 相似文献

6.

Is Parceling Really Necessary? A Comparison of Results From Item Parceling and Categorical Variable Methodology

Deborah L. Bandalos 《Structural equation modeling》2013,20(2):211-240

This study examined the efficacy of 4 different parceling methods for modeling categorical data with 2, 3, and 4 categories and with normal, moderately nonnormal, and severely nonnormal distributions. The parceling methods investigated were isolated parceling in which items were parceled with other items sharing the same source of variance, and distributed parceling in which items were parceled with items influenced by different factors. These parceling strategies were crossed with strategies in which items were either parceled with similarly distributed or differently distributed items, to create 4 different parceling methods. Overall, parceling together items influenced by different factors and with different distributions resulted in better model fit, but high levels of parameter estimate bias. Across all parceling methods, parameter estimate bias ranged from 20% to over 130%. Parceling strategies were contrasted with use of the WLSMV estimator for categorical, unparceled data. Results based on this estimator are encouraging, although some bias was found when high levels of nonnormality were present. Values of the chi-square and root mean squared error of approximation based on WLSMV also resulted in Type II error rates for misspecified models when data were severely nonnormally distributed. 相似文献

7.

Evaluating the Power of Latent Growth Curve Models to Detect Individual Differences in Change

Christopher Hertzog Timo von Oertzen Paolo Ghisletta Ulman Lindenberger 《Structural equation modeling》2013,20(4):541-563

We evaluated the statistical power of single-indicator latent growth curve models to detect individual differences in change (variances of latent slopes) as a function of sample size, number of longitudinal measurement occasions, and growth curve reliability. We recommend the 2 degree-of-freedom generalized test assessing loss of fit when both slope-related random effects, the slope variance and intercept-slope covariance, are fixed to 0. Statistical power to detect individual differences in change is low to moderate unless the residual error variance is low, sample size is large, and there are more than four measurement occasions. The generalized test has greater power than a specific test isolating the hypothesis of zero slope variance, except when the true slope variance is close to 0, and has uniformly superior power to a Wald test based on the estimated slope variance. 相似文献

8.

Evaluation of a New Mean Scaled and Moment Adjusted Test Statistic for SEM

Xiaoxiao Tong Peter M. Bentler 《Structural equation modeling》2013,20(1):148-156

Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test statistics. A modification to the Satorra–Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the 4 test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies 7 sample sizes and 3 distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ² test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra–Bentler scaled test statistic performed best overall, whereas the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions. 相似文献

9.

Effects of sample size and nonnormality on the estimation of mediated effects in latent variable models

John F. Finch Stephen G. West David P. MacKinnon 《Structural equation modeling》2013,20(2):87-107

A Monte Carlo approach was used to examine bias in the estimation of indirect effects and their associated standard errors. In the simulation design, (a) sample size, (b) the level of nonnormality characterizing the data, (c) the population values of the model parameters, and (d) the type of estimator were systematically varied. Estimates of model parameters were generally unaffected by either nonnormality or small sample size. Under severely nonnormal conditions, normal theory maximum likelihood estimates of the standard error of the mediated effect exhibited less bias (approximately 10% to 20% too small) compared to the standard errors of the structural regression coefficients (20% to 45% too small). Asymptotically distribution free standard errors of both the mediated effect and the structural parameters were substantially affected by sample size, but not nonnormality. Robust standard errors consistently yielded the most accurate estimates of sampling variability. 相似文献

10.

Comparing Methods of Assessing Differential Item Functioning in a Computerized Adaptive Testing Environment

Pui-Wa Lei Shu-Ying Chen Lan Yu 《Journal of Educational Measurement》2006,43(3):245-264

Mantel-Haenszel and SIBTEST, which have known difficulty in detecting non-unidirectional differential item functioning (DIF), have been adapted with some success for computerized adaptive testing (CAT). This study adapts logistic regression (LR) and the item-response-theory-likelihood-ratio test (IRT-LRT), capable of detecting both unidirectional and non-unidirectional DIF, to the CAT environment in which pretest items are assumed to be seeded in CATs but not used for trait estimation. The proposed adaptation methods were evaluated with simulated data under different sample size ratios and impact conditions in terms of Type I error, power, and specificity in identifying the form of DIF. The adapted LR and IRT-LRT procedures are more powerful than the CAT version of SIBTEST for non-unidirectional DIF detection. The good Type I error control provided by IRT-LRT under extremely unequal sample sizes and large impact is encouraging. Implications of these and other findings are discussed. 相似文献

11.

A Comparison of Diagonal Weighted Least Squares Robust Estimation Techniques for Ordinal Data

Christine DiStefano Grant B. Morgan 《Structural equation modeling》2013,20(3):425-438

This study compared diagonal weighted least squares robust estimation techniques available in 2 popular statistical programs: diagonal weighted least squares (DWLS; LISREL version 8.80) and weighted least squares–mean (WLSM) and weighted least squares—mean and variance adjusted (WLSMV; Mplus version 6.11). A 20-item confirmatory factor analysis was estimated using item-level ordered categorical data. Three different nonnormality conditions were applied to 2- to 7-category data with sample sizes of 200, 400, and 800. Convergence problems were seen with nonnormal data when DWLS was used with few categories. Both DWLS and WLSMV produced accurate parameter estimates; however, bias in standard errors of parameter estimates was extreme for select conditions when nonnormal data were present. The robust estimators generally reported acceptable model–data fit, unless few categories were used with nonnormal data at smaller sample sizes; WLSMV yielded better fit than WLSM for most indices. 相似文献

12.

Effects of Missing Data Methods in Structural Equation Modeling With Nonnormal Longitudinal Data

Tacksoo Shin Mark L. Davison Jeffrey D. Long 《Structural equation modeling》2013,20(1):70-98

The purpose of this study is to investigate the effects of missing data techniques in longitudinal studies under diverse conditions. A Monte Carlo simulation examined the performance of 3 missing data methods in latent growth modeling: listwise deletion (LD), maximum likelihood estimation using the expectation and maximization algorithm with a nonnormality correction (robust ML), and the pairwise asymptotically distribution-free method (pairwise ADF). The effects of 3 independent variables (sample size, missing data mechanism, and distribution shape) were investigated on convergence rate, parameter and standard error estimation, and model fit. The results favored robust ML over LD and pairwise ADF in almost all respects. The exceptions included convergence rates under the most severe nonnormality in the missing not at random (MNAR) condition and recovery of standard error estimates across sample sizes. The results also indicate that nonnormality, small sample size, MNAR, and multicollinearity might adversely affect convergence rate and the validity of statistical inferences concerning parameter estimates and model fit statistics. 相似文献

13.

Mediated Effects with the Parallel Process Latent Growth Model: An Evaluation of Methods for Testing Mediation in the Presence of Nonnormal Data

Namwook Koo James Algina 《Structural equation modeling》2016,23(1):32-44

This Monte Carlo simulation study investigated the impact of nonnormality on estimating and testing mediated effects with the parallel process latent growth model and 3 popular methods for testing the mediated effect (i.e., Sobel’s test, the asymmetric confidence limits, and the bias-corrected bootstrap). It was found that nonnormality had little effect on the estimates of the mediated effect, standard errors, empirical Type I error, and power rates in most conditions. In terms of empirical Type I error and power rates, the bias-corrected bootstrap performed best. Sobel’s test produced very conservative Type I error rates when the estimated mediated effect and standard error had a relationship, but when the relationship was weak or did not exist, the Type I error was closer to the nominal .05 value. 相似文献

14.

A Study Of The Relationship Between Grade And Age And Variability

Fei Tsao 《Journal of Experimental Education》2013,81(3):187-200

Power and stability of Type I error rates are investigated for the Box-Scheffé test of homogeneity of variance with varying subsample sizes under conditions of normality and nonnormality. The test is shown to be robust to violation of the normality assumption when sampling is from a leptokurtic population. Subsample sizes which produce maximum power are given for small, intermediate, and large sample situations. Suggestions for selecting subsample sizes which will produce maximum power for a given n are provided. A formula for estimating power in the equal n case is shown to give results agreeing with empirical results. 相似文献

15.

Evaluating the Wald Test for Item‐Level Comparison of Saturated and Reduced Models in Cognitive Diagnosis

Jimmy de la Torre Young‐Sun Lee 《Journal of Educational Measurement》2013,50(4):355-373

This article used the Wald test to evaluate the item‐level fit of a saturated cognitive diagnosis model (CDM) relative to the fits of the reduced models it subsumes. A simulation study was carried out to examine the Type I error and power of the Wald test in the context of the G‐DINA model. Results show that when the sample size is small and a larger number of attributes are required, the Type I error rate of the Wald test for the DINA and DINO models can be higher than the nominal significance levels, while the Type I error rate of the A‐CDM is closer to the nominal significance levels. However, with larger sample sizes, the Type I error rates for the three models are closer to the nominal significance levels. In addition, the Wald test has excellent statistical power to detect when the true underlying model is none of the reduced models examined even for relatively small sample sizes. The performance of the Wald test was also examined with real data. With an increasing number of CDMs from which to choose, this article provides an important contribution toward advancing the use of CDMs in practical educational settings. 相似文献

16.

An Investigation of the Power of the Likelihood Ratio Goodness-of-Fit Statistic in Detecting Differential Item Functioning

Robert D. Ankenmann Elizabeth A. Witt Stephen B. Dunbar 《Journal of Educational Measurement》1999,36(4):277-300

The purpose of this study was to investigate the power and Type I error rate of the likelihood ratio goodness-of-fit (LR) statistic in detecting differential item functioning (DIF) under Samejima's (1969, 1972) graded response model. A multiple-replication Monte Carlo study was utilized in which DIF was modeled in simulated data sets which were then calibrated with MULTILOG (Thissen, 1991) using hierarchically nested item response models. In addition, the power and Type I error rate of the Mantel (1963) approach for detecting DIF in ordered response categories were investigated using the same simulated data, for comparative purposes. The power of both the Mantel and LR procedures was affected by sample size, as expected. The LR procedure lacked the power to consistently detect DIF when it existed in reference/focal groups with sample sizes as small as 500/500. The Mantel procedure maintained control of its Type I error rate and was more powerful than the LR procedure when the comparison group ability distributions were identical and there was a constant DIF pattern. On the other hand, the Mantel procedure lost control of its Type I error rate, whereas the LR procedure did not, when the comparison groups differed in mean ability; and the LR procedure demonstrated a profound power advantage over the Mantel procedure under conditions of balanced DIF in which the comparison group ability distributions were identical. The choice and subsequent use of any procedure requires a thorough understanding of the power and Type I error rates of the procedure under varying conditions of DIF pattern, comparison group ability distributions.–or as a surrogate, observed score distributions–and item characteristics. 相似文献

17.

The Place of the Fable in the Character Training of Children

Sadie Goldsmith 《Journal of Experimental Education》2013,81(4):343-345

Type I error rate and power for the t test, Wilcoxon-Mann-Whitney (U) test, van der Waerden Normal Scores (NS) test, and Welch-Aspin-Satterthwaite (W) test were compared for two independent random samples drawn from nonnormal distributions. Data with varying degrees of skewness (S) and kurtosis (K) were generated using Fleishman's (1978) power function. Five sample size combinations were used with both equal and unequal variances. For nonnormal data with equal variances, the power of the U test exceeded the power of the t test regardless of sample size. When the sample sizes were equal but the variances were unequal, the t test proved to be the most powerful test. When variances and sample sizes were unequal, the W test became the test of choice because it was the only test that maintained its nominal Type I error rate. 相似文献

18.

Behavior of descriptive fit indexes in confirmatory factor analysis using ordered categorical data

Susan R. Hutchinson Antonio Olmos 《Structural equation modeling》2013,20(4):344-364

The purpose of this study was to examine the behavior of 8 measures of fit used to evaluate confirmatory factor analysis models. This study employed Monte Carlo simulation to determine to what extent sample size, model size, estimation procedure, and level of nonnormality affected fit when polytomous data were analyzed. The 3 indexes least affected by the design conditions were the comparative fit index, incremental fit index, and nonnormed fit index, which were affected only by level of nonnormality. The measure of centrality was most affected by the design variables, with values of n2>. 10 for sample size, model size, and level of nonnormality and interaction effects for Model Size x Level of Nonnormality and Estimation x Level of Nonnormality. Findings from this study should alert applied researchers to exercise caution when evaluating model fit with nonnormal, polytomous data. 相似文献

19.

Using Monte Carlo Normal Distributions to Evaluate Structural Models With Nonnormal Data

Siavash Jalal Peter M. Bentler 《Structural equation modeling》2018,25(4):541-557

Statistical theories of goodness-of-fit tests in structural equation modeling are based on asymptotic distributions of test statistics. When the model includes a large number of variables or the population is not from a multivariate normal distribution, the asymptotic distributions do not approximate the distribution of the test statistics very well at small sample sizes. A variety of methods have been developed to improve the accuracy of hypothesis testing at small sample sizes. However, all these methods have their limitations, specially for nonnormal distributed data. We propose a Monte Carlo test that is able to control Type I error with more accuracy compared to existing approaches in both normal and nonnormally distributed data at small sample sizes. Extensive simulation studies show that the suggested Monte Carlo test has a more accurate observed significance level as compared to other tests with a reasonable power to reject misspecified models. 相似文献

20.

Performance of Modified Test Statistics in Covariance and Correlation Structure Analysis Under Conditions of Multivariate Nonnormality

《Structural equation modeling》2013,20(3):356-410

Questions of whether hypothesized structure models are appropriate representations of the pattern of association among a group of variables can be addressed using a wide variety of statistical procedures. These procedures include covariance structure analysis techniques and correlation structure analysis techniques, in which covariance structure procedures are based on distribution theory for covariances, and correlation structure procedures are based on distribution theory for correlations. The present article provides an overview of standard and modified normal theory and asymptotically distribution-free covariance and correlation structure analysis techniques and also details Monte Carlo simulation results on the Type I and Type II error control as a function of structure model type, number of variables in the model, sample size, and distributional nonnormality. The present Monte Carlo simulation demonstrates clearly that the robustness and nonrobustness of structure analysis techniques vary as a function of the structure of the model and the data conditions. Implications of these results for users of structure analysis techniques are considered in the context of current software availability. 相似文献