首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 797 毫秒
1.
We examine the accuracy of p values obtained using the asymptotic mean and variance (MV) correction to the distribution of the sample standardized root mean squared residual (SRMR) proposed by Maydeu-Olivares to assess the exact fit of SEM models. In a simulation study, we found that under normality, the MV-corrected SRMR statistic provides reasonably accurate Type I errors even in small samples and for large models, clearly outperforming the current standard, that is, the likelihood ratio (LR) test. When data shows excess kurtosis, MV-corrected SRMR p values are only accurate in small models (p = 10), or in medium-sized models (p = 30) if no skewness is present and sample sizes are at least 500. Overall, when data are not normal, the MV-corrected LR test seems to outperform the MV-corrected SRMR. We elaborate on these findings by showing that the asymptotic approximation to the mean of the SRMR sampling distribution is quite accurate, while the asymptotic approximation to the standard deviation is not.  相似文献   

2.
The analytically derived asymptotic standard errors (SEs) of maximum likelihood (ML) item estimates can be approximated by a mathematical function without examinees' responses to test items, and the empirically determined SEs of marginal maximum likelihood estimation (MMLE)/Bayesian item estimates can be obtained when the same set of items is repeatedly estimated from the simulation (or resampling) test data. The latter method will result in rather stable and accurate SE estimates as the number of replications increases, but requires cumbersome and time-consuming calculations. Instead of using the empirically determined method, the adequacy of using the analytical-based method in predicting the SEs for item parameter estimates was examined by comparing results produced from both approaches. The results indicated that the SEs yielded from both approaches were, in most cases, very similar, especially when they were applied to a generalized partial credit model. This finding encourages test practitioners and researchers to apply the analytically asymptotic SEs of item estimates to the context of item-linking studies, as well as to the method of quantifying the SEs of equating scores for the item response theory (IRT) true-score method. Three-dimensional graphical presentation for the analytical SEs of item estimates as the bivariate function of item difficulty together with item discrimination was also provided for a better understanding of several frequently used IRT models.  相似文献   

3.
A 2-stage robust procedure as well as an R package, rsem, were recently developed for structural equation modeling with nonnormal missing data by Yuan and Zhang (2012). Several test statistics that have been used for complete data analysis are employed to evaluate model fit in the 2-stage robust method. However, properties of these statistics under robust procedures for incomplete nonnormal data analysis have never been studied. This study aims to systematically evaluate and compare 5 test statistics, including a test statistic derived from normal-distribution-based maximum likelihood, a rescaled chi-square statistic, an adjusted chi-square statistic, a corrected residual-based asymptotical distribution-free chi-square statistic, and a residual-based F statistic. These statistics are evaluated under a linear growth curve model by varying 8 factors: population distribution, missing data mechanism, missing data rate, sample size, number of measurement occasions, covariance between the latent intercept and slope, variance of measurement errors, and downweighting rate of the 2-stage robust method. The performance of the test statistics varies and the one derived from the 2-stage normal-distribution-based maximum likelihood performs much worse than the other four. Application of the 2-stage robust method and of the test statistics is illustrated through growth curve analysis of mathematical ability development, using data on the Peabody Individual Achievement Test mathematics assessment from the National Longitudinal Survey of Youth 1997 Cohort.  相似文献   

4.
The purpose of this study was to investigate the power and Type I error rate of the likelihood ratio goodness-of-fit (LR) statistic in detecting differential item functioning (DIF) under Samejima's (1969, 1972) graded response model. A multiple-replication Monte Carlo study was utilized in which DIF was modeled in simulated data sets which were then calibrated with MULTILOG (Thissen, 1991) using hierarchically nested item response models. In addition, the power and Type I error rate of the Mantel (1963) approach for detecting DIF in ordered response categories were investigated using the same simulated data, for comparative purposes. The power of both the Mantel and LR procedures was affected by sample size, as expected. The LR procedure lacked the power to consistently detect DIF when it existed in reference/focal groups with sample sizes as small as 500/500. The Mantel procedure maintained control of its Type I error rate and was more powerful than the LR procedure when the comparison group ability distributions were identical and there was a constant DIF pattern. On the other hand, the Mantel procedure lost control of its Type I error rate, whereas the LR procedure did not, when the comparison groups differed in mean ability; and the LR procedure demonstrated a profound power advantage over the Mantel procedure under conditions of balanced DIF in which the comparison group ability distributions were identical. The choice and subsequent use of any procedure requires a thorough understanding of the power and Type I error rates of the procedure under varying conditions of DIF pattern, comparison group ability distributions.–or as a surrogate, observed score distributions–and item characteristics.  相似文献   

5.
Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and 2 well-known robust test statistics. A modification to the Satorra–Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the 4 test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies 7 sample sizes and 3 distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ2 test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra–Bentler scaled test statistic performed best overall, whereas the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.  相似文献   

6.
A well-known ad-hoc approach to conducting structural equation modeling with missing data is to obtain a saturated maximum likelihood (ML) estimate of the population covariance matrix and then to use this estimate in the complete data ML fitting function to obtain parameter estimates. This 2-stage (TS) approach is appealing because it minimizes a familiar function while being only marginally less efficient than the full information ML (FIML) approach. Additional advantages of the TS approach include that it allows for easy incorporation of auxiliary variables and that it is more stable in smaller samples. The main disadvantage is that the standard errors and test statistics provided by the complete data routine will not be correct. Empirical approaches to finding the right corrections for the TS approach have failed to provide unequivocal solutions. In this article, correct standard errors and test statistics for the TS approach with missing completely at random and missing at random normally distributed data are developed and studied. The new TS approach performs well in all conditions, is only marginally less efficient than the FIML approach (and is sometimes more efficient), and has good coverage. Additionally, the residual-based TS statistic outperforms the FIML test statistic in smaller samples. The TS method is thus a viable alternative to FIML, especially in small samples, and its further study is encouraged.  相似文献   

7.
In this study we compared five item selection procedures using three ability estimation methods in the context of a mixed-format adaptive test based on the generalized partial credit model. The item selection procedures used were maximum posterior weighted information, maximum expected information, maximum posterior weighted Kullback-Leibler information, and maximum expected posterior weighted Kullback-Leibler information procedures. The ability estimation methods investigated were maximum likelihood estimation (MLE), weighted likelihood estimation (WLE), and expected a posteriori (EAP). Results suggested that all item selection procedures, regardless of the information functions on which they were based, performed equally well across ability estimation methods. The principal conclusions drawn about the ability estimation methods are that MLE is a practical choice and WLE should be considered when there is a mismatch between pool information and the population ability distribution. EAP can serve as a viable alternative when an appropriate prior ability distribution is specified. Several implications of the findings for applied measurement are discussed.  相似文献   

8.
This article proposes 2 classes of ridge generalized least squares (GLS) procedures for structural equation modeling (SEM) with unknown population distributions. The weight matrix for the first class of ridge GLS is obtained by combining the sample fourth-order moment matrix with the identity matrix. The weight matrix for the second class is obtained by combining the sample fourth-order moment matrix with its diagonal matrix. Empirical results indicate that, with data from an unknown population distribution, parameter estimates by ridge GLS can be much more accurate than those by either GLS or normal-distribution-based maximum likelihood; and standard errors of the parameter estimates also become more accurate in predicting the empirical ones. Rescaled and adjusted statistics are proposed for overall model evaluation, and they also perform much better than the default statistic following from the GLS method. The use of the ridge GLS procedures is illustrated with a real data set.  相似文献   

9.
Contemporary educational accountability systems, including state‐level systems prescribed under No Child Left Behind as well as those envisioned under the “Race to the Top” comprehensive assessment competition, rely on school‐level summaries of student test scores. The precision of these score summaries is almost always evaluated using models that ignore the classroom‐level clustering of students within schools. This paper reports balanced and unbalanced generalizability analyses investigating the consequences of ignoring variation at the level of classrooms within schools when analyzing the reliability of such school‐level accountability measures. Results show that the reliability of school means cannot be determined accurately when classroom‐level effects are ignored. Failure to take between‐classroom variance into account biases generalizability (G) coefficient estimates downward and standard errors (SEs) upward if classroom‐level effects are regarded as fixed, and biases G‐coefficient estimates upward and SEs downward if they are regarded as random. These biases become more severe as the difference between the school‐level intraclass correlation (ICC) and the class‐level ICC increases. School‐accountability systems should be designed so that classroom (or teacher) level variation can be taken into consideration when quantifying the precision of school rankings, and statistical models for school mean score reliability should incorporate this information.  相似文献   

10.
When the multivariate normality assumption is violated in structural equation modeling, a leading remedy involves estimation via normal theory maximum likelihood with robust corrections to standard errors. We propose that this approach might not be best for forming confidence intervals for quantities with sampling distributions that are slow to approach normality, or for functions of model parameters. We implement and study a robust analog to likelihood-based confidence intervals based on inverting the robust chi-square difference test of Satorra (2000). We compare robust standard errors and the robust likelihood-based approach versus resampling methods in confirmatory factor analysis (Studies 1 & 2) and mediation analysis models (Study 3) for both single parameters and functions of model parameters, and under a variety of nonnormal data generation conditions. The percentile bootstrap emerged as the method with the best calibrated coverage rates and should be preferred if resampling is possible, followed by the robust likelihood-based approach.  相似文献   

11.
The accuracy of structural model parameter estimates in latent variable mixture modeling was explored with a 3 (sample size) × 3 (exogenous latent mean difference) × 3 (endogenous latent mean difference) × 3 (correlation between factors) × 3 (mixture proportions) factorial design. In addition, the efficacy of several likelihood-based statistics (Akaike's Information Criterion [AIC], Bayesian Information Ctriterion [BIC], the sample-size adjusted BIC [ssBIC], the consistent AIC [CAIC], the Vuong-Lo-Mendell-Rubin adjusted likelihood ratio test [aVLMR]), classification-based statistics (CLC [classification likelihood information criterion], ICL-BIC [integrated classification likelihood], normalized entropy criterion [NEC], entropy), and distributional statistics (multivariate skew and kurtosis test) were examined to determine which statistics best recover the correct number of components. Results indicate that the structural parameters were recovered, but the model fit statistics were not exceedingly accurate. The ssBIC statistic was the most accurate statistic, and the CLC, ICL-BIC, and aVLMR showed limited utility. However, none of these statistics were accurate for small samples (n = 500).  相似文献   

12.
The precision of estimates in many statistical models can be expressed by a confidence interval (CI). CIs based on standard errors (SEs) are common in practice, but likelihood-based CIs are worth consideration. In comparison to SEs, likelihood-based CIs are typically more difficult to estimate, but are more robust to model (re)parameterization. In latent variable models, some parameters might take on values outside of their interpretable range. Therefore, it is desirable to place a bound to keep the parameter interpretable. For likelihood-based CI, a correction is needed when a parameter is bounded. The correction is known (Wu & Neale, 2012), but is difficult to implement in practice. A novel automatic implementation that is simple for an applied researcher to use is introduced. A simulation study demonstrates the accuracy of the correction using a latent growth curve model and the method is illustrated with a multilevel confirmatory factor analysis.  相似文献   

13.
Simulations of computerized adaptive tests (CATs) were used to evaluate results yielded by four commonly used ability estimation methods: maximum likelihood estimation (MLE) and three Bayesian approaches—Owen's method, expected a posteriori (EAP), and maximum a posteriori. In line with the theoretical nature of the ability estimates and previous empirical research, the results showed clear distinctions between MLE and the Bayesian methods, with MLE yielding lower bias, higher standard errors, higher root mean square errors, lower fidelity, and lower administrative efficiency. Standard errors for MLE based on test information underestimated actual standard errors, whereas standard errors for the Bayesian methods based on posterior distribution standard deviations accurately estimated actual standard errors. Among the Bayesian methods, Owen's provided the worst overall results, and EAP provided the best. Using a variable starting rule in which examinees were initially classified into three broad/ability groups greatly reduced the bias for the Bayesian methods, but had little effect on the results for MLE. On the basis of these results, guidelines are offered for selecting appropriate CAT ability estimation methods in different decision contexts.  相似文献   

14.
In the logistic regression (LR) procedure for differential item functioning (DIF), the parameters of LR have often been estimated using maximum likelihood (ML) estimation. However, ML estimation suffers from the finite-sample bias. Furthermore, ML estimation for LR can be substantially biased in the presence of rare event data. The bias of ML estimation due to small samples and rare event data can degrade the performance of the LR procedure, especially when testing the DIF of difficult items in small samples. Penalized ML (PML) estimation was originally developed to reduce the finite-sample bias of conventional ML estimation and also was known to reduce the bias in the estimation of LR for the rare events data. The goal of this study is to compare the performances of the LR procedures based on the ML and PML estimation in terms of the statistical power and Type I error. In a simulation study, Swaminathan and Rogers's Wald test based on PML estimation (PSR) showed the highest statistical power in most of the simulation conditions, and LRT based on conventional PML estimation (PLRT) showed the most robust and stable Type I error. The discussion about the trade-off between bias and variance is presented in the discussion section.  相似文献   

15.
In practice, models always have misfit, and it is not well known in what situations methods that provide point estimates, standard errors (SEs), or confidence intervals (CIs) of standardized structural equation modeling (SEM) parameters are trustworthy. In this article we carried out simulations to evaluate the empirical performance of currently available methods. We studied maximum likelihood point estimates, as well as SE estimators based on the delta method, nonparametric bootstrap (NP-B), and semiparametric bootstrap (SP-B). For CIs we studied Wald CI based on delta, and percentile and BCa intervals based on NP-B and SP-B. We conducted simulation studies using both confirmatory factor analysis and SEM models. Depending on (a) whether point estimate, SE, or CI is of interest; (b) amount of model misfit; (c) sample size; and (d) model complexity, different methods can be the one that renders best performance. Based on the simulation results, we discuss how to choose proper methods in practice.  相似文献   

16.
The problem of testing the normal covariance matrix equal to a specified matrix is considered.A new Chi-Square test statistic is derived for multivariate normal population.Unlike the likelihood ratio test,the new test is an exact one.  相似文献   

17.
This study examined and compared various statistical methods for detecting individual differences in change. Considering 3 issues including test forms (specific vs. generalized), estimation procedures (constrained vs. unconstrained), and nonnormality, we evaluated 4 variance tests including the specific Wald variance test, the generalized Wald variance test, the specific likelihood ratio (LR) variance test, and the generalized LR variance test under both constrained and unconstrained estimation for both normal and nonnormal data. For the constrained estimation procedure, both the mixture distribution approach and the alpha correction approach were evaluated for their performance in dealing with the boundary problem. To deal with the nonnormality issue, we used the sandwich standard error (SE) estimator for the Wald tests and the Satorra–Bentler scaling correction for the LR tests. Simulation results revealed that testing a variance parameter and the associated covariances (generalized) had higher power than testing the variance solely (specific), unless the true covariances were zero. In addition, the variance tests under constrained estimation outperformed those under unconstrained estimation in terms of higher empirical power and better control of Type I error rates. Among all the studied tests, for both normal and nonnormal data, the robust generalized LR and Wald variance tests with the constrained estimation procedure were generally more powerful and had better Type I error rates for testing variance components than the other tests. Results from the comparisons between specific and generalized variance tests and between constrained and unconstrained estimation were discussed.  相似文献   

18.
The asymptotic performance of structural equation modeling tests and standard errors are influenced by two factors: the model and the asymptotic covariance matrix Γ of the sample covariances. Although most simulation studies clearly specify model conditions, specification of Γ is usually limited to values of univariate skewness and kurtosis. We illustrate that marginal skewness and kurtosis are not sufficient to adequately specify a nonnormal simulation condition by showing that asymptotic standard errors and test statistics vary substantially among distributions with skewness and kurtosis that are identical. We argue therefore that Γ should be reported when presenting the design of simulation studies. We show how Γ can be exactly calculated under the widely used Vale–Maurelli transform. We suggest plotting the elements of Γ and reporting the eigenvalues associated with the test statistic. R code is provided.  相似文献   

19.
Ordinal variables are common in many empirical investigations in the social and behavioral sciences. Researchers often apply the maximum likelihood method to fit structural equation models to ordinal data. This assumes that the observed measures have normal distributions, which is not the case when the variables are ordinal. A better approach is to use polychoric correlations and fit the models using methods such as unweighted least squares (ULS), maximum likelihood (ML), weighted least squares (WLS), or diagonally weighted least squares (DWLS). In this simulation evaluation we study the behavior of these methods in combination with polychoric correlations when the models are misspecified. We also study the effect of model size and number of categories on the parameter estimates, their standard errors, and the common chi-square measures of fit when the models are both correct and misspecified. When used routinely, these methods give consistent parameter estimates but ULS, ML, and DWLS give incorrect standard errors. Correct standard errors can be obtained for these methods by robustification using an estimate of the asymptotic covariance matrix W of the polychoric correlations. When used in this way the methods are here called RULS, RML, and RDWLS.  相似文献   

20.
Fitting a large structural equation modeling (SEM) model with moderate to small sample sizes results in an inflated Type I error rate for the likelihood ratio test statistic under the chi-square reference distribution, known as the model size effect. In this article, we show that the number of observed variables (p) and the number of free parameters (q) have unique effects on the Type I error rate of the likelihood ratio test statistic. In addition, the effects of p and q cannot be fully explained using degrees of freedom (df). We also evaluated the performance of 4 correctional methods for the model size effect, including Bartlett’s (1950), Swain’s (1975), and Yuan’s (2005) corrected statistics, and Yuan, Tian, and Yanagihara’s (2015) empirically corrected statistic. We found that Yuan et al.’s (2015) empirically corrected statistic generally yields the best performance in controlling the Type I error rate when fitting large SEM models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号