期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Reliability,Dimensionality, and Internal Consistency as Defined by Cronbach: Distinct Albeit Related Concepts

Ernest C. Davenport Mark L. Davison Pey‐Yan Liou Quintin U. Love 《Educational Measurement》2015,34(4):4-9

This article uses definitions provided by Cronbach in his seminal paper for coefficient α to show the concepts of reliability, dimensionality, and internal consistency are distinct but interrelated. The article begins with a critique of the definition of reliability and then explores mathematical properties of Cronbach's α. Internal consistency and dimensionality are then discussed as defined by Cronbach. Next, functional relationships are given that relate reliability, internal consistency, and dimensionality. The article ends with a demonstration of the utility of these concepts as defined. It is recommended that reliability, internal consistency, and dimensionality each be quantified with separate indices, but that their interrelatedness be recognized. High levels of unidimensionality and internal consistency are not necessary for reliability as measured by α nor, more importantly, for interpretability of test scores. 相似文献

2.

Coefficient alpha: A basic introduction from the perspectives of classical test theory and structural equation modeling

Michael B. Miller 《Structural equation modeling》2013,20(3):255-273

This article is a pedagogical piece on coefficient alpha (α) and its uses. The classical approach to test reliability is explained. Test‐retest, alternative‐forms, and internal‐consistency methods of approximating test reliability are described, equations are derived for each method, and α is shown to be a lower‐bound internal‐consistency approximation to test reliability. Emphasis is placed on the effects of violations of model assumptions on reliability estimation. The classical models are conceptualized as structural equation models and are displayed in path diagrams. Special emphasis is placed on the failure of α to meet certain basic criteria as an index of test homogeneity. 相似文献

3.

Easier Said Than Done: Rejoinder on Sijtsma and on Green and Yang

Ernest C. Davenport Mark L. Davison Pey‐Yan Liou Quintin U. Love 《Educational Measurement》2016,35(1):6-10

The main points of Sijtsma and Green and Yang in Educational Measurement: Issues and Practice (34, 4) are that reliability, internal consistency, and unidimensionality are distinct and that Cronbach's alpha may be problematic. Neither of these assertions are at odds with Davenport, Davison, Liou, and Love in the same issue. However, many authors in the testing community mention these terms not only together, but sometimes as if they are synonymous. Moreover, Cronbach's coefficient alpha is very popular as an index of reliability. Thus, articles discussing alpha are not only appropriate, but necessary. Our concerns are the same as formed the genesis of prior (2009) articles by these same authors, Sijtsma and Green and Yang. This rejoinder also makes comments about item parcels when tests are multidimensional and about factor analytic approaches to assessing reliability. 相似文献

4.

Evaluation of Dimensionality in the Assessment of Internal Consistency Reliability: Coefficient Alpha and Omega Coefficients

下载免费PDF全文

Samuel B. Green Yanyun Yang 《Educational Measurement》2015,34(4):14-20

In the lead article, Davenport, Davison, Liou, & Love demonstrate the relationship among homogeneity, internal consistency, and coefficient alpha, and also distinguish among them. These distinctions are important because too often coefficient alpha—a reliability coefficient—is interpreted as an index of homogeneity or internal consistency. We argue that factor analysis should be conducted before calculating internal consistency estimates of reliability. If factor analysis indicates the assumptions underlying coefficient alpha are met, then it can be reported as a reliability coefficient. However, to the extent that items are multidimensional, alternative internal consistency reliability coefficients should be computed based on the parameter estimates of the factor model. Assuming a bifactor model evidenced good fit, and the measure was designed to assess a single construct, omega hierarchical—the proportion of variance of the total scores due to the general factor—should be presented. Omega—the proportion of variance of the total scores due to all factors—also should be reported in that it represents a more traditional view of reliability, although it is computed within a factor analytic framework. By presenting both these coefficients and potentially other omega coefficients, the reliability results are less likely to be misinterpreted. 相似文献

5.

A Perceptual Measure of the Degree of Development of Proprietary Equipment

《Structural equation modeling》2013,20(4):579-598

In this article we evaluate the psychometric properties of a scale for a perceptual measure of the extent to which manufacturing organizations develop proprietary equipment. We use a confirmatory factor analysis (CFA) approach to assess unidimensionality and reliability as well as convergent, discriminant and concurrent validity. Convergent and discriminant validity is assessed using CFA of the multitrait-multimethod (MTMM) matrix. In addition, we assess the scale's factorial invariance across industries. Results suggest that although method effects are present, the scale demonstrates internal consistency and validity. Implications of this study in the field of operations strategy and general strategy are discussed. 相似文献

6.

Reliability of Scores From Teacher-Made Tests 总被引：1，自引：0，他引：1

David A. Frisbie 《Educational Measurement》1988,7(1):25-35

Reliability is the property of a set of test scores that indicates the amount of measurement error associated with the scores. Teachers need to know about reliability so that they can use test scores to make appropriate decisions about their students. The level of consistency of a set of scores can he estimated by using the methods of internal analysis to compute a reliability coefficient. This coefficient, which can range between 0.0 and +1.0, usually has values around 0.50 for teacher-made tests and around 0.90 for commercially prepared standardized tests. Its magnitude can be affected by such factors as test length, test-item difficulty and discrimination, time limits, and certain characteristics of the group—extent of their testwiseness, level of student motivation, and homogeneity in the ability measured by the test. 相似文献

7.

Higher Validity in the Face of Lower Reliability: Another Look

《教育实用测度》2013,26(3):249-253

A test segment that lacks content validity with respect to a criterion may be deleted for that reason. At issue is the effect on reliability and validity as measured by the coefficients arising from classical test theory. Assuming that the predictor test has some reasonable degree of internal consistency, deleting a segment of meaningful size is certain to reduce reliability. However, Feldt (1997) showed that a concomitant rise in the validity coefficient may occur under certain limited conditions. The present research further characterizes the circumstances under which validity changes may occur as a result of deletion of a predictor test segment. Specifically, for a positive outcome, one seeks a relatively large correlation between the scores from the deleted segment and the remaining items coupled with a relatively low correlation between scores from the deleted segment and the criterion. 相似文献

8.

试卷中含有单个高计分主观题时的信度估计方法

杨志明丁港王雯《教育测量与评价(理论版)》2021,(1):44-48

测评信度是衡量考试质量的核心指标之一,但常规的信度估计方法在估计含有单个高计分主观题试卷的信度时并不恰当,因为这种高计分主观题对测验总分方差的影响太大。解决这种问题的一个做法是:在估计出单个高计分主观题信度的基础上,进一步运用分层α系数公式估计整个试卷的测评信度。单个高计分主观题信度的估计方法有两种,即使用重测信度的估计方法,或者使用根据两个随机变量的相关系数会因随机误差的存在而衰减的特点所提出的估计方法。相似文献

9.

Traditional Dimensionality Versus Essential Dimensionality

Ratna Nandakumar 《Journal of Educational Measurement》1991,28(2):99-117

This article addresses testing the hypothesis of one versus more than one dominant (essential) dimension in the possible presence of minor dimensions. The method used is Stout's statistical test of essential unidimensionality, which is based on the theory of essential unidimensionality. Differences between the traditional definition of dimensionality provided by item response theory, which counts all dimensions present, and essential dimensionality, which counts only dominant dimensions, are discussed. As Monte Carlo studies demonstrate, Stout's test of essential unidimensionality tends to indicate essential unidimensionality in the presence of one dominant dimension and one or more minor dimensions that have a relatively small influence on item scores. As the influence of the minor dimensions increases, Stout's test is more likely to reject the hypothesis of essential unidimensionality. To assist in interpreting these studies, a rough index of the deviation from essential unidimensionality is proposed. 相似文献

10.

Long-term stability of students' evaluations: A note on Feldman's “consistency and variability among college students in rating their teachers and courses”

Herbert W. Marsh Dr. J. U. Overall 《Research in higher education》1979,10(2):139-147

Feldman (1977), reviewing research about the reliability of student evaluations, reported that while class average responses were quite reliable (.80s and .90s), single rater reliabilities were typically low (.20s). However, studies he reviewed determined single rater reliability with internal consistency measures which assumed that differences among students in the same class (within-class variance) were completely random—an assumption which Feldman seriously questioned. In the present study, this assumption was tested by collecting evaluations from the same students at the end of each class and again one year after graduation. Single rater reliability based upon an internal consistency approach (agreement among different students in the same class) was similar to that reported by Feldman. However, single rater reliability based upon a stability approach (agreement between end-of-term and follow-up ratings by the same student) was much higher (medianr=.59). These results indicate that individual student evaluations were remarkably stable over time and more reliable than previously assumed. Most important, there was systematic information in individual student ratings—beyond that implied by the class average response—that internal consistency approaches have ignored or assumed to be nonexistent. 相似文献

11.

Psychometric Characteristics of the California Preschool Social Competence Scale in a Spanish Population Sample

Jordi Julvez Maria Forns Núria Ribas-Fitó Carlos Mazon Maties Torrent Raquel Garcia-Esteban 《Early education and development》2013,24(5):795-815

Research Findings: Few rating scales measure social competence in very young Spanish or Catalan children. We aimed to analyze the psychometric characteristics of the California Preschool Social Competence Scale (CPSCS) when applied to a Spanish- and Catalan-speaking population. Children were rated by their respective teachers within 6 months following their 4th birthday in two population-based birth cohorts in Spain (N = 378). A confirmatory factor analysis (CFA) was used to compare the underlying structure of the Spanish–Catalan version with that of the original version. Cronbach's alpha coefficient was used to determine the internal consistency of each of the confirmed factors. Cohen's kappa formula was used to calculate the test–retest reliability in a small subset of children who were rated again one month later. Five correlated factors (Considerateness, Task Orientation, Extraversion, Verbal Facility, and Response to Unfamiliar) were optimally confirmed as a result of CFA. The first three factors had robust internal consistency. The kappa coefficient was satisfactory in 29 items out of 30. Children's cognitive abilities as assessed by the McCarthy Scales, children's gender, maternal social class and level of education were related to the social competence scores as indicators of criterion-related factors. Practice or Policy: The bilingual version of the CPSCS has good psychometric properties allowing it to be used in further studies in either Spanish or Catalan populations. 相似文献

12.

《状态自我宽恕量表》在大学生群体中的修订

汤舒俊喻峰《荆州师范学院学报》2009,(6):68-70

目的：在大学生群体中修订状态自我宽恕量表（SSFS）,考察其心理测量学指标。方法：对392名在校大学生进行测查,采用探索性和验证性因素分析等方法考察该量表的信度和效度。结果：修订后的量表有两个维度,验证性因素分析结果显示,SSFS的两因素结构拟合较好。SSFS的内部一致性信度为0．750,重测信度为0．623。SSFS的效标关联效度良好。结论：修订后的SSFS具有较好的心理测量学属性,可作为测量大学生状态自我宽恕的工具。相似文献

13.

A Flexible Latent Class Approach to Estimating Test‐Score Reliability

Daniël W. van der Palm L. Andries van der Ark Klaas Sijtsma 《Journal of Educational Measurement》2014,51(4):339-357

The latent class reliability coefficient (LCRC) is improved by using the divisive latent class model instead of the unrestricted latent class model. This results in the divisive latent class reliability coefficient (DLCRC), which unlike LCRC avoids making subjective decisions about the best solution and thus avoids judgment error. A computational study using large numbers of items shows that DLCRC also is faster than LCRC and fast enough for practical purposes. Speed and objectivity render DLCRC superior to LCRC. A decisive feature of DLCRC is that it aims at closely approximating the multivariate distribution of item scores, which might render the method suited when test data are multidimensional. A simulation study focusing on multidimensionality shows that DLCRC in general has little bias relative to the true reliability and is relatively accurate compared to LCRC and classical lower bound methods coefficients α and λ2 and the greatest lower bound. 相似文献

14.

Strong Convergence of the Coefficient Alpha Estimator for Reliability of Multiple-Component Measuring Instruments

Tenko Raykov 《Structural equation modeling》2019,26(3):430-436

It is shown that in general the popular coefficient alpha estimator for reliability of multi-component measuring instruments converges almost surely to a quantity that is not equal to the population reliability coefficient. This convergence with probability 1 is a stronger statement than convergence in probability (consistency) and convergence in distribution for the alpha estimator, which have been studied in the past. In the special case of congeneric measures with uncorrelated errors and equal loadings on the common true score, the alpha estimator converges almost surely to the population reliability coefficient that equals population alpha, which implies also its consistency as a reliability estimator. When the loadings are unequal but sufficiently high and similar, the alpha estimator converges almost surely to population alpha that is essentially indistinguishable from the population reliability coefficient, which implies alpha’s approximate consistency then. For the general case, the results entail that the alpha estimator is not a consistent estimator of reliability. The findings add to the critical literature on coefficient alpha in the general case, as well as to the justification of its use as a dependable measuring instrument reliability estimator in special cases and settings resulting under appropriate restrictive conditions, and are illustrated using a numerical example. 相似文献

15.

In search of the reliability of a Flemish version of the Knowledge Monitoring Assessment Test

Geraldine Clarebout Jan Elen Patrick Onghena 《Metacognition and Learning》2006,1(2):137-147

Metacognitive skills are widely recognized as an important moderating variable for learning. Many studies have shown that these skills affect students’ learning results. Tobias and Everson (2000) argue that metacognitive skills cannot be effectively applied in absence of accurate knowledge monitoring. Consequently, they constructed a knowledge monitoring assessment test, which is claimed to be a valid test to measure students' knowledge monitoring capacity. In this contribution the reliability of a Flemish version of the KMA test is studied. Two studies are reported on, one with secondary education students and one with freshmen university students. In both studies split half method and Kuder Richardson 20 were used to calculate the internal consistency as a measurement of reliability. Because none of the results showed a good reliability it is suggested that additional efforts are needed to elaborate a reliable instrument. 相似文献

16.

Counterintuitive Dynamics Test

Nuri Balta Ali Eryılmaz 《International Journal of Science and Mathematics Education》2017,15(3):411-431

One way to fascinate, engage, arouse curiosity, motivate, and stimulate intellectual development in learning scientific concepts is to use counterintuitive questions. These questions make students aware of the inadequacies of their own thinking by exposing them to situations whose outcomes are inconsistent with what they would expect. In this study, a counterintuitive dynamics test (CIDT) is developed and administered to high school students along with the force concept inventory (FCI). After expert reviews, the initial version of the test consisting of 39 questions was administered to 87 students as a pilot study. After item analysis, a final version of 30 questions was developed; its internal consistency reliability coefficient was calculated as 0.826. The CIDT and FCI were administered to 229 students from 9 different high schools in Turkey. The results indicated that while in FCI students were mostly affected by everyday experiences and while in CIDT by carelessness and a superficial approach. Average scores for both tests were roughly equal and low. The results showed that the CIDT is a new test that measures another dimension of dynamic concepts and should be used along with the FCI. 相似文献

17.

《学习障碍评价量表》修订报告

曾守锤《中国特殊教育》2006,(6):92-96

对Stephen B．McCamey1996年修订完成的《学习障碍评价量表》(学校版)进行了修订。中文版量表共85个项目,包括7个分量表:听、思考、说、阅读、书写／写作、拼写和数学运算。对416名小学二至五年级学生的测量表明:(1)项目的回答模式合理; (2)该量表具有较高的内部一致性系数和重测信度系数;(3)该量表具有较好的结构效度、效标关联效度和内容效度。相似文献

18.

Measuring ocean literacy of high school students: psychometric properties of a Chinese version of the ocean literacy scale

Liang-Ting Tsai 《Environmental Education Research》2019,25(2):264-279

This study established a Chinese scale for measuring high school students’ ocean literacy. This included testing its reliability, validity, and differential item functioning (DIF) with the aim of compensating for the lack of DIF tests focusing on current scales. The construct validity and reliability were verified and tested by analyzing the established scale’s items using the Rasch model, and a gender DIF test was conducted to ensure the test results’ fairness when distinct groups were compared simultaneously. The results indicated that the scale established in this study is unidimensional and possesses favorable internal consistency and construct validity. The gender DIF test results indicated that several items were difficult for either female or male students to correctly answer; however, the experts and scholars discussed these items individually and suggested retaining them. The final Chinese version of the ocean literacy scale developed here comprises 48 items that can reflect high school students’ understanding of ocean literacy—which helps students understand the topics of marine science encountered in real life. 相似文献

19.

Development and validity of a Dutch version of the Remote Associates Task: An item-response theory approach

Soghra Akbari Chermahini Marian Hickendorff Bernhard Hommel 《Thinking Skills and Creativity》2012,7(3):177-186

The Remote Associates Test (RAT) developed by Mednick and Mednick (1967) is known as a valid measure of creative convergent thinking. We developed a 30-item version of the RAT in Dutch with high internal consistency (Cronbach's alpha = 0.85) and applied both Classical Test Theory and Item Response Theory (IRT) to provide measures of item difficulty and discriminability, construct validity, and reliability. IRT was further used to construct a shorter version of the RAT, which comprises of 22 items but still shows good reliability and validity—as revealed by its relation to Raven's Advanced Progressive Matrices test, another insight-problem test, and Guilford's Alternative Uses Test. 相似文献

20.

THE ROLE OF RELIABILITY IN CRITERION-REFERENCED TESTS

MICHAEL T. KANE 《Journal of Educational Measurement》1986,23(3):221-224

In discussion of the properties of criterion-referenced tests, it is often assumed that traditional reliability indices, particularly those based on internal consistency, are not relevant. However, if the measurement errors involved in using an individual's observed score on a criterion-referenced test to estimate his or her universe scores on a domain of items are compared to errors of an a priori procedure that assigns the same universe score (the mean observed test score) to all persons, the test-based procedure is found to improve the accuracy of universe score estimates only if the test reliability is above 0.5. This suggests that criterion-referenced tests with low reliabilities generally will have limited use in estimating universe scores on domains of items. 相似文献