首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Lord (1959) has shown that the standard error of measurement of a test is, for all practical purposes, directly proportional to the square root of the number of items on the test. More specifically, Lord found empirically that the standard error of a test was equal to .     if the reliability of the test was computed by the Kuder-Richardson (KR) 20 formula. If the KR-21 formula was used, the standard error was equal to .     . The present paper sets out to show how these relationships may be derived from the defining formulas of reliability and standard error of measurement, if certain simple assumptions about values of test statistics are made.  相似文献   

2.
Kuder-Richardson reliability formulas 20 and 21 should not be applied to tests containing items either answered correctly or incorrectly by all examinees. The extent to which KR-20 is attenuated by zero-variance items is derived.  相似文献   

3.
From concepts which refer only to observed scores and which allow the parameters of score distribution over repeated measurements on a given person to differ from person to person, necessary and sufficient conditions under which coefficient alpha equals test reliability are derived. The result clarifies the relation this quantity to the Kuder-Richardson formula 20, to the KR 21, to the Spearman-Brown formula, and to Lord item sampling model.  相似文献   

4.
The stepped-up reliability coefficient does not have the same standard error as an ordinary correlation coefficient. Fisher's z-transformation should not be applied to it. Appropriate procedures are suggested.  相似文献   

5.
An occupational inventory was developed by the Pennsylvania Department of Education for consideration as part of the state assessment project. This instrument measures 7tb grade student knowledge of the world of work. The summary phase of field testing involving 255 students revealed a reliability of .828 and a standard error of measurement of 2.60.  相似文献   

6.
What is the extent of error likely with each of several approximations for the standard deviation, internal consistency reliability, and the standard error of measurement? To help answer this question, approximations were compared with exact statistics obtained on 85 different classroom tests constructed and administered by professors in a variety of fields; means and standard deviations of the resulting differences supported the use of approximations in practical situations. Results of this analysis (1) suggest a greater number of alternative formulas that might be employed, and (2) provide additional information concerning the accuracy of approximations with non-normal distributions.  相似文献   

7.
Although there is considerable evidence that the Law School Admission Test (LSAT) and the undergraduate grade-point average (UGPA) have a useful degree of predictive validity, there is also a large variation in the magnitude of the coefficients across schools. Understanding this variation has important implications for the use and interpretation of results of a validity study conducted at an individual school. A meta analysis of the validity results and data on applicants to 154 law schools was conducted in an effort to better understand this observed variation. The standard deviation (SD) on the LSAT and the correlation between the LSAT and UGPA for accepted students at each law school accounted for 58.5% of the between-school variance in the multiple correlations of these two predictors with first-year average grade in law school. Sampling error accounted for an additional 12% of the variance. Hence, only a small fraction of the between-school variability in validities remains to be explained by other statistical artifacts of situational specificity factors. Mean validities and 90% credibility values for four adjustment procedures are reported as are the mean observed validities for different combinations of predictors.  相似文献   

8.
It is widely recognized that the reliability of a difference score depends on the reliabilities of the constituent scores and their intercorrelation. Authors often use a well-known identity to express the reliability of a difference as a function of the reliabilities of the components, assuming that the intercorrelation remains constant. This approach is misleading, because the familiar formula is a composite function in which the correlation between components is a function of reliability. An alternative formula, containing the correlation between true scores instead of the correlation between observed scores, provides more useful information and yields values that are not quite as anomalous as the ones usually obtained  相似文献   

9.
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this regard is a measure produced by dividing the standard error of measurement by the test's ‘reliability length’, the latter defined as the maximum possible score minus the most probable score obtainable by blind guessing alone. This, however, can be unsatisfactory with negative marking (formula scoring), as shown by data on 13 negatively marked true/false tests. In these the examinees displayed considerable misinformation, which correlated negatively with correct knowledge. Negative marking can improve test reliability by penalizing such misinformation as well as by discouraging guessing. Reliability measures can be based on idealized theoretical models instead of on test data. These do not reflect the qualities of the test items, but can be focused on specific test objectives (e.g. in relation to cut‐off scores) and can be expressed as easily communicated statements even before tests are written.  相似文献   

10.
Where two sets of measurements can each be grouped into below average, average and above average classifications with an equal number assigned to each of the below average and above average classifications, a 3 by 3 table can then be tabulated with frequency counts. The exact value of the product moment coefficient of correlation can then be calculated very simply by means of the formula, r = (DIFF)/2m, where DIFF is the difference between the sum of the corner numbers on the positive diagonal and the sum of the corner numbers on the negative diagonal, and m equals the number in each of the below average and above average classifications for each variable. The formula for r is applicable to negative as well as positive correlation.  相似文献   

11.
Violations of four selected principles of writing multiple choice items were introduced into an undergraduate political science examination. Three of the four poor practices had no overall effect on test difficulty. A significant (α= .05) interaction effect between the poor practices and course achievement occurred for one of the four practices, with the poorer students generally gaining most from the poorly written items. KR 20 values were significantly lower for sets of items with the same flaws than for "good" versions of the items in three of four comparisons. The reductions in reliability were equivalent to those expected to result from shortening the test by 13 to 56 percent. Concurrent validity (correlation of experimental test scores with final examination scores) was significantly lower in two of four cases. The reductions in validity were equivalent to those expected to result from shortening the test by 56 to 83 percent.  相似文献   

12.
Although it has been known for over a half-century that the standard error of measurement is in many respects superior to the reliability coefficient for purposes of evaluating the fallibility of a psychological test, current textbooks and journal literature in tests and measurements still devote far more attention to test reliability than to the standard error. The present paper provides a list of ten salient features of the standard error, contrasting it to the reliability coefficient, and concludes that the standard error of measurement should be regarded as a primary characteristic of a mental test.  相似文献   

13.
在实验和测量工作中 ,系统误差的存在是不可避免的 ,若不能有效地加以消除 ,就会使测量结果受到歪曲 ,从而不能保证测量结果的正确性 ,按照偶然误差理论评定测量结果的精密度大小也就失去意义。因此 ,在任何一项实验工作和具体测量中 ,首先必须要想办法最大限度地减小和消除一切可能存在的系统误差  相似文献   

14.
The hypothesis that some students, when tested under formula directions, omit items about which they have useful partial knowledge implies that such directions are not as fair as rights directions, especially to those students who are less inclined to guess. This hypothesis may be called the differential effects hypothesis. An alternative hypothesis states that examinees would perform no better than chance expectation on items that they would omit under formula directions but would answer under rights directions. This may be called the invariance hypothesis. Experimental data on this question were obtained by conducting special test administrations of College Board SAT-verbal and Chemistry tests and by including experimental tests in a Graduate Management Admission Test administration. The data provide a basis for evaluating the two hypotheses and for assessing the effects of directions on the reliability and parallelism of scores for sophisticated examinees taking professionally developed tests. Results support the invariance hypothesis rather than the differential effects hypothesis.  相似文献   

15.
介绍了多管落球法测量重力加速度的新方法。采用外推法设计实验,实现了斯托克斯公式所要求的横向和纵向均为"无限广延"的条件。测得的重力加速度值与公认值相比相对误差为2.3%。  相似文献   

16.
本文使用超维里定理与升降算符方法,导出了氢原子跃迁矩阵元与平均值计算的一些递推关系式.使用这些递推关系式可克服直接计算公式中交替双重求和所带来的计算机舍入误差,可从少数几个低阶矩阵元出发,求出任意幂次径向算符的矩阵元,从而可为里德堡原子激发态结构的现代研究提供必备的高阶微扰计算公式.  相似文献   

17.
One of the most widely used methods for equating multiple parallel forms of a test is to incorporate a common set of anchor items in all its operational forms. Under appropriate assumptions it is possible to derive a linear equation for converting raw scores from one operational form to the others. The present note points out that the single most important determinant of the efficiency of the equating process is the magnitude of the correlation between the anchor test and the unique components of each form. It is suggested to use some monotonic function of this correlation as a measure of the equating efficiency, and a simple model relating the relative length of the anchor test and the test reliability to this measure of efficiency is presented.  相似文献   

18.
A reliability coefficient for criterion-referenced tests is developed from the assumptions of classical test theory. This coefficient is based on deviations of scores from the criterion score, rather than from the mean. The coefficient is shown to have several of the important properties of the conventional normreferenced reliability coefficient, including its interpretation as a ratio of variances and as a correlation between parallel forms, its relationship to test length, its estimation from a single form of a test, and its use in correcting for attenuation due to measurement error. Norm-referenced measurement is considered as a special case of criterion-referenced measurement.  相似文献   

19.
Dyer, et al. (1967) have proposed a model for school system evaluation. The usefulness of the indices obtained from this model depend on the reliability or stability of the indices. This study presents evidence related to the stability of these indices when pupils and factors related to time are considered as sources of error.  相似文献   

20.
An alternative interpretation of Livingston's reliability coefficient is based on the notion of the relation of the size of the reliability coefficient to the range of talent. It is shown that the (generally) larger Livingston coefficient does not imply a smaller standard error of measurement and consequently does not imply a more dependable determination of whether or not a true score falls below (or exceeds) a given criterion value.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号