首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Where were you in 1963, when this article was first published? What are the two kinds of information you can get from achievement tests? What is a‘triterion level?”  相似文献   

2.
What are the implications for teacher certification tests as courts begin to interpret the need for using a job analysis as a basis for testing? What is the character of recent court cases related to teacher certification tests?  相似文献   

3.
Are teacher certification tests subject to the same concerns as other licensure examinations? How should standards for such examinations be established?  相似文献   

4.
黑盒测试和白盒测试都是软件测试的重要方法,黑盒测试的测试人员更偏重于业务方向,白盒测试的测试人员更偏重于实现方式;黑盒测试更注重整体,白盒测试更注重局部;它们是相辅相成的.  相似文献   

5.
根据有关分析和综合的概念理论以及儿童的心理年龄特点,以图形为材料,编制了“分析测验和综合测验”。经测试结果表明,两个测验均有较好的质量参数,可作为6—12岁儿童分析能力和综合能力的测量工具。  相似文献   

6.
States are increasingly requiring that public school teachers pass one or more tests as a condition for permanent employment. As a result of a recent federal court decision, these tests must now satisfy the same legal standards as other employment tests. Moreover, some of the measures used to assess teacher competence no longer rely on multiple-choice items. They now utilize various types of open-ended performance assessments. This article discusses how these developments may affect the adverse impact, reliability, validity, and pass-fail standards of teacher certification tests. The article concludes by recommending that such tests combine multiple-choice questions with open-end tasks that focus on the common or critical situations that are likely to arise across the full range of practice setting for which the teacher is being certified or licensed.  相似文献   

7.
States are increasingly requiring that public school teachers pass one or more tests as a condition for permanent employment. As a result of a recent federal court decision, these tests must now satisfy the same legal standards as other employment tests. Moreover, some of the measures used to assess teacher competence no longer rely on multiple-choice items. They now utilize various types of open-ended performance assessments. This article discusses how these developments may affect the adverse impact, reliability, validity, and pass-fail standards of teacher certification tests. The article concludes by recommending that such tests combine multiple-choice questions with open-end tasks that focus on the common or critical situations that are likely to arise across the full range of practice setting for which the teacher is being certified or licensed.  相似文献   

8.
Music has an important place in theatre productions and in instructional audiovisual material. Why is this, what functions has the music and what are its effects? Literature on music in audio‐visual material and on its design is very scarce. The concept of music and its characteristics are described in some detail, indicating that music per se has no referential meaning. The theory is presented that music has an emotional effect that can be manipulated by the developer of audio‐visual material. On the other hand, music offers resources for commenting and structuring audio‐visual material.  相似文献   

9.
Abstract:R?rvik, H. 1980. A Comparison of Piaget's and Kohlberg's Theories and Tests for Moral Judgment. Scandinavian Journal of Educational Research 25,99‐124. Piaget's and Kohlberg's theories for moral judgment are compared. On the basis of this comparison, hypotheses are formulated regarding expected relationships between the tests constructed on the basis of the two theories. The empirical testing of these hypotheses indicates that there are marked similarities between Piaget's and Kohlberg's tests as to characteristics measured, power of discrimination between age levels, and in the stage placement of subjects.

The main differences between the tests seem to be that Piaget's test is most influenced by the personal relationship to other persons. Contrary to the impression given by the theorist himself, Kohlberg's test seems to a larger extent to measure the subjects’ norms and emotional reactions connected to inter‐nalization of norms. Moral behavior is more closely related to Kohlberg's measure.  相似文献   

10.
The choice of anchor tests is crucial in applications of the nonequivalent groups with anchor test design of equating. Sinharay and Holland (2006, 2007) suggested “miditests,” which are anchor tests that are content‐representative and have the same mean item difficulty as the total test but have a smaller spread of item difficulties. Sinharay and Holland (2006, 2007), Cho, Wall, Lee, and Harris (2010), Fitzpatrick and Skorupski (2016), Liu, Sinharay, Holland, Curley, and Feigenbaum (2011a), Liu, Sinharay, Holland, Feigenbaum, and Curley (2011b), and Yi (2009) found the miditests to lead to better equating than minitests, which are representative of the total test with respect to content and difficulty. However, these findings recently came into question as Trierweiler, Lewis, and Smith (2016) concluded, based on a comparison of correlation coefficients of miditests and minitests with the total test, that making an anchor test a miditest does not generally increase the anchor to total score correlation and recommended the continuation of the practice of using minitests over miditests. Their recommendation raises the question, “Should miditests continue to be considered in practice?” This note defends the miditests by citing literature that favors miditests and then by showing that miditests perform as well as the minitests in most realistic situations considered in Trierweiler et al. (2016), which implies that miditests should continue to be seriously considered by equating practitioners.  相似文献   

11.
通过计算机辅助口语测试和面试的相关性研究,探讨计算机辅助口语测试与面试的可替代性,同时结合问卷调查,了解学生对计算机辅助口语测试的态度、影响口语测试的因素。研究结果表明,计算机辅助口语测试和面试的成绩显著相关,两者具有可替代性;大多数学生认同计算机辅助口语测试这一形式;影响口语测试的主要因素是题目和材料的熟悉程度。  相似文献   

12.
What is differential bundle functioning and how is this different from differential item functioning? Can test specifications be used to identify and aid in the interpretation of differential bundle functioning? How can differential bundle functioning lead to an improved understanding of why groups perform differently on achievement tests?  相似文献   

13.
《教育实用测度》2013,26(4):393-406
Two models are presented in this article for estimating the proportion of students who would pass all of three or more content area tests given that none have actually been tested in more than two of the content areas. The first model allows one to estimate the proportion of students who would pass all of three or more content area tests from the test results of a study in which no student took more than two of the tests; the second model (which requires an outside estimate of the correlations between the different content area tests) allows one to estimate the proportion of students who would pass all of three or more content area tests from the test results of a study (or field test results) in which students took only one content area test. The models were tested on the Texas End-of-Course test battery (which consists of four content area tests) results of students who took all four content area tests prior to or in the spring of 2001, with at least one of the end-of-course content area tests taken in the spring of 2001. The model test results may have particular application to state assessment programs that must perform standard setting on high-stakes exams before the first live administration of the exams.  相似文献   

14.
How should researchers choose between competing scales in predicting a criterion variable? This article proposes the use of nonnested tests for the 2SLS estimator of latent variable models to discriminate between scales. The finite sample performance of these tests is compared to structural equation modeling information-based criteria such as root mean squared error of approximation (RMSEA) and Akaike's Information Criterion (AIC). The Cox and encompassing tests and augmented versions of these tests are compared to the inconsistent ordinary least squares (OLS) J test. An augmented version of the encompassing test performs best for sample sizes of 100 or more and can be recommended for use on scales with high reliability (0.9) and sample sizes of 200 or more, under varying regressor and error distributions. The OLS J test performs best for small samples of N = 50 and can be recommended for use in small samples when scales have high reliability (0.9). Relative to the nonnested tests, the information-based criteria perform poorly.  相似文献   

15.
How does the fact that two tests should not be equated manifest itself? This paper addresses this question through the study of the degree to which equating functions fail to exhibit population invariance across subpopulations. Equating fimctions are supposed to be population invariant by definition. But, when two tests are not equatable, it is possible that the linking functions, used to connect the scores of one to the scores of the other, are not invariant across different populations of examinees. While no acceptable equating function is ever completely population invariant, in the situations where equating is usually performed we believe that the dependence of the equating function on the population used to compute it is usually small enough to be ignored. We introduce two root‐mean‐square difference measures of the degree to which the functions used to link two tests computed on different subpopulations differ from the linking function computed for the whole population. We also introduce the system of “parallel‐linear” linking functions for multiple subpopulations and show that, for this system, our measure of population invariance can be computed easily from the standardized mean differences between the scores of the subpopulations on the two tests. For the parallel‐linear case, we develop a correlation‐based upper bound on our measure that holds for all systems of subpopulations. We illustrate these ideas using data from the SAT I and from a concordance study of several combinations of ACT and SAT I scores, In the appendices, we give some theoretical results bearing on the other equating “requirements” of “same construct,”“same reliability” and one aspect of Lord's concept of equity.  相似文献   

16.
虽然客观型试题在测试学生的语言能力上有很强的优势,但在实践中它的负面影响也越来越明显。主观型试题能更有效地测量学生的语言表达能力和交际能力。主观型试题编题相对比较容易,但评分却很难。要合理分析造成主观型试题信度低的原因,提出解决的方法。  相似文献   

17.
在分析研究近年大量语文试题的基础上归纳出语文考试的大致趋势;主观性试题的比重越来越大,客观性试题大大减少;开放性试题逐渐增加,重视考查学生的思维能力,尤其是创造性思维能力,提倡探究,鼓励创造;课内外结合,重视语言积累和文化积累;注重考查综合知识和综合能力,注重学科之间的综合;注意从情感态度与价值观、过程与方法、知识和能力三个维度设计考试题,试卷日益体现对学生的人文关怀;听说读写全面考查;作文测试紧扣时代脉搏,直面现实生活,力求引导学生关注现实人生,写出真情实感;立足生活实际,关注社会热点,强调学以致用;试题设计越来越新颖,考试方式呈现多样化。  相似文献   

18.
完形测试建立于完形心理学的理论基础之上,自创建以来,倍受语言测试界的重视,被广泛用于各种大规模的测试中。但后来的研究表明完形测试也存在一些令人费解之处。本文通过分析Bachman模式揭开了完形测试的神秘面纱。分析表明,完形测试结果并不一定是应试者潜在能力(underlying competence)的真实体现,它还会受到方法因素的影响。该分析结果建议采用完形测试形式的命题者不仅要更好地定义测量目标,而且还要考虑测试中存在哪些潜在难点,尽量避免它们对测试结果带来的影响。  相似文献   

19.
Research on the Effects of Administering Tests via Computers   总被引:1,自引:0,他引:1  
Studies on the effects of computerized testing show that computer-based tests are not equivalent to their conventional counterparts, the results of item feedback are inconclusive, and computerized-adaptive tests can be as reliable as conventional tests with fewer items administered. The effects of the ordering of items, the ability estimation procedures, and the context of item administration in the estimation of parameters are some areas that require further investigation.  相似文献   

20.
Abstract

In an attempt to identify some of the causes of answer changing behavior, the effects of four tests and item specific variables were evaluated. Three samples of New Zealand school children of different ages were administered tests of study skills. The number of answer changes per item was compared with the position of each item in a group of items, the position of each item in the test, the discrimination index and the difficulty index of each item. It is shown that answer changes were more likely to be made on items occurring early in a group of items and toward the end of a test. There was also a tendency for difficult items and items with poor discriminations to be changed more frequently. Some implications of answer changing in the design of tests are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号