首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Measurement experts often advise against giving students an option of selecting which items on a test they choose to answer. A review of the related literature revealed that this caveat refers primarily to essay examinations. It was the purpose of this research to investigate the effects of permitting students the prerogative of selecting a proportion of the items on an objective test which they wished to exclude from the calculation of a grade. Findings regarding the effects of this practice on test reliability, test validity and on the students' relative test performance ranks suggest that it may not be as deleterious as we had been led to believe.  相似文献   

2.
In order to investigate the effect of two item-writing practices on test characteristics, examinations were chosen for study in two undergraduate courses (N = 71 and 210) . About one-fourth of the items on each examination included a practice generally regarded as undesirable in measurement textbooks and alleged to make test items more difficult. Alternate forms which eliminated the undesirable practice were developed and administered at the same time as the original form. Rewriting item stems so that they formed a complete sentence or question resulted in about 6 percent more students answering items correctly. Eliminating unnecessary material in item stems, however, had little effect on difficulty. KR20 values were not appreciably different for the two versions of either test. Neither flaw was found to affect item discrimination indices noticeably. The absence of any substantial practice-by-achievement level interactions suggested little effect of the practices on the validity of the tests.  相似文献   

3.
完形填空试题由于在命题、实施、评卷、结果分析等方面具有客观、便利等优点,因而被广泛应用于外语教学和测试中。但是目前充斥市场的绝大多数完形填空试题效度不高,主要原因就是试题的考点层次不高,效度偏低。根据李筱菊提出的完形填空考点层次理论设计一道完形填空试题,并选择某高校的学生进行试测,重点分析了答题正确率和失分原因,从实证的角度得出通过提高考点层次来提升完形填空试题考点效度的方法。应着重培养学生在高层次考点上的能力,从而提高英语学习者的综合英语水平。  相似文献   

4.
2004年高考(上海卷)地理试卷包含两大部分:选择题和综合分析题。选择题部分共20题,每题2分,计40分。综合分析题部分有八大题,34个小题, 110个得分点。主要从经典的试题分析、考试结果的信度、考试效度的内容和结构方面的证据以及考试对教育教学的影响等几个角度对地理考试进行评价,得出下列结论:地理考试的能力目标是根据课程标准制定的,命题以课程标准为依据,难度略偏易,有一定的区分度;试卷的题量适中;选择题与非选择题比例适中,对学校的教育和教学有较好的导向作用。然而,综合分析题部分图文信息阅读量较大,应答文字表述较少,难以比较系统地考查考生独立的地理思维能力,这对教学的导向是不利的。  相似文献   

5.
High stakes testing, a phenomena born out of intense accountability across the United States, produces instructional settings that marginalize both curriculum and instruction. Teachers and other school personnel have minimized instruction to drill and practice in an effort to raise standardized and criterion referenced test scores. This study presents an alternative to current practice that engages students in learning and increases their awareness of the internal aspects of standardized tests. The Test Item Construction Model (TICM) guides students through the process of studying test item stems and subsequently creating items using a 12 week process of incrementing from understanding to creating test items. Students grew in their understanding of the test item stems and the generation of these. An ANOVA did not yield significant differences between random groups of trained and untrained test writers. However, students in the experimental group demonstrated gains in understanding of test items.  相似文献   

6.
Using a technique that controlled exposure of items, the investigator examined the effect on mean test score, item difficulty index, and reliability and validity coefficients of the reordering of items within a power test containing ten letter-series-completion items. The results suggest that effects on test statistics from item rearrangement are, generally, minimal. The implication of these findings for test designs involving an item sampling procedure is that performance on an item is minimally influenced by the context in which it occurs.  相似文献   

7.
Speededness refers to the situation where the time limits on a standardized test do not allow substantial numbers of examinees to fully consider all test items. When tests are not intended to measure speed of responding, speededness introduces a severe threat to the validity of interpretations based on test scores. In this article, we describe test speededness, its potential threats to validity, and traditional and modern methods that can be used to assess the presence of speededness. We argue that more attention must be paid to this issue and that more research must be done to set appropriate time limits on power tests so that speed of responding does not interfere with the construct measured.  相似文献   

8.
This article reviews ten predictive validity studies of the Swedish Scholastic Assessment Test (SweSAT). A primary result is that the predictive validity of the SweSAT seems to be highly dependent upon the study programme being examined; that is, the predictive validity is better at some programmes than others. When compared with the upper‐secondary school grade point average, the predictive validity of the SweSAT seems to be fairly good, but there are major differences between study programmes in this case as well. However, it is suggested that the validity of the results is to some extent threatened by methodological issues. A general conclusion is, therefore, that there is room for improving the test itself, as well as the way that predictive validity studies are carried out.  相似文献   

9.
学业水平考试物理试题难度预估方法探究   总被引:1,自引:1,他引:0  
目前上海市普通高中学业水平考试未实行考前试测制度,因此试题难易度主要依据试题编制者的经验进行预估,尚无量化研究的方法。本研究根据国内外研究经验,从试题的物理概念、试题设计、数学运算三个项目出发,结合2011年上海市普通高中物理学业水平考试试题难度实测数据分析,构建试题难度预估的量化方法,并用2012年上海市普通高中物理学业水平考试试题难度实测数据检验其准确性,期望为今后物理试题难易度预估提供研究的基础。  相似文献   

10.
Verbal reports of examinees' thinking on multiple-choice critical thinking test items can provide useful validation data only if the verbal reporting does not change the course of examinees' thinking and performance. Using a completely randomized factorial design, 343 senior high school students were divided into five groups. In four of the groups, different procedures were used to elicit students' thinking as they worked through Part A of a critical thinking test of observation appraisal (Norris & King, 1983). In the control group, students took the same test in paper-and-pencil format. There were no significant differences in test performance among the five groups nor in the quality of thinking among the four groups from whom verbal reports of thinking were elicited. These results are evidence that verbal reports of thinking can meet one of the necessary conditions of useful validation data—namely, that collecting the data does not alter examinees' thinking and performance. Some analyses found significant interviewer main effects and sex-by-interviewer and elicitation-level-by-interviewer-by-sex-by-grade interaction effects. Analysis of these interactions suggested that the role of the interviewer might limit the generality of the technique.  相似文献   

11.
大学英语测试中只设笔试而且主观题过少,不能测出学生的实际交际能力,尤其是口头交际能力。而当今社会急需交际型人才,所以,应不遗余力推广大学英语口语测试。目前的口语测试和普通话水平测试的举办表明我们基本具备举办更大规模的大学英语口语测试的条件,计算机的应用加大了操作的可能性,而良好的社会氛围和不断加强的英语基础教育也增加了它的可行性。  相似文献   

12.
Educational tests used for accountability purposes must represent the content domains they purport to measure. When such tests are used to monitor progress over time, the consistency of the test content across years is important for ensuring that observed changes in test scores are due to student achievement rather than to changes in what the test is measuring. In this study, expert science teachers evaluated the content and cognitive characteristics of the items from 2 consecutive annual administrations of a 10th-grade science assessment. The results indicated the content area representation was fairly consistent across years and the proportion of items measuring the different cognitive skill areas was also consistent. However, the experts identified important cognitive distinctions among the test items that were not captured in the test specifications. The implications of this research for the design of science assessments and for appraising the content validity of state-mandated assessments are discussed.  相似文献   

13.
A sample of college-bound juniors from 275 high schools took a test consisting of 70 math questions from the SAT. A random half of the sample was allowed to use calculators on the test. Both genders and three ethnic groups (White, African American, and Asian American) benefitted about equally from being allowed to use calculators; Latinos benefitted slightly more than the other groups. Students who routinely used calculators on classroom mathematics tests were relatively advantaged on the calculator test. Test speededness was about the same whether or not students used calculators. Calculator effects on individual items ranged from positive through neutral to negative and could either increase or decrease the validity of an item as a measure of mathematical reasoning skills. Calculator effects could be either present or absent in both difficult and easy items  相似文献   

14.
This study evaluated the connection between gender differences in examinees' familiarity, interest, and negative emotional reactions to items on the Advanced Placement Psychology Examination and the items' gender differential item functioning (DIF). Gender DIF and gender differences in interest varied appreciably with the content of the items. Gender differences in the three variables were substantially related to the items' gender DIF (e.g., R = .50). Much of the gender DIF on this test may be attributable to gender differences in these variables.  相似文献   

15.
在文献检索和专家咨询基础上编制出的高校后勤学生满意度量表,是否可信?通过计算494个样本的Cronbachsα系数、折半信度系数可以检验量表的信度,运用因子分析法可以检验量表的效度。结果表明,高校后勤学生主观满意度量表5个维度19个项目都具有良好的测量性能。  相似文献   

16.
Many innovative item formats have been proposed over the past decade, but little empirical research has been conducted on their measurement properties. This study examines the reliability, efficiency, and construct validity of two innovative item formats—the figural response (FR) and constructed response (CR) formats used in a K–12 computerized science test. The item response theory (IRT) information function and confirmatory factor analysis (CFA) were employed to address the research questions. It was found that the FR items were similar to the multiple-choice (MC) items in providing information and efficiency, whereas the CR items provided noticeably more information than the MC items but tended to provide less information per minute. The CFA suggested that the innovative formats and the MC format measure similar constructs. Innovations in computerized item formats are reviewed, and the merits as well as challenges of implementing the innovative formats are discussed.  相似文献   

17.
A practical concern for many existing tests is that subscore test lengths are too short to provide reliable and meaningful measurement. A possible method of improving the subscale reliability and validity would be to make use of collateral information provided by items from other subscales of the same test. To this end, the purpose of this article is to compare two different formulations of an alternative Item Response Theory (IRT) model developed to parameterize unidimensional projections of multidimensional test items: Analytical and Empirical formulations. Two real data applications are provided to illustrate how the projection IRT model can be used in practice, as well as to further examine how ability estimates from the projection IRT model compare to external examinee measures. The results suggest that collateral information extracted by a projection IRT model can be used to improve reliability and validity of subscale scores, which in turn can be used to provide diagnostic information about strength and weaknesses of examinees helping stakeholders to link instruction or curriculum to assessment results.  相似文献   

18.
The top‐down approach to designing a multistage test is relatively understudied in the literature and underused in research and practice. This study introduced a route‐based top‐down design approach that directly sets design parameters at the test level and utilizes the advanced automated test assembly algorithm seeking global optimality. The design process in this approach consists of five sub‐processes: (1) route mapping, (2) setting objectives, (3) setting constraints, (4) routing error control, and (5) test assembly. Results from a simulation study confirmed that the assembly, measurement and routing results of the top‐down design eclipsed those of the bottom‐up design. Additionally, the top‐down design approach provided unique insights into design decisions that could be used to refine the test. Regardless of these advantages, it is recommended applying both top‐down and bottom‐up approaches in a complementary manner in practice.  相似文献   

19.
The present study investigated the utility of 52 items, selected from a readily available item pool developed for instructional purposes, when the items are used to measure critical thinking abilities of biology students. The items yield scores that have reasonable internal consistency reliability. Furthermore, analyses involving ACT, Watson-Glaser Critical Thinking Appraisal, and Group Embedded Figures Test scores also suggest that the critical thinking test items have good concurrent validity. Thus, the measure may be useful in both science instruction and future research regarding critical thinking phenomena.  相似文献   

20.
This study was conducted to determine if a norm-referenced test designed to assess instructional design competency could be statistically validated (i.e., confirmed statistically to discriminate between known masters and known nonmasters of instructional design). The test was composed of items written to assess verified competencies required of instructional design professionals. A total of 257 respondents participated in the study over the course of three stages: initial item bank construction, item analysis to determine those items with discrimination power, and the concurrent validity calculation, including determination of the mastery cut-off score. Mean scores of five groups of respondents were analyzed in the final stage. Statistically significant differences were found among the Professional Masters, Education Graduate Students and Undergraduates, Noneducation Graduate Students, and Noneducation Undergraduates. The article concludes with a discussion of the role of such an instrument in conducting research in the field.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号