首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This study analyzed the relationship between benchmark scores from two curriculum‐based measurement probes in mathematics (M‐CBM) and student performance on a state‐mandated high‐stakes test. Participants were 298 students enrolled in grades 7 and 8 in a rural southeastern school. Specifically, we calculated the criterion‐related and predictive validity of benchmark scores from CBM probes measuring math computation and math reasoning skills. Results of this study suggest that math reasoning probes have strong concurrent and predictive validity. The study also provides evidence that calculation skills, while important, do not have strong predictive strength at the secondary level when a state math assessment is the criterion. When reading comprehension skill is taken into account, math reasoning scores explained the greatest amount of variance in the criterion measure. Computation scores explained less than 5% of the variance in the high‐stakes test, suggesting that it may have limitations as a universal screening measure for secondary students.  相似文献   

2.
Grades and Test Scores: Accounting for Observed Differences   总被引:1,自引:0,他引:1  
Why do grades and test scores often differ? A framework of possible differences is proposed in this article. An approximation of the framework was tested with data on 8,454 high school seniors from the National Education Longitudinal Study. Individual and group differences in grade versus test performance were substantially reduced by focusing the two measures on similar academic subjects, correcting for grading variations and unreliability, and adding teacher ratings and other information about students. Concurrent prediction of high school average was thus increased from 0.62 to 0.90; differential prediction in eight subgroups was reduced to 0.02 letter‐grades. Grading variation was a major source of discrepancy between grades and test scores. Other major sources were teacher ratings and Scholastic Engagement, a promising organizing principle for understanding student achievement. Engagement was defined by three types of observable behavior: employing school skills, demonstrating initiative, and avoiding competing activities. While groups varied in average achievement, group performance was generally similar on grades and tests. Major factors in achievement were similarly constituted and similarly related from group to group. Differences between grades and tests give these measures complementary strengths in high‐stakes assessment. If artifactual differences between the two measures are not corrected, common statistical estimates of validity and fairness are unduly conservative.  相似文献   

3.
Brennan noted that users of test scores often want (indeed, demand) that subscores be reported, along with total test scores, for diagnostic purposes. Haberman suggested a method based on classical test theory (CTT) to determine if subscores have added value over the total score. One way to interpret the method is that a subscore has added value only if it has a better agreement than the total score with the corresponding subscore on a parallel form. The focus of this article is on classification of the examinees into “pass” and “fail” (or master and nonmaster) categories based on subscores. A new CTT‐based method is suggested to assess whether classification based on a subscore is in better agreement, than classification based on the total score, with classification based on the corresponding subscore on a parallel form. The method can be considered as an assessment of the added value of subscores with respect to classification. The suggested method is applied to data from several operational tests. The added value of subscores with respect to classification is found to be very similar, except at extreme cutscores, to their added value from a value‐added analysis of Haberman.  相似文献   

4.
As states evaluate whether they should continue with their current assessment program or adopt next-generation college readiness assessments, it is important to ascertain the degree to which current high school assessments can be used for college readiness interpretations. In this study, we examined the ability of a state assessment to serve as an indicator of college readiness. Empirical evidence is presented summarizing relationships between performance on the standards-based high school assessment and performance in college. Benchmarks were set on the Reading, Mathematics, and Science tests by linking assessment scores directly to grades in college courses. The accuracy of the benchmarks was similar to that of a traditional college admission test. Students who met the college readiness benchmarks earned higher grades in general education college courses and had higher first-year college grade point averages. Implications for states and other stakeholders are discussed.  相似文献   

5.
Many educational testing programs report examinee performance at more than two levels of proficiency. Whether these assessments have the capacity to support these multiple inferences, though, is a topic that has not been widely discussed. This study proposes a method for evaluating the minimum number of measurement opportunities for reporting students' performance at multiple achievement levels and describes an application of the method for reading and mathematics assessments that are used by some school districts in Nebraska. Analyses were based on judgments collected from 110 teachers about characteristics of items and tasks from multiple assessments in reading and mathematics at grades 4 and 8, and in high school. Results suggested that there were generally enough items on the mathematics assessments to classify students into two or three performance levels, but rarely enough to make the four classifications that the state reported. Items on the reading assessments were generally distributed across the proficiency levels and tended to allow reporting for all four classification levels. These findings have implications for both practitioners and policymakers in how scores are interpreted.  相似文献   

6.
This study evaluated the classification accuracy of a second grade oral reading fluency curriculum‐based measure (R‐CBM) in predicting third grade state test performance. It also compared the long‐term classification accuracy of local and publisher‐recommended R‐CBM cut scores. Participants were 266 students who were divided into a calibration sample (n = 170) and two cross‐validation samples (n = 46; n = 50), respectively. Using calibration sample data, local fall, winter, and spring R‐CBM cut scores for predicting students’ state test performance were developed using three methods: discriminant analysis (DA), logistic regression (LR), and receiver operating characteristic curve analysis (ROC). The classification accuracy of local and publisher‐recommended cut scores was evaluated across subsamples. Only DA and ROC produced cut scores that maintained adequate sensitivity (≥.70) across cohorts; however, LR and publisher‐recommended scores had higher levels of specificity and overall correct classification. Implications for developing local cut scores are discussed.  相似文献   

7.
Test scores and high school grades were correlated with end-of-second-year college grades for 2,707 students in 12 curricular groups at 27 2-year colleges. Optimally weighted combinations of the aptitude/achievement predictors were found to predict grades for occupational curricula with about the same accuracy as they predict grades in academic criteria. For women, however, the aptitude test scores correlated much less with grades in occupational curricula than with grades in academic curricula.  相似文献   

8.
The mathematics achievement of a cohort of 955 students in 42 classes in six schools in London was followed over a 4‐year period, until they took their General Certificate of Secondary Education examinations (GCSEs) in the summer of 2000. All six schools were regarded by the Office for Standards in Education (Ofsted) as providing a good standard of education, and all were involved in teacher training partnerships with universities. Matched data on Key Stage 3 test scores and GCSE grades were available for 709 students, and these data were analysed in terms of the progress from Key Stage 3 test scores to GCSE grades. Although there were wide differences between schools in terms of overall GCSE grades, the average progress made by students was similar in all six schools. However, within each school, the progress made during Key Stage 4 varied greatly from set to set. Comparing students with the same Key Stage 3 scores, students placed in top sets averaged nearly half a GCSE grade higher than those in the other upper sets, who in turn averaged a third of a grade higher than those in lower sets, who in turn averaged around a third of a grade higher than those students placed in bottom sets. In the four schools that used formal whole‐class teaching, the difference in GCSE grades between top and bottom sets, taking Key Stage 3 scores into account, ranged from just over one grade at GCSE to nearly three grades. At the schools using small‐group and individualized teaching, the differences in value‐added between sets were not significant. In two of the schools, a significant proportion of working‐class students were placed into lower sets than would be indicated by their Key Stage 3 test scores.  相似文献   

9.
In this study we examined the benefits of computer programs designed to supplement regular reading instruction in an urban public school system. The programs provide systematic exercises for mastering word‐attack strategies. Our findings indicate that first graders who participated in the programs made significant reading gains over the school year. Their post‐test scores were slightly (but not significantly) greater than the post‐test scores of control children who received regular reading instruction without the programs. When analyses were restricted to low‐performing children eligible for Title I services, significantly higher post‐test scores were obtained by the treatment group compared to the control group. At post‐test Title I children in the treatment group performed at levels similar to non‐Title I students.  相似文献   

10.
We examined summary indices of high school performance (coursework, grades, and test scores) based on the graded response model (GRM). The indices varied by inclusion of ACT test scores and whether high school courses were constrained to have the same difficulty and discrimination across groups of schools. The indices were examined with respect to skewness, incremental prediction of college degree attainment, and differences across racial/ethnic and socioeconomic subgroups. The most difficult high school courses to earn an “A” grade included calculus, chemistry, trigonometry, other advanced math, physics, algebra 2, and geometry. The GRM‐based indices were less skewed than simple high school grade point average (HSGPA) and had higher correlations with ACT Composite score. The index that included ACT test scores and allowed item parameters to vary by school group was most predictive of college degree attainment, but had larger subgroup differences. Implications for implementing multiple measure models for college readiness are discussed.  相似文献   

11.
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern‐level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet been constructed. This study puts forward a simple approach to estimating the indices at both the attribute and the pattern level through one single test administration. Detailed elaboration is made on how the upper and lower bounds for the attribute‐level accuracy can be derived from the variance of error of the attribute mastery probability estimate. In addition, based on Cui's pattern‐level indices, an alternative approach to estimating the attribute‐level indices is also proposed. Comparative analysis of simulation results indicate that the new indices are very desirable for evaluating test‐retest consistency and correct classification rate.  相似文献   

12.
Performance assessments, scenario‐based tasks, and other groups of items carry a risk of violating the local item independence assumption made by unidimensional item response theory (IRT) models. Previous studies have identified negative impacts of ignoring such violations, most notably inflated reliability estimates. Still, the influence of this violation on examinee ability estimates has been comparatively neglected. It is known that such item dependencies cause low‐ability examinees to have their scores overestimated and high‐ability examinees' scores underestimated. However, the impact of these biases on examinee classification decisions has been little examined. In addition, because the influence of these dependencies varies along the underlying ability continuum, whether or not the location of the cut‐point is important in regard to correct classifications remains unanswered. This simulation study demonstrates that the strength of item dependencies and the location of an examination systems’ cut‐points both influence the accuracy (i.e., the sensitivity and specificity) of examinee classifications. Practical implications of these results are discussed in terms of false positive and false negative classifications of test takers.  相似文献   

13.
BP神经网络是目前应用最广泛的人工神经网络模型之一,在分类和识别上表现出良好的特性,因此被研究者用于认知诊断评估以对被试进行诊断分类。通过模拟研究,考查属性个数、属性层级关系、测验长度、题目质量、测试样本量5个因素对BP神经网络在认知诊断中分类准确性的影响。结果表明:1)基于BP神经网络的认知诊断分类准确率不依赖于测试样本量;2)题目质量和测验长度对BP神经网络的诊断准确率有显著的积极影响;3)属性个数对BP神经网络的分类准确率有消极影响;4)题目质量一定程度上会影响BP诊断方法在不同属性层级结构上的分类准确率。  相似文献   

14.
The purpose of this study was to examine whether using a multiple‐measure framework yielded better classification accuracy than oral reading fluency (ORF) or maze alone in predicting pass/fail rates for middle‐school students on a large‐scale reading assessment. Participants were 178 students in Grades 7 and 8 from a Midwestern school district. The multiple‐measure framework yielded classification accuracy rates that were either similar to, or better than, the individual predictors. Specificity was improved using a combined measure of ORF and maze versus individual predictors alone. Educational implications for identifying students in need of reading intervention are discussed.  相似文献   

15.
WISC and WISC-R test results were correlated with achievement test scores and school grades of 36 children who had completed two years of school. Global intelligence estimates from both scales correlated at significant levels with all achievement test measures. Individual subtests from the two scales were unevenly correlated with grades in specific school subjects over both school years. Data suggest that while the two scales may be grossly equivalent as global predictors of school achievement, the individual subtests from the two scales may not correlate equivalently with specific external criteria such as school grades.  相似文献   

16.
Many U.S. students must pass a standards-based exit exam to earn a high school diploma. The degree to which exit exams and state standards properly signal to students their preparedness for postsecondary schooling has been questioned. The alignment of test scores with college grades for students at the University of Arizona (n = 2,667) who took the Arizona high school exams was ascertained in this study. The pass/fail signal accuracy of test scores varied depending on subject: The writing cut score was well aligned with collegiate performance, the reading cut score was below expectations, and the mathematics cut score was set quite rigorously. High school content and performance standards might not be as diluted as prior research has suggested.  相似文献   

17.
This article considers the relationship between gender and self‐efficacy in teacher trainees engaged in an electricity‐related design and construction task. Quantitative data (examination scores, task assessment, and questionnaire) and qualitative data (interviews and written student reflections) were collected. There is a gender bias in student teachers entering the University with more male than female students having done Science to grade 12 level. In addition, the continuing differential in standards of education in South African schools necessitated distinguishing those who had attended educationally advantaged from those who had attended educationally disadvantaged schools. In the examination, a test of theoretical knowledge, male students in each group outperformed female students. This we explain in terms of school background, gender responses from family members who regarded Science as a male domain, and the resulting lower self‐efficacy of female students. However, female students achieved as well as male students in the design and construction task. We argue that although males had better self‐efficacy levels than females at the outset, the hands‐on, individual nature of a task in a domain usually constructed as male led to female students developing increased levels of self‐efficacy, which ensured task performance matching that of the more knowledgeable male students.  相似文献   

18.
Abstract

The academic efficiency and social justice of entry procedures at Oxford and Cambridge Universities are examined over the past quarter of a century. For each major subject the mean A‐level scores of males and females entering from state and independent schools are compared with mean final examination scores in the major subjects. In any comparison of state and independent cohorts of the same gender, within the bounds of normal statistical fluctuation, the difference in A‐level score is a good predictor of the difference in finals score. For example, when between state men and independent men the difference in A‐level score is zero, the difference between mean finals score is zero also. The origin of female under‐achievement is examined. In most subjects there is pronounced gender inequality due to the following chain of circumstances: (1) to break‐even in finals women require at entry better grades at Advanced Level than men; (2) women used to have much the better A‐levels and so, in finals a quarter of a century ago, they matched and even — in some subjects — surpassed the men; (3) the A‐levels of women entering Oxford and Cambridge Universities fell off during the 1970s; (4) today female A‐level scores are slightly worse than male A‐level scores, and so female finals scores are much worse, in most subjects, than male finals scores. The concept of an ideal subject is defined; this is a subject in which zero difference in A‐level score between male and female yields zero difference in finals score. Law at Cambridge and chemistry at Oxford are ideal subjects. Ideal subjects are rare at Oxbridge: most subjects exhibit a significant male lead in finals when male and female have equal A‐level scores. The most non‐ideal subject at Oxford is mathematics, in which zero difference in A‐level score between males and females yields a male lead in finals score of 13%: at Oxford the other non‐ideal subjects are physics (male lead at equal A‐levels 11%), philosophy, politics and economics (9%), history (8%), modem languages (8%) and English (5%). An ideal subject is a paradigm which requires even‐handedness between male and female cohorts in the following parameters: (1) efficiency of course selection from school; (2) efficiency of teaching; (3) efficiency of finals assessment; (4) latent ability. A pronounced relative decline in the A‐level scores of girls educated in state maintained schools entering English and Welsh universities occurred in the 1970s; it is attributed to the reform of the state school system, particularly the growth in mixed‐sex comprehensive schools and the decline in the number of female single‐sex grammar schools. A peculiar aspect of the admissions filters at both Oxford and Cambridge ensures that state‐school educated men gaining entry do so with A‐level scores markedly superior to those of the other three cohorts.  相似文献   

19.
Ensuring postsecondary readiness is a goal of K‐12 education, but it is unclear whether high school students should get different messages about the required levels of academic preparation depending on their postsecondary trajectories. This study estimated readiness benchmark scores on a college admissions test predictive of earning good grades in majors associated with middle‐skills occupations at 2‐year postsecondary institutions. Results generally indicated similarity between those scores, the corresponding scores for students preparing for high‐skills jobs requiring a bachelor's degree, and established readiness benchmarks for the general college‐going population. Subsequent analyses revealed small variation between readiness benchmarks for different college majors. Overall, results suggest that high school graduates need a strong academic foundation regardless of the postsecondary path they choose.  相似文献   

20.
福建省高中招生考试制度改革在国家宏观政策指导下,根据新课改的目标,从2001年到2010年已经走过了十个年头。在十年改革的过程中,福建省高中招生考试制度改革呈现出考试科目逐渐增多,分值逐渐增加,测试形式逐渐走向多样化和体育科考试越来越受到重视等特点,同时也存在着招生腐败的争议和在等级划分、综合素质评定偏差等诸多治标不治本的问题。因此,积极探索和尝试建立招生监督机制的第三方机构,增加招生透明度和借鉴高考"校长实名推荐制"等方法,对改革高中招生考试制度有一定的意义。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号