首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
ABSTRACT

This study investigates the role of automated scoring and feedback in supporting students’ construction of written scientific arguments while learning about factors that affect climate change in the classroom. The automated scoring and feedback technology was integrated into an online module. Students’ written scientific argumentation occurred when they responded to structured argumentation prompts. After submitting the open-ended responses, students received scores generated by a scoring engine and written feedback associated with the scores in real-time. Using the log data that recorded argumentation scores as well as argument submission and revisions activities, we answer three research questions. First, how students behaved after receiving the feedback; second, whether and how students’ revisions improved their argumentation scores; and third, did item difficulties shift with the availability of the automated feedback. Results showed that the majority of students (77%) made revisions after receiving the feedback, and students with higher initial scores were more likely to revise their responses. Students who revised had significantly higher final scores than those who did not, and each revision was associated with an average increase of 0.55 on the final scores. Analysis on item difficulty shifts showed that written scientific argumentation became easier after students used the automated feedback.  相似文献   

2.
ABSTRACT

In the current study, two pools of 250 essays, all written as a response to the same prompt, were rated by two groups of raters (14 or 15 raters per group), thereby providing an approximation to the essay’s true score. An automated essay scoring (AES) system was trained on the datasets and then scored the essays using a cross-validation scheme. By eliminating one, two, or three raters at a time, and by calculating an estimate of the true scores using the remaining raters, an independent criterion against which to judge the validity of the human raters and that of the AES system, as well as the interrater reliability was produced. The results of the study indicated that the automated scores correlate with human scores to the same degree as human raters correlate with each other. However, the findings regarding the validity of the ratings support a claim that the reliability and validity of AES diverge: although the AES scoring is, naturally, more consistent than the human ratings, it is less valid.  相似文献   

3.
Cindy L. James   《Assessing Writing》2006,11(3):167-178
How do scores from writing samples generated by computerized essay scorers compare to those generated by “untrained” human scorers and what combination of scores, if any, is more accurate at placing students in composition courses? This study endeavored to answer this two-part question by evaluating the correspondence between writing sample scores generated by the IntelliMetric™ automated scoring system and scores generated by University Preparation English faculty, as well as examining the predictive validity of both the automated and human scores. The results revealed significant correlations between the faculty scores and the IntelliMetric™ scores of the ACCUPLACEROnLine WritePlacer Plus test. Moreover, logistic regression models that utilized the IntelliMetric™ scores and average faculty scores were more accurate at placing students (77% overall correct placement rate) than were models incorporating only the average faculty score or the IntelliMetric™ scores.  相似文献   

4.
Abstract

Students in a college course were given written criteria, divided into teams, and asked to score their own essay examination. Their pooled ratings correlated .922 with the instructor's ratings. Agreement between the ratings of students and instructor was not related to grade point or total test score. However, grade point and test scores were related negatively to the ambiguity of the students’ answers on the examination. The results support the generalization that subjective scoring standards are readily communicable. Theoretical and practical implications are examined.  相似文献   

5.
《Assessing Writing》2008,13(2):80-92
The scoring of student essays by computer has generated much debate and subsequent research. The majority of the research thus far has focused on validating the automated scoring tools by comparing the electronic scores to human scores of writing or other measures of writing skills, and exploring the predictive validity of the automated scores. However, very little research has investigated possible effects of the essay prompts. This study endeavoured to do so by exploring test scores for three different prompts for the ACCUPLACER® WritePlacer® Plus test which is scored by the IntelliMetric® automated scoring system. The results indicated that there was no significant difference among the prompts overall; among males, between males and females, by native language or in comparison to scores generated by human raters. However, there was a significant difference in mean scores by topic for females.  相似文献   

6.
This study investigated the relationship between middle school students’ scores for a written assignment (N = 162) and a process that involved students in generating criteria and self‐assessing with a rubric. Gender, time spent writing, grade level, prior rubric use, and previous achievement in English were also examined. The treatment involved using a model essay to scaffold the process of generating a list of criteria for an effective essay, reviewing a written rubric, and using the rubric to self‐assess first drafts. The comparison condition involved generating a list of criteria and reviewing first drafts. Findings include a main effect of treatment, gender, grade level, writing time, and previous achievement on total essay scores, as well as main effects on scores for every criterion on the scoring rubric. The results suggested that reading a model, generating criteria, and using a rubric to self‐assess can help middle school students produce more effective writing.  相似文献   

7.
8.
ABSTRACT

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation. The purpose of this study was to assess the validity, labor costs, and efficiency of comparative judgments as a potential substitute for rubric scoring. An analysis of two essay prompts revealed that comparative judgment measures were comparable to rubric scores at a level similar to that expected of two professional scorers. The comparative judgment measures correlated slightly higher than rubric scores with a multiple-choice writing test. Score reliability exceeding .80 was achieved with approximately nine judgments per response. The average judgment time was 94 seconds, which compared favorably to 119 seconds per rubric score. Practical challenges to future implementation are discussed.  相似文献   

9.
This article presents a novel method, the Complex Dynamics Essay Scorer (CDES), for automated essay scoring using complex network features. Texts produced by college students in China were represented as scale‐free networks (e.g., a word adjacency model) from which typical network features, such as the in‐/out‐degrees, clustering coefficient (CC), and dynamic networks, were obtained. The CDES integrates the classical concepts of network feature representation and essay score series variation. Several experiments indicated that the network measures different essay qualities and can be clearly demonstrated to develop complex networks for autoscoring tasks. The average agreement of the CDES and human rater scores was 86.5%, and the average Pearson correlation was .77. The results indicate that the CDES produced functional complex systems and autoscored Chinese essays in a method consistent with human raters. Our research suggests potential applications in other areas of educational assessment.  相似文献   

10.
11.
A framework for evaluation and use of automated scoring of constructed‐response tasks is provided that entails both evaluation of automated scoring as well as guidelines for implementation and maintenance in the context of constantly evolving technologies. Consideration of validity issues and challenges associated with automated scoring are discussed within the framework. The fit between the scoring capability and the assessment purpose, the agreement between human and automated scores, the consideration of associations with independent measures, the generalizability of automated scores as implemented in operational practice across different tasks and test forms, and the impact and consequences for the population and subgroups are proffered as integral evidence supporting use of automated scoring. Specific evaluation guidelines are provided for using automated scoring to complement human scoring for tests used for high‐stakes purposes. These guidelines are intended to be generalizable to new automated scoring systems and as existing systems change over time.  相似文献   

12.
'Mental models' used by automated scoring for the simulation divisions of the computerized Architect Registration Examination are contrasted with those used by experienced human graders. Candidate solutions (N = 3613) received both automated and human holistic scores. Quantitative analyses suggest high correspondence between automated and human scores; thereby suggesting similar mental models are implemented. Solutions with discrepancies between automated and human scores were selected for qualitative analysis. The human graders were reconvened to review the human scores and to investigate the source of score discrepancies in light of rationales provided by the automated scoring process. After review, slightly more than half of the score discrepancies were reduced or eliminated. Six sources of discrepancy between original human scores and automated scores were identified: subjective criteria; objective criteria; tolerances/ weighting; details; examinee task interpretation; and unjustified. The tendency of the human graders to be compelled by automated score rationales varied by the nature of original score discrepancy. We determine that, while the automated scores are based on a mental model consistent with that of expert graders, there remain some important differences, both intentional and incidental, which distinguish between human and automated scoring. We conclude that automated scoring has the potential to enhance the validity evidence of scores in addition to improving efficiency.  相似文献   

13.
By far, the most frequently used method of validating (the interpretation and use of) automated essay scores has been to compare them with scores awarded by human raters. Although this practice is questionable, human-machine agreement is still often regarded as the “gold standard.” Our objective was to refine this model and apply it to data from a major testing program and one system of automated essay scoring. The refinement capitalizes on the fact that essay raters differ in numerous ways (e.g., training and experience), any of which may affect the quality of ratings. We found that automated scores exhibited different correlations with scores awarded by experienced raters (a more compelling criterion) than with those awarded by untrained raters (a less compelling criterion). The results suggest potential for a refined machine-human agreement model that differentiates raters with respect to experience, expertise, and possibly even more salient characteristics.  相似文献   

14.
Scientific argumentation is one of the core practices for teachers to implement in science classrooms. We developed a computer-based formative assessment to support students’ construction and revision of scientific arguments. The assessment is built upon automated scoring of students’ arguments and provides feedback to students and teachers. Preliminary validity evidence was collected in this study to support the use of automated scoring in this formative assessment. The results showed satisfactory psychometric properties related to this formative assessment. The automated scores showed satisfactory agreement with human scores, but small discrepancies still existed. Automated scores and feedback encouraged students to revise their answers. Students’ scientific argumentation skills improved during the revision process. These findings provided preliminary evident to support the use of automated scoring in the formative assessment to diagnose and enhance students’ argumentation skills in the context of climate change in secondary school science classrooms.  相似文献   

15.
In this digital ITEMS module, Dr. Sue Lottridge, Amy Burkhardt, and Dr. Michelle Boyer provide an overview of automated scoring. Automated scoring is the use of computer algorithms to score unconstrained open-ended test items by mimicking human scoring. The use of automated scoring is increasing in educational assessment programs because it allows scores to be returned faster at lower cost. In the module, they discuss automated scoring from a number of perspectives. First, they discuss benefits and weaknesses of automated scoring, and what psychometricians should know about automated scoring. Next, they describe the overall process of automated scoring, moving from data collection to engine training to operational scoring. Then, they describe how automated scoring systems work, including the basic functions around score prediction as well as other flagging methods. Finally, they conclude with a discussion of the specific validity demands around automated scoring and how they align with the larger validity demands around test scores. Two data activities are provided. The first is an interactive activity that allows the user to train and evaluate a simple automated scoring engine. The second is a worked example that examines the impact of rater error on test scores. The digital module contains a link to an interactive web application as well as its R-Shiny code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

16.
This study investigated the use of automated essay scoring (AES) to identify at-risk students enrolled in a first-year university writing course. An application of AES, the Criterion® Online Writing Evaluation Service was evaluated through a methodology focusing on construct modelling, response processes, disaggregation, extrapolation, generalization, and consequence. Based on the results of our two-year study with students (N = 1,482) at a public technological research university in the United States, we found that Criterion offered a defined writing construct congruent with established models, achieved acceptance among students and instructors, showed no statistically significant differences between ethnicity groups of sufficient sample size, correlated at acceptable levels with other writing measures, performed in a stable fashion, and enabled instructors to identify at-risk students to increase their course success.  相似文献   

17.
自动作文评分系统的技术优势为英语写作教学模式的创新改革提供一个良好的平台。本研究对基于自动作文评分系统的英语写作教学模式进行了设计与教学实践,包括写前阶段、初稿和同伴互评阶段、修改和自动评阋阶段、课堂讲评和定稿阶段的设计。为期一年的写作教学实验表明:新的写作教学模式督促学生写,保持写作的频率,激发学生的写作兴趣,培养学生自主写作能力,提高学生英语写作水平。  相似文献   

18.

This study examines the internal consistency of Novak and Gowin's scoring scheme and its effect on the prediction validity of concept mapping as an alternative science classroom achievement assessment. Data were collected in three typical situations: very limited concept mapping experience with free‐style concept mapping; some concept mapping experience with questions provided; extensive concept mapping experience with a list of concepts provided for. It was found that Novak's scoring scheme was not internally consistent, and therefore there was generally no significant correlation between students’ scores on concept mapping and students’ scores on conventional classroom achievement assessments. The need for a new scoring scheme when concept mapping is used as an alternative science assessment is discussed.  相似文献   

19.
Abstract

This study was designed to ascertain the prevalence of written output deficits in young gifted children, to delineate the relationship between written output performance and reading performance, and to identify possible mechanisms for specific written output deficits in such children. Data from a sample of children scoring >120 on at least one IQ or achievement subscale indicated: (1) that there was a significant incidence of discrepancies between written spelling scores and reading (decoding) scores, as compared to the population; (2) that performance on spelling tasks was more subject to a maturational timetable than decoding was; (3) that performance on spelling tasks is less amenable than performance on decoding tasks to compensatory enhancement by higher level processing, and involves a sequential processing module that is shared with calculation but not with decoding; and (4) that strengths in visual‐spatial tasks may interact with relative weaknesses in both decoding and calculation tasks to predict even poorer performance on written spelling tasks.  相似文献   

20.
Abstract

Assessment feedback from teachers gains consistently low satisfaction scores in national surveys of student satisfaction, with concern surrounding its timeliness, quality and effectiveness. Equally, there has been heightened interest in the responsibility of learners in engaging with feedback and how student assessment literacy might be increased. We present results from a five-year longitudinal mixed methods enquiry, thematically analysing semi-structured interviews and focus groups with undergraduate students who have experienced dialogic feed-forward on a course in a British university. We use inferential statistics to compare performance pre and post-assessment intervention. The assessment consisted of submitting a draft coursework essay, which was discussed and evaluated face-to-face with the course teacher before a self-reflective piece was written about the assessment process and a final essay was submitted for summative grading. We evidence that this process asserted a positive influence on the student learning experience in a number of inter-related cognitive and affective ways, impacting positively upon learning behaviour, supporting student achievement and raising student satisfaction with feedback. We advocate a cyclic and iterative approach to dialogic feed-forward, which facilitates learners’ longitudinal development. Programme teams should offer systematic opportunities across curricula for students to understand the rationale for and develop feedback literacy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号