首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 640 毫秒
1.
《Assessing Writing》2008,13(2):80-92
The scoring of student essays by computer has generated much debate and subsequent research. The majority of the research thus far has focused on validating the automated scoring tools by comparing the electronic scores to human scores of writing or other measures of writing skills, and exploring the predictive validity of the automated scores. However, very little research has investigated possible effects of the essay prompts. This study endeavoured to do so by exploring test scores for three different prompts for the ACCUPLACER® WritePlacer® Plus test which is scored by the IntelliMetric® automated scoring system. The results indicated that there was no significant difference among the prompts overall; among males, between males and females, by native language or in comparison to scores generated by human raters. However, there was a significant difference in mean scores by topic for females.  相似文献   

2.
This study investigated the use of automated essay scoring (AES) to identify at-risk students enrolled in a first-year university writing course. An application of AES, the Criterion® Online Writing Evaluation Service was evaluated through a methodology focusing on construct modelling, response processes, disaggregation, extrapolation, generalization, and consequence. Based on the results of our two-year study with students (N = 1,482) at a public technological research university in the United States, we found that Criterion offered a defined writing construct congruent with established models, achieved acceptance among students and instructors, showed no statistically significant differences between ethnicity groups of sufficient sample size, correlated at acceptable levels with other writing measures, performed in a stable fashion, and enabled instructors to identify at-risk students to increase their course success.  相似文献   

3.
ABSTRACT

This study investigates the role of automated scoring and feedback in supporting students’ construction of written scientific arguments while learning about factors that affect climate change in the classroom. The automated scoring and feedback technology was integrated into an online module. Students’ written scientific argumentation occurred when they responded to structured argumentation prompts. After submitting the open-ended responses, students received scores generated by a scoring engine and written feedback associated with the scores in real-time. Using the log data that recorded argumentation scores as well as argument submission and revisions activities, we answer three research questions. First, how students behaved after receiving the feedback; second, whether and how students’ revisions improved their argumentation scores; and third, did item difficulties shift with the availability of the automated feedback. Results showed that the majority of students (77%) made revisions after receiving the feedback, and students with higher initial scores were more likely to revise their responses. Students who revised had significantly higher final scores than those who did not, and each revision was associated with an average increase of 0.55 on the final scores. Analysis on item difficulty shifts showed that written scientific argumentation became easier after students used the automated feedback.  相似文献   

4.
Using generalizability (G-) theory, this study examined the accuracy and validity of the writing scores assigned to secondary school ESL students in the provincial English examinations in Canada. The major research question that guided this study was: Are there any differences between the accuracy and construct validity of the analytic scores assigned to ESL students and to NE students for the provincial English writing examination across three years? A series of G-studies and decision (D-) studies for three years were conducted to examine accuracy and validity issues. Results showed that differences in score accuracy did exist between ESL and NE students when initial (pre-adjudication) scores were used. The observed G-coefficients for ESL students were significantly lower than those for NE students in all three years, indicating that there were less accuracy and increased errors associated with the writing scores assigned to ESL students. Further, there were significantly less convergent validity in one year and less discriminant validity in all three years of the writing scores assigned to ESL students than to NE students. These findings raise a potential question about the presence of bias in the assessment of ESL students’ writing if initial scores were used.  相似文献   

5.
尖锐形式的恶是文学的表现。《洛丽塔》与《黑暗之地》这两部小说都用第一人称诚实地交流表达出对恶的严格道德上的认识,其中的主人公都涉及父女乱伦与强奸,在怪癖、丑陋的变态与病态空间里讨论人们尚不熟悉的恶的美学,表达了作者不同的创作主题。纳博科夫通过亨伯特与洛丽塔之间的畸形恋情描写探索了伦理道德世界的复杂人性;格伦维尔创造阿尔比恩的恶的形象充分说明了男权话语霸权对男女双方造成的身心伤害。  相似文献   

6.
Promoting young children’s interpersonal safety knowledge, intentions confidence and skills is the goal of many child maltreatment prevention programs; however, evaluation of their effectiveness has been limited. In this study, a randomized controlled trial was conducted examining the effectiveness of the Australian protective behaviors program, Learn to be safe with Emmy and friends™ compared to a waitlist condition. In total, 611 Australian children in Grade 1 (5–7 years; 50% male) participated, with assessments at Pre-intervention, Post-intervention and a 6-month follow-up. This study also included a novel assessment of interpersonal safety skills through the Observed Protective Behaviors Test (OPBT). Analyses showed participating in Learn to be safe with Emmy and friends™ was effective post-program in improving interpersonal safety knowledge (child and parent-rated) and parent-rated interpersonal safety skills. These benefits were retained at the 6-month follow-up, with participating children also reporting increased disclosure confidence. However, Learn to be safe with Emmy and friends™ participation did not significantly impact children’s disclosure intentions, safety identification skills, or interpersonal safety skills as measured by the OPBT. Future research may seek to evaluate the effect of further parent and teacher integration into training methods and increased use of behavioral rehearsal and modelling to more effectively target specific disclosure intentions and skills.  相似文献   

7.
Abstract

The effect of changing item responses on scores of elementary school children on a standardized achievement test was studied. Previous research, primarily involving non-standardized instruments and adult samples, indicates that changed responses are more likely to be correct than not. Subjects were 165 third grade students using the Metropolitan Reading Tests. Students received no special instructions regarding changing responses. Changes were identified visually and were independently verified. While frequency of response changes was low, such changes generally improved scores. Sex differences in number and success of changes were non-significant. The relationship between frequency of response change and test score was minimal. Responses to difficult items were changed more frequently with less success than changes on easy items. High scorers made more successful changes than did low scorers. Within the limits of the methodology, results clearly indicated that response changes of elementary students on multiple-choice items tend to improve test scores.  相似文献   

8.
Accountability in higher education has increased, with more institutions requiring standardized tests. These tests are high stakes for institutions, but low-stakes test for students, who seldom experience consequences for their performance. This study describes how one psychology department improved students' scores on the Psychology Area Concentration Achievement Test. Results were compared between three motivation conditions: no incentive, a monetary incentive, and a motivational Microsoft PowerPoint presentation. The presentation gave students information about the assessment, encouraged them to do well, and informed them that faculty would discuss scores while evaluating the psychology program. Results showed that test scores were significantly higher and correlated significantly with grade point average for students exposed to the motivational presentation. The motivational PowerPoint presentation seemed to have reduced the number of underachieving students and provided more accurate assessment data, with minimal investment in time and effort on the part of faculty.  相似文献   

9.
In this paper, I describe the design and evaluation of automated essay scoring (AES) models for an institution's writing placement program. Information was gathered on admitted student writing performance at a science and technology research university in the northeastern United States. Under timed conditions, first-year students (N = 879) were assigned to write essays on two persuasive prompts within the Criterion® Online Writing Evaluation Service at the beginning of the semester. AES models were built and evaluated for a total of four prompts. AES models meeting recommended performance criteria were then compared to standardized admissions measures and locally developed writing measures. Results suggest that there is evidence to support the use of Criterion as part of the placement process at the institution.  相似文献   

10.
Content‐based automated scoring has been applied in a variety of science domains. However, many prior applications involved simplified scoring rubrics without considering rubrics representing multiple levels of understanding. This study tested a concept‐based scoring tool for content‐based scoring, c‐rater?, for four science items with rubrics aiming to differentiate among multiple levels of understanding. The items showed moderate to good agreement with human scores. The findings suggest that automated scoring has the potential to score constructed‐response items with complex scoring rubrics, but in its current design cannot replace human raters. This article discusses sources of disagreement and factors that could potentially improve the accuracy of concept‐based automated scoring.  相似文献   

11.
Grading practices can send a powerful message to students about course expectations. A study by Henderson et al. (American Journal of Physics 72:164–169, 2004) in physics education has identified a misalignment between what college instructors say they value and their actual scoring of quantitative student solutions. This work identified three values that guide grading decisions: (1) a desire to see students’ reasoning, (2) a readiness to deduct points from solutions with obvious errors and a reluctance to deduct points from solutions that might be correct, and (3) a tendency to assume correct reasoning when solutions are ambiguous. These authors propose that when values are in conflict, the conflict is resolved by placing the burden of proof on either the instructor or the student. Here, we extend the results of the physics study to earth science (n?=?7) and chemistry (n?=?10) instructors in a think-aloud interview study. Our results suggest that both the previously identified three values and the misalignment between values and grading practices exist among science faculty more generally. Furthermore, we identified a fourth value not previously recognized. Although all of the faculty across both studies stated that they valued seeing student reasoning, the combined effect suggests that only 49% of faculty across the three disciplines graded work in such a way that would actually encourage students to show their reasoning, and 34% of instructors could be viewed as penalizing students for showing their work. This research may contribute toward a better alignment between values and practice in faculty development.  相似文献   

12.
ABSTRACT

Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K–12 large-scale assessment. In this study, human and automated scores on essays written by college students with and without learning disabilities and/or attention deficit hyperactivity disorder were compared, using a nationwide (U.S.) sample of prospective graduate students taking the revised Graduate Record Examination. The findings are that, on average, human raters and the automated scoring engine assigned similar essay scores for all groups, despite average differences among groups with respect to essay length and spelling errors.  相似文献   

13.
We present some key findings of a four-year, two-phase writing assessment project at Central Michigan University: Phase One (2002), a survey of faculty members (n = 115) and subsequent focus groups (n = 14) and Phase Two (2005), an evaluation of two samples of student writing (n = 635 and 632). Major findings of Phase One reported here include the amounts and types of writing assigned by faculty members and their perceptions about the quality of their students’ writing. Phase Two revealed some surprising results about our students’ critical reading and writing abilities, confirmed the limitations of a timed-writing assessment methodology, and exposed an intriguing artifact of the data set. We reflect on the process of developing and conducting the assessment project, examine its strengths and weaknesses, and share our thoughts about the next phase of our assessment odyssey.  相似文献   

14.
《Assessing Writing》1998,5(1):39-70
The Maryland School Performance Assessment Program (MSPAP) tests include an expressive writing task in which students at grades 3, 5, and 8 can choose to write about any topic they wish in the form of either a story, poem, or play. This test design feature provided the opportunity to investigate what factors contribute to students' choice of genre, how scorers apply a single expressive writing rubric to a range of genres, and whether these genres constitute equivalent tasks for measurement and reporting purposes. Our study, which combined analysis of statewide score data, 300 randomly selected student texts, questionnaires given to teacher-scorers, and interviews with students, argues strongly for the validity of this choice task as a measure of expressive writing and demonstrates that choice of genre both increases writers' engagement and enhances the fairness of the assessment by giving all students the best opportunity to demonstrate proficiency in this learning outcome. By highlighting several features of student texts that complicate scoring, the study also suggests that accuracy and consistency might be improved by
  • 1.1) providing additional sample papers during training,
  • 2.2) attending to scorers' assumptions regarding several key concepts, especially “originality,” and
  • 3.3) adjusting the ways that training for focused holistic scoring generally takes place.
The study concludes that the perceptions of students, scorers, and classroom teachers are critical to the ongoing development of writing assessments that offer students increasing control and choice.  相似文献   

15.
The present study examined growth in writing quality associated with feedback provided by an automated essay evaluation system called PEG Writing. Equal numbers of students with disabilities (SWD) and typically-developing students (TD) matched on prior writing achievement were sampled (n = 1196 total). Data from a subsample of students (n = 655) was used to investigate evidence of transfer to improved first-draft performance on a follow up writing prompt. Three-level hierarchical linear modeling was used. Findings indicated that SWD produced first drafts of lesser quality than TD students, but grew at a faster rate and were able to close the gap in writing quality after five revisions. However, these effects were moderated by school quality and the availability of internet-connected devices in schools. There was no evidence of transfer for either group of students. Results document a positive association between the use of PEG Writing and growth in writing quality for SWD, and underscore the importance of having sufficient technology resources for maximizing this growth.  相似文献   

16.
The authors explored the credibility of using informal reading inventories and writing samples for 138 students (K–4) to evaluate the effectiveness of a summer literacy program. Running Records (a measure of a child's reading level) and teacher experience during daily reading instruction were used to estimate the reliability of the more formal Developmental Reading Assessment scores. Training of scorers was used to increase the reliability of writing scores; a second scoring was used to estimate the reliability of the scores. The results suggested that with minimal modifications to administration and scoring procedures, scores from both reading inventories and writing samples can be a dependable source of data for teachers, administrators, and policy makers. This result is significant because it suggests that formative literacy assessments can be reliably used instead of standardized multiple-choice tests to make more credible summative decisions without taking time away from instruction, and can truly match curriculum, instruction, and assessment.  相似文献   

17.
ABSTRACT

As an alternative to rubric scoring, comparative judgment generates essay scores by aggregating decisions about the relative quality of the essays. Comparative judgment eliminates certain scorer biases and potentially reduces training requirements, thereby allowing a large number of judges, including teachers, to participate in essay evaluation. The purpose of this study was to assess the validity, labor costs, and efficiency of comparative judgments as a potential substitute for rubric scoring. An analysis of two essay prompts revealed that comparative judgment measures were comparable to rubric scores at a level similar to that expected of two professional scorers. The comparative judgment measures correlated slightly higher than rubric scores with a multiple-choice writing test. Score reliability exceeding .80 was achieved with approximately nine judgments per response. The average judgment time was 94 seconds, which compared favorably to 119 seconds per rubric score. Practical challenges to future implementation are discussed.  相似文献   

18.
Writing performanceperformance of 279 seventh- and eighth-grade students in four urban charter schools was evaluated in comparison group pretest/posttest quasi-experimental study. Thirty-three students, identified by cut scores on a standardized fluency measure, received supplemental one-to-one Self-Regulated Strategy Development (SRSD) instruction for persuasive quick writing. Fifty-one students with scores below the cut participated as an eligible non-treatment comparison; 195 students with scores above the cut participated as a non-eligible comparison group. All students’ written responses were evaluated before and after the intervention. Results of repeated measures analysis indicated that students in treatment (additional instruction time + SRSD + planned practice-testing) significantly improved quick writing performance after instruction when compared to pretest performance, and when compared to eligible comparison, with large effect sizes for number of persuasive elements and organizational quality and medium effects for persuasive quality. When compared to non-eligible comparison, students in treatment had significantly higher scores for organizational quality (large effects) and persuasive quality (small effects).  相似文献   

19.
Social Networking Sites (SNSs) such as Facebook are one of the latest examples of communications technologies that have been widely-adopted by students and, consequently, have the potential to become a valuable resource to support their educational communications and collaborations with faculty. However, faculty members have a track record of prohibiting classroom uses of technologies that are frequently used by students. To determine how likely higher education faculty are to use Facebook for either personal or educational purposes, higher education faculty (n = 62) and students (n = 120) at a mid-sized southern university were surveyed on their use of Facebook and email technologies. A comparison of faculty and student responses indicate that students are much more likely than faculty to use Facebook and are significantly more open to the possibility of using Facebook and similar technologies to support classroom work. Faculty members are more likely to use more “traditional” technologies such as email.  相似文献   

20.
Abstract

Statistical interactions between Conceptual Levels Test (CLT) scores and deductive vs. inductive teaching methods were examined among 275 sixth grade pupils. The purposes of the study were to determine whether the two methods are most effective among different students, and whether CLT scores predict which students should receive each kind of instruction. Subjects were randomly assigned to deductive and inductive groups for instruction in critical thinking. Repeated measures of achievements and attitudes provided four sets of criterion scores. The regression of criterion scores on CLT scores yielded one significant disordinal interaction and four confidence intervals within which deductive teaching was significantly more effective than inductive instruction. Regions in which inductive teaching was significantly superior were not observed. While deductive instruction was advantageous for some learners, neither high, medium nor low CLT scorers benefited consistently from inductive teaching.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号