首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 616 毫秒
1.
We investigate students’ negative perceptions about an online peer assessment system for undergraduate writing across the disciplines. Specifically, we consider the nature of students’ resistance to peer assessment; what factors influence that resistance; and how students’ perceptions impact their revision work. We do this work by first examining findings from an end-of-course survey administered to 250 students in ten courses across six universities using an online peer assessment system called SWoRD for their writing assignments. Those findings indicate that students have the most positive perceptions of SWoRD in those courses where an instructor graded their work in addition to peers (as opposed to peer-only grading). We then move to an in-depth examination of perceptions and revision work among 84 students using SWoRD and no instructor grading for assessment of writing in one university class. Findings from that study indicate that students sometimes regard peer assessment as unfair and often believe that peers are unqualified to review and assess students’ work. Furthermore, students’ perceptions about the fairness of peer assessment drop significantly following students’ experience in doing peer assessment. Students’ fairness perceptions—and drops in those perceptions—are most significantly associated with their perceptions about the extent to which peers’ feedback is useful and positive. However, students’ perceptions appear to be unrelated to the extent of their revision work. This research fills a considerable gap in the literature regarding the origin of students’ negative perceptions about peer assessment, as well as how perceptions influence performance.  相似文献   

2.
Number of raters is theoretically central to peer assessment reliability and validity, yet rarely studied. Further, requiring each student to assess more peers’ documents both increases the number of evaluations per document but also assessor workload, which can decline performance. Moreover, task complexity is likely a moderating factor, influencing both workload and validity. This study examined whether changing the number of required peer assessments per student / number of raters per document affected peer assessment reliability and validity for tasks at different levels of task complexity. 181 students completed and provided peer assessments for tasks at three levels of task complexity: low complexity (dictation), medium complexity (oral imitation), and high complexity (writing). Adequate validity of peer assessments was observed for all three task complexities at low reviewing loads. However, the impacts of increasing reviewing load varied by reliability vs. validity outcomes and by task complexity.  相似文献   

3.
4.
This paper advances a model describing how peer assessment supports self-assessment. Although prior research demonstrates that peer assessment promotes self-assessment, the connection between these two activities is underspecified. This model, the assessment cycle, draws from theories of self-assessment to elaborate how learning takes place through peer assessment. The model is applied to three activity structures described in the literature to analyse their potential to support learning by promoting self-assessment. Broadly speaking, the model can be used to understand learning that takes place in a variety of peer assessment activities: marking/grading, analysis, feedback, conferencing and revision. This approach contrasts most studies on peer assessment, which have focused on calibration of instructor and peer grades, rather than learning opportunities.  相似文献   

5.
Growth in the use of testing to determine student eligibility for community college courses has prompted debate and litigation regarding over the equity, access, and legal implications of these practices. In California, this has resulted in state regulations requiring that community colleges provide predictive validity evidence of test-score?based inferences and course prerequisites. In addition, companion measures that supplement placement test scores must be used for placement purposes. However, for both theoretical and technical reasons the predictive validity coefficients between placement test scores and final grades or retention in a course generally demonstrate a weak relationship. The study discussed in this article examined the predictive validity of placement test scores with course grade and retention in English and mathematics courses. The investigation produced a model to explain variance in course outcomes using test scores, student background data, and instructor differences in grading practices. The model produced suggests that student dispositional characteristics explain the high proportion of variance in the dependent variables. Including instructor grading practices in the model adds significantly to the explanatory power and suggests that grading variations make accurate placement more problematic. This investigation underscores the importance of academic standards as something imposed on students by an institution and not something determined by the entering abilities of students.  相似文献   

6.
The peer rating system used here advances the quantitative literacy goals outlined in the social sciences. We instituted a mid-semester intervention to teach rating skills and used an index to track longitudinal changes of skill mastery over the course of the semester. Seventy-four students in five advanced research classes followed the procedure of the existing peer rating system by completing reading assignments, writing reflections online, engaging in class discussions, rating their peers’ reflections and receiving feedback on their group effort. Peer ratings were then compared with each other and also with the instructor ratings to derive individualised indices of reliability and validity. These technical indicators enabled two rounds of assessment before and after a class-wide intervention. An omnibus test across the five classes showed a significant improvement in rating quality due to the intervention. Our courses not only met a quantitative learning outcome but also promised vocational competence.  相似文献   

7.
Although the rubric has emerged as one of the most popular assessment tools in progressive educational programs, there is an unfortunate dearth of information in the literature quantifying the actual effectiveness of the rubric as an assessment tool in the hands of the students. This study focuses on the validity and reliability of the rubric as an assessment tool for student peer‐group evaluation in an effort to further explore the use and effectiveness of the rubric. A total of 1577 peer‐group ratings using a rubric for an oral presentation was used in this 3‐year study involving 107 college biology students. A quantitative analysis of the rubric used in this study shows that it is used consistently by both students and the instructor across the study years. Moreover, the rubric appears to be ‘gender neutral’ and the students' academic strength has no significant bearing on the way that they employ the rubric. A significant, one‐to‐one relationship (slope = 1.0) between the instructor's assessment and the students' rating is seen across all years using the rubric. A generalizability study yields estimates of inter‐rater reliability of moderate values across all years and allows for the estimation of variance components. Taken together, these data indicate that the general form and evaluative criteria of the rubric are clear and that the rubric is a useful assessment tool for peer‐group (and self‐) assessment by students. To our knowledge, these data provide the first statistical documentation of the validity and reliability of the rubric for student peer‐group assessment.  相似文献   

8.
Although peer review is a widely-used pedagogical technique, its value depends upon the quality of the reviews that students produce, and much research remains to be done to systematically study the nature, causes, and consequences of variation in peer review quality. We propose a new framework that conceptualizes five larger dimensions of peer review quality and then present a study that investigated three specific peer review quality constructs in a large dataset and further explored how these constructs change through different types of self-regulation peer reviewing experiences. Peer review data across multiple assignments were analyzed from 2,092 undergraduate students enrolled in one of three offerings of a biology course at a large public research university in the United States. Peer review quality was measured in terms of comment amount, comment accuracy, and rating accuracy; the measures of reviewing experience focused upon self-regulated learning factors such as practice, feedback, others’ modeling, and relative performance. Meta-correlation (for testing reliability, separability, and stability) and meta-regression (as a time-series analysis for testing the relationship of changes across assignments in reviewing quality with experiences as reviewer and reviewee) are used to establish the robustness of effects and meaningful variation of effects across course offerings and assignments. Results showed that there were three meaningful review quality constructs (i.e., were measured reliably, separable, and semi-stable over time). Further, all three showed changes in response to previous reviewer and reviewee experiences, but only feedback helpfulness, in particular, showed effects of all four examined types of self-regluation experiences (practice, feedback, others’ modeling, and relative performance). The findings suggest that instructors can improve review quality by providing comment prompt scaffolds that lead to longer comments as well as by matching authors with similarly performing reviewers.  相似文献   

9.
This study examines the impact of an assessment training module on student assessment skills and task performance in a technology-facilitated peer assessment. Seventy-eight undergraduate students participated in the study. The participants completed an assessment training exercise, prior to engaging in peer-assessment activities. During the training, students reviewed learning concepts, discussed marking criteria, graded example projects and compared their evaluations with the instructor’s evaluation. Data were collected in the form of initial and final versions of students’ projects, students’ scoring of example projects before and after the assessment training, and written feedback that students provided on peer projects. Results of data analysis indicate that the assessment training led to a significant decrease in the discrepancy between student ratings and instructor rating of example projects. In addition, the degree of student vs. instructor discrepancy was highly predictive of the quality of feedback that students provided to their peers and the effectiveness of revisions that they made to their own projects upon receiving peer feedback. Smaller discrepancies in ratings were associated with provision of higher quality peer feedback during peer assessment, as well as better revision of initial projects after peer assessment.  相似文献   

10.

Papers in a large second-year science class were assessed using an anonymous peer review system modelled on that used for professional journals. Three students and one paid marker (outside reviewer) reviewed each paper--and each student received four reviews. A paid marker served as 'editor' and determined marks based on the four reviews, with reference to the paper as necessary. Students were asked to rank the four reviews for helpfulness and for completeness and accuracy. Consistency of reviews was analysed. On average, peer reviewers gave higher marks than paid markers, and on average students found peer reviewers to be more helpful but marginally less complete and accurate than paid markers. The differences among paid markers, however, were larger than the difference between the average peer reviewer and the average paid marker. The consistency among the four sets of marks given was not impressive. Students responded to the range of reviews they received. It can be shown statistically that the expected range for four reviews is much greater than that expected for two reviews--thus the multiplicity of reviews received exacerbated a widespread perception that marks were arbitrary. The net outcome was a moral dilemma. Giving the same paper to multiple assessors reveals the extent to which assessment rests on arbitrary factors. This may be good preparation for the real world; however, it is not an exercise to be taken lightly, and not recommended without prior preparation of context.  相似文献   

11.
As the number of participants in online distance learning courses increases, peer assessment is becoming a popular strategy for evaluating open assignments and for breaking the social isolation surrounding distance education. Yet, the quality and characteristics of peer assessment in massive online courses has received little attention. Hence, this study was set to examine peer feedback quality and grading accuracy in a project-based course. The study applied a sequential exploratory mixed methods design. It included 339 participants who studied the same engineering course, but in three different modes: on-campus (n = 77), small private online course (n = 110), and massive open online course (MOOC) (n = 152). Content analysis of feedback comments identified four categories: reinforcement, statement, verification and elaboration, arranged in an ascending scale of cognitive ability. The findings indicated that the MOOC participants provided more feedback comments and volunteered to assess more projects than their counterparts did. However, the on-campus students provided higher quality feedback and their peer grading was better correlated with the grades assigned by the teaching assistants.  相似文献   

12.
This research centers on the psychometric examination of the structure of an instrument, known as the 5E Lesson Plan (5E ILPv2) rubric for inquiry-based teaching. The instrument is intended to measure an individual’s skill in developing written 5E lesson plans for inquiry teaching. In stage one of the instrument’s development, an exploratory factor analysis on a fifteen-item 5E ILP instrument revealed only three factor loadings instead of the expected five factors, which led to its subsequent revision. Modifications in the original instrument led to a revised 5E ILPv2 instrument comprised of twenty-one items. This instrument, like its precursor, has a scoring scale that ranges from zero to four points per item. Content validity of the 5E ILPv2 was determined through the expertise of a panel of science educators. Over the course of five semesters, three elementary science methods instructors in three different universities collected post lesson plan data from 224 pre-service teachers enrolled in their courses. Each instructor scored their students’ post 5E inquiry lesson plans using the 5E ILPv2 instrument recording a score for each item on the instrument. A factor analysis with maximum likelihood extraction and promax oblique rotation provided evidence of construct validity for five factors and explained 85.5 % of the variability in the total instrument. All items loaded with their theoretical factors exhibiting high ordinal alpha reliability estimates of .94, .99, .96, .97, and .95 for the engage, explore, explain, elaborate, and evaluate subscales respectively. The total instrument reliability estimate was 0.98 indicating strong evidence of total scale reliability.  相似文献   

13.
Little information regarding the psychometric properties of the most commonly used autism identification measures used in school settings with traditionally racially and ethnically minoritized (REM) groups is available. This analysis of autism identification measures is particularly important due to the demographic increase in the United States among most REM populations in recent decades. In addition, most REM groups are inequitably identified for autism and these measures may contribute to disproportionate identification based on problematic psychometric factors. This study systematically compiles the recommended psychometric properties pertaining to validity and reliability of the common autism identification measures among REM groups that are traditionally underrepresented (i.e., Black and Latinx populations) for autism identification. Conclusions suggest that several of the most common autism identification measures lack sufficient psychometric analyses to evaluate appropriate utilization with REM populations, specifically those who are Black and Latinx. The findings from this study may inform school psychologists' utilization and knowledge of limitations of these measures, as well as assisting with the determination of the appropriateness of these measures for use with REM populations.  相似文献   

14.
Peer assessment exercises yield varied reliability and validity. To maximise reliability and validity, the literature recommends adopting various design principles including the use of explicit assessment criteria. Counter to this literature, we report a peer assessment exercise in which criteria were deliberately avoided yet acceptable reliability and validity were achieved. Based on this finding, we make two arguments. First, the comparative judgement approach adopted can be applied successfully in different contexts, including higher education and secondary school. Second, the success was due to this approach; an alternative technique based on absolute judgement yielded poor reliability and validity. We conclude that sound outcomes are achievable without assessment criteria, but success depends on how the peer assessment activity is designed.  相似文献   

15.
The student evaluation of teaching process is generally thought to produce reliable results. The consistency is found within class and instructor averages, while a considerable amount of inconsistency exists with individual student responses. This paper reviews these issues along with a detailed examination of common measures of reliability that are utilised with the instruments. While inter-item consistency of the evaluations has been shown to be high, the agreement between students was shown to be no better than what would be expected by chance, indicating that students do not agree on what they are being asked to evaluate. The reliability measures generated by the student evaluations of teaching are an insufficient foundation for establishing validity. Further, the pattern of reliability indicates that the instruments are generally providing information about students, not instructors.  相似文献   

16.
Use of in-class concept questions with clickers can transform an instructor-centered "transmissionist" environment to a more learner-centered constructivist classroom. To compare the effectiveness of three different approaches using clickers, pairs of similar questions were used to monitor student understanding in majors' and nonmajors' genetics courses. After answering the first question individually, students participated in peer discussion only, listened to an instructor explanation only, or engaged in peer discussion followed by instructor explanation, before answering a second question individually. Our results show that the combination of peer discussion followed by instructor explanation improved average student performance substantially when compared with either alone. When gains in learning were analyzed for three ability groups of students (weak, medium, and strong, based on overall clicker performance), all groups benefited most from the combination approach, suggesting that peer discussion and instructor explanation are synergistic in helping students. However, this analysis also revealed that, for the nonmajors, the gains of weak performers using the combination approach were only slightly better than their gains using instructor explanation alone. In contrast, the strong performers in both courses were not helped by the instructor-only approach, emphasizing the importance of peer discussion, even among top-performing students.  相似文献   

17.
The purpose of this study was to compare the effects of two peer assessment methods on university students' academic writing performance and their satisfaction with peer assessment. This study also examined the validity and reliability of student generated assessment scores. Two hundred and thirty-two predominantly undergraduate students were selected by convenience sampling during the fall semester of 2007. The results indicate that students in the experimental group demonstrated greater improvement in their writing than those in the comparison group, and the findings reveal that students in the experimental group exhibited higher levels of satisfaction with the peer assessment method both in peer assessment structure and peer feedback than those in the comparison group. Additionally, the findings indicate that the validity and reliability of student generated rating scores were extremely high. Using Wiki interactive software and providing an online collaborative learning environment to facilitate peer assessment added value to peer assessment.  相似文献   

18.
We proposed an extended form of the Govindarajulu and Barnett margin of error (MOE) equation and used it with an analysis of variance experimental design to examine the effects of aggregating student evaluations of teaching (SET) ratings on the MOE statistic. The interpretative validity of SET ratings can be questioned when the number of students enrolled in a course is low or when the response rate is low. A possible method of improving interpretative validity is to aggregate SET ratings data from two or more courses taught by the same instructor. Based on non-parametric comparisons of the generated MOE, we found that aggregating course evaluation data from two courses reduced the MOE in most cases. However, significant improvement was only achieved when combining course evaluation data for the same instructor for the same course. Significance did not hold when combining data from different courses. We discuss the implications of our findings and provide recommendations for practice.  相似文献   

19.
This study was undertaken to examine the social perceptual skill deficit theory in explaining the low peer acceptance of children with learning disabilities. The quality of tests measuring social perception was also examined. Thirty 9- to 12-year-old children with learning disabilities and a matched control group were given two measures of social perception: a laboratory task and a behavior rating scale. The behavior rating scale was completed by the children's teachers. In addition, the Peer Acceptance Scale (Bruininks, Rynders, & Gross, 1974) was administered to assess peer status. Results showed that the children with learning disabilities differed significantly from their nondisabled peers on each of the three measures-the children with learning disabilities obtained lower social perception and peer acceptance scores. However, the relationships between sociometric status and social perception varied as a function of task. A small but significant correlation wa found between the behavior rating scale and peer status. The laboratory task was not correlated with either the behavior rating scale or peer status. Results are discussed in terms of the psychometric properties of laboratory versus naturalistic measures of social perception and the importance of establishing the external validity of social skill measures by correlating them with outcome measures such as peer status.  相似文献   

20.
This narrative synthesis reviews the psychometric properties of commercially and publicly available retell instruments used to assess the reading comprehension of students in grades K–12. Eleven instruments met selection criteria and were systematically coded for data related to the administration procedures, scoring procedures, and technical adequacy of the retell component. High variability was evident in the prompting conditions and the use of quantitative and qualitative scoring mechanisms. Because no two instruments shared the same features, their retell scores are likely not equitable. None of the measures provided sufficient information to substantiate their reliability and validity. Many were lacking data on critical psychometric aspects, such as passage equivalency and construct validity, and nearly all had insufficient or ill-defined norming samples.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号