期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Beyond item analysis: Connecting student behaviour and performance using e-assessment logs

Hatim Lahza Tammy G. Smith Hassan Khosravi 《British journal of educational technology : journal of the Council for Educational Technology》2023,54(1):335-354

Traditional item analyses such as classical test theory (CTT) use exam-taker responses to assessment items to approximate their difficulty and discrimination. The increased adoption by educational institutions of electronic assessment platforms (EAPs) provides new avenues for assessment analytics by capturing detailed logs of an exam-taker's journey through their exam. This paper explores how logs created by EAPs can be employed alongside exam-taker responses and CTT to gain deeper insights into exam items. In particular, we propose an approach for deriving features from exam logs for approximating item difficulty and discrimination based on exam-taker behaviour during an exam. Items for which difficulty and discrimination differ significantly between CTT analysis and our approach are flagged through outlier detection for independent academic review. We demonstrate our approach by analysing de-identified exam logs and responses to assessment items of 463 medical students enrolled in a first-year biomedical sciences course. The analysis shows that the number of times an exam-taker visits an item before selecting a final response is a strong indicator of an item's difficulty and discrimination. Scrutiny by the course instructor of the seven items identified as outliers suggests our log-based analysis can provide insights beyond what is captured by traditional item analyses.

Practitioner notes

What is already known about this topic

Traditional item analysis is based on exam-taker responses to the items using mathematical and statistical models from classical test theory (CTT). The difficulty and discrimination indices thus calculated can be used to determine the effectiveness of each item and consequently the reliability of the entire exam.

What this paper adds

Data extracted from exam logs can be used to identify exam-taker behaviours which complement classical test theory in approximating the difficulty and discrimination of an item and identifying items that may require instructor review.

Implications for practice and/or policy

Identifying the behaviours of successful exam-takers may allow us to develop effective exam-taking strategies and personal recommendations for students.
Analysing exam logs may also provide an additional tool for identifying struggling students and items in need of revision.

相似文献

2.

A Process for Reviewing and Evaluating Generated Test Items

Mark J. Gierl Hollis Lai 《Educational Measurement》2016,35(4):6-20

Testing organization needs large numbers of high‐quality items due to the proliferation of alternative test administration methods and modern test designs. But the current demand for items far exceeds the supply. Test items, as they are currently written, evoke a process that is both time‐consuming and expensive because each item is written, edited, and reviewed by a subject‐matter expert. One promising approach that may address this challenge is with automatic item generation. Automatic item generation combines cognitive and psychometric modeling practices to guide the production of items that are generated with the aid of computer technology. The purpose of this study is to describe and illustrate a process that can be used to review and evaluate the quality of the generated item by focusing on the content and logic specified within the item generation procedure. We illustrate our process using an item development example from mathematics drawn from the Common Core State Standards and from surgical education drawn from the health sciences domain. 相似文献

3.

Psychometric defensibility of the Intervention Selection Profile‐Social Skills (ISP‐SS) with students at risk for behavioral concerns

Stephen P. Kilgus Katie Eklund Nathaniel P. von der Embse 《Psychology in the schools》2019,56(4):526-538

The purpose of the current study was to examine the validity and diagnostic accuracy of the Intervention Selection Profile—Social Skills (ISP‐SS), a brief social skills assessment tool intended for use with students in need of Tier 2 intervention. Participants included 160 elementary and middle school students who had been identified through universal screening as at risk for behavioral concerns. Teacher participants ( n = 71) rated each of these students using both the ISP‐SS and the Social Skills Improvement System—Rating Scales (SSiS‐RS), with the latter measure serving as the criterion within validity and diagnostic accuracy analyses. Confirmatory factor analysis supported ISP‐SS structural validity, indicating ISP‐SS items broadly conformed to a single “Social Skills” factor. Follow‐up analyses suggested ISP‐SS broad scale scores demonstrated adequate internal consistency reliability, with hierarchical omega coefficient equal to 0.86. Correlational analyses supported the concurrent validity of ISP‐SS items, finding each ISP‐SS item to be moderately or highly related to its corresponding SSiS‐RS subscale. Finally, analyses indicated three of the seven ISP‐SS items that demonstrated sufficient diagnostic accuracy; however, findings suggest additional revisions are needed if the ISP‐SS is to be appropriate for use in schools. Implications for practice and future research are discussed. 相似文献

4.

International assessment: A Rasch model and teachers' evaluation of TIMSS science achievement items

Shawn M. Glynn 《科学教学研究杂志》2012,49(10):1321-1344

The Trends in International Mathematics and Science Study (TIMSS) is a comparative assessment of the achievement of students in many countries. In the present study, a rigorous independent evaluation was conducted of a representative sample of TIMSS science test items because item quality influences the validity of the scores used to inform educational policy in those countries. The items had been administered internationally to 16,009 students in their eighth year of formal schooling. The evaluation had three components. First, the Rasch model, which emphasizes high quality items, was used to evaluate the items psychometrically. Second, readability and vocabulary analyses were used to evaluate the wording of the items to ensure they were comprehensible to the students. And third, item development guidelines were used by a focus group of science teachers to evaluate the items in light of the TIMSS assessment framework, which specified the format, content, and cognitive domains of the items. The evaluation components indicated that the majority of the items were of high quality, thereby contributing to the validity of TIMSS scores. These items had good psychometric characteristics, readability, vocabulary, and compliance with the assessment framework. Overall, the items tended to be difficult: constructed response items assessing reasoning or application were the most difficult, and multiple choice items assessing knowledge or application were less difficult. The teachers revised some of the sampled items to improve their clarity of content, conciseness of wording, and fit with format specifications. For TIMSS, the findings imply that some of the non‐sampled items may need revision, too. For researchers and teachers, the findings imply that the TIMSS science items and the Rasch model are valuable resources for assessing the achievement of students. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 1321–1344, 2012 相似文献

5.

A Polytomous Scoring Approach to Handle Not-Reached Items in Low-Stakes Assessments

Guher Gorgun Okan Bulut 《Educational and psychological measurement》2021,81(5):847

In low-stakes assessments, some students may not reach the end of the test and leave some items unanswered due to various reasons (e.g., lack of test-taking motivation, poor time management, and test speededness). Not-reached items are often treated as incorrect or not-administered in the scoring process. However, when the proportion of not-reached items is high, these traditional approaches may yield biased scores and thereby threatening the validity of test results. In this study, we propose a polytomous scoring approach for handling not-reached items and compare its performance with those of the traditional scoring approaches. Real data from a low-stakes math assessment administered to second and third graders were used. The assessment consisted of 40 short-answer items focusing on addition and subtraction. The students were instructed to answer as many items as possible within 5 minutes. Using the traditional scoring approaches, students’ responses for not-reached items were treated as either not-administered or incorrect in the scoring process. With the proposed scoring approach, students’ nonmissing responses were scored polytomously based on how accurately and rapidly they responded to the items to reduce the impact of not-reached items on ability estimation. The traditional and polytomous scoring approaches were compared based on several evaluation criteria, such as model fit indices, test information function, and bias. The results indicated that the polytomous scoring approaches outperformed the traditional approaches. The complete case simulation corroborated our empirical findings that the scoring approach in which nonmissing items were scored polytomously and not-reached items were considered not-administered performed the best. Implications of the polytomous scoring approach for low-stakes assessments were discussed. 相似文献

6.

How does student peer review influence perceptions,engagement and academic outcomes? A case study

Raoul Mulder Chi Baik Ryan Naylor Jon Pearce 《Assessment & Evaluation in Higher Education》2014,39(6):657-677

Involving students in peer review has many pedagogical benefits, but few studies have explicitly investigated relationships between the content of peer reviews, student perceptions and assessment outcomes. We conducted a case study of peer review within a third-year undergraduate subject at a research-intensive Australian university, in which we examined: (1) students’ perceptions of the peer review process before and after peer review, (2) content of the peer reviews and what kinds of feedback were adopted and (3) the effect of participation in peer review on performance (grades) in the assessment task. Students overwhelmingly perceived peer review to be beneficial, and the opportunity to participate in peer review resulted in a significant improvement in the quality of work submitted for assessment. Students who benefited most from peer review were those of below-median performance, and the magnitude of benefit was related to the degree to which students engaged with the peer review process. Our study confirms that participation in peer review can lead to important improvements in performance and learning outcomes. 相似文献

7.

Advantages of authentic assessment for improving the learning experience and employability skills of higher education students: A systematic literature review

《Studies in Educational Evaluation》2021

This study is a systematic review of literature that investigates the advantages of implementing authentic assessment for improving two major categories that are relevant to academic and professional success of higher education students: learning experience and employability skills. Authentic assessment involves students in challenging tasks that closely resemble those of the workplace settings. 26 papers from 2010 to 2019 that were relevant to the topic of this paper were selected for the review process. Findings of this review indicate that authentic assessment can play a role in improving the learning experience of higher education students through enhancing their engagement in learning and improving their satisfaction as well as positively influencing their efforts to achieve educational goals. We also discuss the benefits of authentic assessment for equipping students with essential skills for their future professional life, such as communication skills, collaboration skills, critical-thinking and problem-solving skills, self-awareness, and self-confidence. 相似文献

8.

Pedagogy of Science Teaching Tests: Formative assessments of science teaching orientations

William W. Cobern David Schuster Betty Adams Brandy Ann Skjold Ebru Zeynep Muğaloğlu Amy Bentz 《International Journal of Science Education》2013,35(13):2265-2288

A critical aspect of teacher education is gaining pedagogical content knowledge of how to teach science for conceptual understanding. Given the time limitations of college methods courses, it is difficult to touch on more than a fraction of the science topics potentially taught across grades K-8, particularly in the context of relevant pedagogies. This research and development work centers on constructing a formative assessment resource to help expose pre-service teachers to a greater number of science topics within teaching episodes using various modes of instruction. To this end, 100 problem-based, science pedagogy assessment items were developed via expert group discussions and pilot testing. Each item contains a classroom vignette followed by response choices carefully crafted to include four basic pedagogies (didactic direct, active direct, guided inquiry, and open inquiry). The brief but numerous items allow a substantial increase in the number of science topics that pre-service students may consider. The intention is that students and teachers will be able to share and discuss particular responses to individual items, or else record their responses to collections of items and thereby create a snapshot profile of their teaching orientations. Subsets of items were piloted with students in pre-service science methods courses, and the quantitative results of student responses were spread sufficiently to suggest that the items can be effective for their intended purpose. 相似文献

9.

浅谈基于网络资源提高学生自主学习能力

古丽巴哈尔·卡吾力常占瑛高晓黎谢湘云郭伟《教育教学论坛》2020,(13):68-69

目的:文章对目前的教学过程和学习效果进行了调查分析,考查网络资源与学生自主学习之间的联系。方法:通过调查法、德尔菲法选择36条目进行调查,揭示网络资源在学生自主学过程中的作用。结果:网络资源能有效地提高教学质量,但需要与教师传统教学有机结合。结论:通过考查学生利用网络资源自主学习情况,为更好提高教学水平、增强学生解决实际问题能力、培养信息素养和增强学习的有效性提供依据。相似文献

10.

Using teacher inquiry to support technology-enhanced formative assessment: a review of the literature to inform a new method

Rosemary Luckin Wilma Clark Katerina Avramides Jade Hunter Martin Oliver 《Interactive Learning Environments》2017,25(1):85-97

In this paper we review the literature on teacher inquiry (TI) to explore the possibility that this process can equip teachers to investigate students’ learning as a step towards the process of formative assessment. We draw a distinction between formative assessment and summative forms of assessment [CRELL. (2009). The transition to computer-based assessment: New approaches to skills assessment and implications for large-scale testing. In F. Scheuermann & J. Björnsson (Eds.), JRC Scientific and technical reports. Ispra: Author; Webb, M. (2010). Beginning teacher education and collaborative formative e-assessment. Assessment & Evaluation in Higher Education, 35, 597–618; EACEA. (2009). National testing of pupils in Europe: Objectives, organisation and use of results. Brussels: Eurydice; OECD. (2010b). Assessing the effects of ICT in education (F. Scheuermann & E. Pedró, Eds.). Paris: JRC, OECD]. Our review of TI is combined with a review of the research concerning the way that practices with technology can support the assessment process. We conclude with a comparison of TI and teacher design research from which we extract the characteristics for a method of TI that can be used to develop technology-enhanced formative assessment: teacher inquiry into student learning. In this review, our primary focus is upon enabling teachers to use technology effectively to inquire about their students’ learning progress. 相似文献

11.

The quality of peer assessment in a wiki-based online context: a qualitative study

Maria João Loureiro Lúcia Pombo António Moreira 《Educational Media International》2013,50(2):139-149

Peer assessment (PA) provides opportunities for authentic assessment, autonomy and collaboration. Several authors advocate that students can benefit from PA and put forward the effects of PA on the students’ learning outcomes. Questions concerning the validity and reliability of PA and PA competences are also addressed by different researchers. This qualitative study is part of a wider project that seeks to develop and test evaluation and assessment strategies in online contexts. In a doctoral module, PA was used for summative and formative purposes. Formative PA aimed to give feedback about the ongoing group work, but also to increase online interaction between the different groups of students. The main module task was to write a literature review, about a selected topic, using a wiki. Criteria and indicators to assess the literature review were negotiated with the students. Different criteria were used to assess the quality of PA, such as, the use of the negotiated criteria, the adequacy of the chosen vocabulary or the provision of constructive feedback. The results show that overall the quality of PA can be improved. Groups did not provide sufficient criticism, questions and suggestions for improvement. 相似文献

12.

The road to self-assessment: exemplar marking before peer review develops first-year students’ capacity to judge the quality of a scientific report

Robyn Yucel Fiona L. Bird Jodie Young Tania Blanksby 《Assessment & Evaluation in Higher Education》2014,39(8):971-986

Lack of clarity about assessment criteria and standards is a source of anxiety for many first-year university students. The Developing Understanding of Assessment for Learning (DUAL) programme was designed as a staged approach to gradually familiarise students with expectations, and to provide opportunities for the development of the skills required to successfully complete assessment tasks. This paper investigated the students’ perceptions of the first two components of the DUAL programme, which assist first-year biology students to engage with stated assessment criteria and standards in order to develop their capacity to make judgements about scientific report exemplars, their peers’ scientific reports and ultimately their own. The study found strong evidence (96% of responses) that the marking and discussion of exemplar reports with peers and demonstrators clarified expectations of scientific report writing. A key feature of this element of DUAL was the opportunity for structured discussion about assessment criteria and standards between peers and markers (demonstrators). During these discussions, students can clarify explicit statements and develop a tacit knowledge base to enhance their ability to judge the quality of others’ work and their own. The peer review exercise (the second element of DUAL) was not rated as highly, with 65% of students finding the process helpful for improving their report. The negative reactions by a sizeable minority of students highlight the need to clearly communicate the expectations and benefits of peer review, with a focus on how the process of giving feedback to peers might benefit a student as much as receiving feedback on their own report. 相似文献

13.

Assessment for learning: using programmatic assessment requirements as an opportunity to develop information literacy and data skills in undergraduate students

Emily Stark Sedona Kintz Chloey Pestorious Akorede Teriba 《Assessment & Evaluation in Higher Education》2018,43(7):1061-1068

Departments and programmes in higher education are required to participate in an increasing number of programme, course and student assessments. These assessment requirements are also opportunities to develop student skills related to scientific literacy and research, if students are included in the process of developing, administering and interpreting these assessments. This paper describes a course designed to build student research skills through incorporating undergraduate students into the process of programme review for a psychology department, a comprehensive assessment required of this department every five years. This course proved to be an effective way to engage students, as students developed and administered assessment surveys, analysed and interpreted results, and prepared both a professional report for the department as well as a research presentation. The paper discusses recommended course activities, and shows how this opportunity can benefit students and faculty in a myriad of ways. 相似文献

14.

Teachers’ and students’ perceptions of assessments: A review and a study into the ability and accuracy of estimating the difficulty levels of assessment items

Gerard van de Watering Janine van der Rijt 《Educational Research Review》2006,1(2):133-147

In today's higher education, high quality assessments play an important role. Little is known, however, about the degree to which assessments are correctly aimed at the students’ levels of competence in relation to the defined learning goals. This article reviews previous research into teachers’ and students’ perceptions of item difficulty. It focuses on the item difficulty of assessments and students’ and teachers’ abilities to estimate item difficulty correctly. The review indicates that teachers tend to overestimate the difficulty of easy items and underestimate the difficulty of difficult items. Students seem to be better estimators of item difficulty. The accuracy of the estimates can be improved by: the information the estimators or teachers have about the target group and their earlier assessment results; defining the target group before the estimation process; the possibility of having discussions about the defined target group of students and their corresponding standards during the estimation process; and by the amount of training in item construction and estimating. In the subsequent study, the ability and accuracy of teachers and students to estimate the difficulty levels of assessment items was examined. In higher education, results show that teachers are able to estimate the difficulty levels correctly for only a small proportion of the assessment items. They overestimate the difficulty level of most of the assessment items. Students, on the other hand, underestimate their own performances. In addition, the relationships between the students’ perceptions of the difficulty levels of the assessment items and their performances on the assessments were investigated. Results provide evidence that the students who performed best on the assessments underestimated their performances the most. Several explanations are discussed and suggestions for additional research are offered. 相似文献

15.

Using automated analysis to assess middle school students' competence with scientific argumentation

Christopher D. Wilson Kevin C. Haudek Jonathan F. Osborne Zoë E. Buck Bracey Tina Cheuk Brian M. Donovan Molly A. M. Stuhlsatz Marisol M. Santiago Xiaoming Zhai 《科学教学研究杂志》2024,61(1):38-69

Argumentation is fundamental to science education, both as a prominent feature of scientific reasoning and as an effective mode of learning—a perspective reflected in contemporary frameworks and standards. The successful implementation of argumentation in school science, however, requires a paradigm shift in science assessment from the measurement of knowledge and understanding to the measurement of performance and knowledge in use. Performance tasks requiring argumentation must capture the many ways students can construct and evaluate arguments in science, yet such tasks are both expensive and resource-intensive to score. In this study we explore how machine learning text classification techniques can be applied to develop efficient, valid, and accurate constructed-response measures of students' competency with written scientific argumentation that are aligned with a validated argumentation learning progression. Data come from 933 middle school students in the San Francisco Bay Area and are based on three sets of argumentation items in three different science contexts. The findings demonstrate that we have been able to develop computer scoring models that can achieve substantial to almost perfect agreement between human-assigned and computer-predicted scores. Model performance was slightly weaker for harder items targeting higher levels of the learning progression, largely due to the linguistic complexity of these responses and the sparsity of higher-level responses in the training data set. Comparing the efficacy of different scoring approaches revealed that breaking down students' arguments into multiple components (e.g., the presence of an accurate claim or providing sufficient evidence), developing computer models for each component, and combining scores from these analytic components into a holistic score produced better results than holistic scoring approaches. However, this analytical approach was found to be differentially biased when scoring responses from English learners (EL) students as compared to responses from non-EL students on some items. Differences in the severity between human and computer scores for EL between these approaches are explored, and potential sources of bias in automated scoring are discussed. 相似文献

16.

Item and testlet position effects in computer-based alternate assessments for students with disabilities

Okan Bulut Ming Lei Qi Guo 《International Journal of Research & Method in Education》2018,41(2):169-183

Item positions in educational assessments are often randomized across students to prevent cheating. However, if altering item positions results in any significant impact on students’ performance, it may threaten the validity of test scores. Two widely used approaches for detecting position effects – logistic regression and hierarchical generalized linear modelling – are often inconvenient for researchers and practitioners due to some technical and practical limitations. Therefore, this study introduced a structural equation modeling (SEM) approach for examining item and testlet position effects. The SEM approach was demonstrated using data from a computer-based alternate assessment designed for students with cognitive disabilities from three grade bands (3–5, 6–8, and high school). Item and testlet position effects were investigated in the field-test (FT) items that were received by each student at different positions. Results indicated that the difficulty of some FT items in grade bands 3–5 and 6–8 differed depending on the positions of the items on the test. Also, the overall difficulty of the field-test task in grade bands 6–8 increased as students responded to the field-test task in later positions. The SEM approach provides a flexible method for examining different types of position effects. 相似文献

17.

How do pre-service teacher education students respond to assessment feedback?

Peter Grainger 《Assessment & Evaluation in Higher Education》2020,45(7):913-925

相似文献

18.

Development of an Instrument Designed to Investigate Elements of Science Students’ Metacognition,Self‐Efficacy and Learning Processes: The SEMLI‐S

Gregory Thomas David Anderson Samson Nashon 《International Journal of Science Education》2013,35(13):1701-1724

The development and evaluation of science students’ metacognition, learning processes and self‐efficacy are important for improving science education. This paper reports on the development of an empirical self‐report instrument for providing a measure of students’ metacognition, self‐efficacy and constructivist science learning processes. A review of the range of literature related to metacognition, self‐regulation and constructivist learning processes resulted in the development of an initial bilingual (English and traditional Chinese) instrument composed of 72 items. This instrument was completed by 465 Hong Kong high school students. The data collected were subjected to exploratory factor analysis and Rasch analysis. The subsequent refinement process resulted in a final version of the Self‐Efficacy and Metacognition Learning Inventory—Science (SEMLI‐S) consisting of 30 items that can be used for either analysing and focusing on any or all of its dimensions or for assigning scores to individuals that enable comparison between them in relation to their metacognitive science learning orientations. 相似文献

19.

过程控制在大学课程考核中的新探索

李果姬永生杜健民《煤炭高等教育》2014,(6):89-92

传统的大学课程考核总是在课程结束后采取"一考定终身"的方式,存在着许多弊端,而过程控制能充分考虑学生在课程学习过程中的表现,能更真实地反映学生的学习效果,从而成为当前课程考核的主流。结合土木工程的专业课程,探讨在课堂、课后和实验、实践等教学环节,通过对学生的出勤、提问、测验、课后作业、论文专题和实验、实践等方面实施过程控制的探索,在此基础上给出平时成绩,并结合课程的特点按照一定的权重将平时成绩与期终考核成绩结合起来,以综合评定学生的课程成绩,从而起到更加真实、全面反映学生学习效果的目的。相似文献

20.

Measuring knowledge integration: Validation of four‐year assessments

Ou Lydia Liu Hee‐Sun Lee Marcia C. Linn 《科学教学研究杂志》2011,48(9):1079-1107

Science education needs valid, authentic, and efficient assessments. Many typical science assessments primarily measure recall of isolated information. This paper reports on the validation of assessments that measure knowledge integration ability among middle school and high school students. The assessments were administered to 18,729 students in five states. Rasch analyses of the assessments demonstrated satisfactory item fit, item difficulty, test reliability, and person reliability. The study showed that, when appropriately designed, knowledge integration assessments can be balanced between validity and reliability, authenticity and generalizability, and instructional sensitivity and technical quality. Results also showed that, when paired with multiple‐choice items and scored with an effective scoring rubric, constructed‐response items can achieve high reliabilities. Analyses showed that English language learner status and computer use significantly impacted students' science knowledge integration abilities. Students who took the assessment online, which matched the format of content delivery, performed significantly better than students who took the paper‐and‐pencil version. Implications and future directions of research are noted, including refining curriculum materials to meet the needs of diverse students and expanding the range of topics measured by knowledge integration assessments. © 2011 Wiley Periodicals, Inc. J Res Sci Teach 48: 1079–1107, 2011 相似文献