首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In this digital ITEMS module, Dr. Jue Wang and Dr. George Engelhard Jr. describe the Rasch measurement framework for the construction and evaluation of new measures and scales. From a theoretical perspective, they discuss the historical and philosophical perspectives on measurement with a focus on Rasch's concept of specific objectivity and invariant measurement. Specifically, they introduce the origins of Rasch measurement theory, the development of model‐data fit indices, as well as commonly used Rasch measurement models. From an applied perspective, they discuss best practices in constructing, estimating, evaluating, and interpreting a Rasch scale using empirical examples. They provide an overview of a specialized Rasch software program (Winsteps) and an R program embedded within Shiny (Shiny_ERMA) for conducting the Rasch model analyses. The module is designed to be relevant for students, researchers, and data scientists in various disciplines such as psychology, sociology, education, business, health, and other social sciences. It contains audio‐narrated slides, sample data, syntax files, access to Shiny_ERMA program, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

2.
Giving students a choice of assessment methods is one approach to developing an inclusive curriculum. However, both staff and students raise concerns about its fairness, often described as its equity. This study investigates their perceptions of the fairness of the procedures and outcomes of this approach to assessment, in nine modules in a University setting. Using a tool validated as part of the study, students’ views on procedural fairness were gathered (n?=?370 students). In addition, seven module co-ordinators were interviewed. A seven-step approach to the design of the approach was used. The results demonstrated that students were satisfied that their assessment choices were fair in levels of support, feedback, information and, to a lesser extent, student workload and examples of assessment methods. In exploring fairness of the outcomes, the students’ grades were not significantly different between the two sets of choices. However, based on staff interviews, the overall grades were higher than previous cohorts and higher than average for current student cohorts in the institution. The discussion highlights some of the complex issues surrounding fairness (equity) using assessment choice and, in addition, the paper refers to some practical tools for its implementation.  相似文献   

3.
In this ITEMS module, we introduce the generalized deterministic inputs, noisy “and” gate (G‐DINA) model, which is a general framework for specifying, estimating, and evaluating a wide variety of cognitive diagnosis models. The module contains a nontechnical introduction to diagnostic measurement, an introductory overview of the G‐DINA model, as well as common special cases, and a review of model‐data fit evaluation practices within this framework. We use the flexible GDINA R package, which is available for free within the R environment and provides a user‐friendly graphical interface in addition to the code‐driven layer. The digital module also contains videos of worked examples, solutions to data activity questions, curated resources, a glossary, and quizzes with diagnostic feedback.  相似文献   

4.
In this digital ITEMS module, Dr. Sue Lottridge, Amy Burkhardt, and Dr. Michelle Boyer provide an overview of automated scoring. Automated scoring is the use of computer algorithms to score unconstrained open-ended test items by mimicking human scoring. The use of automated scoring is increasing in educational assessment programs because it allows scores to be returned faster at lower cost. In the module, they discuss automated scoring from a number of perspectives. First, they discuss benefits and weaknesses of automated scoring, and what psychometricians should know about automated scoring. Next, they describe the overall process of automated scoring, moving from data collection to engine training to operational scoring. Then, they describe how automated scoring systems work, including the basic functions around score prediction as well as other flagging methods. Finally, they conclude with a discussion of the specific validity demands around automated scoring and how they align with the larger validity demands around test scores. Two data activities are provided. The first is an interactive activity that allows the user to train and evaluate a simple automated scoring engine. The second is a worked example that examines the impact of rater error on test scores. The digital module contains a link to an interactive web application as well as its R-Shiny code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

5.
Standardizing aspects of assessments has long been recognized as a tactic to help make evaluations of examinees fair. It reduces variation in irrelevant aspects of testing procedures that could advantage some examinees and disadvantage others. However, recent attention to making assessment accessible to a more diverse population of students highlights situations in which making tests identical for all examinees can make a testing procedure less fair: Equivalent surface conditions may not provide equivalent evidence about examinees. Although testing accommodations are by now standard practice in most large-scale testing programmes, for the most part these practices lie outside formal educational measurement theory. This article builds on recent research in universal design for learning (UDL), assessment design, and psychometrics to lay out the rationale for inference that is conditional on matching examinees with principled variations of an assessment so as to reduce construct-irrelevant demands. The present focus is assessment for special populations, but it is argued that the principles apply more broadly.  相似文献   

6.
This paper reviews the literature about peer and self‐assessment in university courses from the point of view of their use, and the suitability of their use, in the first year of university study. The paper is divided into three parts. The first part argues that although first‐year students are involved in many of the studies that report on the use of peer and self‐assessment in higher education, the proportion of these studies that do so is somewhat less than in other year levels. In addition, relatively little of this work directly and explicitly discusses the suitability of peer and self‐assessment for students and courses at this year level. The second part of the paper provides an introductory exploration of the relationship between peer and self‐assessment, and specific features of first‐year assessment, learning and teaching. Three issues relating directly to the suitability of peer and self‐assessment in the first year are explored. In the third part, the paper briefly discusses the desirability of implementing peer and self‐assessment, in general, before seeking to extend this specifically to the first year. The paper concludes by recommending that greater use can and should be made of peer and self‐assessment in the first year of university study.  相似文献   

7.
In this commentary, we summarize some of the main themes of the NRC report and note ways in which the papers by Mislevy and Haertel, Gorin, and Abedi and Gándara address the Panel's recommendations. We then briefly review and offer reflections on each paper. We see much to applaud here and also in the broader effort to build bridges between the cognitive and measurement sciences. However, much work remains, not only in building bridges, be also in educating the next generation of psychometricians about cognition and the next generation of teachers about psychometrics. Until that day, even the best work here will have little impact on the business of large-scale assessment.  相似文献   

8.
Item analysis is an integral part of operational test development and is typically conducted within two popular statistical frameworks: classical test theory (CTT) and item response theory (IRT). In this digital ITEMS module, Hanwook Yoo and Ronald K. Hambleton provide an accessible overview of operational item analysis approaches within these frameworks. They review the different stages of test development and associated item analyses to identify poorly performing items and effective item selection. Moreover, they walk through the computational and interpretational steps for CTT‐ and IRT‐based evaluation statistics using simulated data examples and review various graphical displays such as distractor response curves, item characteristic curves, and item information curves. The digital module contains sample data, Excel sheets with various templates and examples, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

9.
In this digital ITEMS module, Dr. Brian Leventhal and Dr. Allison Ames provide an overview of Monte Carlo simulation studies (MCSS) in item response theory (IRT). MCSS are utilized for a variety of reasons, one of the most compelling being that they can be used when analytic solutions are impractical or nonexistent because they allow researchers to specify and manipulate an array of parameter values and experimental conditions (e.g., sample size, test length, and test characteristics). Dr. Leventhal and Dr. Ames review the conceptual foundation of MCSS in IRT and walk through the processes of simulating total scores as well as item responses using the two-parameter logistic, graded response, and bifactor models. They provide guidance for how to implement MCSS using other item response models and best practices for efficient syntax and executing an MCSS. The digital module contains sample SAS code, diagnostic quiz questions, activities, curated resources, and a glossary.  相似文献   

10.
Drawing valid inferences from modern measurement models is contingent upon a good fit of the data to the model. Violations of model‐data fit have numerous consequences, limiting the usefulness and applicability of the model. As Bayesian estimation is becoming more common, understanding the Bayesian approaches for evaluating model‐data fit models is critical. In this instructional module, Allison Ames and Aaron Myers provide an overview of Posterior Predictive Model Checking (PPMC), the most common Bayesian model‐data fit approach. Specifically, they review the conceptual foundation of Bayesian inference as well as PPMC and walk through the computational steps of PPMC using real‐life data examples from simple linear regression and item response theory analysis. They provide guidance for how to interpret PPMC results and discuss how to implement PPMC for other model(s) and data. The digital module contains sample data, SAS code, diagnostic quiz questions, data‐based activities, curated resources, and a glossary.  相似文献   

11.
This paper is about fairness (equity) in large‐scale assessment systems within multicultural societies. It makes the key assumptions that fairness is fundamentally a sociocultural, rather than a technical, issue and that fair assessment cannot be considered in isolation from both the curriculum and the educational opportunities of the students. Equity is defined as a qualitative concern for what is just. This involves, but is not the same as, equality of opportunity and of outcome. In relation to large‐scale assessment four topics are addressed: the nature of the assessment system; recognizing experiences of different groups; cultural diversity; and monitoring group performance. The conclusion is that, while we can never achieve fair assessment, we can make it fairer. At the heart of this improvement process is openness about design, constructs and scoring which brings out into the open the values and biases of the test design process.  相似文献   

12.
In this digital ITEMS module, Dr. Roy Levy describes Bayesian approaches to psychometric modeling. He discusses how Bayesian inference is a mechanism for reasoning in a probability-modeling framework and is well-suited to core problems in educational measurement: reasoning from student performances on an assessment to make inferences about their capabilities more broadly conceived, as well as fitting models to characterize the psychometric properties of tasks. The approach is first developed in the context of estimating a mean and variance of a normal distribution before turning to the context of unidimensional item response theory (IRT) models for dichotomously scored data. Dr. Levy illustrates the process of fitting Bayesian models using the JAGS software facilitated through the R statistical environment. The module is designed to be relevant for students, researchers, and data scientists in various disciplines such as education, psychology, sociology, political science, business, health, and other social sciences. It contains audio-narrated slides, diagnostic quiz questions, and data-based activities with video solutions as well as curated resources and a glossary.  相似文献   

13.
An assessment‐oriented design‐based research model was applied to existing inquiry‐oriented multimedia programs in astronomy, biology, and ecology. Building on emerging situative theories of assessment, the model extends prevailing views of formative assessment for learning by embedding “discursive” formative assessment more directly into the curriculum. Three twenty‐hour curricula were designed and aligned to content standards, and three levels of assessments were developed and used to assess and enhance learning for each curriculum. These assessments included three or four informal “activity‐oriented” quizzes and discursive formative feedback rubrics supporting collective discourse, a “curriculum‐oriented” examination of individual conceptual understanding, and a “standards‐oriented” test measuring aggregated achievement of targeted standards. After two design‐research cycles, worthwhile scientific argumentation and statistically significant gains were attained for two of the three packages on the exam and test. Achievement gains were comparable to or larger than those of students in comparison classrooms. Many existing innovations could be enhanced and evaluated in this fashion; designing these strategies directly into innovations could have an even greater impact on discourse, understanding, and achievement. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 1240–1270, 2012  相似文献   

14.
This paper explores the views of a group of students who took an oral performance assessment in a first-year mathematics module. Such assessments are unusual for most subjects in the UK, but particularly within the generally homogenous assessment diet of undergraduate mathematics. The evidence presented here resonates with some, but not all, of the existing literature on oral assessment and suggests that, despite concerns about anxiety and fairness, students see oral assessments as encouraging a focus on understanding, being relatively authentic and reactive to their needs. We argue that, suitably implemented, oral assessment may be a viable assessment method for straddling the ‘assessment for’ and ‘assessment of’ learning divide in higher education.  相似文献   

15.
This article reports on the collaboration of six states to study how simulation‐based science assessments can become transformative components of multi‐level, balanced state science assessment systems. The project studied the psychometric quality, feasibility, and utility of simulation‐based science assessments designed to serve formative purposes during a unit and to provide summative evidence of end‐of‐unit proficiencies. The frameworks of evidence‐centered assessment design and model‐based learning shaped the specifications for the assessments. The simulations provided the three most common forms of accommodations in state testing programs: audio recording of text, screen magnification, and support for extended time. The SimScientists program at WestEd developed simulation‐based, curriculum‐embedded, and unit benchmark assessments for two middle school topics, Ecosystems and Force & Motion. These were field‐tested in three states. Data included student characteristics, responses to the assessments, cognitive labs, classroom observations, and teacher surveys and interviews. UCLA CRESST conducted an evaluation of the implementation. Feasibility and utility were examined in classroom observations, teacher surveys and interviews, and by the six‐state Design Panel. Technical quality data included AAAS reviews of the items' alignment with standards and quality of the science, cognitive labs, and assessment data. Student data were analyzed using multidimensional Item Response Theory (IRT) methods. IRT analyses demonstrated the high psychometric quality (reliability and validity) of the assessments and their discrimination between content knowledge and inquiry practices. Students performed better on the interactive, simulation‐based assessments than on the static, conventional items in the posttest. Importantly, gaps between performance of the general population and English language learners and students with disabilities were considerably smaller on the simulation‐based assessments than on the posttests. The Design Panel participated in development of two models for integrating science simulations into a balanced state science assessment system. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 363–393, 2012  相似文献   

16.
17.
The purpose of this article is to support members of a student's multidisciplinary team to identify complex factors involved in providing valid classroom-based assessment data, including issues surrounding technology-based assessment for students who are deaf or hard of hearing (DHH). The diversity of this population creates unique challenges in creating guidelines for assessment. An overview of the diversity of DHH students is used to provide a framework for examining current assessment practices, including both effective and ineffective practices. Cognitive and linguistic learning differences and diverse language learning experiences in the population are discussed as they relate to assessment protocols. Paralleling technology-based learning experiences with comparable technology-based assessment experiences is also presented. Recommendations for planning for accessible and meaningful assessment include the use of innovative technologies to align instruction and assessment.  相似文献   

18.
Peer‐assessment was used within a negotiated curriculum in a module on training and development at ECTS level 3. The students on the programme were exclusively day‐release and all had a major responsibility for the management and delivery of work‐based training programmes. Analysis of student evaluations, supplemented by those of university assessors and the external examiner, showed that the learning on the programme was transformative in that it changed the way students perceived their own abilities and their potential to make improvements in their work both as students and trainers. Peer‐assessment encouraged critical reflection, helped develop skills of autonomous learning and provided feedback that took account of individual learning needs. Weaker students benefited from the talents of their more able peers. Students planned, monitored and assessed their learning activities in a way that significantly increased motivation and raised academic standards. The learning achieved led to a transformation of personal and professional perspective on the part of the students leading to greater personal autonomy.  相似文献   

19.
Students with disabilities often take tests under different conditions than their peers do. Testing accommodations, which involve changes to test administration that maintain test content, include extending time limits, presenting written text through auditory means, and taking a test in a private room with fewer distractions. For some students with disabilities, accommodations such as these are necessary for fair assessment; without accommodations, invalid interpretations would be made on the basis of these students’ scores. However, when misapplied, accommodations can also diminish fairness, introduce new sources of construct-irrelevant variance, and also lead to invalid interpretation of test scores. This module provides a psychometric framework for thinking about accommodations, and then explicates an accommodations decision-making framework that includes a variety of considerations. Problems with current accommodations practices are discussed, along with potential solutions and future directions. The module is accompanied by exercises allowing participants to apply their understanding.  相似文献   

20.
School‐based practitioners are often called upon to provide assessment and recommendations for struggling students. These assessments often open doors to specialised services or interventions and provide opportunities for students to build competencies in areas of need. However, these assessments often fail to highlight the abilities of these students and instead focus on areas in need of remediation. The use of a more positive, or strengths‐based, approach to working with students is needed. Although strengths‐based assessment (SBA) is not a new concept, it is not routinely incorporated into school‐based assessment services. This article provides an overview of SBA and its benefits, along with empirically‐driven models that support the implementation of SBA in schools, and calls for a renewed focus on understanding students from a strengths‐based model. Examples of SBA measures and techniques are included, along with implications for practice for both students and psychologists.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号