期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Validity issues in standard-setting studies

Hans A. Pant André A. Rupp Simon P. Tiffin-Richards Olaf Köller 《Studies in Educational Evaluation》2009,35(2-3):95-101

Standard-setting procedures are a key component within many large-scale educational assessment systems. They are consensual approaches in which committees of experts set cut-scores on continuous proficiency scales, which facilitate communication of proficiency distributions of students to a wide variety of stakeholders. This communicative function makes standard-setting studies a key gateway for validity concerns at the intersection of evidentiary and consequential aspects of score interpretations. This short review paper describes the conceptual and empirical basis of validity arguments for standard-setting procedures in light of recent research on validity theory. It specifically demonstrates how procedural and internal evidence for the validity of standard-setting procedures can be collected to form part of the consequential basis of validity evidence for test use. 相似文献

2.

Assessment of prior learning in higher education: a review from a validity perspective

Tova Stenlund 《Assessment & Evaluation in Higher Education》2010,35(7):783-797

The process of giving official acknowledgment to formal, informal and non‐formal prior learning is commonly labelled as assessment, accreditation or recognition of prior learning (APL), representing a practice that is expanding in higher education in many countries. This paper focuses specifically on the assessment part of APL, which undoubtedly is central to the whole process, through a review of research in this area and an analysis of the reviewed studies from a validity perspective. The research reviewed (published 1990–2007) is categorised into empirical as well as more theoretically oriented publications, with a quantitative dominance of the latter. According to the validity analysis, a majority of the studies conducted in this area relate to the evidential basis of test interpretation and use, primarily providing theoretical rationales and theories for a variety of practices. The consequential basis of test interpretation and use has not been studied to any larger extent, resulting in a lack of both theoretical and empirical studies dealing with this aspect of validity. 相似文献

3.

Quality standards for new modes modes of assessment. An exploratory study of the consequential validity of the OverAll Test

Mien Segers Sabine Dierick Filip Dochy 《European Journal of Psychology of Education - EJPE》2001,16(4):569-588

During the past decade, due to societal developments, methods of instruction as well as the assessment of students’ performances have changed to an important considerable extent. Two of the elements of this change are the accents on cognitive competencies such as problem solving and on learning in an authentic context. In conjunction with the development of such learning methods, new modes of assessment were implemented. It was expected that this change would have positive feedback effects on learning and teaching. These feedback effects are the central issue of this article. They are discussed in terms of the experiences of the Maastricht School of Economics and Business Administration. This school places the analysis of authentic problems at the core of the curriculum, including the learning process as well as the assessment system. The OverAll Test, a case-based assessment instrument aiming to assess problem solving skills, was implemented as part of this. Different quality issues related to the OverAll Test have been evaluated. This article presents the results of one of the four validity studies conducted; an exploratory study of the consequential validity of the OverAll Test. It starts with the an outline of the main features of the new modes of assessment and the OverAll test as an example. There is then a discussed discussion of effectively the OverAll test fits these features as well as the goals and characteristics of problem-based learning. The study of the consequential validity of the OverAll Test is then described in depth. The results of the survey, as well as the results of the semi-structured interviews with staff and students, indicate a friction between the intended characteristics of the learning and assessment environment and the practice of instruction and assessment. 相似文献

4.

Les tests d’apprentissage comme alternative ou complément aux tests d’intelligence: Un bilan de leur evolution

Jürgen Guthke 《European Journal of Psychology of Education - EJPE》1990,5(2):117-133

For a number of reasons the learning test approach (also known under the expressions «dynamic assessment» or «assessment of learning potential») can be considered as a promising alternative and a complement to conventional procedures of intelligence testing. But learning test procedures have also been criticized for their lack of psychometric standardization, their time-consuming application and for their only modest increase in predictive validity compared to conventional intelligence tests with regard to normal populations. The author traces the history of the learning test concept and critically discusses its main theoretical problems and practical implications. He further presents an overview of the different learning test procedures that have been developed in his research group since the early seventies. One of the main goals of this research is to combine the learning test approach with psychometric standardization i.e. to maximize individualization of the assessment and pursue the goal of a more qualitative evaluation without losing the necessary objectivity or the possibility to quantify results in order to make interindividual comparisons. Other major concerns are to enhance content validity of learning tests by using better analyzed items and to invest more into research regarding construct validity. 相似文献

5.

Strategies for Examining the Consequences of Assessment and Accountability Programs

Suzanne Lane Clement A. Stone 《Educational Measurement》2002,21(1):23-30

This article addresses issues in evaluating the consequences of assessment programs that are developed for the purpose of holding schools accountable to state standards. After providing a brief review of research examining consequential evidence, a validation study to obtain consequential evidence for state assessment and accountability programs is proposed. The proposal includes a validity argument, a set of propositions that follow from the validity argument, a delineation of the consequential evidence needed, and a way to model the relationship between performance gains and school, principal, teacher, and student variables. 相似文献

6.

Testing Injustice: Examining the Consequential Validity of edTPA

Nadia Behizadeh Adrian Neely 《Equity & Excellence in Education》2013,46(3-4):242-264

In this case study, we examine the consequential validity of using edTPA in a social justice-oriented, urban teacher preparation program. According to the developers of edTPA, a primary purpose is to support teacher candidate learning, yet our analysis suggests that edTPA does not support learning when used during student teaching. Our 16 participants, who are primarily teacher candidates of color and many first-generation college students, and who all passed edTPA, unanimously indicated that edTPA increased their mental and financial stress, which they linked to design elements including high stakes, standardization, and external scoring. Participants also critiqued the construct of teaching represented in edTPA, arguing that dispositions and a social justice orientation are missing and that edTPA is more about following procedures than supporting candidate learning. Moreover, edTPA encouraged inequitable practices, including focusing on high-achieving classes and selecting curricula based on scoring procedures instead of student need. Overall, our analysis indicates that there is not strong consequential validity evidence to support the use of edTPA as an assessment during student teaching, particularly in social justice-oriented programs, yet suggests edTPA could be a useful tool if stakes and proceduralism are reduced and scoring is conducted locally. 相似文献

7.

Using Focus Groups to Examine the Consequential Aspect of validity

Naomi Chudowsky Peter Behuniak 《Educational Measurement》1998,17(4):28-38

How can focus groups be used to examine issues of consequential validity in large-scale assessment? In relation to a new large-scale assessment, what are teacher concerns, and how do these concerns differ by type of school district? What are the strengths and weaknesses in this approach to looking at consequential validity? 相似文献

8.

Assessment of teacher competence using video portfolios: Reliability, construct validity, and consequential validity

Wilfried Admiraal Mark HoeksmaMarie-Thérèse van de Kamp Gee van Duin 《Teaching and Teacher Education》2011,27(6):1019-1028

The richness and complexity of video portfolios endanger both the reliability and validity of the assessment of teacher competencies. In a post-graduate teacher education program, the assessment of video portfolios was evaluated for its reliability, construct validity, and consequential validity. Although video portfolio facilitated a reliable and valid assessment of teacher competencies, procedures to improve assessment quality were also revealed and are therefore discussed: more explicit grounding of assessment results in the data, peer debriefing, prolonged engagement with the assessment data, cross-checking to find confirmatory or counter examples. 相似文献

9.

Dynamic Assessment of Proficiency for Solving Procedural Knowledge Tasks

《教育心理学家》2013,48(3):365-384

Computer-coached practice environments can serve two purposes: to instruct and to assess. Sherlock I is a computer-coached practice environment for teaching avionics troubleshooting skills. Instruction is based on the dynamic assessment of the learner in the context of a troubleshooting problem. A cognitive task analysis of troubleshooting proficiency was used to develop Sherlock's instructional and assessment goals. As a learner works through a problem, Sherlock assesses the quality of his or her decisions and uses that information to provide the level of hint explicitness necessary at particular decision points in the problem. Specific competency building is situated within the troubleshooting context and is sharpened to the extent that satisfies each individual's needs. When a learning impasse is reached, Sherlock generates the appropriate level of feedback to the trainee based on his or her prior performance. Sherlock's hinting structure challenges trainees with learning opportunities that would be just beyond their reach without coaching. This form of dynamic assessment is based on patterns of student performance that reflect the processes of learning rather than simply the products of learning. Sherlock provides an example of how intelligent tutoring systems can change the nature of assessment by addressing dimensions of proficiency. 相似文献

10.

Comparing two forms of dynamic assessment and traditional assessment of preschool phonological awareness

Thatcher Kantor P Wagner RK Torgesen JK Rashotte CA 《Journal of learning disabilities》2011,44(4):313-321

The goal of the current study was to compare two forms of dynamic assessment and standard assessment of preschool children's phonological awareness. The first form of dynamic assessment was a form of scaffolding in which item formats were modified in response to an error so as to make the task easier or more explicit. The second form of dynamic assessment was direct instruction of the phonological awareness tasks. The results indicate that preschool children's phonological awareness can be assessed using standard assessment procedures, provided the items require processing units larger than the individual phoneme. No advantage was found in reliability or validity for either dynamic assessment condition relative to the standard assessment condition. Dynamic assessment does not appear to improve reliability or validity of phonological awareness assessments when preschool children are given tasks that they can perform using standard administration procedures. 相似文献

11.

Education as a mode of existence: A Latourian inquiry into assessment validity in higher education

Jonathan Tummons 《Educational Philosophy and Theory》2013,45(1):45-54

Within professional higher education, the construct of assessment validity is used to make assumptions about the extent to which students are able to replicate in professional practice what they have learned during their studies through the provision of authentic simulated opportunities to practice. Drawing on the work of Bruno Latour, this article argues that the conceptualisation as well as use of the idea of assessment validity in theorising the assessment of simulation-based learning in professional courses, in order to predict the future performance of the student constitutes a category mistake that consequently makes claims for assessment validity which are unfounded. The article goes on to explore ways by which ethnographers of education might use other elements of Latour’s work in order to generate rich, problematising accounts of educational practice. 相似文献

12.

Assessing Learning in a Technology-Supported Genetics Environment: Evidential and Systemic Validity Issues

《Educational Assessment》2013,18(3):155-196

To evaluate student learning in a computer-supported environment known as GenScope(tm), we developed a system for assessing students' reasoning proficiency in introductory genetics. A critical aspect of the development effort concerned the validity of this assessment system. We used a variety of methods to address traditional evidential validity concerns as well as more contemporary concerns with consequential and systemic validity. Specifically, we examined whether or not our assessment system helped students develop the understanding it was designed to assess. Our inquiry revealed strong evidential validity but only limited consequential validity. In response, we developed a set of formative assessments designed to scaffold student assessment performance without compromising the evidential validity of the assessment system. In addition to documenting and enhancing the validity of the system, these efforts demonstrate the utility of newer interpretive models of validity inquiry and the value of Rasch measurement tools for conducting such inquiry. 相似文献

13.

From Perception to Practice: The Impact of Teachers' Scoring Experience on Performance-Based Instruction and Classroom Assessment

《Educational Assessment》2013,18(4):257-290

Proponents of performance assessment often cite its potential to drive instructional reform. This has been the case with the Maryland School Performance Assessment Program, a battery of performance tasks administered annually to students in Grades 3, 5, and 8. Particularly because student responses to these tasks are scored in-state each summer by nearly 700 teachers, the scoring experience is regarded as a critical opportunity for professional development in support of such reform. As one means of examining the consequential validity of performance assessment, this study investigates the impact of the scoring experience on teachers' instructional and classroom assessment practice. Although teachers almost unanimously endorse the value of scoring, our analysis of interview data, questionnaires, classroom observation, and classroom artifacts demonstrates that their appropriation of performance-based instruction may be superficial and incomplete. From this study of a cadre of teacher-scorers comes a tempering of claims for the scoring experience as well as insights into the kinds of support needed to engender real and sustained changes in teaching and learning. 相似文献

14.

Educational and Employment Testing: Changing Concepts in Measurement and Policy

Wayne J. Camara Dianne C. Brown 《Educational Measurement》1995,14(1):5-11

How will the expansion of the concept of construct validity affect validation practice in employment testing? How does the need for consequential validity differ in educational and employment testing? How do the research bases differ for performance assessment in these settings? Are there parallel trends in policies for test use in education and industry? 相似文献

15.

The Application of Dynamic Assessment in People Communicating at a Prelinguistic Level: A descriptive review of the literature

Erika Boers Marleen J. Janssen Alexander E. M. G. Minnaert Wied A. J. J. M. Ruijssenaars 《International Journal of Disability, Development & Education》2013,60(2):119-145

Many people with severe disabilities face difficulties communicating with their communication partners and rely primarily on prelinguistic communication. It is accepted that dynamic assessment can play an important role in improving communication and in measuring a person’s ability to learn new communicative skills. Less is known, however, about the application of dynamic assessment in the case of those who communicate at a prelinguistic level. The present article reviewed dynamic assessment procedures that addressed communication abilities in people communicating at a prelinguistic level and young children who communicate using speech, with the aim of identifying key elements of dynamic assessment for persons communicating at a prelinguistic level. The results indicated the need for the identification of contextual variables that support communicative competence, teaching communication partners new skills, and a procedure that is highly individualised. Further research on the validity and reliability of these dynamic assessments is strongly recommended. 相似文献

16.

Developing Procedures for Implementing Peer Assessment in Large Classes Using an Action Research Process 总被引：2，自引：4，他引：2

Roy Ballantyne Karen Hughes Aliisa Mylonas 《Assessment & Evaluation in Higher Education》2002,27(5):427-441

Peer assessment has been used successfully in higher education, with important benefits reported in terms of student learning. However, most of the literature has focused on its use with small groups of students taught by staff who are committed to the peer assessment process. This paper reports the development of peer assessment procedures for use in large classes, using a cyclical process of action, reflection and refined action. The project was carried out in three phases and after each phase changes were made to the procedures in response to student and staff feedback. The development of procedures is discussed in relation to assessment tasks, assessment criteria, anonymity, procedural guidelines, distribution systems, marking procedures and tutor remarking. Although there are specific difficulties associated with the use of peer assessment in large classes, this study suggests that these are outweighed by the learning benefits for students. Based on the findings of this study, recommendations are made for ways in which peer assessment might be successfully applied in large classes. 相似文献

17.

Developing and assessing beginning teacher effectiveness: the potential of performance assessments

Linda Darling-Hammond Stephen P. Newton Ruth Chung Wei 《Educational Assessment, Evaluation and Accountability》2013,25(3):179-204

The Performance Assessment for California Teachers (PACT) is an authentic tool for evaluating prospective teachers by examining their abilities to plan, teach, assess, and reflect on instruction in actual classroom practice. The PACT seeks both to measure and develop teacher effectiveness, and this study of its predictive and consequential validity provides information on how well it achieves these goals. The research finds that teacher candidates’ PACT scores are significant predictors of their later teaching effectiveness as measured by their students’ achievement gains in both English language arts (ELA) and mathematics. Several subscales of the PACT are also influential in predicting later effectiveness: These include planning, assessment, and academic language development in ELA, and assessment and reflection in mathematics. In addition, large majorities of PACT candidates report that they acquired additional knowledge and skills for teaching by virtue of completing the assessment. Candidates’ feelings that they learned from the assessment were the strongest when they also felt well-supported by their program in learning to teach and in completing the assessment process. 相似文献

18.

Linking assessment to undergraduate student capabilities through portfolio examination

Anthony J. O'Sullivan Peter Harris Chris S. Hughes Susan M. Toohey Chinthaka Balasooriya Gary Velan 《Assessment & Evaluation in Higher Education》2012,37(3):379-391

Portfolios are an established method of assessment, although concerns do exist around their validity for capabilities such as reflection and self‐direction. This article describes an e‐portfolio which closely aligns learning and reflection to graduate capabilities, incorporating features that address concerns about portfolios. Students are required to complete assessments linked to graduate capabilities. In Year 3, a portfolio review occurs (205–248 students per year), focusing on students' grades and feedback from assessments and a reflective essay is submitted. In the essay, students reflect on their progress, identify areas of weakness and detail plans for improvement. Progress in each capability is summatively graded against specific criteria and feedback is provided. Students progressively accumulate evidence of learning linked to the graduate capabilities. The provision of sufficient structure prevents evasion of areas of weakness. Importantly, the equal weighting given to all graduate capabilities emphasises that competence in all areas is required. The requirement for a degree of self‐direction and reflection in all assessments promotes regular review of progress. This e‐portfolio explicitly links graduate outcomes with assessment in order to drive learning. Further research is required to evaluate acceptability to students, as well as the efficacy of portfolios in developing reflective practice and self‐directed learning. 相似文献

19.

大学英语形成性评估效度实证研究

沈梅英《湖北广播电视大学学报》2010,30(6):128-129

课堂形成性评估指教学过程中通过及时、有效的反馈促进语言教学健康发展的评估手段。课堂形成性评估能否作为终结性标准化测试的有效补充,在很大程度上取决于评估的效度。大学英语课堂形成性评估内容效度、结构效度和互存效度的分析表明：课堂形成性评估能提供更全面、更有意义的信息来描述学生的学习行为、能力发展和成绩进步,促进教学目标的实现。相似文献

20.

Learning analytics application to examine validity and generalizability of game-based assessment for spatial reasoning

Yoon Jeon Kim Mariah A. Knowles Jennifer Scianna Grace Lin José A. Ruipérez-Valiente 《British journal of educational technology : journal of the Council for Educational Technology》2023,54(1):355-372

Game-based assessment (GBA), a specific application of games for learning, has been recognized as an alternative form of assessment. While there is a substantive body of literature that supports the educational benefits of GBA, limited work investigates the validity and generalizability of such systems. In this paper, we describe applications of learning analytics methods to provide evidence for psychometric qualities of a digital GBA called Shadowspect, particularly to what extent Shadowspect is a robust assessment tool for middle school students' spatial reasoning skills. Our findings indicate that Shadowspect is a valid assessment for spatial reasoning skills, and it has comparable precision for both male and female students. In addition, students' enjoyment of the game is positively related to their overall competency as measured by the game regardless of the level of their existing spatial reasoning skills.

Practitioner notes

What is already known about this topic:

Digital games can be a powerful context to support and assess student learning.
Games as assessments need to meet certain psychometric qualities such as validity and generalizability.
Learning analytics provide useful ways to establish assessment models for educational games, as well as to investigate their psychometric qualities.

What this paper adds:

How a digital game can be coupled with learning analytics practices to assess spatial reasoning skills.
How to evaluate psychometric qualities of game-based assessment using learning analytics techniques.
Investigation of validity and generalizability of game-based assessment for spatial reasoning skills and the interplay of the game-based assessment with enjoyment.

Implications for practice and/or policy:

Game-based assessments that incorporate learning analytics can be used as an alternative to pencil-and-paper tests to measure cognitive skills such as spatial reasoning.
More training and assessment of spatial reasoning embedded in games can motivate students who might not be on the STEM tracks, thus broadening participation in STEM.
Game-based learning and assessment researchers should consider possible factors that affect how certain populations of students enjoy educational games, so it does not further marginalize specific student populations.

相似文献