首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
The World Bank’s Human Capital Index (HCI) aims to provide new information regarding future productivity of each country’s workforce, by synchronizing available International Large-Scale Assessment (ILSA) and regional test program results. Linking the literature on ILSA participation, this study questions the problematic nature of this approach and revisits the comparability issue of ILSA results. We find that education systems are imposed upon a score penalty depending on which ILSA or regional test program they choose to partake in. In particular, our results show that (i) test-overlap systems used in the score synchronization procedure are systematically different from systems that only choose to participate in one ILSA or exclusively in regional tests, (ii) inter-test score exchange rate is volatile due to sampling design and cohort effects, (iii) test participation type alone accounts for about 57.8 percent of the variation in synchronized scores, and the score penalty is especially salient for systems that exclusively participate in regional test programs; the majority of which are low-income and lower-middle income countries. Findings in this study show how various intra- and extrapolations to compensate for missing data in effect introduce large score penalties for systems that either did not participate or only partially participated in ILSAs. Finally, this study contributes to research on reasons for participation in ILSAs and the global rise of test-based accountability reform, under which the World Bank’s new HCI may be seen as a tool to incentivize participation in ILSAs by penalizing those governments that have chosen alternative, non-standardized paths for measuring learning outcomes of students.  相似文献   

One of the overarching goals of international large-scale assessments (ILSA) is to inform public discourse about the quality of education in different countries. To fulfil this function, the Organisation for Economic Co-operation and Development (OECD), for example, raises awareness of the Program for International Student Assessment (PISA) results through different forms of traditional and social media (e.g. press releases and other activities under the slogan PISA Day). Scholars have responded to the rapid growth of ILSA by examining public discourse through newspaper articles, policy documents, and other outlets. However, we know very little about whether and to what extent the general public is actually affected by PISA results. In order to address this gap, this study uses data regarding public trust in education from the 2011 wave of the International Social Survey Program (ISSP). Drawing on survey data from 30 countries and Hierarchical Linear Models (HLM), the study shows that PISA rankings have a significant effect on public perceptions. We find that in high performing countries the general public expresses higher levels of confidence in the education system. We discuss these patterns in the context of growing politicisation of education policy making and the use of ILSA as evidence.  相似文献   

The present paper aims to discuss how data from international large-scale assessments (ILSAs) can be utilized and combined, even with other existing data sources, in order to monitor educational outcomes and study the effectiveness of educational systems. We consider different purposes of linking data, namely, extending outcomes measures, analyzing differences over time or across cohorts, and supplementing context information. These linking strategies are illustrated by a non-exhaustive selection of studies that exploited ILSAs to investigate a wide range of educational topics. We conclude that the main contribution of ILSA to educational research lies in the ways they facilitate analyses of educational policy and policy-related issues at the institutional level by means of cross-country analyses. However, the scope of these studies also covers high-quality data on lower levels of the educational system.  相似文献   

In traditional feeling-of-knowing procedures, participants make judgments on unrecalled items only (e.g. Hart 1965). However, many researchers elicit feeling-of-knowing judgments (FOKs) on all items. When FOKs are made on all items, participants may use recall as a basis for judgments, leading to higher magnitude judgments for recalled items, but causing a relative floor effect for judgments for unrecalled items. We suspected that resolution (relative accuracy) would be better when FOKs are made on all items than when they are made on unrecalled items only. We examined the issue by comparing across studies, reanalyzing data from another experiment, and by conducting an original experiment. In the literature review, we included 83 conditions across 52 studies. We found that feeling-of-knowing judgments made on all items showed higher resolution than feeling-of-knowing judgments made on unrecalled items. This was replicated in the reanalysis of existing data of a single study that used both methods. In the original experiment, we collected feeling-of-knowing judgments for general-information questions. The experiment confirmed that resolution for predicting recognition was higher when feeling-of-knowing judgments were made on all items than when they were made only on unrecalled items. We discuss both methodological and theoretical implications of these data.  相似文献   

This paper examines whether, to what extent, and how international large-scale assessments (ILSAs) have influenced education policy-making at the national level. Based on an exploratory review of the research and policy literature on ILSAs and two surveys administered to educational policy experts, researchers, policymakers, and educators, our research found that ILSAs, with their multiple and ambiguous uses, increasingly function as solutions in search for the right problem – that is, they appear to be used as tools to legitimize educational reforms. The survey results pointed to a growing perception among stakeholders that ILSAs are having an effect on national educational policies, with 38% of respondents stating that ILSAs were generally misused in national policy contexts. However, while the ILSA literature indicates that these assessments are having some influence, there is little evidence that any positive or negative causal relationship exists between ILSA participation and the implementation of education reforms. Perhaps the most significant change associated with the use of ILSAs in the literature reviewed is the way in which new conditions for educational comparison have been made possible at the national, regional, and global levels.  相似文献   

International large-scale student assessments (ILSAs) in education represent a valuable source of information for policy-makers, not only on student achievements, but also on their relationship with different contextual factors. The results are partly described in the official studies’ reports; more can be derived from the publicly released data sets. However, league tables are often the only evidence used in policy debates and decisions on education. Indeed, the comparison of student achievement across the participating educational systems is a legitimate proxy for estimating countries’ development and productivity, but the use of league tables more often turns into ‘horse-ranking’, ignoring the contexts of teaching and learning. This is often supported by the media, turning the use of results into their abuse. The purpose of this paper is to discuss the use and misuse of league tables in reporting ILSA results, vs. the use of data for in-depth analysis in order to make informed decisions.  相似文献   

The media analysis is situated in the larger body of studies that explore the varied reasons why different policy actors advocate for international large-scale student assessments (ILSAs) and adds to the research on the fast advance of the global education industry. The analysis of The Economist, Financial Times, and Wall Street Journal covers publications on ‘PISA’, ‘TIMSS’, and related search items over the period 1996–2016. The three media outlets vary in terms of ILSA reporting. The Economist and Financial Times tend to focus on PISA, whereas the Wall Street Journal pays greater attention to TIMSS than PISA. The content analysis of 59 articles yields interesting results about how the business-oriented readership of the three media outlets frames public education and why it sees education as a profitable business opportunity. The three most common narratives, reflecting the business logic, are the following: (i) public education is in crisis; (ii) there is no correlation between spending and education outcome; and (iii) school accountability, teacher performance, and decentralisation represent the most effective policies to improve the quality of education. Drawing on these three common narratives, the financial media outlets present a particular vision of how to improve education; a vision in which the private sector is supposed to play a major role.  相似文献   

Conclusion In three-quarters of the student-item pairs the multiple-choice items correctly identified the use of adequate or inadequate strategies so on a general level the items might be thought to have performed satisfactorily. However as diagnostic tools they were generally inadequate and this fact points to the desirability of using the student-oriented procedure for constructing such items. In this procedure student understanding of the domain of knowledge is probed through interviews and distractors are designed to reflect the dominant types of misconceptions. The importance of the school-related context was something which emerged during the interviews and, in the light of the discrepancy between student understanding as revealed through school tests and interview studies, this is an area to which more attention needs to be given.  相似文献   

Increasingly, tests are being translated and adapted into different languages. Differential item functioning (DIF) analyses are often used to identify non-equivalent items across language groups. However, few studies have focused on understanding why some translated items produce DIF. The purpose of the current study is to identify sources of differential item and bundle functioning on translated achievement tests using substantive and statistical analyses. A substantive analysis of existing DIF items was conducted by an 11-member committee of testing specialists. In their review, four sources of translation DIF were identified. Two certified translators used these four sources to categorize a new set of DIF items from Grade 6 and 9 Mathematics and Social Studies Achievement Tests. Each item was associated with a specific source of translation DIF and each item was anticipated to favor a specific group of examinees. Then, a statistical analysis was conducted on the items in each category using SIBTEST. The translators sorted the mathematics DIF items into three sources, and they correctly predicted the group that would be favored for seven of the eight items or bundles of items across two grade levels. The translators sorted the social studies DIF items into four sources, and they correctly predicted the group that would be favored for eight of the 13 items or bundles of items across two grade levels. The majority of items in mathematics and social studies were associated with differences in the words, expressions, or sentence structure of items that are not inherent to the language and/or culture. By combining substantive and statistical DIF analyses, researchers can study the sources of DIF and create a body of confirmed DIF hypotheses that may be used to develop guidelines and test construction principles for reducing DIF on translated tests.  相似文献   

A questionnaire used in student evaluations of interdisciplinary courses during six semesters contained two Likert items stated in a direct negative mode which were embedded in a questionnaire (14–18 items) in which the remaining items were phrased in a direct positive mode. In the seventh semester and thereafter, the two negative items were restated as direct positive stems. Item‐analysis demonstrated that in the direct negative mode, the two items had low item‐to‐total correlations and that the internal consistency reliability of the sum score could be improved by eliminating the two negatively phrased items. Also, the two negatively worded items defined a separate factor. After they were reworded into a direct positive mode, these two items showed markedly improved item‐to‐total correlations. Moreover, the unique factor disappeared, which suggests that it was a methodological artefact probably attributable to respondent carelessness. Including a few negative items in an otherwise positively stated questionnaire leads to ambiguity of results rather than controlling for response sets. We therefore recommend against the practice.  相似文献   

Sjödahl, L. 1974. Number of Judges when Scaling Attitude Items. Scand. J. educ. Res. 18, 183‐197. When constructing attitude scales for measuring patient‐centering among student nurses the author has carried out a series of studies of the method of attitude scale formation. The article presents one of several methodological studies published by the same author in Ped. psyk. problem (nr. 184). Scale values, interquartile range, interval position and the selection of items for the final scale are studied as dependent variables when varying the size of the judging group. The results show that correlations between series of scale values from judging groups of varying sizes can be very high but that we can still get quite a different selection of items for the final scale, depending upon which judging group we use in the scaling procedure. The stability of the interval placements of the statements is shown to vary with the intervals along the scale.  相似文献   

This study was designed to examine the following central Vygotskian hypotheses about the functions of preschool children's private speech: (1) that private speech facilitates the transition from collaborative to independent task performance, and (2) that children's use of private speech is conducive to task success. Age-related changes in children's use of private speech were also examined. Forty preschoolers, ranging in age from three to five, completed a selective attention task with scaffolded assistance given from an experimenter when needed. In an effort to overcome several methodological limitations found in previous research, a new microgenetic method of analyzing speech-performance relations based on assigning task items to discrete categories reflecting six possible co-occurrences between private speech (item-relevant speech, item-irrelevant speech, silence) and performance (success, failure) was introduced. Results were that (1) item-relevant speech was used more often during successful than during failed items while the opposite was true for item-irrelevant speech; (2) children were more likely to use private speech on successful items after scaffolding than they were on similar items not following scaffolding; (3) after scaffolding, children were more likely to succeed on the next item if they talked to themselves than if they were silent; and (4) hypothesized curvilinear, age-related patterns in children's item-relevant private speech and silence were found, however, only when analyzing speech during successful items. Implications of this research for preschool teachers and parents are discussed.  相似文献   

A manipulation of the instructions students received prior to completing the 7-item Endeavor Instructional Rating card differentially affected their ratings on two types of items. Specifically, when students were led to believe their ratings would have a strong impact on the instructor's career, they tended to be more lenient on items measuring rapport (i.e., the affective domain); this same effect was not observed for items measuring pedagogical skill (i.e., the cognitive domain). The different items on our instructional rating instrument appear to be measuring different things. One implication of this observation is that the inconsistent findings reported in past research on student ratings of instruction may be due to the differential mix of items from one instrument to another. When instructors are compared on ratings given them by students, unbiased interpretation requires that the multidimensional nature of teaching (and of the rating instrument) be considered.  相似文献   

Ji Liu 《牛津教育评论》2019,45(3):315-332
This study explores the multidimensionality of engagements with international large-scale standardised assessments (ILSAs). The objective is to understand how different policy actors—government, media, and citizens—rationalise, report, and perceive China’s PISA participation. First, government archive analysis traces a decade of documents (2005–2015), and the findings show that Shanghai’s initial participation in PISA was rationalised as a policy experiment for learning Western ideas of education governance. Second, media content analysis of two major news outlets indicates that media framing of PISA participation was strategic on timing, intensity, and tone. Third, a public opinion survey yields results which show that low public knowledge of Shanghai’s PISA participation in 2012 is prevalent. Drawing on these findings, this study investigates how the ILSA movement, exemplified by PISA, engages different levels of stakeholders in China.  相似文献   

This study analyzed children’s use of mental computation strategies and the standard algorithm on multi-digit subtractions. Fifty-eight Flemish 4th graders of varying mathematical achievement level were individually offered subtractions that either stimulated the use of mental computation strategies or the standard algorithm in one choice and two no-choice conditions. In the choice condition, children could apply their preferential strategy on each item; in the no-choice conditions, they had to solve all items with mental computation and the standard algorithm, respectively. Results revealed that children of all achievement levels applied the standard algorithm remarkably frequently and efficiently, even on subtractions that were intended to evoke mental computation. Moreover, children did not fit their strategy choices to the numerical characteristics of the items, but high and above-average achieving children based their strategy choices on their individual mastery of the different strategies. We discuss the theoretical, methodological, and instructional implications of these results.  相似文献   

The Arnett Caregiver Interaction Scale (CIS) has been widely used in research studies to measure the quality of caregiver–child interactions. The scale was modeled on a well-established theory of parenting, but there are few psychometric studies of its validity. We applied factor analyses and item response theory methods to assess the psychometric properties of the Arnett CIS in a national sample of toddlers in home-based care and preschoolers in center-based care from the Early Childhood Longitudinal Study-Birth Cohort. We found that a bifactor structure (one common factor and a second set of specific factors) best fits the data. In the Arnett CIS, the bifactor model distinguishes a common substantive dimension from two methodological dimensions (for positively and negatively oriented items). Despite the good fit of this model, the items are skewed (most teachers/caregivers display positive interactions with children) and, as a result, the Arnett CIS is not well suited to distinguish between caregivers who are “highly” versus “moderately” positive in their interactions with children, according to the items on the scale. Regression-adjusted associations between the Arnett CIS and child outcomes are small, especially for preschoolers in centers. We encourage future scale development work on measures of child care quality by early childhood scholars.  相似文献   

This paper examines the role of the microgenetic method in science education. The microgenetic method is a technique for exploring the progression of learning in detail through repeated, high-frequency observations of a learner’s ‘performance’ in some activity. Existing microgenetic studies in science education are analysed. This leads to an examination of five significant methodological issues in microgenetic research. Firstly, qualitative and/or quantitative approaches to data collection and analysis within the microgenetic approach are considered and a case is made for the appropriateness of qualitative microgenetic research. Secondly, it is argued that researchers may define static intervals, periods within which (for methodological purposes) change is assumed not to occur, when reporting microgenetic studies. Thirdly, researchers should consider providing justifications for their choice of sampling rate with reference to the rate of change of the phenomenon they are studying. Fourthly, the difficulty of distinguishing conceptual change from the existence of multiple understandings is highlighted. Finally, the nature of sequences of repeated measures in microgenetic studies is considered. It is argued that different methodological approaches are suitable for microgenetic studies of different phenomena and a list of guidelines for the use of the microgenetic method in small-scale, qualitatively analysed studies in science education is proposed.  相似文献   

In this article we draw on data from a completed project entitled Why Do Women’s Studies? involving five English Universities. However, the data reported here focuses on a single institution. The data were collected through questionnaires which combined quantitative and qualitative questions and we have the views of three distinct groups of students: students taking women’s studies as a degree; students taking other degrees but including women’s studies modules and students with no experience of women’s studies. After detailing our method and reflecting on some methodological issues we present and debate our data which shows that although many of the conventional stereotypes regarding women’s studies remain in common discourses they seem to be agreed with less than they are reported to have been heard. Yet, the power of these discourses remains a danger to women’s studies as evidenced by its demise as an undergraduate course in many English institutions.  相似文献   

In recent years, more and more international comparative research has been conducted in internationally and geographically spread project teams and international research networks, and comparative research has become a fundamentally collaborative effort. Accordingly, research in such projects has to cope with a higher level of methodological complexity than non‐comparative research as well as with a particular sociocultural complexity. This complexity can have an influence on the research process and therefore on the quality and validity of the results, an issue that has so far not been discussed much, either in Higher Education research or beyond. Thus, this article refers to studies that provide empirical insights into comparative collaborative research teams and illuminates why international collaboration in comparative research projects is both a source of better solutions and of amplified complications and how they are interrelated. On this basis it provides a conceptual reflection and delineates dimensions of task‐related, methodological complexity and team diversity. While comparative research has specific methodological challenges that can be alleviated by international team collaboration, collaborative research has particular social challenges that can be aggravated in comparative research. The conclusion makes propositions for further analyses, discusses lessons for comparative Higher Education research and sets out implications for its institutional development.  相似文献   

This study explored methodological challenges related to eliciting and assessing teachers’ beliefs about teaching cancer education. We aimed to develop reliable belief scales, a methodological innovation in the context of the theory of planned behaviour (TPB), as single belief items are typically used as predictors for direct measures. The expectancy-value product proved useful for identifying categories of teachers’ beliefs about cancer education. Six reliable belief scales emerged, for example, control beliefs about external emotional and non-emotional inhibitors and internal facilitators. We also discuss methodological issues related to eliciting beliefs and forming reliable belief scales.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号