期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Validity and reliability testing of the Chinese (mainland) version of the 39-item Parkinson’s Disease Questionnaire (PDQ-39)

Wei Luo Xiao-hong Gui Bo Wang Wen-ying Zhang Zhi-yuan Ouyang Yi Guo Bao-rong Zhang Mei-ping Ding 《Journal of Zhejiang University. Science. B》2010,11(7):531-538

The 39-item Parkinson’s Disease Questionnaire (PDQ-39) has been tested in many languages, but not in Chinese mainland. We aimed to assess the Chinese (mainland) version of the PDQ-39. Seventy-one subjects with Parkinson’s disease (PD) completed the PDQ-39 and the Medical Outcomes Study 36-item Short Form Health Survey (SF-36). All subjects were retested with the PDQ-39 a week later. The united Parkinson’s disease rating scale (UPDRS) and the Hoehn and Yahr (H &; Y) scale were also used to evaluate the subjects. Reliability was assessed by Cronbach’s α and intra-class correlation coefficient (ICC). Validity was examined in terms of agreement with SF-36, UPDRS, and H &; Y scales. The Chinese (mainland) version of the PDQ-39 demonstrated acceptable reliability (Cronbach’s α: 0.84–0.88; ICC: 0.56–0.82). The item-total correlations (0.33–0.88) and scaling success rates (77.56%) indicated satisfactory convergent and discriminant validity of the PDQ-39 items. The correlations between related constructs of the PDQ-39 and UPDRS (r=0.44–0.68) and between those of the PDQ-39 and SF-36 (r=(−0.46)-(−0.69)) were all statistically significant (P<0.01). Except for stigma, cognitions, and bodily discomfort, all other dimensions of the PDQ-39 significantly discriminated patients at different H &; Y stages indicated by the H &; Y scale. Although our observations indicate that some problematic subscales of this version of the PDQ-39 could be improved upon, this study suggests acceptable reliability and validity of the Chinese (mainland) version of the PDQ-39. 相似文献

2.

Development of Items for a Pedagogical Content Knowledge Test Based on Empirical Analysis of Pupils' Errors

Melanie Jüttner Birgit J. Neuhaus 《International Journal of Science Education》2013,35(7):1125-1143

In view of the lack of instruments for measuring biology teachers' pedagogical content knowledge (PCK), this article reports on a study about the development of PCK items for measuring teachers' knowledge of pupils' errors and ways for dealing with them. This study investigated 9th and 10th grade German pupils' (n?=?461) drawings in an achievement test about the knee-jerk in biology, which were analysed by using the inductive qualitative analysis of their content. The empirical data were used for the development of the items in the PCK test. The validation of the items was determined with think-aloud interviews of German secondary school teachers (n?=?5). If the item was determined, the reliability was tested by the results of German secondary school biology teachers (n?=?65) who took the PCK test. The results indicated that these items are satisfactorily reliable (Cronbach's alpha values ranged from 0.60 to 0.65). We suggest a larger sample size and American biology teachers be used in our further studies. The findings of this study about teachers' professional knowledge from the PCK test could provide new information about the influence of teachers' knowledge on their pupils' understanding of biology and their possible errors in learning biology. 相似文献

3.

Digital ITEMS Module 1: Reliability in Classical Test Theory

下载免费PDF全文

Charlie Lewis Michael Chajewski André A. Rupp 《Educational Measurement》2018,37(2):71-72

In this ITEMS module, we provide a two‐part introduction to the topic of reliability from the perspective of classical test theory (CTT). In the first part, which is directed primarily at beginning learners, we review and build on the content presented in the original didactic ITEMS article by Traub and Rowley (1991). Specifically, we discuss the notion of reliability as an intuitive everyday concept to lay the foundation for its formalization as a reliability coefficient via the basic CTT model. We then walk through the step‐by‐step computation of key reliability indices and discuss the data collection conditions under which each is most suitable. In the second part, which is directed primarily at intermediary learners, we present a distribution‐centered perspective on the same content. We discuss the associated assumptions of various CTT models ranging from parallel to congeneric, and review how these affect the choice of reliability statistics. Throughout the module, we use a customized Excel workbook with sample data and basic data manipulation functionalities to illustrate the computation of individual statistics and to allow for structured independent exploration. In addition, we provide quiz questions with diagnostic feedback as well as short videos that walk through sample exercises within the workbook. 相似文献

4.

How Well Does the Sum Score Summarize the Test? Summability as a Measure of Internal Consistency

下载免费PDF全文

J. J. Goeman N. H. De Jong 《Educational Measurement》2018,37(2):54-63

Many researchers use Cronbach's alpha to demonstrate internal consistency, even though it has been shown numerous times that Cronbach's alpha is not suitable for this. Because the intention of questionnaire and test constructers is to summarize the test by its overall sum score, we advocate summability, which we define as the proportion of total test variation that is explained by the sum score. This measure is closely related to Loevinger's H. The mathematical derivation of summability as a measure of explained variation is given for both scale and dichotomously scored items. Using computer simulations, we show that summability performs adequately and we apply it to an existing productive vocabulary test. An open‐source tool to easily calculate summability is provided online ( https://sites.google.com/view/summability ). 相似文献

5.

A Measure of Counselor Competency

Karen Eriksen Garrett McAuliffe 《Counselor Education & Supervision》2003,43(2):120-133

Counselor educators need to be able to demonstrate their effectiveness in training new counselors; however, currently few valid or reliable measures exist for assessing educators' impact. The authors describe the development of such an instrument, the Counseling Skills Scale. They began by revising an existing scale and then they solicited feedback from experts and a focus group. They used the instrument to compare beginning counselors‐in‐training with those who had completed a counseling skills course. Finally, they conducted an item analysis. A paired t test showed significant improvements in counseling skills (t = 4.51, p < .000) from pretest to posttest. Cronbach's alpha showed internal consistency to be .90. 相似文献

6.

Easier Said Than Done: Rejoinder on Sijtsma and on Green and Yang

Ernest C. Davenport Mark L. Davison Pey‐Yan Liou Quintin U. Love 《Educational Measurement》2016,35(1):6-10

The main points of Sijtsma and Green and Yang in Educational Measurement: Issues and Practice (34, 4) are that reliability, internal consistency, and unidimensionality are distinct and that Cronbach's alpha may be problematic. Neither of these assertions are at odds with Davenport, Davison, Liou, and Love in the same issue. However, many authors in the testing community mention these terms not only together, but sometimes as if they are synonymous. Moreover, Cronbach's coefficient alpha is very popular as an index of reliability. Thus, articles discussing alpha are not only appropriate, but necessary. Our concerns are the same as formed the genesis of prior (2009) articles by these same authors, Sijtsma and Green and Yang. This rejoinder also makes comments about item parcels when tests are multidimensional and about factor analytic approaches to assessing reliability. 相似文献

7.

Reliability,Dimensionality, and Internal Consistency as Defined by Cronbach: Distinct Albeit Related Concepts

下载免费PDF全文

Ernest C. Davenport Mark L. Davison Pey‐Yan Liou Quintin U. Love 《Educational Measurement》2015,34(4):4-9

This article uses definitions provided by Cronbach in his seminal paper for coefficient α to show the concepts of reliability, dimensionality, and internal consistency are distinct but interrelated. The article begins with a critique of the definition of reliability and then explores mathematical properties of Cronbach's α. Internal consistency and dimensionality are then discussed as defined by Cronbach. Next, functional relationships are given that relate reliability, internal consistency, and dimensionality. The article ends with a demonstration of the utility of these concepts as defined. It is recommended that reliability, internal consistency, and dimensionality each be quantified with separate indices, but that their interrelatedness be recognized. High levels of unidimensionality and internal consistency are not necessary for reliability as measured by α nor, more importantly, for interpretability of test scores. 相似文献

8.

Reliability and Validity Issues for Two Common Measures of Medical Students' Attitudes toward Older Adults

T. J. Stewart E. Roberts P. Eleazer R. Boland D. Wieland 《Educational gerontology》2013,39(6):409-421

Results are reported from 2 common measures of medical student attitudes toward older adults: Maxwell-Sullivan Attitude Survey (MSAS); and UCLA Geriatrics Attitude Survey (GAS), with students entering the University of South Carolina School of Medicine (USCSM) in the period 2000–2005. A reliability analysis incorporating item means, Cronbach's alpha, item correlation matrix, and, Spearman-Brown prediction for positively and negatively worded items was conducted. Internal consistency results were unacceptable, revealing reliability and validity problems in this sample of medical students. Reconsideration of the use of these common measures, and a reframing of attitudes of medical students toward older adults seem appropriate. 相似文献

9.

The Dominance Concept Inventory: A Tool for Assessing Undergraduate Student Alternative Conceptions about Dominance in Mendelian and Population Genetics

Joel K. Abraham Kathryn E. Perez Rebecca M. Price 《CBE life sciences education》2014,13(2):349-358

Despite the impact of genetics on daily life, biology undergraduates understand some key genetics concepts poorly. One concept requiring attention is dominance, which many students understand as a fixed property of an allele or trait and regularly conflate with frequency in a population or selective advantage. We present the Dominance Concept Inventory (DCI), an instrument to gather data on selected alternative conceptions about dominance. During development of the 16-item test, we used expert surveys (n = 12), student interviews (n = 42), and field tests (n = 1763) from introductory and advanced biology undergraduates at public and private, majority- and minority-serving, 2- and 4-yr institutions in the United States. In the final field test across all subject populations (n = 709), item difficulty ranged from 0.08 to 0.84 (0.51 ± 0.049 SEM), while item discrimination ranged from 0.11 to 0.82 (0.50 ± 0.048 SEM). Internal reliability (Cronbach''s alpha) was 0.77, while test–retest reliability values were 0.74 (product moment correlation) and 0.77 (intraclass correlation). The prevalence of alternative conceptions in the field tests shows that introductory and advanced students retain confusion about dominance after instruction. All measures support the DCI as a useful instrument for measuring undergraduate biology student understanding and alternative conceptions about dominance. 相似文献

10.

Development and validity of a Dutch version of the Remote Associates Task: An item-response theory approach

Soghra Akbari Chermahini Marian Hickendorff Bernhard Hommel 《Thinking Skills and Creativity》2012,7(3):177-186

The Remote Associates Test (RAT) developed by Mednick and Mednick (1967) is known as a valid measure of creative convergent thinking. We developed a 30-item version of the RAT in Dutch with high internal consistency (Cronbach's alpha = 0.85) and applied both Classical Test Theory and Item Response Theory (IRT) to provide measures of item difficulty and discriminability, construct validity, and reliability. IRT was further used to construct a shorter version of the RAT, which comprises of 22 items but still shows good reliability and validity—as revealed by its relation to Raven's Advanced Progressive Matrices test, another insight-problem test, and Guilford's Alternative Uses Test. 相似文献

11.

Assessment of the quality and generalizability of the revised UCLA loneliness scale in Chinese and Korean community-dwelling elderly populations using item response theory (IRT)-Rasch modeling and hybrid IRT-logistic regression

In H. Park Arif Rachmatullah In-Sook Park 《Educational gerontology》2013,39(10):581-599

ABSTRACT

Objectives: This study aims to test the dimensionality, reliability, and item quality of the revised UCLA loneliness scale as well as to investigate the differential item functioning (DIF) of the three dimensions of the revised UCLA loneliness scale in community-dwelling Chinese and Korean elderly individuals.

Method: Data from 493 elderly individuals (287 Chinese and 206 Korean) were used to examine the revised UCLA loneliness scale. The Research model based on item response theory (IRT) was used to test dimensionality, reliability, and item fit. The hybrid ordinal logistic regression-IRT test was used to evaluate DIF.

Results: Item separation reliability, person reliability, and Cronbach’s alpha met the benchmarks. The quality of the items in the three-dimension model met the benchmark. Eight items were detected as significant DIF items (at α < .01). The loneliness level of Chinese elderly individuals was significantly higher than that of Koreans in Dimensions 1 and 2, while Korean elderly participants showed significantly higher loneliness levels than Chinese participants in Dimension 3. Several collected demographic characteristics and loneliness levels were more highly correlated in Korean elderly individuals than in Chinese elderly individuals.

Conclusion: Analysis using the three dimensions is reasonable for the revised UCLA loneliness scale. Good item quality and the items of this measure suggest that the revised UCLA loneliness can be used to assess the preferred latent traits. Finally, the differences between the levels of loneliness in Chinese and Korean elderly individuals are associated with the factors of loneliness. 相似文献

12.

A psychometric measurement model for adult English language learners: Pearson Test of English Academic

Hye K. Pae 《Educational Research and Evaluation》2013,19(3):211-229

The aim of this study was to apply Rasch modeling to an examination of the psychometric properties of the Pearson Test of English Academic (PTE Academic). Analyzed were 140 test-takers' scores derived from the PTE Academic database. The mean age of the participants was 26.45 (SD = 5.82), ranging from 17 to 46. Conformity of the participants' performance on the 86 items of PTE Academic Form 1 of the field test was evaluated using the partial credit model. The person reliability coefficient was .96, and item reliability was .99. The results showed that no significant differential item functioning was found across subgroups of gender and spoken-language context, indicating that the item data approximated the Rasch model. The findings of this study validated the test stability of PTE Academic as a useful measurement tool for English language learners' academic English assessment. 相似文献

13.

Developing and evaluating instructionally sensitive assessments in science

Maria Araceli Ruiz‐Primo Min Li Kellie Wills Michael Giamellaro Ming‐Chih Lan Hillary Mason Deanna Sands 《科学教学研究杂志》2012,49(6):691-712

The purpose of this article is to address a major gap in the instructional sensitivity literature on how to develop instructionally sensitive assessments. We propose an approach to developing and evaluating instructionally sensitive assessments in science and test this approach with one elementary life‐science module. The assessment we developed was administered to 125 students in seven classrooms. The development approach considered three dimensions of instructional sensitivity; that is, assessment items should: represent the curriculum content, reflect the quality of instruction, and have formative value for teaching. Focusing solely on the first dimension, representation of the curriculum content, this study was guided by the following research questions: (1) What science module characteristics can be systematically manipulated to develop items that prove to be instructionally sensitive? and (2) Are the instructionally sensitive assessments developed sufficiently valid to make inferences about the impact of instruction on students' performance? In this article, we describe our item development approach and provide empirical evidence to support validity arguments about the developed instructionally sensitive items. Results indicated that: (1) manipulations of the items at different proximities to vary their sensitivity were aligned with the rules for item development and also corresponded with pre‐to‐post gains; and (2) the items developed at different distances from the science module showed a pattern of pre‐to‐post gain consistent with their instructional sensitivity, that is, the closer the items were to the science module, the larger the observed gains and effect sizes. © 2012 Wiley Periodicals, Inc. J Res Sci Teach 49: 691–712, 2012 相似文献

14.

Measuring Mathematical Competences of Engineering Students at the Beginning of Their Studies

Irene Neumann Bettina Rösken-Winter Malte Lehmann Christoph Duchhardt Aiso Heinze Reinhold Nickolaus 《Peabody Journal of Education》2013,88(4):465-476

This article reports about our efforts to determine engineering students' competence in mathematics. Our research is embedded in a larger project, KoM@ING–Modeling and developing competence: Integrated IRT based and qualitative studies with a focus on mathematics and its usage in engineering studies, within the program Modeling and Measuring Competencies in Higher Education (KoKoHS). KoKoHS provides the umbrella organization of several research projects addressing the modeling and measuring of competences at the college level. KoM@ING aims to model the role of engineering students' mathematical competences for their studies from both a quantitative and a qualitative perspective.

Here, we report the development of a large-scale instrument assessing engineering freshmen's competence in mathematics by applying Rasch analysis to determine measures for item difficulties and student abilities. Several analyses were performed to provide insights into the measures' reliability and validity. In particular, to examine cognitive validity, we scrutinized students' think-aloud protocols when solving the items to investigate their problem solving abilities as a proxy for item difficulty. Overall, we found first evidence that our instrument is suitable to assess engineering freshmen's competence in mathematics. This instrument may be helpful to conduct further research and to inform those concerned with college organization and policy. 相似文献

15.

Relation between examinees’ true knowledge and examination scores: systematic review and exemplary calculations on Multiple-True-False items

《Educational Research Review》2021

相似文献

16.

IRT‐Estimated Reliability for Tests Containing Mixed Item Formats

Lianghua Shu Richard D. Schwarz 《Journal of Educational Measurement》2014,51(2):163-177

As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's α, Feldt‐Raju, stratified α, and marginal reliability). Models with different underlying assumptions concerning test‐part similarity are discussed. A detailed computational example is presented for the targeted coefficients. A comparison of the IRT model‐derived coefficients is made and the impact of varying ability distributions is evaluated. The advantages of IRT‐derived reliability coefficients for problems such as automated test form assembly and vertical scaling are discussed. 相似文献

17.

An empirical examination of IRT information for school climate surveys

Lun Mo Fang Yang Xiangen Hu 《Educational Research and Evaluation》2013,19(1):33-45

School climate surveys are widely applied in school districts across the nation to collect information about teacher efficacy, principal leadership, school safety, students' activities, and so forth. They enable school administrators to understand and address many issues on campus when used in conjunction with other student and staff data. However, these days each district develops the questionnaire according to its own needs and rarely provides supporting evidence for the reliability of items in the scale, that is, whether an individual item contributes significant information to the questionnaire. The Item Response Theory (IRT) is a useful tool that helps examine how much information each item and the whole scale can provide. Our study applied IRT to examine individual items in a school climate survey and assessed the efficiency of the survey after the removal of items that contributed little to the scale. The purpose of this study is to show how IRT can be applied to empirically validate school climate surveys. 相似文献

18.

Detecting Local Item Dependence in Polytomous Adaptive Data

Jessica L. Mislevy André A. Rupp Jeffrey R. Harring 《Journal of Educational Measurement》2012,49(2):127-147

A rapidly expanding arena for item response theory (IRT) is in attitudinal and health‐outcomes survey applications, often with polytomous items. In particular, there is interest in computer adaptive testing (CAT). Meeting model assumptions is necessary to realize the benefits of IRT in this setting, however. Although initial investigations of local item dependence have been studied both for polytomous items in fixed‐form settings and for dichotomous items in CAT settings, there have been no publications applying local item dependence detection methodology to polytomous items in CAT despite its central importance to these applications. The current research uses a simulation study to investigate the extension of widely used pairwise statistics, Yen's Q₃ Statistic and Pearson's Statistic X², in this context. The simulation design and results are contextualized throughout with a real item bank of this type from the Patient‐Reported Outcomes Measurement Information System (PROMIS). 相似文献

19.

Delimiting Coefficient α from Internal Consistency and Unidimensionality

下载免费PDF全文

Klaas Sijtsma 《Educational Measurement》2015,34(4):10-13

I discuss the contribution by Davenport, Davison, Liou, & Love (2015) in which they relate reliability represented by coefficient α to formal definitions of internal consistency and unidimensionality, both proposed by Cronbach (1951). I argue that coefficient α is a lower bound to reliability and that concepts of internal consistency and unidimensionality, however defined, belong to the realm of validity, viz. the issue of what the test measures. Internal consistency and unidimensionality may play a role in the construction of tests when the theory of the attribute for which the test is constructed implies that the items be internally consistent or unidimensional. I also offer examples of attributes that do not imply internal consistency or unidimensionality, thus limiting these concepts' usefulness in practical applications. 相似文献

20.

Regional Environmental Education Centers

Karen E. Diamond 《The Journal of environmental education》2013,44(2):23-25

Abstract

The development of the Children's Attitudes Toward the Environment Scale-Preschool Version (CATES-PV) is reported. The scale was administered to 42 preschool children. Their parents (34 mothers, 30 fathers) completed 2 environmental attitude scales, an environmental knowledge scale, and a questionnaire concerning environmentally related home practices. The scale has acceptable reliability, with a Cronbach's alpha of .68. Construct validity of the scale was suggested by the pattern of relationships found between child and parent measures. Specifically, children's attitudes were not correlated with verbal ability, but with the degree to which children participated in environmentally relevant activities in the home. The implications of those results for preschool curricula and practices are discussed. 相似文献