期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Negative Keying Effects in the Factor Structure of TIMSS 2011 Motivation Scales and Associations with Reading Achievement

Michalis P. Michaelides 《教育实用测度》2013,26(4):365-378

ABSTRACT

The Student Background survey administered along with achievement tests in studies of the International Association for the Evaluation of Educational Achievement includes scales of student motivation, competence, and attitudes toward mathematics and science. The scales consist of positively- and negatively keyed items. The current research examined the factorial structure of the 18-item motivational scales in fourth-grade mathematics in the 2011 Trends in International Mathematics and Science Study (TIMSS). Survey data from six European countries were analyzed. In comparisons of alternative models, the fit was adequate when three correlated factors were specified and negative keying was taken into account as a latent factor, or with correlated uniquenesses among negatively keyed items. Participants reading achievement scores correlated systematically to negative keying with coefficients ranging from .254 to .395 in the six samples. Unlike their higher-scoring peers, fourth-graders with lower reading achievement responded differentially to similar items depending on the direction of item keying, in such a way that their motivation scores were biased downward. Implications about the use of reverse keying in surveys for young students are discussed. 相似文献

2.

A Comparison of Two Response Scale Formats Used in Teaching Evaluation Questionnaires

Malcolm G. Eley Erica J. Stecher 《Assessment & Evaluation in Higher Education》1997,22(1):65-79

Three studies compared the common Likert agree/disagree question form to a behavioural observation form in which students report recalled frequencies of described teaching or learning events. The agree/disagree form seemed to prompt global, impressionistic approaches to responding, while the behavioural observation form seemed to prompt more objective approaches. Between‐student response consistency was greater for the behavioural observation form than for the agree/disagree form. Across separate samples of teaching, mean overall ratings derived from behavioural observation form questionnaires spread more broadly than did those from agree/ disagree forms. Across separate elements within an individual's teaching, the ratings from the behavioural observation form spread more than those from the agree/disagree form. The conclusions drawn were that using behavioural observation form questions rather than agree/disagree questions in teaching evaluation questionnaires can yield measurable improvements in inter‐rater reliability and in the capability to distinguish amongst levels of teaching quality. 相似文献

3.

Development and validation of an instrument for assessing attitudes of high school students about recycling

Ilker Ugulu 《Environmental Education Research》2015,21(6):916-942

Recycling and its applications are growing significantly due to the great potential for solving a range of environmental problems in society. Nevertheless, there are currently very few instruments that can provide valid and reliable data on students’ attitudes toward recycling. In this regard, this article focuses on the development and validation of Recycling Attitude Scale (RAS). The items in the RAS were developed initially from the responses to three open-ended items by 53 tenth and eleventh grade students and literature review on recycling attitude. This initial form was pilot tested with 356 tenth and eleventh grade students and then subjected to exploratory factor analysis. Subsequently, the revised version of the scale was administrated to 694 tenth grade students, and the results were subjected to confirmatory factor analysis and reliability analysis. The RAS consists of 21 items in three subscales, with responses recorded on a four-point Likert scale, options ranging from strongly agree to strongly disagree. Cronbach’s alpha reliability coefficient (α) of the scale was found to be .87. The results indicate that the RAS a potentially valuable tool for both instructors and researchers in Turkey for the assessment of the attitudes toward recycling held by students in secondary education. 相似文献

4.

Effect of the number of scale points on chi‐square fit indices in confirmatory factor analysis

Samuel B. Green Theresa M. Akey Kandace K. Fleming Scott L. Hershberger Janet G. Marquis 《Structural equation modeling》2013,20(2):108-120

This article investigates the effect of the number of item response categories on chi‐square statistics for confirmatory factor analysis to assess whether a greater number of categories increases the likelihood of identifying spurious factors, as previous research had concluded. Four types of continuous single‐factor data were simulated for a 20‐item test: (a) uniform for all items, (b) symmetric unimodal for all items, (c) negatively skewed for all items, or (d) negatively skewed for 10 items and positively skewed for 10 items. For each of the 4 types of distributions, item responses were divided to yield item scores with 2,4, or 6 categories. The results indicated that the chi‐square statistic for evaluating a single‐factor model was most inflated (suggesting spurious factors) for 2‐category responses and became less inflated as the number of categories increased. However, the Satorra‐Bentler scaled chi‐square tended not to be inflated even for 2‐category responses, except if the continuous item data had both negatively and positively skewed distributions. 相似文献

5.

The Effects of the Number of Scale Points and Non-Normality on the Generalizability Coefficient: A Monte Carlo Study

Steven R. Shumate James Surles Robert L. Johnson Jim Penny 《教育实用测度》2013,26(4):357-376

Increasingly, assessment practitioners use generalizability coefficients to estimate the reliability of scores from performance tasks. Little research, however, examines the relation between the estimation of generalizability coefficients and the number of rubric scale points and score distributions. The purpose of the present research is to inform assessment practitioners of (a) the optimum number of scale points necessary to achieve the best estimates of generalizability coefficients and (b) the possible biases of generalizability coefficients when the distribution of scores is non-normal. Results from this study indicate that the number of scale points substantially affects the generalizability estimates. Generalizability estimates increase as scale points increase, with little bias after scales reach 12 points. Score distributions had little effect on generalizability estimates. 相似文献

6.

Student evaluation of teaching: the use of best–worst scaling

Twan Huybers 《Assessment & Evaluation in Higher Education》2014,39(4):496-513

An important purpose of student evaluation of teaching is to inform an educator’s reflection about the strengths and weaknesses of their teaching approaches. Quantitative instruments are one way of obtaining student responses. They have traditionally taken the form of surveys in which students provide their responses to various statements using item-by-item agree/disagree ratings. Previous research has identified shortcomings of such rating scales, including response bias and the associated lack of discrimination amongst the items evaluated. In this paper, best–worst scaling is proposed as a novel method for quantitative teaching evaluation. The way in which best–worst scaling can be used in this context is illustrated in three different applications. Two applications demonstrate how it can be used for evaluations in a small-size classroom environment. The third application is a broader evaluation of university courses on a larger scale. In comparison with conventional rating scales, the best–worst scaling approach enables better highlighting of the differences between evaluation items. In doing so, it can provide enhanced guidance to educators in their reflection about their teaching. Moreover, implementation and analysis of a best–worst scaling evaluation is relatively straightforward, which establishes it a feasible method for teaching practitioners and researchers. 相似文献

7.

Effect of number of response options on the psychometric properties of Likert-type scales used with children

《Studies in Educational Evaluation》2020

Although Likert-type rating scales are used in a great number of early childhood studies, knowledge of how the number of response options affects the psychometric properties of scales used with children is limited. The purpose of this study is to contribute to this knowledge. Data were collected from second grade students and third grade students. Accordingly, 1,092 second- and third-graders completed a 2-point, 3-point, and 4-point version of the School Attachment Scale for Children and Adolescents. Participants came from 11 schools, different in terms of socioeconomic status. The children received the versions approximately three weeks apart. Results revealed that as the number of response options increased, the means tended to decrease and the distribution to be normal. For the 2-point version, most items were below the cut-off point in terms of discrimination indexes. Compared to the 2-point version, there was a significant increase in discrimination indexes for the 3- and 4-point versions, and the items’ discrimination indexes were high. It was concluded that the reliability coefficient increased with an increasing number of response options for all subdimensions of the scale. When the validity estimations of the three subdimensions were examined for the three versions of the scale, it was found that the 3- and 4-point versions were appropriate for the validity and that the validity of the 2-point version was weak. It was observed that using 2-point Likert-type scales with children negatively affected the psychometric properties and that these properties improved with an increased number of response options. 相似文献

8.

马基雅维利主义人格量表的信、效度分析

郭远兵黄朝云郭小安《宁波大学学报(教育科学版)》2012,(4):68-70

为考察马基雅维利主义人格量表(MPS)在大学生群体中的适用性,以481名武汉市大学生为被试,对 MPS 进行因素分析和效、信度检验,结果显示:(1)项目分析后保留16个项目,探索性因素分析抽取4个因素,各因子及项目归属与原问卷一致,4个因子可解释总变异的52.064%.(2)验证性因素分析显示4因素模型拟合良好;MPS 总分与各因子和人格特质、操纵策略相关显著.(3)内部一致性信度为0.727-0.831,重测信度为0.715-0.837.据此认为:MPS 中文版信、效度良好相似文献

9.

Shifting gears: consequences of including two negatively worded items in the middle of a positively worded questionnaire

Michael J. Roszkowski Margot Soven 《Assessment & Evaluation in Higher Education》2010,35(1):113-130

A questionnaire used in student evaluations of interdisciplinary courses during six semesters contained two Likert items stated in a direct negative mode which were embedded in a questionnaire (14–18 items) in which the remaining items were phrased in a direct positive mode. In the seventh semester and thereafter, the two negative items were restated as direct positive stems. Item‐analysis demonstrated that in the direct negative mode, the two items had low item‐to‐total correlations and that the internal consistency reliability of the sum score could be improved by eliminating the two negatively phrased items. Also, the two negatively worded items defined a separate factor. After they were reworded into a direct positive mode, these two items showed markedly improved item‐to‐total correlations. Moreover, the unique factor disappeared, which suggests that it was a methodological artefact probably attributable to respondent carelessness. Including a few negative items in an otherwise positively stated questionnaire leads to ambiguity of results rather than controlling for response sets. We therefore recommend against the practice. 相似文献

10.

Affordances of Item Formats and Their Effects on Test‐Taker Cognition under Uncertainty

Jung Aa Moon Madeleine Keehner Irvin R. Katz 《Educational Measurement》2019,38(1):54-62

The current study investigated how item formats and their inherent affordances influence test‐takers’ cognition under uncertainty. Adult participants solved content‐equivalent math items in multiple‐selection multiple‐choice and four alternative grid formats. The results indicated that participants’ affirmative response tendency (i.e., judge the given information as True) was affected by the presence of a grid, type of grid options, and their visual layouts. The item formats further affected the test scores obtained from the alternatives keyed True and the alternatives keyed False, and their psychometric properties. The current results suggest that the affordances rendered by item design can lead to markedly different test‐taker behaviors and can potentially influence test outcomes. They emphasize that a better understanding of the cognitive implications of item formats could potentially facilitate item design decisions for large‐scale educational assessments. 相似文献

11.

Using Focus Groups,Expert Advice,and Cognitive Interviews to Establish the Validity of a College Student Survey 总被引：1，自引：0，他引：1

Ouimet Judith A. Bunnage JoAnne C. Carini Robert M. Kuh George D. Kennedy John 《Research in higher education》2004,45(3):233-250

This study focused on how the design of a national student survey instrument was informed and improved through the combined use of student focus groups, cognitive interviews, and expert survey design advice. We were specifically interested in determining (a) how students interpret the items and response options, (b) the frequency of behaviors or activities associated with the response options, (c) if the items are clearly worded and specific enough to produce reliable and valid results, and (d) if the items and response categories accurately represent students' behaviors and perceptions. We collected focus group data from 8 colleges and universities as part of a nationally funded research project on student engagement. The findings provide additional insight into the importance of using focus groups and cognitive interviews to learn how students interpret various items and what different responses really mean. 相似文献

12.

THE COMPARABILITY OF THE WAIS, WISC, AND WBII

M. Y. QUERESHI JEFFREY M. MILLER 《Journal of Educational Measurement》1970,7(2):105-111

Three Wechsler scales (the Wechsler Adult Intelligence Scale, Wechsler Intelligence Scale for Children, and Wechsler-Bellevue II) were administered in a counterbalanced design to 72 randomly selected 17 year-old high school Ss in order to investigate their comparability by testing the equality of ( a ) means, ( b ) variances, ( c ) reliability coefficients, and ( d ) validity coefficients based on scaled scores and IQs. Results indicated that the subtest scores and IQs for the given three scales were not equivalent. The present findings conform with most of the previous results regarding the comparability of Wechsler scales. Although the three scales investigated all evidence high similarity of item content and format, they clearly fail to meet the statistical criteria of equivalence for 17 year-old subjects. 相似文献

13.

Instructional practices of teachers enrolled in educational technology and general education programs

Marcie J. Bober Howard J. Sullivan Deborah L. Lowther Patrick Harrison 《Educational technology research and development : ETR & D》1998,46(3):81-97

This study investigated classroom practices of 38 teachers enrolled in university masters' degree programs in educational technology and in other areas of education. The classroom practices related to five key concepts associated with educational technology: (a) learner-centered instruction, (b) instructional design, (c) media and technology, (d) assessment, and (e) instructional alignment. Teachers rated their frequency of use of desirable practices in these five areas on a 30-item Likert type survey. In addition, one class of students per teacher rated its own teacher's frequency of use of the practices on 20 items parallel to items on the teacher survey. The mean overall rating across all teachers for the classroom practice items was very close to Often, or 4.0, on the 5-point scale. There were few reported differences between the teachers enrolled in educational technology programs and those enrolled in other education programs. Student ratings indicated less frequent teacher use of the desirable practices on 16 of the 20 common items, with significantly lower student ratings on 8 of these items. However, there was strong teacher-student agreement on several other comparisons.The study reported in this article was conducted as a doctoral dissertation at Arizona State University. 相似文献

14.

STAFF AND ELDERLY KNOWLEDGE AND ATTITUDES TOWARD ELDERLY SEXUALITY

Bonnie L. Walker Nancy J. Osgood James P. Richardson Paul H. Ephross 《Educational gerontology》2013,39(5):471-489

This study compared staff and elderly knowledge, attitudes, and practices related to sexual expression by elderly persons in a long‐term care setting. Volunteers (N = 194) responded agree or disagree to 159 items. Significant differences were observed between the staff and elderly responses on 36 items. Areas of greatest differences involved knowledge and attitudes about consensual sex and sexual abuse, issues related to family attitudes toward remarriage and sexual expression, and age‐related changes and health problems related to sexuality. Items related to masturbation received the greatest percentage of no response. The staff had significantly higher total scores as compared to the elderly reflecting more knowledge, positive attitudes, and support for more proactive responses toward elderly sexuality. Findings have major implications for staff training in long‐term care settings. 相似文献

15.

A systematic procedure for constructing a valid microcomputer attitude scale

Samiha Abdel-Gaid Cecil R. Trueblood Robert L. Shrigley 《科学教学研究杂志》1986,23(9):823-839

The purpose of this study was twofold: (1) to design a system for constructing Likert attitude scales as supported by the sociopsychological and measurement literature, and (2) using the design to assemble a microcomputer attitude scale for inservice and preservice teachers (n = 281). The results of the study: (1) a 15-step flow chart for designing reliable and valid attitude scales, and (2) a 23-item microcomputer Likert attitude scale with the following characteristics: (a) coefficient alpha 0.89, (b) range of adjusted item-total correlations from 0.29 to 0.62, (c) range of interitem correlations from 0.04 to 0.60, (d) correlation of 0.20 with a mathematics attitude scale and 0.02 with a reading attitude scale, and (e) favorable factor analysis and emotional intensity data. 相似文献

16.

Methodological issues in developing a multi-dimensional coding procedure for small-group chat communication

《Learning and Instruction》2007,17(4):394-404

In CSCL research, collaboration through chat has primarily been studied in dyadic settings. This article discusses three issues that emerged during the development of a multi-dimensional coding procedure for small-group chat communication: (a) the unit of analysis and unit fragmentation, (b) the reconstruction of the response structure and (c) determining reliability without overestimation. Threading, i.e. connections between analysis units, proved essential to handle unit fragmentation, to reconstruct the response structure and for reliability of coding. In addition, a risk for reliability overestimation was identified. Implications for analysis methodology in CSCL are discussed. 相似文献

17.

A Method of Self-Evaluation for Counselor Education Utilizing the Measurement of Facilitative Condition

Donald G. Martin George M. Gazda 《Counselor Education & Supervision》1970,9(2):87-92

A pretest-posttest control group design was used to test the value of employing four psychotherapeutic interaction scales for self-evaluation. The counselor-offered conditions of empathy, non-possessive warmth, genuineness, and intensity of interpersonal contact were self-evaluated by 44 counselors following their counseling interviews. These evaluations were compared with supervisors' evaluations of the tape recorded sessions. Findings showed that (a) the gain in offered therapeutic conditions was significant on all scales for the experimental group but on only two scales for the control group; (b) the amount of gain for the experimental group was significantly higher than that of the control group on only one scale (Empathy); (c) counselor/supervisor evaluations showed highly significant concurrent validity; and (d) basic counselor personality orientations such as self-concept strength and defensiveness generally showed no correlation with accuracy of self-evaluation on the scales. 相似文献

18.

Weighting Constructed-Response Items in IRT-Based Exams

《教育实用测度》2013,26(4):257-275

Weighting responses to Constructed-Response (CR) items has been proposed as a way to increase the contribution these items make to the test score when there is insufficient testing time to administer additional CR items. The effect of various types of weighting items of an IRT-based mixed-format writing examination was investigated. Constructed-response items were weighted by increasing their representation according to the test blueprint, by increasing their contribution to the test characteristic curve, by summing the ratings of multiple raters, and by applying optimal weights utilized in IRT pattern scoring. Total score and standard errors of the weighted composite forms of CR and Multiple-Choice (MC) items were compared against each other and against a form containing additional rather than weighted items. Weighting resulted in a slight reduction of test reliability but reduced standard error in portions of the ability scale. 相似文献

19.

IRT‐Estimated Reliability for Tests Containing Mixed Item Formats

Lianghua Shu Richard D. Schwarz 《Journal of Educational Measurement》2014,51(2):163-177

As a global measure of precision, item response theory (IRT) estimated reliability is derived for four coefficients (Cronbach's α, Feldt‐Raju, stratified α, and marginal reliability). Models with different underlying assumptions concerning test‐part similarity are discussed. A detailed computational example is presented for the targeted coefficients. A comparison of the IRT model‐derived coefficients is made and the impact of varying ability distributions is evaluated. The advantages of IRT‐derived reliability coefficients for problems such as automated test form assembly and vertical scaling are discussed. 相似文献

20.

ELEMENTARY SCHOOL TEACHERS’ KNOWLEDGE OF MODEL FUNCTIONS AND MODELING PROCESSES: A COMPARISON OF SCIENCE AND NON-SCIENCE MAJORS

Jing-Wen Lin 《International Journal of Science and Mathematics Education》2014,12(5):1197-1220

This study aimed to: (a) understand practicing teachers’ knowledge of model functions and modeling processes, (b) compare the similarities and differences between the knowledge of science and non-science major teachers, and (c) explore the possible reasons for the similarities and differences between the knowledge of these 2 groups. A 4-point Likert scale questionnaire was developed and used to measure the knowledge of 187 practicing elementary school teachers (94 science majors and 93 non-science majors) on model functions and modeling processes. The author selected 10 target teachers to conduct think-aloud interview and to explore their ranking. One month after completing the questionnaire, 28 volunteer teachers were selected for a follow-up interview to better understand the reasons for their responses. The results show that these teachers tend to agree or strongly agree with the items about model functions and modeling processes. The only significant difference between science and non-science majors was for the item “generating new ideas.” Qualitative analyses of the follow-up interviews and think-aloud results showed that teacher education and professional development did not focus on understanding and using models. Science-major teachers tended to formulate their responses with reference to specific models, while the non-science major teachers’ responses contained acquiescence bias. Finally, implications for science education are discussed. 相似文献