首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
Croatian 1st‐year and 3rd‐year high‐school students (N = 170) completed a conceptual physics test. Students were evaluated with regard to two physics topics: Newtonian dynamics and simple DC circuits. Students answered test items and also indicated their confidence in each answer. Rasch analysis facilitated the calculation of three linear measures: (a) an item‐difficulty measure based upon all responses, (b) an item‐confidence measure based upon correct student answers, and (c) an item‐confidence measure based upon incorrect student answers. Comparisons were made with regard to item difficulty and item confidence. The results suggest that Newtonian dynamics is a topic with stronger students' alternative conceptions than the topic of DC circuits, which is characterized by much lower students' confidence on both correct and incorrect answers. A systematic and significant difference between mean student confidence on Newtonian dynamics and DC circuits items was found in both student groups. Findings suggest some steps for physics instruction in Croatia as well as areas of further research for those in science education interested in additional techniques of exploring alternative conceptions. © 2005 Wiley Periodicals, Inc. J Res Sci Teach 43: 150–171, 2006  相似文献   

2.
The use of content validity as the primary assurance of the measurement accuracy for science assessment examinations is questioned. An alternative accuracy measure, item validity, is proposed. Item validity is based on research using qualitative comparisons between (a) student answers to objective items on the examination, (b) clinical interviews with examinees designed to ascertain their knowledge and understanding of the objective examination items, and (c) student answers to essay examination items prepared as an equivalent to the objective examination items. Calculations of item validity are used to show that selected objective items from the science assessment examination overestimated the actual student understanding of science content. Overestimation occurs when a student correctly answers an examination item, but for a reason other than that needed for an understanding of the content in question. There was little evidence that students incorrectly answered the items studied for the wrong reason, resulting in underestimation of the students' knowledge. The equivalent essay items were found to limit the amount of mismeasurement of the students' knowledge. Specific examples are cited and general suggestions are made on how to improve the measurement accuracy of objective examinations.  相似文献   

3.
In order to investigate the effect of two item-writing practices on test characteristics, examinations were chosen for study in two undergraduate courses (N = 71 and 210) . About one-fourth of the items on each examination included a practice generally regarded as undesirable in measurement textbooks and alleged to make test items more difficult. Alternate forms which eliminated the undesirable practice were developed and administered at the same time as the original form. Rewriting item stems so that they formed a complete sentence or question resulted in about 6 percent more students answering items correctly. Eliminating unnecessary material in item stems, however, had little effect on difficulty. KR20 values were not appreciably different for the two versions of either test. Neither flaw was found to affect item discrimination indices noticeably. The absence of any substantial practice-by-achievement level interactions suggested little effect of the practices on the validity of the tests.  相似文献   

4.
Creating a sense of community in online classes contributes to student retention and to their overall satisfaction with the course itself. This study aimed to develop a scale of sense of community of students attending online university courses. A series of ordinal exploratory factor analyses were conducted on data obtained from 839 students enrolled in Italian universities. Using an item analysis method, we were able to select the 36 most valid items from an original set of 60 items we had previously defined. These items are distributed across three related factors measuring membership, influence, and fulfillment of needs. This factorial structure replicates the McMillan and Chavis’s model of sense of community, upon the basis of which this scale was developed. The three factors presented good ordinal alpha and adequate convergent/divergent validity coefficients. The scale represents an efficient tool for the design, monitoring, and evaluation of online courses.  相似文献   

5.
We describe the development and validation of a three-tiered diagnostic test of the water cycle (DTWC) and use it to evaluate the impact of prior learning experiences on undergraduates’ misconceptions. While most approaches to instrument validation take a positivist perspective using singular criteria such as reliability and fit with a measurement model, we extend this to a multi-tiered approach which supports multiple interpretations. Using a sample of 130 undergraduate students from two colleges, we utilize the Rasch model to place students and items along traditional one-, two-, and three-tiered scales as well as a misconceptions scale. In the three-tiered and misconceptions scales, high confidence was indicative of mastery. In the latter scale, a ‘misconception’ was defined as mastery of an incorrect concept. We found that integrating confidence into mastery did little to change item functioning; however, three-tiered usage resulted in higher reliability and lower student ability estimates than two-tiered usage. The misconceptions scale showed high efficacy in predicting items on which particular students were likely to express misconceptions, and revealed several tenacious misconceptions that all students were likely to express regardless of ability. Previous coursework on the water cycle did little to change the prevalence of undergraduates’ misconceptions.  相似文献   

6.
Biology student mastery regarding the mechanisms of diffusion and osmosis is difficult to achieve. To monitor comprehension of these processes among students at a large public university, we developed and validated an 18-item Osmosis and Diffusion Conceptual Assessment (ODCA). This assessment includes two-tiered items, some adopted or modified from the previously published Diffusion and Osmosis Diagnostic Test (DODT) and some newly developed items. The ODCA, a validated instrument containing fewer items than the DODT and emphasizing different content areas within the realm of osmosis and diffusion, better aligns with our curriculum. Creation of the ODCA involved removal of six DODT item pairs, modification of another six DODT item pairs, and development of three new item pairs addressing basic osmosis and diffusion concepts. Responses to ODCA items testing the same concepts as the DODT were remarkably similar to responses to the DODT collected from students 15 yr earlier, suggesting that student mastery regarding the mechanisms of diffusion and osmosis remains elusive.  相似文献   

7.
The main issue addressed in this article is that there is much to learn about students’ knowledge and thinking in science from largescale international quantitative studies beyond overall score measures. Response patterns on individual or groups of items can give valuable diagnostic insight into students’ conceptual understanding, but there is also a danger of drawing conclusions that may be too simple and nonvalid. We discuss how responses to multiple-choice items could be interpreted, and we also show how responses on constructed-response items can be systematised and analysed. Finally, we study, empirically, interactions between item characteristics and student responses. It is demonstrated that even small changes in the item wording and/or the item format may have a substantial influence on the response pattern. Therefore, we argue that interpretations of results from these kinds of studies should be based on a thorough analysis of the actual items used. We further argue that diagnostic information should be an integrated part of the international research aims of such large-scale studies. Examples of items and student responses presented are taken from The Third International Mathematics and Science Study (TIMSS).  相似文献   

8.
In this contribution we concentrate on the features of a particular item format: items having as the last option “none of the above” (NOTA items). There is considerable dispute on the advisability of the usage of NOTA items in testing. Some authors come to the conclusion that NOTA items should be avoided, some come to neutral conclusions while others argue that NOTA items are optimal test items. In this article, we provide evidence to this discussion by conducting protocol analysis on written statements of examinees while answering NOTA items. In our investigation, a test containing 30 multiple-choice items was administered from 169 university students. The results show that NOTA options appear to be more attractive than options with specified solutions in those cases where a problemsolver fails. Also, a relationship is found between the quality of (incorrect) problemsolving and the choice of NOTA items: the more qualitative the incorrect problemsolving process is, the more likely the student is to choose for NOTA items. Overall, our research supports the statement that ‘the more confidence an examinee has in his worked solution, which is inconsistent with one of the specified solutions, the more eager he seems to choose “none of the above”.  相似文献   

9.
This study measured and explored the relationships among elementary mathematics teachers’ skill in (a) determining what an item measures, (b) analyzing student work, (c) providing targeted feedback, and (d) determining next instructional steps. Twenty-three elementary mathematics teachers were randomly assigned to one of three conditions: analyzing items and student responses without rubrics, analyzing items and student responses with rubrics, or analyzing items and student responses with rubrics after watching a professional development program on providing feedback to students. Findings show there is a moderate to strong relationship between teachers’ abilities to analyze student responses to infer what a student knows and can do and their abilities to take action based on that information through either providing the student feedback or making appropriate instructional adaptations. Findings show it was relatively more difficult for teachers to provide feedback that was likely to move students forward in their learning than it was for them to analyze a student's response or to determine next instructional steps. No teacher skill differences associated with the different treatment conditions were found.  相似文献   

10.
《教育实用测度》2013,26(2):123-136
College students use information about upcoming tests, including the item formats to be used, to guide their study strategies and allocation of effort, but little is known about how students perceive item formats. In this study, college students rated the dissimilarity of pairs of common item formats (true/false, multiple choice, essay, fill-in-the-blank, matching, short answer, analogy, and arrangement). A multidimensional scaling model with individual differences (INDSCAL) was fit to the data of 11 1 students and suggested that they were using two dimensions to distinguish among these formats. One dimension separated supply from selection items, and the formats' positions on the dimension were related to ratings of difficulty, review time allocated, objectivity, and recognition (as opposed to recall) required. The second dimension ordered item formats from those with few options from which to choose (e.g., true/false) or brief responses (e.g., fill-in-the-blank), to those with many options from which to choose (e.g., matching) or long responses (e.g., essay). These student perceptions are likely to mediate the impact of classroom evaluation on student study strategies and allocation of effort.  相似文献   

11.
Information about how success and gender affect students' views of ideal and actual classroom role behavior can help both researchers and teachers better understand classroom components such as achievement and curriculum. A 20-item double Q sort was used to measure differences in perceptions of high school science students according to letter grades and gender. Individual Q sort item rankings of 160 students were tested for significant differences according to letter grade received; item ratings were compared according to gender for 215 students. Differences in perception according to success were found for both ideal and actual behavior; 8 and 5 items, respectively, out of each 20-item sort were found to be significant at the p = < 0.05 level. No such overall patterns of difference in view were found between boys and girls, although three ideal student items and one actual self-report item were found to be significantly different at the p = < 0.05 level.  相似文献   

12.
This study was conducted with 330 Form 4 (grade 10) students (aged 15??C?16?years) who were involved in a course of instruction on electrolysis concepts. The main purposes of this study were (1) to assess high school chemistry students?? understanding of 19 major principles of electrolysis using a recently developed 2-tier multiple-choice diagnostic instrument, the Electrolysis Diagnostic Instrument (EDI), and (2) to assess students?? confidence levels in displaying their knowledge and understanding of these electrolysis concepts. Analysis of students?? responses to the EDI showed that they displayed very limited understanding of the electrolytic processes involving molten compounds and aqueous solutions of compounds, with a mean score of 6.82 (out of a possible maximum of 17). Students were found to possess content knowledge about several electrolysis processes but did not provide suitable explanations for the changes that had occurred, with less than 45?% of students displaying scientifically acceptable understandings about electrolysis. In addition, students displayed limited confidence about making the correct selections for the items; yet, in 16 of the 17 items, the percentage of students who were confident that they had selected the correct answer to an item was higher than the actual percentage of students who correctly answered the corresponding item. The findings suggest several implications for classroom instruction on the electrolysis topic that need to be addressed in order to facilitate better understanding by students of electrolysis concepts.  相似文献   

13.
Test items become easier when a representational picture visualizes the text item stem; this is referred to as the multimedia effect in testing. To uncover the processes underlying this effect and to understand how pictures affect students' item-solving behavior, we recorded the eye movements of sixty-two schoolchildren solving multiple-choice (MC) science items either with or without a representational picture. Results show that the time students spent fixating the picture was compensated for by less time spent reading the corresponding text. In text-picture items, students also spent less time fixating incorrect answer options; a behavior that was associated with better test scores in general. Detailed gaze likelihood analyses revealed that the picture received particular attention right after item onset and in the later phase of item solving. Hence, comparable to learning, pictures in tests seemingly boost students' performance because they may serve as mental scaffolds, supporting comprehension and decision making.  相似文献   

14.
Item nonresponses are prevalent in standardized testing. They happen either when students fail to reach the end of a test due to a time limit or quitting, or when students choose to omit some items strategically. Oftentimes, item nonresponses are nonrandom, and hence, the missing data mechanism needs to be properly modeled. In this paper, we proposed to use an innovative item response time model as a cohesive missing data model to account for the two most common item nonresponses: not-reached items and omitted items. In particular, the new model builds on a behavior process interpretation: a person chooses to skip an item if the required effort exceeds the implicit time the person allocates to the item (Lee & Ying, 2015; Wolf, Smith, & Birnbaum, 1995), whereas a person fails to reach the end of the test due to lack of time. This assumption was verified by analyzing the 2015 PISA computer-based mathematics data. Simulation studies were conducted to further evaluate the performance of the proposed Bayesian estimation algorithm for the new model and to compare the new model with a recently proposed “speed-accuracy + omission” model (Ulitzsch, von Davier, & Pohl, 2019). Results revealed that all model parameters could recover properly, and inadequately accounting for missing data caused biased item and person parameter estimates.  相似文献   

15.
This research explored the measurement characteristics of two science examinations and the potential to use access arrangements data to investigate how students requiring reading support are affected by features of exam questions. For two science examinations, traditional and Rasch analyses provided estimates of difficulty and information on item functioning. For one examination, the performance of students eligible for support from a reader in exams was compared to a ‘norm’ group. For selected items a sample of student responses were analysed. A number of factors potentially making questions easier, more difficult or potentially contributing to problems with item functioning were identified. A number of features that may particularly influence those requiring reading support were also identified.  相似文献   

16.
In an exploratory study, education majors in a physical science course were given a set of tasks analogous to a given, solved prototype-task to see how transfer items were handled. Some students were given a conceptual model along with the solved prototype. Others were given a general procedure for applying the conceptual model to the transfer items. The procedure helped considerably for the transfer items least like the prototype item. The model alone was also effective for certain items. In the absence of both model and procedure, students' problem solving was usually incoherent or self-contradictory. Presenting additional solved items helped marginally on an exceptionally novel item. Students' main source of difficulty, given the model and procedure, was that they were distracted by prior, concrete experience and thus failed to follow the procedure. For most students, this difficulty could readily be overcome. A small proportion (10–15%) of students had more profound difficulties.  相似文献   

17.
Traditional item analyses such as classical test theory (CTT) use exam-taker responses to assessment items to approximate their difficulty and discrimination. The increased adoption by educational institutions of electronic assessment platforms (EAPs) provides new avenues for assessment analytics by capturing detailed logs of an exam-taker's journey through their exam. This paper explores how logs created by EAPs can be employed alongside exam-taker responses and CTT to gain deeper insights into exam items. In particular, we propose an approach for deriving features from exam logs for approximating item difficulty and discrimination based on exam-taker behaviour during an exam. Items for which difficulty and discrimination differ significantly between CTT analysis and our approach are flagged through outlier detection for independent academic review. We demonstrate our approach by analysing de-identified exam logs and responses to assessment items of 463 medical students enrolled in a first-year biomedical sciences course. The analysis shows that the number of times an exam-taker visits an item before selecting a final response is a strong indicator of an item's difficulty and discrimination. Scrutiny by the course instructor of the seven items identified as outliers suggests our log-based analysis can provide insights beyond what is captured by traditional item analyses.

Practitioner notes

What is already known about this topic
  • Traditional item analysis is based on exam-taker responses to the items using mathematical and statistical models from classical test theory (CTT). The difficulty and discrimination indices thus calculated can be used to determine the effectiveness of each item and consequently the reliability of the entire exam.
What this paper adds
  • Data extracted from exam logs can be used to identify exam-taker behaviours which complement classical test theory in approximating the difficulty and discrimination of an item and identifying items that may require instructor review.
Implications for practice and/or policy
  • Identifying the behaviours of successful exam-takers may allow us to develop effective exam-taking strategies and personal recommendations for students.
  • Analysing exam logs may also provide an additional tool for identifying struggling students and items in need of revision.
  相似文献   

18.
This study explores measurement of a construct called knowledge integration in science using multiple-choice and explanation items. We use construct and instructional validity evidence to examine the role multiple-choice and explanation items plays in measuring students' knowledge integration ability. For construct validity, we analyze item properties such as alignment, discrimination, and target range on the knowledge integration scale using a Rasch Partial Credit Model analysis. For instructional validity, we test the sensitivity of multiple-choice and explanation items to knowledge integration instruction using a cohort comparison design. Results show that (1) one third of correct multiple-choice responses are aligned with higher levels of knowledge integration while three quarters of incorrect multiple-choice responses are aligned with lower levels of knowledge integration, (2) explanation items discriminate between high and low knowledge integration ability students much more effectively than multiple-choice items, (3) explanation items measure a wider range of knowledge integration levels than multiple-choice items, and (4) explanation items are more sensitive to knowledge integration instruction than multiple-choice items.  相似文献   

19.
《Educational Assessment》2013,18(4):317-340
A number of methods for scoring tests with selected-response (SR) and constructed-response (CR) items are available. The selection of a method depends on the requirements of the program, the particular psychometric model and assumptions employed in the analysis of item and score data, and how scores are to be used. This article compares 3 methods: unweighted raw scores, Item Response Theory pattern scores, and weighted raw scores. Student score data from large-scale end-of-course high school tests in Biology and English were used in the comparisons. In the weighted raw score method evaluated in this study, the CR items were weighted so that SR and CR items contributed the same number of points toward the total score. The scoring methods were compared for the total group and for subgroups of students in terms of the resultant scaled score distributions, standard errors of measurement, and proficiency-level classifications. For most of the student ability distribution, the three scoring methods yielded similar results. Some differences in results are noted. Issues to be considered when selecting a scoring method are discussed.  相似文献   

20.
Mathematical word problems represent a common item format for assessing student competencies. Automatic item generation (AIG) is an effective way of constructing many items with predictable difficulties, based on a set of predefined task parameters. The current study presents a framework for the automatic generation of probability word problems based on templates that allow for the generation of word problems involving different topics from probability theory. It was tested in a pilot study with N = 146 German university students. The items show a good fit to the Rasch model. Item difficulties can be explained by the Linear Logistic Test Model (LLTM) and by the random-effects LLTM. The practical implications of these findings for future test development in the assessment of probability competencies are also discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号