排序方式: 共有27条查询结果,搜索用时 234 毫秒
1.
Our study explored the prospects and limitations of using machine-learning software to score introductory biology students’ written explanations of evolutionary change. We investigated three research questions: 1) Do scoring models built using student responses at one university function effectively at another university? 2) How many human-scored student responses are needed to build scoring models suitable for cross-institutional application? 3) What factors limit computer-scoring efficacy, and how can these factors be mitigated? To answer these questions, two biology experts scored a corpus of 2556 short-answer explanations (from biology majors and nonmajors) at two universities for the presence or absence of five key concepts of evolution. Human- and computer-generated scores were compared using kappa agreement statistics. We found that machine-learning software was capable in most cases of accurately evaluating the degree of scientific sophistication in undergraduate majors’ and nonmajors’ written explanations of evolutionary change. In cases in which the software did not perform at the benchmark of “near-perfect” agreement (kappa > 0.80), we located the causes of poor performance and identified a series of strategies for their mitigation. Machine-learning software holds promise as an assessment tool for use in undergraduate biology education, but like most assessment tools, it is also characterized by limitations. 相似文献
2.
Harnessing technology to improve formative assessment of student conceptions in STEM: forging a national network 总被引:2,自引:0,他引:2
Haudek KC Kaplan JJ Knight J Long T Merrill J Munn A Nehm R Smith M Urban-Lurain M 《CBE life sciences education》2011,10(2):149-155
Concept inventories, consisting of multiple-choice questions designed around common student misconceptions, are designed to reveal student thinking. However, students often have complex, heterogeneous ideas about scientific concepts. Constructed-response assessments, in which students must create their own answer, may better reveal students' thinking, but are time- and resource-intensive to evaluate. This report describes the initial meeting of a National Science Foundation-funded cross-institutional collaboration of interdisciplinary science, technology, engineering, and mathematics (STEM) education researchers interested in exploring the use of automated text analysis to evaluate constructed-response assessments. Participants at the meeting shared existing work on lexical analysis and concept inventories, participated in technology demonstrations and workshops, and discussed research goals. We are seeking interested collaborators to join our research community. 相似文献
3.
4.
Transforming Biology Assessment with Machine Learning: Automated Scoring of Written Evolutionary Explanations 总被引:4,自引:4,他引:0
Ross H. Nehm Minsu Ha Elijah Mayfield 《Journal of Science Education and Technology》2012,21(1):183-196
This study explored the use of machine learning to automatically evaluate the accuracy of students’ written explanations of
evolutionary change. Performance of the Summarization Integrated Development Environment (SIDE) program was compared to human
expert scoring using a corpus of 2,260 evolutionary explanations written by 565 undergraduate students in response to two
different evolution instruments (the EGALT-F and EGALT-P) that contained prompts that differed in various surface features
(such as species and traits). We tested human-SIDE scoring correspondence under a series of different training and testing
conditions, using Kappa inter-rater agreement values of greater than 0.80 as a performance benchmark. In addition, we examined
the effects of response length on scoring success; that is, whether SIDE scoring models functioned with comparable success
on short and long responses. We found that SIDE performance was most effective when scoring models were built and tested at
the individual item level and that performance degraded when suites of items or entire instruments were used to build and
test scoring models. Overall, SIDE was found to be a powerful and cost-effective tool for assessing student knowledge and
performance in a complex science domain. 相似文献
5.
Evaluating Instrument Quality in Science Education: Rasch‐based analyses of a Nature of Science test
Irene Neumann Knut Neumann Ross Nehm 《International Journal of Science Education》2013,35(10):1373-1405
Given the central importance of the Nature of Science (NOS) and Scientific Inquiry (SI) in national and international science standards and science learning, empirical support for the theoretical delineation of these constructs is of considerable significance. Furthermore, tests of the effects of varying magnitudes of NOS knowledge on domain‐specific science understanding and belief require the application of instruments validated in accordance with AERA, APA, and NCME assessment standards. Our study explores three interrelated aspects of a recently developed NOS instrument: (1) validity and reliability; (2) instrument dimensionality; and (3) item scales, properties, and qualities within the context of Classical Test Theory and Item Response Theory (Rasch modeling). A construct analysis revealed that the instrument did not match published operationalizations of NOS concepts. Rasch analysis of the original instrument—as well as a reduced item set—indicated that a two‐dimensional Rasch model fit significantly better than a one‐dimensional model in both cases. Thus, our study revealed that NOS and SI are supported as two separate dimensions, corroborating theoretical distinctions in the literature. To identify items with unacceptable fit values, item quality analyses were used. A Wright Map revealed that few items sufficiently distinguished high performers in the sample and excessive numbers of items were present at the low end of the performance scale. Overall, our study outlines an approach for how Rasch modeling may be used to evaluate and improve Likert‐type instruments in science education. 相似文献
6.
7.
8.
Elizabeth P. Beggrow Minsu Ha Ross H. Nehm Dennis Pearl William J. Boone 《Journal of Science Education and Technology》2014,23(1):160-182
The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices—such as explanation, argumentation, and communication—in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students’ written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students’ normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education. 相似文献
9.
Automated computerized scoring systems (ACSSs) are being increasingly used to analyze text in many educational settings. Nevertheless, the impact of misspelled words (MSW) on scoring accuracy remains to be investigated in many domains, particularly jargon-rich disciplines such as the life sciences. Empirical studies confirm that MSW are a pervasive feature of human-generated text and that despite improvements, spell-check and auto-replace programs continue to be characterized by significant errors. Our study explored four research questions relating to MSW and text-based computer assessments: (1) Do English language learners (ELLs) produce equivalent magnitudes and types of spelling errors as non-ELLs? (2) To what degree do MSW impact concept-specific computer scoring rules? (3) What impact do MSW have on computer scoring accuracy? and (4) Are MSW more likely to impact false-positive or false-negative feedback to students? We found that although ELLs produced twice as many MSW as non-ELLs, MSW were relatively uncommon in our corpora. The MSW in the corpora were found to be important features of the computer scoring models. Although MSW did not significantly or meaningfully impact computer scoring efficacy across nine different computer scoring models, MSW had a greater impact on the scoring algorithms for naïve ideas than key concepts. Linguistic and concept redundancy in student responses explains the weak connection between MSW and scoring accuracy. Lastly, we found that MSW tend to have a greater impact on false-positive feedback. We discuss the implications of these findings for the development of next-generation science assessments. 相似文献
10.
This study investigated whether or not an increase in secondary science teacher knowledge about evolution and the nature of
science gained from completing a graduate-level evolution course was associated with greater preference for the teaching of
evolution in schools. Forty-four precertified secondary biology teachers participated in a 14-week intervention designed to
address documented misconceptions identified by a precourse instrument. The course produced statistically significant gains
in teacher knowledge of evolution and the nature of science and a significant decrease in misconceptions about evolution and
natural selection. Nevertheless, teachers’ postcourse preference positions remained unchanged; the majority of science teachers
still preferred that antievolutionary ideas be taught in school. 相似文献