首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In automated test assembly (ATA), the methodology of mixed‐integer programming is used to select test items from an item bank to meet the specifications for a desired test form and optimize its measurement accuracy. The same methodology can be used to automate the formatting of the set of selected items into the actual test form. Three different cases are discussed: (i) computerized test forms in which the items are presented on a screen one at a time and only their optimal order has to be determined; (ii) paper forms in which the items need to be ordered and paginated and the typical goal is to minimize paper use; and (iii) published test forms with the same requirements but a more sophisticated layout (e.g., double‐column print). For each case, a menu of possible test‐form specifications is identified, and it is shown how they can be modeled as linear constraints using 0–1 decision variables. The methodology is demonstrated using two empirical examples.  相似文献   

2.
The central purposes of this study were to review the development and evolution of the Scientific Attitude Inventory (SAI) and then reevaluate the psychometric properties of the revised form of the SAI, the Scientific Attitude Inventory II (SAI‐II). The SAI‐II was administered to a convenience sample of 543 middle and high school students from five teachers in four schools in four school districts in San Antonio, Texas, at the beginning of the 2004–2005 school year. Confirmatory factor analysis on the full data set failed to support the existence of a 12‐factor structure (as proposed by the scale developers) or a one‐factor structure. The data were then randomly divided into exploratory [exploratory factor analysis (EFA)] validation and confirmatory [confirmatory factor analysis (CFA)] cross‐validation sets. Exploratory and confirmatory models yielded a three‐factor solution that did not fit the data well [χ2 (321) = 646, p < .001; RMSEA = .061 (.90 CI = .054–.068); and CFI = .81]. The three factors were labeled “Science is About Understanding and Explaining” (13 items), “Science is Rigid” (6 items), and “I Want to Be a Scientist” (8 items). The α‐coefficients for these three factors ranged from 0.59 to 0.85. Whether these identified subscales are valid will require independent investigation. In this sample, and consistent with prior publications, the SAI‐II in its current form did not have satisfactory psychometric properties and cannot be recommended for further use. © 2008 Wiley Periodicals, Inc. J Res Sci Teach 45: 600–616, 2008  相似文献   

3.
The development of self‐regulation has been studied primarily in Western middle‐class contexts and has, therefore, neglected what is known about culturally varying self‐concepts and socialization strategies. The research reported here compared the self‐regulatory competencies of German middle‐class (= 125) and rural Cameroonian Nso preschoolers (= 76) using the Marshmallow test (Mischel, 2014). Study 1 revealed that 4‐year‐old Nso children showed better delay‐of‐gratification performance than their German peers. Study 2 revealed that culture‐specific maternal socialization goals and interaction behaviors were related to delay‐of‐gratification performance. Nso mothers’ focus on hierarchical relational socialization goals and responsive control seems to support children's delay‐of‐gratification performance more than German middle‐class mothers’ emphasis on psychological autonomous socialization goals and sensitive, child‐centered parenting.  相似文献   

4.
This paper illustrates that the psychometric properties of scores and scales that are used with mixed‐format educational tests can impact the use and interpretation of the scores that are reported to examinees. Psychometric properties that include reliability and conditional standard errors of measurement are considered in this paper. The focus is on mixed‐format tests in situations for which raw scores are integer‐weighted sums of item scores. Four associated real‐data examples include (a) effects of weights associated with each item type on reliability, (b) comparison of psychometric properties of different scale scores, (c) evaluation of the equity property of equating, and (d) comparison of the use of unidimensional and multidimensional procedures for evaluating psychometric properties. Throughout the paper, and especially in the conclusion section, the examples are related to issues associated with test interpretation and test use.  相似文献   

5.
Innovative educational strategies can provide variety and enhance student learning while addressing complex logistical and financial issues facing modern anatomy education. Observe‐Reflect‐Draw‐Edit‐Repeat (ORDER), a novel cyclical artistic process, has been designed based on cognitivist and constructivist learning theories, and on processes of critical observation, reflection and drawing in anatomy learning. ORDER was initially investigated in the context of a compulsory first year surface anatomy practical (ORDER‐SAP) at a United Kingdom medical school in which a cross‐over trial with pre‐post anatomy knowledge testing was utilized and student perceptions were identified. Despite positive perceptions of ORDER‐SAP, medical student (n = 154) pre‐post knowledge test scores were significantly greater (P < 0.001) with standard anatomy learning methods (3.26, SD = ±2.25) than with ORDER‐SAP (2.17, ±2.30). Based on these findings, ORDER was modified and evaluated in the context of an optional self‐directed gross anatomy online interactive tutorial (ORDER‐IT) for participating first year medical students (n = 55). Student performance was significantly greater (P < 0.001) with ORDER‐IT (2.71 ± 2.17) when compared to a control tutorial (1.31 ± 2.03). Performances of students with visual and artistic preferences when using ORDER were not significantly different (P > 0.05) to those students without these characteristics. These findings will be of value to anatomy instructors seeking to engage students from diverse learning backgrounds in a research‐led, innovative, time and cost‐effective learning method, in the context of contrasting learning environments. Anat Sci Educ 10: 7–22. © 2016 American Association of Anatomists.  相似文献   

6.
Two methods of local linear observed‐score equating for use with anchor‐test and single‐group designs are introduced. In an empirical study, the two methods were compared with the current traditional linear methods for observed‐score equating. As a criterion, the bias in the equated scores relative to true equating based on Lord's (1980) definition of equity was used. The local method for the anchor‐test design yielded minimum bias, even for considerable variation of the relative difficulties of the two test forms and the length of the anchor test. Among the traditional methods, the method of chain equating performed best. The local method for single‐group designs yielded equated scores with bias comparable to the traditional methods. This method, however, appears to be of theoretical interest because it forces us to rethink the relationship between score equating and regression.  相似文献   

7.
As access and reliance on technology continue to increase, so does the use of computerized testing for admissions, licensure/certification, and accountability exams. Nonetheless, full computer‐based test (CBT) implementation can be difficult due to limited resources. As a result, some testing programs offer both CBT and paper‐based test (PBT) administration formats. In such situations, evidence that scores obtained from different formats are comparable must be gathered. In this study, we illustrate how contemporary statistical methods can be used to provide evidence regarding the comparability of CBT and PBT scores at the total test score and item levels. Specifically, we looked at the invariance of test structure and item functioning across test administration mode across subgroups of students defined by SES and sex. Multiple replications of both confirmatory factor analysis and Rasch differential item functioning analyses were used to assess invariance at the factorial and item levels. Results revealed a unidimensional construct with moderate statistical support for strong factorial‐level invariance across SES subgroups, and moderate support of invariance across sex. Issues involved in applying these analyses to future evaluations of the comparability of scores from different versions of a test are discussed.  相似文献   

8.
Increasing evidence indicates that individuals with Intellectual Disabilities (ID) might benefit from phonics‐based reading instruction. However, research and instruction in this field has predominantly focused on sight word reading. Models for complex interventions recommend that feasibility research be conducted prior to conducting randomised studies to assess efficacy of interventions (Thabane et al., 2010). The aim of the current paper is therefore to investigate feasibility questions relating to conducting a full‐scale randomised controlled trial (RCT) evaluation of an online, phonics‐based reading programme (Headsprout? Early Reading; HER) with children with ID. Employing a randomised pre‐test post‐test group design, this study explores and trials important aspects of a RCT evaluation to inform a full‐scale RCT. We also found that HER had a significant effect on reading skills when compared with ‘education as usual’, with large effect sizes on the main outcome measure. This indicates that further, more robust evaluations using HER with children with ID are a worthwhile pursuit.  相似文献   

9.
Orlando and Thissen's S‐X 2 item fit index has performed better than traditional item fit statistics such as Yen's Q1 and McKinley and Mill's G2 for dichotomous item response theory (IRT) models. This study extends the utility of S‐X 2 to polytomous IRT models, including the generalized partial credit model, partial credit model, and rating scale model. The performance of the generalized S‐X 2 in assessing item model fit was studied in terms of empirical Type I error rates and power and compared to G2. The results suggest that the generalized S‐X 2 is promising for polytomous items in educational and psychological testing programs.  相似文献   

10.
In criterion‐referenced tests (CRTs), the traditional measures of reliability used in norm‐referenced tests (NRTs) have often proved problematic because of NRT assumptions of one underlying ability or competency and of variance in the distribution of scores. CRTs, by contrast, are likely to be created when mastery of the skill or knowledge by all or most all test takers is expected and thus little variation in the scores is expected. A comprehensive CRT often measures a number of discrete tasks that may not represent a single unifying ability or competence. Hence, CRTs theoretically violate the two most essential assumptions of classic NRT re liability theory and they have traditionally required the logistical problems of multiple test administrations to the same test takers to estimate reliability. A review of the literature categorizes approaches to reliability for CRTs into two classes: estimates sensitive to all measures of error and estimates of consistency in test outcome. For single test administration of CRTs Livingston's k2is recommended for estimating all measures of error, Sc is proposed for estimates of consistency in test outcome. Both approaches compared using data from a CRT exam and recommendations for interpretation and use are proposed.  相似文献   

11.
By 12 months, children grasp that a phonetic change to a word can change its identity (phonological distinctiveness). However, they must also grasp that some phonetic changes do not (phonological constancy). To test development of phonological constancy, sixteen 15‐month‐olds and sixteen 19‐month‐olds completed an eye‐tracking task that tracked their gaze to named versus unnamed images for familiar words spoken in their native (Australian) and an unfamiliar non‐native (Jamaican) regional accent of English. Both groups looked longer at named than unnamed images for Australian pronunciations, but only 19‐month‐olds did so for Jamaican pronunciations, indicating that phonological constancy emerges by 19 months. Vocabulary size predicted 15‐month‐olds' identifications for the Jamaican pronunciations, suggesting vocabulary growth is a viable predictor for phonological constancy development.  相似文献   

12.
In this article, linear item response theory (IRT) observed‐score equating is compared under a generalized kernel equating framework with Levine observed‐score equating for nonequivalent groups with anchor test design. Interestingly, these two equating methods are closely related despite being based on different methodologies. Specifically, when using data from IRT models, linear IRT observed‐score equating is virtually identical to Levine observed‐score equating. This leads to the conclusion that poststratification equating based on true anchor scores can be viewed as the curvilinear Levine observed‐score equating.  相似文献   

13.
Preparation of tests and student's assessment by the instructor are time consuming. We address these two tasks in neuroanatomy education by employing a digital media application with a three‐dimensional (3D), interactive, fully segmented, and labeled brain atlas. The anatomical and vascular models in the atlas are linked to Terminologia Anatomica. Because the cerebral models are fully segmented and labeled, our approach enables automatic and random atlas‐derived generation of questions to test location and naming of cerebral structures. This is done in four steps: test individualization by the instructor, test taking by the students at their convenience, automatic student assessment by the application, and communication of the individual assessment to the instructor. A computer‐based application with an interactive 3D atlas and a preliminary mobile‐based application were developed to realize this approach. The application works in two test modes: instructor and student. In the instructor mode, the instructor customizes the test by setting the scope of testing and student performance criteria, which takes a few seconds. In the student mode, the student is tested and automatically assessed. Self‐testing is also feasible at any time and pace. Our approach is automatic both with respect to test generation and student assessment. It is also objective, rapid, and customizable. We believe that this approach is novel from computer‐based, mobile‐based, and atlas‐assisted standpoints. Anat Sci Educ 2:244–252, 2009. © 2009 American Association of Anatomists.  相似文献   

14.
The Wechsler Intelligence Scale for Children–Third Edition (WISC‐III) and the Stanford‐Binet Intelligence Scale–Fourth Edition (SB‐IV), were administered to 20 gifted children and 20 non‐gifted children to examine the extent of the difference in IQ scores obtained on the two tests and whether order effects were present. Results show that the SB‐IV Composite Score was significantly higher than the WISC‐III Full Scale IQ for both groups. However, for the gifted group, unlike the non‐gifted group, this difference achieved significance only when the SB‐IV was administered first. When either IQ test was administered to the gifted students for the first time, without the confound of a learning influence, there was no significant difference in mean scores. However, when both tests were administered, it was found that the SB‐IV influenced the WISC‐III Full Scale IQ in a downward direction whereas the WISC‐III influenced the SB‐IV Composite Score in an upward direction. © 2002 Wiley Periodicals, Inc.  相似文献   

15.
Short and long‐term effects of a treatment for dyslexia are evaluated. The treatment is based on psycholinguistic theory and assumes that dyslexia is due to poor lexico‐phonological processing of words. The treatment is computer‐based and focuses on learning to recognise and to make use of the phonological and morphological structure of Dutch words. The results of the treatment were clear improvements in reading words, reading text and spelling. Effect sizes of standardised treatment gains were large (Cohen's d>0.80 for all variables). Following the treatment, participants attained an average level of text‐reading and spelling. The attained level of reading words and reading text was found to be stable over a four‐year follow‐up period. Spelling showed a slight decline one year after the treatment, but remained stable thereafter. 1 1. A preliminary report of the data was presented at the World congress on dyslexia, September 1997, Thessaloniki, Greece.  相似文献   

16.
This article illustrates that not all statistical software packages are correctly calculating a p‐value for the classical F test comparison of two independent Normal variances. This is illustrated with a simple example, and the reasons why are discussed. Eight different software packages are considered.  相似文献   

17.
Quality of healthcare delivery is dependent on collaboration between professional disciplines. Integrating opportunities for interprofessional learning in health science education programs prepares future clinicians to function as effective members of a multi‐disciplinary care team. This study aimed to create a modified team‐based learning (TBL) environment utilizing ultrasound technology during an interprofessional learning activity to enhance musculoskeletal anatomy knowledge of first year medical (MD) and physical therapy (PT) students. An ultrasound demonstration of structures of the upper limb was incorporated into the gross anatomy courses for first‐year MD (n = 53) and PT (n = 28) students. Immediately before the learning experience, all students took an individual readiness assurance test (iRAT) based on clinical concepts regarding the assigned study material. Students observed while a physical medicine and rehabilitation physician demonstrated the use of ultrasound as a diagnostic and procedural tool for the shoulder and elbow. Following the demonstration, students worked within interprofessional teams (n = 14 teams, 5–6 students per team) to review the related anatomy on dissected specimens. At the end of the session, students worked within interprofessional teams to complete a collaborative clinical case‐based multiple choice post‐test. Team scores were compared to the mean individual score within each team with the Wilcoxon signed‐rank test. Students scored higher on the collaborative post‐test (95.2 ±10.2%) than on the iRAT (66.1 ± 13.9% for MD students and 76.2 ±14.2% for PT students, P < 0.0001). Results suggest that this interprofessional team activity facilitated an improved understanding and clinical application of anatomy. Anat Sci Educ 11: 94–99. © 2017 American Association of Anatomists.  相似文献   

18.
Changes in medical school curricula often require educators to develop teaching strategies that decrease contact hours while maintaining effective pedagogical methods. When faced with this challenge, faculty at the University of Cincinnati College of Medicine converted the majority of in‐person histology laboratory sessions to self‐study modules that utilize multiple audiovisual modalities and a virtual microscope platform. Outcomes related to this shift were investigated through performance on in‐house examinations, results of the United States Medical Licensing Examination® (USMLE®) Step 1 Examination, and student feedback. Medical School College Admissions Test® (MCAT®) scores were used as a covariate when comparing in‐house examinations. Results revealed no significant change in performance on in‐house examinations when the content being assessed was controlled (F(2, 506) = 0.676, P = 0.51). A significant improvement in overall practical examination grade averages was associated with the self‐study modules (F(6, 1164) = 10.213, P < 0.01), but gradual changes in examination content may explain this finding. The histology and cell biology portion of USMLE Step 1 Examination remained consistent throughout the time period that was investigated. Student feedback regarding the self‐study modules was positive and suggested that features such as instructor narrated videos were an important component of the self‐study modules because they helped recreate the experience of in‐person laboratory sessions. Positive outcomes from the student perspective and no drop in examination performance suggests that utilizing self‐study modules for histology laboratory content may be an option for educators faced with the challenge of reducing contact hours without eliminating content. Anat Sci Educ 10: 276–285. © 2016 American Association of Anatomists.  相似文献   

19.
For open‐mindedness to be an Aristotelian personal virtue, its possession must make agents better off. Unfortunately, open‐mindedness does not currently pay. The reasons include (1) novelty glut — taking seriously even a tiny percentage of the worthwhile, available ideas would be overwhelming; and (2) deception campaigns — we lack the time, sophistication, and knowledge to uncover the truth ourselves. Our best coping strategy is closed‐mindedness, that is, to ignore whatever we encounter unless vouched for by trusted experts. However, as Jessica Gottlieb and Howard Curzer argue in this article, student learning demands open‐mindedness. Although open‐mindedness is a personal vice, it is a student‐role virtue. Thus, teachers must buffer their classrooms against those features of the contemporary world that make open‐mindedness counterproductive. Teachers can counter these threats by using core practices that are general (such as facilitating classroom discussions) and content‐specific (for example, engaging students in scientific investigations). Core practices enable teachers to craft environments and experiences that make open‐mindedness great again.  相似文献   

20.
Recent research has shown that infants are more likely to engage with in‐group over out‐group members. However, it is not known whether infants' learning is influenced by a model's group membership. This study investigated whether 14‐month‐olds (= 66) selectively imitate and adopt the preferences of in‐group versus out‐group members. Infants watched an adult tell a story either in their native language (in‐group) or a foreign language (out‐group). The adult then demonstrated a novel action (imitation task) and chose 1 of 2 objects (preference task). Infants did not show selectivity in the preference task, but they imitated the in‐group model more faithfully than the out‐group model. This suggests that cultural learning is beginning to be truly cultural by 14 months of age.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号