首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This study examined the underlying structure of the Nonverbal Literacy Assessment (NVLA), an instrument designed to measure emergent literacy for K–fourth‐grade students with severe developmental disabilities. The NVLA was conceptualized as having six constructs that reflected emergent literacy skills: (a) phonemic awareness, (b) phonics, (c) comprehension, (d) vocabulary, (e) listening comprehension, and (f) text awareness. Confirmatory factor analysis using data from 207 student administrations was used to examine the six‐factor model and two alternative models. Results suggested that all three models fit the data, but the high corrections coefficients among the constructs suggested that a one‐factor model of emergent literacy was the best‐fitting model. Implications and limitations are discussed. © 2010 Wiley Periodicals, Inc.  相似文献   

2.
This article investigates the effect of the number of item response categories on chi‐square statistics for confirmatory factor analysis to assess whether a greater number of categories increases the likelihood of identifying spurious factors, as previous research had concluded. Four types of continuous single‐factor data were simulated for a 20‐item test: (a) uniform for all items, (b) symmetric unimodal for all items, (c) negatively skewed for all items, or (d) negatively skewed for 10 items and positively skewed for 10 items. For each of the 4 types of distributions, item responses were divided to yield item scores with 2,4, or 6 categories. The results indicated that the chi‐square statistic for evaluating a single‐factor model was most inflated (suggesting spurious factors) for 2‐category responses and became less inflated as the number of categories increased. However, the Satorra‐Bentler scaled chi‐square tended not to be inflated even for 2‐category responses, except if the continuous item data had both negatively and positively skewed distributions.  相似文献   

3.
Item stem formats can alter the cognitive complexity as well as the type of abilities required for solving mathematics items. Consequently, it is possible that item stem formats can affect the dimensional structure of mathematics assessments. This empirical study investigated the relationship between item stem format and the dimensionality of mathematics assessments. A sample of 671 sixth-grade students was given two forms of a mathematics assessment in which mathematical expression (ME) items and word problems (WP) were used to measure the same content. The effects of mathematical language and reading abilities in responding to ME and WP items were explored using unidimensional and multidimensional item response theory models. The results showed that WP and ME items appear to differ with regard to the underlying abilities required to answer these items. Hence, the multidimensional model fit the response data better than the unidimensional model. For the accurate assessment of mathematics achievement, students’ reading and mathematical language abilities should also be considered when implementing mathematics assessments with ME and WP items.  相似文献   

4.
Incremental rehearsal (IR) is a highly effective intervention that uses high repetition and a high ratio of known to unknown items with linearly spaced known items between the new items. It has been hypothesized that narrowly spaced practice would result in quick learning, whereas items that are widely spaced would result in longer‐term retention. The current study examined the effect of spacing by teaching vocabulary words to 36 fourth‐grade students. Each student was randomly assigned to a widely spaced IR condition (i.e., one unknown item, one known item, one unknown item, two known items, one unknown item, three known items, and an increase in the number of known items presented each time by one) or an IR condition in which spacing increased exponentially (IR‐Exp; i.e., one unknown item, one known item, one unknown item, two known items, one unknown item, four known items, and one unknown item, eight known items). The results indicated that the students in the study retained twice as much information with the widely spaced IR than with the IR‐Exp condition, but the latter required half as much time. IR and IR‐Exp were equally efficient, but IR continues to be superior to all other flashcard approaches in improving retention.  相似文献   

5.
This study describes the development of an instrument to investigate the extent to which student‐centered actions are occurring in science classrooms. The instrument was developed through the following five stages: (1) student action identification, (2) use of both national and international content experts to establish content validity, (3) refinement of the item pool based on reviewer comments, (4) pilot testing of the instrument, and (5) statistical reliability and item analysis leading to additional refinement and finalization of the instrument. In the field test, the instrument consisted of 26 items separated into four categories originally derived from student‐centered instruction literature and used by the authors to sort student actions in previous research. The SACS was administered across 22 Grade 6–8 classrooms by 22 groups of observers, with a total of 67 SACS ratings completed. The finalized instrument was found to be internally consistent, with acceptable estimates from inter‐rater intraclass correlation reliability coefficients at the p < 0.01 level. After the final stage of development, the SACS instrument consisted of 24 items separated into three categories, which aligned with the factor analysis clustering of the items. Additionally, concurrent validity of the SACS was established with the Reformed Teaching Observation Protocol. Based on the analyses completed, the SACS appears to be a useful instrument for inclusion in comprehensive assessment packages for illuminating the extent to which student‐centered actions are occurring in science classrooms.  相似文献   

6.
学生的数学素养具有多维结构,素养导向的数学学业成就测评需要提供被试在各维度上的表现信息,而不仅是一个单一的总分。以PISA数学素养结构为理论模型,以多维项目反应理论(MIRT)为测量模型,利用R语言的MIRT程序包处理和分析某地区8年级数学素养测评题目数据,研究数学素养的多维测量方法。结果表明:MIRT兼具单维项目反应理论和因子分析的优点,利用其可对测试的结构效度和测试题目质量进行分析,以及对被试进行多维能力认知诊断。  相似文献   

7.
We propose a structural equation model, which reduces to a multidimensional latent class item response theory model, for the analysis of binary item responses with nonignorable missingness. The missingness mechanism is driven by 2 sets of latent variables: one describing the propensity to respond and the other referred to the abilities measured by the test items. These latent variables are assumed to have a discrete distribution, so as to reduce the number of parametric assumptions regarding the latent structure of the model. Individual covariates can also be included through a multinomial logistic parameterization for the distribution of the latent variables. Given the discrete nature of this distribution, the proposed model is efficiently estimated by the expectation–maximization algorithm. A simulation study is performed to evaluate the finite-sample properties of the parameter estimates. Moreover, an application is illustrated with data coming from a student entry test for the admission to some university courses.  相似文献   

8.
Michael Scriven has suggested that student rating forms, for the purpose of evaluating college teaching, be designed for multiple audiences (instructor, administrator, student), and with a single global item for summative functions (determination of merit, retention, or promotion). This study reviewed approaches to rating form construction, e.g., factor analytic strategies of Marsh, and recommended the multiple audience design of Scriven. An empirical test of the representativeness of the single global item was reported from an analysis of 1,378 forms collected in a university department of education. The global item correlated most satisfactorily with other items, a computed total of items, items that represented underlying factors, and various triplets of items selected to represent all possible combinations of items. It was concluded that a multiple audience rating form showed distinct advantages in design and that the single global item most fairly and highly represented the overall teaching performance, as judged by students, for decisions about retention, promotion, and merit made by administrators.  相似文献   

9.
Teachers' self‐efficacy (SE) in their classroom management capabilities is thought to be an important factor in teachers' overall judgements of their teaching SE. Low SE in classroom management has been linked to teacher attrition and burnout, and reduced student learning outcomes. This article provides the first comprehensive review of classroom management as a factor in the construct of teacher SE. Twenty‐five peer‐reviewed articles published from 1984 to 2009 that reported on the use of SE scales containing at least one novel classroom management self‐efficacy (CMSE) item were reviewed. The validity and reliability of CMSE scales and items were found to be very good, with classroom management items pertaining to maintaining order and control the most frequent category included. Approximately one in four items in the SE scales reviewed was CMSE item, and, in general, CMSE items were not linked explicitly to classroom management research or contemporary psychological or philosophical approaches.  相似文献   

10.
2004年高考(上海卷)地理试卷包含两大部分:选择题和综合分析题。选择题部分共20题,每题2分,计40分。综合分析题部分有八大题,34个小题, 110个得分点。主要从经典的试题分析、考试结果的信度、考试效度的内容和结构方面的证据以及考试对教育教学的影响等几个角度对地理考试进行评价,得出下列结论:地理考试的能力目标是根据课程标准制定的,命题以课程标准为依据,难度略偏易,有一定的区分度;试卷的题量适中;选择题与非选择题比例适中,对学校的教育和教学有较好的导向作用。然而,综合分析题部分图文信息阅读量较大,应答文字表述较少,难以比较系统地考查考生独立的地理思维能力,这对教学的导向是不利的。  相似文献   

11.

This study investigated a set of variables – designated by marketing researchers as psychographic – and their relationship to non‐residential college students' concerns centering around commuting to class. A fourteen item instrument was developed and responded to by 425 students. A factor analysis revealed four factors and the dominant factor was comprised of items such as car security, parking, physical security and traffic. Other factors such as education quality items (i.e, library, faculty) accounted for a small percentage of the variance. A discriminant function analysis of the psycho‐graphic variables compared with six student time‐distance groups did not find differences among the groups by various class attendance times and commuting distances. Research limitations and future research directions are offered.  相似文献   

12.
In typical differential item functioning (DIF) assessments, an item's DIF status is not influenced by its status in previous test administrations. An item that has shown DIF at multiple administrations may be treated the same way as an item that has shown DIF in only the most recent administration. Therefore, much useful information about the item's functioning is ignored. In earlier work, we developed the Bayesian updating (BU) DIF procedure for dichotomous items and showed how it could be used to formally aggregate DIF results over administrations. More recently, we extended the BU method to the case of polytomously scored items. We conducted an extensive simulation study that included four “administrations” of a test. For the single‐administration case, we compared the Bayesian approach to an existing polytomous‐DIF procedure. For the multiple‐administration case, we compared BU to two non‐Bayesian methods of aggregating the polytomous‐DIF results over administrations. We concluded that both the BU approach and a simple non‐Bayesian method show promise as methods of aggregating polytomous DIF results over administrations.  相似文献   

13.
Automatic item generation (AIG)—a means of leveraging technology to create large quantities of items—requires a minimum number of items to offset the sizable upfront investment (i.e., model development and technology deployment) in order to achieve cost savings. In this cost–benefit analysis, we estimated the cost of each step of AIG and manual item writing and applied cost—benefit formulas to calculate the number of items that would have to be produced before the upfront costs of AIG outweigh manual item writing costs in the context of K‐12 mathematics items. Results indicated that AIG is more cost‐effective than manual item writing when developing, at a minimum, 173 to 247 items within one fine‐grained content area (e.g., fourth‐ through seventh‐grade area of figures). The article concludes with a discussion of implications for test developers and the nonmonetary tradeoffs involved in AIG.  相似文献   

14.
Examined in this study were the effects of reducing anchor test length on student proficiency rates for 12 multiple‐choice tests administered in an annual, large‐scale, high‐stakes assessment. The anchor tests contained 15 items, 10 items, or five items. Five content representative samples of items were drawn at each anchor test length from a small universe of items in order to investigate the stability of equating results over anchor test samples. The operational tests were calibrated using the one‐parameter model and equated using the mean b‐value method. The findings indicated that student proficiency rates could display important variability over anchor test samples when 15 anchor items were used. Notable increases in this variability were found for some tests when shorter anchor tests were used. For these tests, some of the anchor items had parameters that changed somewhat in relative difficulty from one year to the next. It is recommended that anchor sets with more than 15 items be used to mitigate the instability in equating results due to anchor item sampling. Also, the optimal allocation method of stratified sampling should be evaluated as one means of improving the stability and precision of equating results.  相似文献   

15.
16.
The reading data from the 1983–84 National Assessment of Educational Progress survey were scaled using a unidimensional item response theory model. To determine whether the responses to the reading items were consistent with unidimensionality, the full-information factor analysis method developed by Bock and associates (1985) and Rosenbaum's (1984) test of unidimensionality, conditional (local) independence, and monotonicity were applied. Full-information factor analysis involves the assumption of a particular item response function; the number of latent variables required to obtain a reasonable fit to the data is then determined. The Rosenbaum method provides a test of the more general hypothesis that the data can be represented by a model characterized by unidimensionality, conditional independence, and monotonicity. Results of both methods indicated that the reading items could be regarded as measures of a single dimension. Simulation studies were conducted to investigate the impact of balanced incomplete block (BIB) spiraling, used in NAEP to assign items to students, on methods of dimensionality assessment. In general, conclusions about dimensionality were the same for BIB-spiraled data as for complete data.  相似文献   

17.
Changes to the design and development of our educational assessments are resulting in the unprecedented demand for a large and continuous supply of content‐specific test items. One way to address this growing demand is with automatic item generation (AIG). AIG is the process of using item models to generate test items with the aid of computer technology. The purpose of this module is to describe and illustrate a template‐based method for generating test items. We outline a three‐step approach where test development specialists first create an item model. An item model is like a mould or rendering that highlights the features in an assessment task that must be manipulated to produce new items. Next, the content used for item generation is identified and structured. Finally, features in the item model are systematically manipulated with computer‐based algorithms to generate new items. Using this template‐based approach, hundreds or even thousands of new items can be generated with a single item model.  相似文献   

18.
Croatian 1st‐year and 3rd‐year high‐school students (N = 170) completed a conceptual physics test. Students were evaluated with regard to two physics topics: Newtonian dynamics and simple DC circuits. Students answered test items and also indicated their confidence in each answer. Rasch analysis facilitated the calculation of three linear measures: (a) an item‐difficulty measure based upon all responses, (b) an item‐confidence measure based upon correct student answers, and (c) an item‐confidence measure based upon incorrect student answers. Comparisons were made with regard to item difficulty and item confidence. The results suggest that Newtonian dynamics is a topic with stronger students' alternative conceptions than the topic of DC circuits, which is characterized by much lower students' confidence on both correct and incorrect answers. A systematic and significant difference between mean student confidence on Newtonian dynamics and DC circuits items was found in both student groups. Findings suggest some steps for physics instruction in Croatia as well as areas of further research for those in science education interested in additional techniques of exploring alternative conceptions. © 2005 Wiley Periodicals, Inc. J Res Sci Teach 43: 150–171, 2006  相似文献   

19.
A questionnaire used in student evaluations of interdisciplinary courses during six semesters contained two Likert items stated in a direct negative mode which were embedded in a questionnaire (14–18 items) in which the remaining items were phrased in a direct positive mode. In the seventh semester and thereafter, the two negative items were restated as direct positive stems. Item‐analysis demonstrated that in the direct negative mode, the two items had low item‐to‐total correlations and that the internal consistency reliability of the sum score could be improved by eliminating the two negatively phrased items. Also, the two negatively worded items defined a separate factor. After they were reworded into a direct positive mode, these two items showed markedly improved item‐to‐total correlations. Moreover, the unique factor disappeared, which suggests that it was a methodological artefact probably attributable to respondent carelessness. Including a few negative items in an otherwise positively stated questionnaire leads to ambiguity of results rather than controlling for response sets. We therefore recommend against the practice.  相似文献   

20.
An important assumption of item response theory is item parameter invariance. Sometimes, however, item parameters are not invariant across different test administrations due to factors other than sampling error; this phenomenon is termed item parameter drift. Several methods have been developed to detect drifted items. However, most of the existing methods were designed to detect drifts in individual items, which may not be adequate for test characteristic curve–based linking or equating. One example is the item response theory–based true score equating, whose goal is to generate a conversion table to relate number‐correct scores on two forms based on their test characteristic curves. This article introduces a stepwise test characteristic curve method to detect item parameter drift iteratively based on test characteristic curves without needing to set any predetermined critical values. Comparisons are made between the proposed method and two existing methods under the three‐parameter logistic item response model through simulation and real data analysis. Results show that the proposed method produces a small difference in test characteristic curves between administrations, an accurate conversion table, and a good classification of drifted and nondrifted items and at the same time keeps a large amount of linking items.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号