首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 28 毫秒
1.
What would you do if you were asked to design an “optimal” school testing program, but had doubts about the criteria to apply? What strategies might help us make more use of performance assessment and computer-adaptive testing in district-level testing programs? Can an NCME President be whimsical when delivering her presidential address?  相似文献   

2.
In a previous simulation study of methods for assessing differential item functioning (DIF) in computer-adaptive tests (Zwick, Thayer, & Wingersky, 1993, 1994), modified versions of the Mantel-Haenszel and standardization methods were found to perform well. In that study, data were generated using the 3-parameter logistic (3PL) model and this same model was assumed in obtaining item parameter estimates. In the current study, the 3PL data were used but the Rasch model was assumed in obtaining the item parameter estimates, which determined the information table used for item selection. Although the obtained DIF statistics were highly correlated with the generating DIF values, they tended to be smaller in magnitude than in the 3PL analysis, resulting in a lower probability of DIF detection. This reduced sensitivity appeared to be related to a degradation in the accuracy of matching. Expected true scores from the Rasch-based computer-adaptive test tended to be biased downward, particularly for lower-ability examinees  相似文献   

3.
Time limits on some computer-adaptive tests (CATs) are such that many examinees have difficulty finishing, and some examinees may be administered tests with more time-consuming items than others. Results from over 100,000 examinees suggested that about half of the examinees must guess on the final six questions of the analytical section of the Graduate Record Examination if they were to finish before time expires. At the higher-ability levels, even more guessing was required because the questions administered to higher-ability examinees were typically more time consuming. Because the scoring model is not designed to cope with extended strings of guesses, substantial errors in ability estimates can be introduced when CATs have strict time limits. Furthermore, examinees who are administered tests with a disproportionate number of time-consuming items appear to get lower scores than examinees of comparable ability who are administered tests containing items that can be answered more quickly, though the issue is very complex because of the relationship of time and difficulty, and the multidimensionality of the test.  相似文献   

4.
This study focused on the effects of administration mode (computer-adaptive test [CAT] versus self-adaptive test [SAT]), item-by-item answer feedback (present versus absent), and test anxiety on results obtained from computerized vocabulary tests. Examinees were assigned at random to four testing conditions (CAT with feedback, CAT without feedback, SAT with feedback, SAT without feedback). Examinees completed the Test Anxiety Inventory (Spielberger, 1980) before taking their assigned computerized tests. Results showed that the CATs were more reliable and took less time to complete than the SATs. Administration time for both the CATs and SATs was shorter when feedback was provided than when it was not, and this difference was most pronounced for examinees at medium to high levels of test anxiety. These results replicate prior findings regarding the precision and administrative efficiency of CATs and SATs but point to new possible benefits of including answer feedback on such tests.  相似文献   

5.
Are there important aspects of human ability that we have not been measuring? What are the purposes and types of audio that are possible in computerized tests? Will the use of audio in computer‐based tests lead to more valid and reliable measurement?  相似文献   

6.
We developed an empirical Bayes (EB) enhancement to Mantel-Haenszel (MH) DIF analysis in which we assume that the MH statistics are normally distributed and that the prior distribution of underlying DIF parameters is also normal. We use the posterior distribution of DIF parameters to make inferences about the item's true DIF status and the posterior predictive distribution to predict the item's future observed status. DIF status is expressed in terms of the probabilities associated with each of the five DIF levels defined by the ETS classification system: C–, B–, A, B+, and C+. The EB methods yield more stable DIF estimates than do conventional methods, especially in small samples, which is advantageous in computer-adaptive testing. The EB approach may also convey information about DIF stability in a more useful way by representing the state of knowledge about an item's DIF status as probabilistic.  相似文献   

7.
发轫于美国的标准化考试在世界范围内被大规模采用,但是,可以说标准化考试是在人们的批评声中走到今天的。为什么人们在如此强烈批评它的同时仍然一如既往地使用它?到底应该如何评价标准化考试?在分析美国标准化考试的现状、问题之后,从教育和文化的视角对标准化考试进行了审视,认为标准化考试有其存在的合理性,但在试题的设计、考试成绩的使用以及在教育评价指标中所在的权重应作到科学、客观。  相似文献   

8.
Computerized testing has created new challenges for the production and administration of test forms. Many testing organizations engaged in or considering computerized testing may find themselves changing from well-established procedures for handcrafiing a small number of paper-and-pencil test forms to procedures for mass producing many computerized test forms. This paper describes an integratedapproach to test development and administration called computer-adaptive sequential testing, or CAST. CAST is a structured approach to test construction which incorporates both adaptive testing methods with automated test assembly to allow test developers to maintain a greater degree of control over the production, quality assurance, and administration of different types of computerized tests. CAST retains much of the efficiency of traditional computer adaptive testing (CAT) and can be modified for computer mastery testing (CMT) applications. The CAST framework is described in detail and several applications are demonstrated using a medical licensure example.  相似文献   

9.
Results obtained from computer-adaptive and self-adaptive tests were compared under conditions in which item review was permitted and not permitted. Comparisons of answers before and after review within the "review" condition showed that a small percentage of answers was changed (5.23%), that more answers were changed from wrong to right than from right to wrong (by a ratio of 2.92:1), that most examinees (66.5%) changed answers to at least some questions, that most examinees who changed answers improved their ability estimates by doing so (by a ratio of 2.55 to 1), and that review was particularly beneficial to examineees at high ability levels. Comparisons between the "review" and "no-review" conditions yielded no significant differences in ability estimates or in estimated measurement error and provided no trustworthy evidence that test anxiety moderated the effects of review on those indexes. Most examinees desired review, but permitting it increased testing time by 41%.  相似文献   

10.
11.
Why do school districts use Standardized tests? What kinds of testing programs are most common? How much do school testing programs cost? Should publishers coordinate their test standardization efforts?  相似文献   

12.
How can the use of norm-referenced tests be changed to lead to more accurate reporting of results? What kinds of information should states and districts be using to contextualize their reports of test results?  相似文献   

13.
Now that the use of performance assessments for certification, graduation, and classroom use is burgeoning, what are the responsibilities of educators for preparing students for such assessments? Do the same guidelines apply for both multiple-choice tests and performance assessments?  相似文献   

14.
What promises has CRT kept or broken? Why do we need denser tests to make good, criterion-referenced interpretations? What wisdom has grown from experience with CRTs?  相似文献   

15.
How does the use of computerized adaptive testing affect the performance of students from different groups? How consistent were the results of computerized adaptive and “conventional” tests? What did the students think about the test experience? What advice do the authors have for test developers and users?  相似文献   

16.
Digital technologies have been used for measurement purposes and whether the test medium influences the user is an important issue. The aim of this study is to investigate students?? performances and duration differences between online and paper?Cpencil tests. An online testing tool was developed and administered in order to determine the differences between the traditional paper?Cpencil tests and online tests concerning students?? performances and the duration on tests. This tool enables to add questions that utilize an online database and which are in the form of multiple choice (with 5 or 4 options), true?Cfalse, matching, filling in the blanks, with multiple answers, with short answers, with long answers, and it also enables to prepare tests and to turn them into paper?Cpencil test mode. Performance test was applied with both online and paper?Cpencil modes on junior students at one of the universities in Turkey. Besides, the online testing tool developed within the context of the study was evaluated by instructors with respect to usability, relevance to the purpose and design. Instructor and student questionnaires are developed to determine the opinions on the online testing tool and online tests. Results showed that there was no significant differences between the performances on online and paper?Cpencil tests. On the other hand, the time they spent on the online test has been longer than the time they spent on paper?Cpencil test. Students found the online testing tool easy to use and stated that online test medium is more comfortable than paper?Cpencil tests. However, they complained about external noises, tiredness, and focusing problems regarding the online examination mediums. Generally, instructors have also appreciated the online testing tool??s design and they agree on the fact that it serves for its purposes.  相似文献   

17.
What criteria should be applied to the evaluation of performance measures? How consistent are the results from performance measures and norm-referenced achievement tests? How can we ensure fairness and credibility in performance measurement?  相似文献   

18.
Assessing Students' Opportunity to Learn: Teacher and Student Perspectives   总被引:2,自引:0,他引:2  
How can we assess the opportunity that students have to learn the material they find on tests? How do students' perceptions of opportunity to learn differ from their teachers?  相似文献   

19.
What has been the impact of high stakes achievement testing on curriculum and testing practice? How serious is the problem of cheating on Standardized achievement tests? What steps can we take to improve the validity of test-based information for a variety of educational purposes?  相似文献   

20.
How detailed should we make the specifications for educational tests? What should be the role of sample or “illustrative” items? How does the nature of test specifications impact on the usefulness of that test?  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号