首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 593 毫秒
1.
1985年度全国高考和广东省高考试题中部分地采用了标准化命题,在全国引起了很大的反响,许多报章、杂志刊载了给予高度评价的文章,并肯定其为试题改革的一个正确方向.我们也认为标准化命题有很多优点,值得采用并推广.1.标准化命题在阅卷评分工作中取得省时高效高质量的效果.对于考生众多,时间紧迫的高校招生工作来说,采用标准化命题无疑是一个关键性措施,而且为今后应用计算机阅卷打下了基础.不仅如此,标准化命题还能使评分工作更为公允.以往命题对阅卷人员依赖性较  相似文献   

2.
在考试命题时,我们经常会碰到一对难以处理好的矛盾,那就是如何做到既要考查到学生的能力,又要提高考试评分的客观性.处理好这一对矛盾,是实现命题科学化、标准化的一个重要前提.我们知道,从能否客观地评卷记分来区分,试题通常可以分成两大类,一类是主观性试题,一类是客观性试题.主观性试题如作文、问答题等.这类试题可以用来考查学生收集材料、组织材料、分析问题、解决问题和评价鉴赏等高层次的学习能力.但是这类试题有一个最大的弱点,就是阅卷时非常容易受教师的主观因素影响.比如教师个人的爱好、业务素养、阅卷时的情绪、阅卷先后的次序、对评分标准的理解等,都会影响答卷的得分,不同教师打出的分数可以相差非常大,评分很不客观.  相似文献   

3.
物理试卷中主观题设计例析   总被引:1,自引:0,他引:1  
主观题是指试题的答案完全由答题者给出,命题者提供答案要点和评分标准,评分教师根据对评分标准的理解,通过人工阅卷进行评分的试题形式,此类试题在评分时受阅卷者主观因素的影响较大。而客观性试题的答案已被明确给出,答题者只须从中作出  相似文献   

4.
试题的类型很多,一般分为主观性试题和客观性试验两类。传统试题中的论述题、计算题、证明题、简答题、实验操作题、作文题等均属主观性试题,而是非判断题、选择题、匹配题、填充题,排列题等则属于客观性试题。两类试题从容量、复盖面、拟题拼卷的难易、阅卷评分的误差等方面来看,此长彼短,互为补充。因此,尽管在标准化考试中大量使用客观性试题,但主观性试题仍不应该被片面废止。一、客观性试题这类试题有两个优势:一是考生答卷省时,使题量增多,扩大了复盖面,二是阅卷评分客观、易  相似文献   

5.
所谓标准化考试,是指对考试制定出客观而规范的标准,从命题到考试、阅卷、评分和计分等各个环节都力求减少各种误差,以求真实测出考生的实际成绩的客观性测试。标准化测试的主要特点是:第一,试题取样范围广、题量多、知识覆盖面宽,因而考试有较高的信度和效度;第二,试题的难度容易调整到适中程度,因而有利于区分出考生的不同程度的语文水平和语文能力;第三,试题的答案简单、明确,因而有利于评分的客观、准确;第四,由于从试题的命定到阅卷评分等各个测试环节都努力减少主观因素的影响,因而使学生的得分更可靠。近年来,全国高考语文科标准化测试所使用的主要题型是:  相似文献   

6.
去年我市举行中学生物理竞赛,采用标准化命题方式,选用客观选择题的形式,考后用先进的“小孔”法(标准计数法)阅卷,最后又进行数据统测与推断。整个竞赛的安排有如下特点:一、命题:1.据考生的实际知识水平确定考试范围和难易程度,确保难度适当。2.根据知识能力双向表来确定考查的知识点和分层次命题,以提高试题的科学性。3.试题形式一律采用选择题,这样能消除评卷人的主观影响,从而大大提高试题的客观性。  相似文献   

7.
客观性试题(如选择题、判断题、填空题等),试题只有一种或几种固定明确的答案,阅卷人的主观见解,·不构成对判卷给分的干抚。在语文测试逐步向标准化靠拢的今天,客观性试题已越来越多地进入了语文测试领域。判断题'少月是非题。它的特点是只给一个含义完整的命题,让考生判断这个命题的是非对错。和选择题相比,判断题题面可以节省许多文字,没有几个答案的干状,含义单纯,使学生答题可以加快速度。虽然,一道判断题所及的覆盖面比选择题会小一些,但就全卷总量而言,  相似文献   

8.
高考数学阅卷坚持公平、公正、公开的原则,本着一切为考生负责的精神,制定细致、严谨、合理的评分细则,每道试题都经过初评、复评,如果不匹配则三评,三评再不匹配,则由命题长仲裁.  相似文献   

9.
近几年来在各种考试中风行客观性试题,尤以选择题为甚。“客观性试题”有其显著的优点:是非得失,一目了然,判卷评分,肯定精确,而且选择题还有可借计算机阅卷,可建立题库,可大面积考核应试者的知识点及认识层次等优点。因此受到命题者的青睐。然而,客观试题中的“客观”只是对判卷评分而言,对学生的解题思维过程及思维方法的评判并不客观。第一选择题只要求学生写解答结果,看不出学生的  相似文献   

10.
初中数学学业考试(以下简称中考)需要统一制订评分标准,统一组织考试,统一阅卷和评定成绩.评分标准是一份试卷的一个重要组成部分,制定评分标准是一个技术性很强的工作,是中考命题工作中一个重要的环节,它直接影响着考生的考试成绩,对考试效度起着很重要的作用.1制定评分标准的原则评分标准一般包括标准答案或有关评分要点及各解答步骤的分数分配方案等,它是阅卷者评分的依据.和命制试题相比较,评分标准的制定并没有引起有关部门的足够重视.减少试题赋分的随意性,让不同能力和水平的学生得到相应的分数,对充分发挥考试评价在促进学生发展和…  相似文献   

11.
The standard error of measurement usefully provides confidence limits for scores in a given test, but is it possible to quantify the reliability of a test with just a single number that allows comparison of tests of different format? Reliability coefficients do not do this, being dependent on the spread of examinee attainment. Better in this regard is a measure produced by dividing the standard error of measurement by the test's ‘reliability length’, the latter defined as the maximum possible score minus the most probable score obtainable by blind guessing alone. This, however, can be unsatisfactory with negative marking (formula scoring), as shown by data on 13 negatively marked true/false tests. In these the examinees displayed considerable misinformation, which correlated negatively with correct knowledge. Negative marking can improve test reliability by penalizing such misinformation as well as by discouraging guessing. Reliability measures can be based on idealized theoretical models instead of on test data. These do not reflect the qualities of the test items, but can be focused on specific test objectives (e.g. in relation to cut‐off scores) and can be expressed as easily communicated statements even before tests are written.  相似文献   

12.
In higher education, a multiple choice test is a widely known format for measuring student's knowledge. The debate about the two most commonly used scoring methods for multiple choice assessment – number right scoring (NR) and negative marking (NM) – seems to be a never-ending story. Both NR scoring as NM do not seem to meet the expectations. However, available research hardly offers alternative methods. Clearly, there is a growing need to explore these alternative scoring methods in order to inform and support test designers. This review aims to present an overview of (alternative) scoring methods for multiple choice tests, in which strengths and weaknesses of each method are provided.  相似文献   

13.
计算机自动评分(CAS)用于自学考试外语类课程的翻译测验评分,能够有效提高评分效率及客观性。本研究对72名自考学习者翻译测验作答数据的计算机自动评分结果与人工评分结果进行相关分析及配对样本t检验,并将两种评分方式的诊断结果进行比较。研究发现,计算机自动评分与人工评分结果高度相关,两种评分方式的翻译测验总分无显著差异,总体而言本次翻译测验自动评分结果是可靠的;但计算机自动评分与人工评分对自考学习者的翻译能力结构诊断结果有一定差异。  相似文献   

14.
Adaptive Comparative Judgement (ACJ) is a modification of Thurstone’s method of comparative judgement that exploits the power of adaptivity, but in scoring rather than testing. Professional judgement by teachers replaces the marking of tests; a judge is asked to compare the work of two students and simply to decide which of them is the better. From many such comparisons a measurement scale is created showing the relative quality of students’ work; this can then be referenced in familiar ways to generate test results. The judges are asked only to make a valid decision about quality, yet ACJ achieves extremely high levels of reliability, often considerably higher than practicable operational marking can achieve. It therefore offers a radical alternative to the pursuit of reliability through detailed marking schemes. ACJ is clearly appropriate for performances like writing or art, and for complex portfolios or reports, but may be useful in other contexts too. ACJ offers a new way to involve all teachers in summative as well as formative assessment. The model provides strong statistical control to ensure quality assessment for individual students. This paper describes the theoretical basis of ACJ, and illustrates it with outcomes from some of our trials.  相似文献   

15.
TEM4听写采用的是较传统的数错扣分法。数错扣分法是负分法,其中存在一些问题。因此我们提出一种实验性的评分方法——部分得分制。实验数据有两组,分别采用TEM4听写评分制和新评分制。数据比较以及部分得分模型(Rasch模型之一)对实验量表效能的分析(如模型与数据拟合值、被试拟合值、信息函数等)说明,实验评分制能较好地测量大多数学生的听写水平。  相似文献   

16.
章对英语测试中的三大客观试题类型进行了综合分析。由于其客观性、灵活性以及评分迅速(对繁忙的英语教师尤其如此)等因素,三大客观试题被广泛采用。然而,其中所存在的问题也是不容忽视的。章对这些问题进行了总结和分析,并提出了行之有效的解决方法。  相似文献   

17.
Examiners seeking guidance on multiple‐choice and true/false tests are likely to encounter various faulty or questionable ideas. Twelve of these are discussed in detail, having to do mainly with the effects on test reliability of test length, guessing and scoring method (i.e. number‐right scoring or negative marking). Some misunderstandings could be based on evidence from tests that were badly written or administered, while others may have arisen through the misinterpretation of reliability coefficients. The usefulness of item response theory in the analysis of academic test items is briefly dismissed.  相似文献   

18.
The teaching and assessment of essay writing at primary schools throughout Vietnam is regulated by the Ministry of Education and Training. The analytical error-recognition method of assessment, however, does not facilitate direct interpretation of students’ writing competence. In this study, which involved samples of Grade 5 students in five provinces in Vietnam, a combination of traditional and partial credit scoring rubrics was developed to enable data analysis using the Rasch model. Based on such analysis, a continuum of writing ability at Grade 5 level was identified and a mastery level defined in terms of writing skills. The study has implications for possible changes in future assessment and marking schemes.  相似文献   

19.
研究人工智能网上评卷技术的应用,内容包括:英语、数学填空题智能评分,辅助人工评卷一致性质检;语文、英语作文题型进行相似卷检测,辅助人工评分合适性质检;语文、英语作文题、政治、历史简答题等题型智能评分,辅助人工评卷大分差评分质检。经实验验证并在2018-2020年某省大规模考试网上评卷中应用,结果表明:人工智能评卷结果与人工评卷结果具有高度一致性,质量检测成效明显,可通过海量信息检索进行相似卷检测。  相似文献   

20.
The impact of the Subset Selection Technique (SST) for administering and scoring multiple-choice items on certain properties of a test was compared with that of the two other commonly used methods, the Number Right (NR) and the Correction for Guessing Formula (CFG). Under SST, examinees are instructed to select any number of response alternatives, the objective being to include the correct answer in the chosen set. The effects of each scoring method on the psychometric properties of a test and on the performance of examinees with different achievement levels and/or risk-taking propensities were investigated. Results indicated that SST outperformed the other two methods, producing not only higher reliability and validity coefficients for the test, but doing so without favoring high risk takers. The superiority of SST may be attributed to two interrelated factors: the efficiency of the technique in controlling for guessing and the encouragement provided examinees to use their partial knowledge in responding.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号