首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
文章回顾了纸笔考试评卷的发展历程,重点介绍了纸笔考试主观题智能评卷技术和扫描网上评卷技术,并基于这两大技术的融合,设计了纸笔考试智能网上评卷系统。该系统在大规模教育考试网上评卷中的应用,提升了教育考试评卷工作的质量和效率,有助于推动大规模考试评分系统的智能化升级,并为探索人工智能技术与教育考试评卷行业的应用融合形式、构建人工智能技术辅助大规模教育考试网上评卷应用模式提供参考。  相似文献   

2.
本研究通过实验探索人工智能评测技术在人工网上评卷质量监控中的应用及其他相关应用。实验数据采集自2017年安徽省高考语文作文和英语作文共计841 610份试卷,对智能阅卷产生的机器评分、普通高考人工网上评卷产生的人工1评和人工2评以及报道分进行平均分、标准差、相关度、评分一致率等多个维度的数据分析,将智能阅卷产生的异常作答样本和大分差数据样本反馈给学科专家组进行质检评分。结果表明:智能阅卷基本上达到了与评卷教师相当的水平;智能阅卷始终采用统一的评分标准,更具客观公正性,能为人工网上评卷提供有效的质量监控。  相似文献   

3.
新高考制度下,英语科目增加了主观题型,评卷误差控制面临较大压力。本文分析当前网上评卷工作存在的主要问题,探讨解决误差控制的途径。研究表明,在评分方法改进、评卷队伍建设和动态误差控制三方面均具有人工智能技术融入的可行性,技术使用方需和技术研发方保持常态制度化沟通交流,技术潜力的挖掘才更具精准性。  相似文献   

4.
网上评卷是以计算机网络技术和电子扫描技术为依托,以控制主观题评分误差、实现考试公平性原则为最终目的,将多年来人工评卷积累起来的丰富经验和现代高新技术相结合,由评卷教师在计算机辅助下完成试卷评阅的一种全新工作模式。相对于传统的评卷方式,这种模式最大限度地实现了考试评卷工作的公平、公正。今后还需在图像质量、异常情况处理、主观题评卷质量监控、成本上进一步改进和完善。  相似文献   

5.
网上评卷是以计算机网络技术和电子扫描技术为依托,以控制主观题评分误差、实现考试公平性原则为最终目的,将多年来人工评卷积累起来的丰富经验和现代高新技术相结合,由评卷教师在计算机辅助下完成试卷评阅的一种全新工作模式。相对于传统的评卷方式,这种模式最大限度地实现了考试评卷工作的公平、公正。今后还需在图像质量、异常情况处理、主观题评卷质量监控、成本上进一步改进和完善。  相似文献   

6.
广西壮族自治区招生考试院在1999年全国普通高考评卷工作中,首次对英语科进行网上阅卷试验。一、英语科网上阅卷的方案网上阅卷是以计算机网络技术和电子扫描技术为依托,以控制主观题评卷误差,实现考试公平性原则为最终目的,把多年来人工评卷积累起来的丰富经验和现代高新技术相结合,教师不是对考生的原始答卷直  相似文献   

7.
《河北自学考试》2011,(5):14-14
省招生委员会决定,自今年起,对口招生采煤概论和教育理论科目实施网上评卷。由于我省全国统考试卷和对口招生的语文、数学、英语试卷在之前已全部实施了网上评卷,这样所有高考科目和与高考同期进行考试的对口考试科目均实行网上评卷方式。实行网上评卷,可以最大限度地减少评卷误差,保证评卷工作的公平、公正。  相似文献   

8.
近年来,基于人工智能技术的“机器评卷”研究不断深入,应用实践也日渐增多。北京教育考试院依托国内一流人工智能研究团队,开展了人工智能在大规模高利害英语听说考试中的应用研究。自2018年起,该研究成果在全市中考听说考试评卷中进行了实质性应用,共涉及考生50余万人,取得了良好的效果。为切实解决英语听说考试智能评卷的技术难题,确保公平公正,北京教育考试院联合“科大讯飞”公司,申请了北京市教育科学“十三五”规划优先关注课题“AI在中高考英语听说测评中的应用研究”,力争将研究成果应用在近年的高考英语听说机考中,以助力北京市教育考试招生制度的改革。  相似文献   

9.
《教育与考试》2014,(4):F0002-F0002
2014年,福建省高考评卷总数88.6万份,全部实行网上评卷。网上评卷工作主要包括试卷扫描、试卷评分、成绩合成等三个组成部分。福建师范大学、福州大学两个评卷点承担试卷评分工作。  相似文献   

10.
容理诚 《广东教育》2002,(11):26-27
今年在互联网上评阅高考作文,是一场评卷的革命。一套先进的电子商务模式系统,可供700人同时评卷。先将考生作文扫描,再随机派发给评卷者,评卷者用自己的工作软盘进入系统对作文评分,再分别按顺序点击给分档次、发展等级、缺题目扣分、错别字扣分、提交评卷结果等项目即告OK。网上评卷,实现了真正意义上的“二评”,使评卷结果更加公正。以往的手工评卷,由于一评分数能被二评者看到,二评者不可避免地受到影响。现在则随机派卷,无分数参考。网上评卷,使评卷结果更加客观。以往的手工评卷,评卷者拿到的是以考场试室为单位装订…  相似文献   

11.
为验证并提高PETS五级听力构建题评卷信度,本研究随机抽取考生答卷进行试评,修订标准答案,采集答案修订前后考生答题数据进行分析,并以问卷形式等就如何提高构建题的评卷信度进行研究。研究结果表明,有必要在大规模评卷之前增加一次试评,根据试评中出现的具体情况充实、完善标准答案。为了提高构建题评分信度减少评卷人的主观判断,须加强评卷人培训,严格按照修订后的评卷要求和标准答案评卷并坚持复评制度。  相似文献   

12.
Formula scoring is a procedure designed to reduce multiple-choice test score irregularities due to guessing. Typically, a formula score is obtained by subtracting a proportion of the number of wrong responses from the number correct. Examinees are instructed to omit items when their answers would be sheer guesses among all choices but otherwise to guess when unsure of an answer. Thus, formula scoring is not intended to discourage guessing when an examinee can rule out one or more of the options within a multiple-choice item. Examinees who, contrary to the instructions, do guess blindly among all choices are not penalized by formula scoring on the average; depending on luck, they may obtain better or worse scores than if they had refrained from this guessing. In contrast, examinees with partial information who refrain from answering tend to obtain lower formula scores than if they had guessed among the remaining choices. (Examinees with misinformation may be exceptions.) Formula scoring is viewed as inappropriate for most classroom testing but may be desirable for speeded tests and for difficult tests with low passing scores. Formula scores do not approximate scores from comparable fill-in-the-blank tests, nor can formula scoring preclude unrealistically high scores for examinees who are very lucky.  相似文献   

13.
客观性试题具有命题灵活性大、知识覆盖面广、考查内容偶然性小,评分标准统一、客观、准确,阅卷评分不受评卷人主观因素影响,易于采用计算机阅卷,提高评卷速度,降低考试成本等优点.然而,就其考核效果看,客观性试题除自身无法展示考生的语言组织能力、表达能力、思维过程及写作能力外,还存在一个较大的缺陷,就是无法规避考生猜测答案获取分数的投机行为.这种机会看似对于每一个考生均等公平,但其实不然.对客观性试题评分方法的缺陷作以分析,推算出求相应真实成绩的换算公式,以期得到命题形式的改进.  相似文献   

14.
《教育实用测度》2013,26(2):123-136
College students use information about upcoming tests, including the item formats to be used, to guide their study strategies and allocation of effort, but little is known about how students perceive item formats. In this study, college students rated the dissimilarity of pairs of common item formats (true/false, multiple choice, essay, fill-in-the-blank, matching, short answer, analogy, and arrangement). A multidimensional scaling model with individual differences (INDSCAL) was fit to the data of 11 1 students and suggested that they were using two dimensions to distinguish among these formats. One dimension separated supply from selection items, and the formats' positions on the dimension were related to ratings of difficulty, review time allocated, objectivity, and recognition (as opposed to recall) required. The second dimension ordered item formats from those with few options from which to choose (e.g., true/false) or brief responses (e.g., fill-in-the-blank), to those with many options from which to choose (e.g., matching) or long responses (e.g., essay). These student perceptions are likely to mediate the impact of classroom evaluation on student study strategies and allocation of effort.  相似文献   

15.
ABSTRACT

Automated essay scoring is a developing technology that can provide efficient scoring of large numbers of written responses. Its use in higher education admissions testing provides an opportunity to collect validity and fairness evidence to support current uses and inform its emergence in other areas such as K–12 large-scale assessment. In this study, human and automated scores on essays written by college students with and without learning disabilities and/or attention deficit hyperactivity disorder were compared, using a nationwide (U.S.) sample of prospective graduate students taking the revised Graduate Record Examination. The findings are that, on average, human raters and the automated scoring engine assigned similar essay scores for all groups, despite average differences among groups with respect to essay length and spelling errors.  相似文献   

16.
In a re-marking exercise to investigate the reliability of marking in GCSE English examinations, the Principal Examiners' marks were adopted as 'true' marks for comparison with the actual or 'live' marks that had been awarded by the appointed examiners. A variety of approaches to investigating mark/re-mark reliability collectively led to the conclusion that marking reliability was particularly strong in the Higher tier of the specification that differentiates by outcome.  相似文献   

17.
VB编程题自动评分系统的设计与实现   总被引:2,自引:0,他引:2  
VB编程题的传统人工阅卷缺乏客观性与公正性,并且存在阅卷工作量大等诸多的缺点。针对一般VB程序编程考核题的自动评判的阅卷系统的实现,对多种不同类型考题给出了不同的解决方案,并详细描述了该系统的设计思想、设计目的,给出了阅卷过程的实现要点,经实践该系统用户界面设计友好,易于操作。  相似文献   

18.
随着新课程改革的不断深入,教学理念逐步更新,学生的英语水平也在逐渐提高,但沿用多年的高考英语书面表达的评分标准并没有与时俱进,已经不能完全适应英语教学改革的要求。笔者认为,与《课程标准》相对照,它存在对学生的书面表达能力要求偏低的问题;与托福等考试的写作评分标准相比较,其整体评分方式不确定度相对较大,分项式描述不尽合理。针对上述问题,本文提出了"改良的整体评分法"的建议。  相似文献   

19.
Cindy L. James   《Assessing Writing》2006,11(3):167-178
How do scores from writing samples generated by computerized essay scorers compare to those generated by “untrained” human scorers and what combination of scores, if any, is more accurate at placing students in composition courses? This study endeavored to answer this two-part question by evaluating the correspondence between writing sample scores generated by the IntelliMetric™ automated scoring system and scores generated by University Preparation English faculty, as well as examining the predictive validity of both the automated and human scores. The results revealed significant correlations between the faculty scores and the IntelliMetric™ scores of the ACCUPLACEROnLine WritePlacer Plus test. Moreover, logistic regression models that utilized the IntelliMetric™ scores and average faculty scores were more accurate at placing students (77% overall correct placement rate) than were models incorporating only the average faculty score or the IntelliMetric™ scores.  相似文献   

20.
通过讨论英汉否定是非问句内部在倾向性上的不一致性,着眼问话人心理倾向性考察实际交际中对否定是非问句的回答,结合英汉是非问句系统的的不同特点,来分析否定是非问句两者在答句方面的异同。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号