首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 250 毫秒
1.
以Learning Styles Questionnaire在国内的使用进行效度研究为目的,在全国范围内选取了304个样本进行测验,问卷回收后利用SPSS15.0和VB6.0对数据进行统计分析,测验量表在中国使用的信度和效度。其结果为:内部一致性系数为r=0.766,复本信度为0.943、0.543、0.695、0.673,效标效度系数为0.626。研究结果表明Learning Styles Questionnaire在国内使用信度和效度尚可以接受,但有待于进一步提高。  相似文献   

2.
赵世明 《中国考试》2006,(10):30-34
分类一致性是标准参照测验信度研究的重要内容,而在国内的各种资格认证测验和水平考试中却鲜见应用和报告。本文尝试利用分半方法估计分类一致性信度指标。结果表明,在分类一致性信度指标的关键特征上,分半法估计结果与两次施测或同质复本方法的结果是一致的。此种方法对于大规模资格认证羽验来说是可行的,即容易解释和理解,更便于实际操作。  相似文献   

3.
信度与效度是衡量心理测验质量的重要指标。增加测验长度是提高测验信度和效度的重要方法之一。由于各学者对效度的代表符号规定不一,导致对于说明测验长度与测验效度关系的公式及其计算结果难于理解和解释。根据经典测量理论对代表测验长度与测验效度关系的公式进行分析和讨论,提出测验效度与长度关系的表达公式。  相似文献   

4.
基于认知诊断评估理论和技术,本研究采用文献分析、专家研讨和口语报告法构建5~6岁儿童"数与运算"领域数学问题解决的认知模型,对627名5~6岁儿童进行认知诊断评估,运用认知诊断模型—数据拟合对认知模型和诊断测验进行质量检验。结果显示,认知属性对题目难度的解释量为0.879,效应量为0.834,平均HCI为0.749;认知诊断测验题目的平均区分度为0.707(标准差为0.299);除两题的题目拟合度S-X^2指标的显著性水平<0.01,其他题目的拟合度都较好;测验的信度为0.95;平均掌握概率与测验总分呈单调递增曲线。以上结果说明,本研究构建的儿童数学问题解决认知模型良好,编制的认知诊断测验具有较理想的测量信度与效度,可以作为评估儿童数学问题解决的有效工具,并可以提供精细丰富的诊断信息,从而为相关教育教学及干预研究提供科学依据。  相似文献   

5.
BP神经网络是目前应用最广泛的人工神经网络模型之一,在分类和识别上表现出良好的特性,因此被研究者用于认知诊断评估以对被试进行诊断分类。通过模拟研究,考查属性个数、属性层级关系、测验长度、题目质量、测试样本量5个因素对BP神经网络在认知诊断中分类准确性的影响。结果表明:1)基于BP神经网络的认知诊断分类准确率不依赖于测试样本量;2)题目质量和测验长度对BP神经网络的诊断准确率有显著的积极影响;3)属性个数对BP神经网络的分类准确率有消极影响;4)题目质量一定程度上会影响BP诊断方法在不同属性层级结构上的分类准确率。  相似文献   

6.
本研究的目的是建立一种适用于社区保健系统的2—4岁儿童视力的筛查测验。本研究对我们编制的儿童视力筛查测验的可行性、信度、效度等方面进行了检验和分析。本测验的部分项目是从ATYCAR视觉测验中选择的,部分是自己设计的。选择玩具、图卡和小物体作为视标,令儿童匹配,观察行为表现,作为视觉反应的指标。结果表明,此测验具有较高的信度、效度和可行性,同时也适用于不能配合其它视觉测验的儿童。可用于我国城、乡地区的保健系统进行儿童视力筛查,使视力障碍儿童得到进一步诊断和干预。  相似文献   

7.
对1989-2008年国内发表的有关明尼苏达多相人格测验(MMPI)的文章进行信度概化研究.对MMPI的10个临床量表和3个效度量表信度系数的报告情况、信度水平和变异性进行描述性分析;以样本类型、样本量等作为预测变量,探讨影响MMPI量表信度水平的因素.在此基础上,与国外关于MMPI的信度概化研究结果进行比较,结果表明二者在信度水平、信度系数的变异性及其预测源方面都存在异同.  相似文献   

8.
本研究对数字加工和计算能力测验进行了修订,修订工作主要包括三个部分:(1)测验的翻译;(2)预测验;(3)大样本测验。测验的分半信度为0.86,克伦巴赫α系数为0.93,被教师诊断为数字加工和计算障碍的学生几乎在所有的子测验成绩上都低于正常学生。这些结果表明,此测验是一个信度和效度均较高的测验。  相似文献   

9.
选取认知诊断研究中常见的一般化模型G-DINA、连接型约束模型NC-RRUM和DINA、补偿型约束模型C-RUM和DINO,从横向加工机制和纵向层级关系两个方面开展对比研究,考察不同类型诊断模型在英语阅读测试方面的适切性。使用似然比检验方法对比各类模型在相对拟合指标与绝对拟合指标上的差异,使用模型分类的一致性指标和精准度指标考察诊断的信度和效度。结果表明:1)G-DINA和NC-RRUM模型与阅读测试数据的拟合度较好,二者显著高于其他模型,其中,一般化G-DINA模型属性分类一致性较高,约束化NC-RRUM模型属性分类精准度最优;2)诊断模型与测试数据的拟合优度随着属性层级结构的削弱而增加,结构关系最为松散的独立结构模型的数据拟合度最佳,表明阅读能力不具备严格的层级关系。该结果可为研究人员探究智能化阅读诊断提供依据,为英语教师在阅读诊断实践中的模型选择提供参考。  相似文献   

10.
从问卷项目内部一致性信度系数、内容效度、构想效度等方面,对开放教育毕业生追踪调查问卷的结构设计进行了信度分析和效度分析。认为三份问卷都有较好的信度,《毕业生质量评价》和《学习效果评价》问卷的结构设计合理有效,《电大人才培养方式》问卷的结构设计比较特殊,需要作一些改进。  相似文献   

11.
Classification consistency and accuracy are viewed as important indicators for evaluating the reliability and validity of classification results in cognitive diagnostic assessment (CDA). Pattern‐level classification consistency and accuracy indices were introduced by Cui, Gierl, and Chang. However, the indices at the attribute level have not yet been constructed. This study puts forward a simple approach to estimating the indices at both the attribute and the pattern level through one single test administration. Detailed elaboration is made on how the upper and lower bounds for the attribute‐level accuracy can be derived from the variance of error of the attribute mastery probability estimate. In addition, based on Cui's pattern‐level indices, an alternative approach to estimating the attribute‐level indices is also proposed. Comparative analysis of simulation results indicate that the new indices are very desirable for evaluating test‐retest consistency and correct classification rate.  相似文献   

12.
This article introduces procedures for the computation and asymptotic statistical inference for classification consistency and accuracy indices specifically designed for cognitive diagnostic assessments. The new classification indices can be used as important indicators of the reliability and validity of classification results produced by cognitive diagnostic assessments. For tests with known or previously calibrated item parameters, the sampling distributions of the two new indices are shown to be asymptotically normal. To illustrate the computations of the new indices, we apply them to the real diagnostic data from a fraction subtraction test (Tatsuoka). We also use simulated data to evaluate their performances and distributional properties.  相似文献   

13.
The purpose of this study was to evaluate the adequacy of three cognitive models, one developed by content experts and two generated from student verbal reports for explaining examinee performance on a grade 3 diagnostic mathematics test. For this study, the items were developed to directly measure the attributes in the cognitive model. The performance of each cognitive model was evaluated by examining its fit to different data samples: verbal report, total, high-, moderate-, and low ability using the Hierarchy Consistency Index (Cui & Leighton, 2009), a model-data fit index. This study utilized cognitive diagnostic assessments developed under the framework of construct-centered test design and analyzed using the Attribute Hierarchy Method (Gierl, Wang, & Zhou, 2008; Leighton, Gierl, & Hunka, 2004). Both the expert-based and the student-based cognitive models provided excellent fit to the verbal report and high ability samples, but moderate to poor fit to the total, moderate and low ability samples. Implications for cognitive model development for cognitive diagnostic assessment are discussed.  相似文献   

14.
本研究旨在基于事先构建的理论模型,编制小学数学应用题认知诊断测验,并通过认知诊断评估对其效度进行验证。采用质性研究和量化研究两条思路,通过认知分析、大声思维和测验等方法,探索了认知诊断评估从理论模型构建到测验编制及其效度验证的过程。在理论模型构建和测验编制方面,所得结果表明认知分析和大声思维相结合能够合理地构建实质心理学的认知模型,并且基于该认知模型自上而下的测验设计是与认知诊断评估流程相吻合的。通过认知诊断评估所获取的数据分析表明,该测验的结构效度、内部效度和外部效度均达到理想水平,基于事先构建的认知模型所编制的认知诊断测验能够作为认知诊断评估的有效工具,有助于发掘和诊断学生数学应用题解决中的认知错误。  相似文献   

15.
Number of raters is theoretically central to peer assessment reliability and validity, yet rarely studied. Further, requiring each student to assess more peers’ documents both increases the number of evaluations per document but also assessor workload, which can decline performance. Moreover, task complexity is likely a moderating factor, influencing both workload and validity. This study examined whether changing the number of required peer assessments per student / number of raters per document affected peer assessment reliability and validity for tasks at different levels of task complexity. 181 students completed and provided peer assessments for tasks at three levels of task complexity: low complexity (dictation), medium complexity (oral imitation), and high complexity (writing). Adequate validity of peer assessments was observed for all three task complexities at low reviewing loads. However, the impacts of increasing reviewing load varied by reliability vs. validity outcomes and by task complexity.  相似文献   

16.
白娟 《考试研究》2013,(1):51-57
全国硕士研究生入学中医综合考试,是为高等院校和科研院所招收中医药学专业硕士研究生而设置的、具有选拔性质的全国统一入学考试科目。本研究运用多元概化理论评估2012年中医综合考试的总体信度、试卷结构及二级学科分配比例的合理性。结果表明:(1)从考查的学科内容看,方剂学、中药学、针灸学、中医内科、中医诊断学的测量精度较高,而中医基础理论的测量精度相对偏低,可通过适当提高该学科试题的难度和区分度以增加测量精度;(2)从设置的题型看,各题型的测量精度均较高,各题型的分量分布较适当。  相似文献   

17.
在前期研究的基础上,根据理论分析和半开半闭式问卷调查(n=296),经初测(n=833)和正式测量(n=2979),编制了由认知特性、个性、适应性三个分量表构成的大学生心理素质量表。通过测量数据的探索性因素分析,得到量表的因素结构,然后考察了量表的重测信度、同质性信度以及效标效度和结构效度,结果表明该量表具有较好的信度和效度。验证性因素分析进一步证明量表的结构比较合理,适用于测试我国大学生的心理素质水平。  相似文献   

18.
Compared to unidimensional item response models (IRMs), cognitive diagnostic models (CDMs) based on latent classes represent examinees' knowledge and item requirements using discrete structures. This study systematically examines the viability of retrofitting CDMs to IRM‐based data with a linear attribute structure. The study utilizes a procedure to make the IRM and CDM frameworks comparable and investigates how estimation accuracy is affected by test diagnosticity and the match between the true and fitted models. The study shows that comparable results can be obtained when highly diagnostic IRM data are retrofitted with CDM, and vice versa, retrofitting CDMs to IRM‐based data in some conditions can result in considerable examinee misclassification, and model fit indices provide limited indication of the accuracy of item parameter estimation and attribute classification.  相似文献   

19.
This case-study investigates the predictive validity and reliability of Key Stage 2 test results, and teacher assessments, for target-setting and value-added assumptions at Key Stage 3. (In England Key Stage 2 tests are taken in the core subjects of English, Mathematics and Science at the age of 11. Key Stage 3 tests are taken in the same subjects at the age of 14. Teacher assessments are also completed for these subjects at both key stages.) The study employed the type of linear regression analysis recommended in several government reports, to correlate Key Stage 2 test results, and teacher assessments, in core subjects, with Key Stage 3 test results, and teacher assessments, in both core and non-core subjects. Following government recommendations that the use of any other form of testing - such as the National Foundation for Educational Research (NFER) Cognitive Abilities Test (CAT) - was now no longer necessary to provide baseline data for value-added calculations, or to set targets, correlations were also investigated between results on the CAT, and test results and teacher assessments at Key Stage 3, for both core and non-core subjects, to see whether this recommendation was well founded. The results of the case-study suggest that Key Stage 2 data, both in the form of test results and teacher assessments, have little or no predictive validity, or reliability, for test results or teacher assessments at Key Stage 3. Indeed, the predictive validity for non-core subjects at Key Stage 3 was so low as to be negligible. However, the CAT average score correlated more highly with both teacher assessments and test results at Key Stage 3 in core subjects, although this relationship was not reflected in non-core subjects. These findings suggest that the predictive validity and reliability of Key Stage 2 data is seriously open to question as baseline data for either value-added, or target-setting procedures, at Key Stage 3. It should be pointed out, however, that these findings are provisional, since they are based on data from two intake years, but preliminary analysis of data from a further three intake years appears to indicate that the concerns identified are well founded.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号