首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although the use of multiple criteria and informants is one of the most universally agreed on practices in the identification of gifted children, few studies to date have examined the convergent validity of multiple informants and objective ability tests in gifted identification. In this study, we illustrate the use of the correlated traits–correlated (methods – 1) or CT–C(M – 1) model (Eid, Lischetzke, Nussbeck, & Trierweiler, 2003) to examine the convergent validity of self, parent, and teacher ratings relative to objective cognitive ability tests in a sample of 145 4th to 6th graders. The CT–C(M – 1) analyses revealed that teacher ratings showed the highest convergence with the objective assessments, whereas self-ratings had the lowest reliabilities and insufficient validity. Parent ratings were more reliable and valid than self-reports, but were outperformed by teacher ratings for most abilities. Overall, the CT–C(M – 1) analyses showed that the convergent validity of the ratings relative to the objective test battery was highest for numerical and lowest for creative abilities. Furthermore, whereas part of the shared variance between parent and teacher ratings reflected true convergent validity, agreement between parent and self-reports was entirely due to a shared rater variance. Our analyses demonstrate the usefulness and proper interpretation of the CT–C(M – 1) approach for examining convergent validity and method effects in multitrait–multimethod data.  相似文献   

2.
BackgroundThe Childhood Trauma Questionnaire – Short Form (CTQ-SF) is a widely utilized self-report instrument in the assessment and characterization of childhood trauma. Yet, research on the instrument’s psychometric properties in clinical samples is sparse, and the Danish version of the CTQ-SF has not been previously evaluated in clinical samples.ObjectivesTo examine the structural validity, internal consistency reliability, and multi-method convergent validity of the CTQ-SF in a heterogenous clinical sample from Denmark.Participants and settingThe study was based on data from four Danish clinical samples (N = 393): 1) Outpatients diagnosed with personality disorders, 2) Patients commencing psychiatric treatment for non-affective first-episode psychosis, 3) Patients diagnosed with first-episode or prolonged depression recruited from general practitioners and an outpatient mood disorder clinic, and 4) detained delinquent boys.MethodsConfirmatory factor analysis was used to explore structural validity. Also, we calculated internal consistency and multi-method convergent validity with interview-based ratings of adverse parenting.ResultsConfirmatory factor analyses indicated that the five-factor structure described in CTQ-SF manual with three error correlated items best fitted the data, as compared to various other models. Coefficients of congruence also supported factorial similarity across countries (i.e. US substance abuser and a mixed Brazilian sample). Internal consistency reliability was acceptable and comparable to estimates previously published. Multi-method convergent validity associations further corroborated the validity of the CTQ-SF.ConclusionThese findings provide support for the reliability and validity of the Danish version of the CTQ-SF in clinical samples.  相似文献   

3.
An important issue in national assessment efforts is how best to measure the outcomes of college. While initial discussions about a national collegiate assessment focused on the reliability, validity, and feasibility of using achievement tests to measure student learning, subsequent discussions have raised the possibility of using students' self-reports of academic development as proxies for achievement test scores. The present study examines the stability of the relationships among self-reports and test scores across samples of two- and four-year colleges and universities. Multitrait-multimethod analyses indicated that self-reports and test scores developed from the same set of test specifications do measure the same constructs, although the scores from one type of measurement may not be substitutable for scores from the other type of measurement. In addition, the analyses produced ambiguous results concerning the stability of relationships across different types of institutions.Paper presented at the annual meeting of the Association for Institutional Research, Boston, May 29, 1995.  相似文献   

4.
ABSTRACT

A student’s perception of teacher effective communication influences the learning atmosphere. The measurement of perception indicates how students view the quality of learning. As few studies have explored the development of an appropriate measurement tool of students’ perception of learning to read the Qur’an, this study aims to develop such a tool. Conceptual analysis and a survey using open-ended questions resulted in dimensions and items. Six experts and three instructors evaluated the content validity. The questionnaire was then administered to 421 participants – 201 for Exploratory Factor Analysis (EFA) and 220 for Confirmatory Factor Analysis (CFA). The results successfully identified four dimensions that explained 64.6% of the variation. The instrument consisted of 13 items with satisfactory reliability and validity. The dimension with the highest coefficient path was ‘understanding and friendliness’ (UF), while the dimension of ‘learning media’ (LM) was the lowest. There was a significant influence of gender on the dimension of UF, as well as verbal (V) and non-verbal (NV) communication, while there were no differences in the dimension of LM. The questionnaire can be effectively used as a measurement device of dimensions related to students’ perception of teacher effective communication in Qur’an learning.  相似文献   

5.
高考规模大,社会影响深远,其重要性不容忽视.因此有硌耍对其测试效度进行科学论证.以取信于民.本文依据英语测试理论,时2008年宁夏高考英语写作项目的阅卷信度、效度以及评阅情况进行分析,力图验证该测试的效度是否符合标准要求.  相似文献   

6.
Numerous researchers have proposed methods for evaluating the quality of rater‐mediated assessments using nonparametric methods (e.g., kappa coefficients) and parametric methods (e.g., the many‐facet Rasch model). Generally speaking, popular nonparametric methods for evaluating rating quality are not based on a particular measurement theory. On the other hand, popular parametric methods for evaluating rating quality are often based on measurement theories such as invariant measurement. However, these methods are based on assumptions and transformations that may not be appropriate for ordinal ratings. In this study, I show how researchers can use Mokken scale analysis (MSA), which is a nonparametric approach to item response theory, to evaluate rating quality within the framework of invariant measurement without the use of potentially inappropriate parametric techniques. I use an illustrative analysis of data from a rater‐mediated writing assessment to demonstrate how one can use numeric and graphical indicators from MSA to gather evidence of validity, reliability, and fairness. The results from the analyses suggest that MSA provides a useful framework within which to evaluate rater‐mediated assessments for evidence of validity, reliability, and fairness that can supplement existing popular methods for evaluating ratings.  相似文献   

7.
ABSTRACT

A Bayesian IRT-model approach was used to investigate the validity and reliability of student perceptions of teaching quality. Furthermore, the student perceptions were compared with ratings of teaching quality by external observers. Grade 4 students (n = 675) filled out a questionnaire that was used to measure their opinions about the lessons of their teachers. Three lessons of 39 teachers were recorded and rated by 4 raters. The analyses showed that student perception and lesson observation scales fit best in an 11-dimensional model, which was an indication of construct validity and discriminant validity. Student perception scales were reliable, although not all items contributed to the scales to the same extent. Student ratings and lesson observations scores generally correlated moderately (ranging from r = .18 to r = .50). Higher correlations were found for scales with a similar content; however, no clear pattern was apparent. Suggestions for future research are presented.  相似文献   

8.
A number of mental-test theorists have called attention to the fact that increasing test reliability beyond an optimal point can actually lead to a decrement in the validity of that test with respect to a criterion. This non-monotonic relation between reliability and validity has been referred to by Loevinger as the “attentuation paradox,” because Spearman’s correction for attenuation leads one to expect that increasing reliability will always increase validity. In this paper a mathematical link between test reliability and test validity is derived which takes into account the correlation between error scores on a test and error scores on a criterion measure the test is designed to predict. It is proved that when the correlation between these two sets of error scores is positive, the non-monotonic relation between test reliability and test validity which has been viewed as a paradox occurs universally.  相似文献   

9.
信度与效度是学业测试的两个质量特征,如何处理两者之间的关系也是测试的根本问题。在介绍信度和效度的定义、关系的基础上,对学业测试中的信度与效度进行分析,并且阐述如何平衡两者之间的关系。最终证明学业测试是一种有效的测量手段,并且必将提高教学质量。  相似文献   

10.
ABSTRACT

This empirical investigation was aimed at conceptualizing, developing and validating a scale for the measurement of the quality of higher degrees by research (HDR-QUAL). For that purpose, this study specifically measured perceptions of higher degrees by research (HDR) students about the constituents of HDR quality in Pakistani tertiary education institutions. Following the 7-step process of scale development, three studies were conducted in order to develop an initial pool of scale items, establishing proposed scale validity and reliability, and assessing nomological behavior of the proposed scale. The principal component analysis with Varimax rotation method resulted in a 3-factor solution, subsequently proposing a 15-item scale. The model fit indices of measurement and the higher-order model indicated a satisfactory fit to data. Finally, the resultant three factors, i.e. financial assistance, supervisory expertise, and infrastructural support, converged into a unidimensional HDR-QUAL scale that was found positively associated with student satisfaction, thus, confirming nomological validity as well. Important policy measures and directions for future research are proposed at the end.  相似文献   

11.
High-quality measures of instructional practice are essential for research and evaluation of innovative instructional policies and programs. However, existing measures have generally proven inadequate because of cost and validity issues. This paper addresses two potential drawbacks of survey self-report measures: variation in teachers’ interpretation of response scales and their interpretation of survey questions. To address these drawbacks, researchers tested out use of “anchoring vignettes“ in teacher surveys to capture information about teaching practice, and they gathered validity evidence in regard to their use as a tool for adjusting teachers’ survey self-reports about their instructional practices for research purposes, or potentially to inform professional development. Data from 65 teachers in grades 4-9 responding to our survey suggested that vignette adjustments were reliable and valid for some instructional practices more than others. For some instructional practices, researchers found significant and high correlations between teachers’ adjusted survey self-rating, through use of anchoring vignettes, and previous observation ratings of teachers’ instruction, including ratings from several widely-used observation rubrics. These results suggest that anchoring vignettes may provide an efficient, cost-effective method for gathering data on teachers’ instruction.  相似文献   

12.
The National Assessment Program – Literacy and Numeracy (NAPLAN) in Australia is a series of literacy and numeracy tests that are used for purposes of school comparison. This paper argues that a key question for this use lies in whether or not this is a reasonable, or valid, use of the test data. Using Kane’s argumentative approach to validity, this paper argues that the comparisons of the quality of student achievement made available on the My School Website have low validity due to the lack of regard to rates of participation in schools. In bringing together the literature that addresses the ‘new governance’ of education through testing and an approach to validity that addresses the technical aspects of test score interpretation, with the ethics of how test scores are used and applied, this study identifies validity as an important consideration in comparative analyses of student achievement data. The identification of the need to consider participation in such comparisons through the application of the argumentative approach to validity highlights the contribution of this article not only to the testing field but also to critical policy literature.  相似文献   

13.
Including vulnerable groups of students such as students with learning disabilities in mainstream school research, require ethical considerations and questionnaire adaptation. These students are often excluded, due to low understanding or methodologies generating inadequate data. Students with disability need be studied as a separate group and provided accessible questionnaires. This pilot study aims at developing and evaluating student self-reported measures, rating aspects of student experiences of school-based Physical Education (PE). Instrument design, reliability and validity were examined in Swedish secondary school students (n = 47) including students, aged 13, with intellectual disability (n = 5) and without impairment and test–retested on 28 of these students. Psychometric results from the small pilot-study sample were confirmed in analyses based on replies from the first wave of data collection in the main study (n = 450). Results show adequate internal consistency, factor structure and relations between measures. In conclusion, reliability and validity were satisfactory in scales to measure self-efficacy in general, in PE, and aptitude to participate. Adapting proxy ratings for functioning into self-reports indicated problems. Adequacy of adjustments made were confirmed and a dichotomous scale for typical/atypical function is suggested for further analyses.  相似文献   

14.
大规模教育考试最重要的质量指标是效度和信度,由于效度检验的证据收集存在一定的困难,且效度缺乏精确的统计测度,故而,命题教师在进行考后的数据分析时,往往偏重于信度分析,而忽略效度。针对于此,本文在应用最为广泛的经典测量理论框架下,提出了一整套科学、全面的命题质量评估方法。实例研究表明:该方法适合命题工作的实际需要、可操作性强。  相似文献   

15.
In this article, we address the measurement of individualized instruction in the context of regular classroom instruction. Our study assessed instructional practices geared towards individualization in German third grade reading lessons by combining self-report data from 621 students, from their teachers (n = 57), and live observations. We then investigated the reliability of these different approaches to measuring individualization as well as the agreement between them. All three approaches yielded reliable indicators of individualized practices, but not all of them corresponded with each other. We found considerable agreement between students and observers, but neither agreed with teachers' self-reports. Upon closer examination, we found that students’ ratings only correlated with teacher ratings that were provided close to the timepoint of interest. This correlation increased when teacher measures were corrected for response tendencies. We conclude with some recommendations for future studies that aim to measure individualized instruction in the classroom.  相似文献   

16.
测试是语言教学的重要环节,而信度和效度则是语言测试领域中的两个基本概念。本文介绍了信度和效度的定义、测量方法、影响因素等,并在分析英语语言测试法的基础上回顾了信度与效度的发展,得出语言测试只有在保证其质量的前提下,才能成为有效的测量手段的结论。  相似文献   

17.
ABSTRACT

In the past decade, there has been interest in the assessment of cognitive and affective processes and products for the purposes of meaningful learning. Meaningful measurement (MM) has been proposed which is in accordance with a humanistic constructivist information‐processing perspective. Students’ responses to the assessment tasks are now evaluated according to an item response measurement model, together with a hypothesized model detailing the progressive forms of knowing/competence under examination. There is a possibility of incorporating student errors and alternative frameworks into these evaluation procedures. Meaningful measurement leads us to examine the composite concepts of “ability” and “difficulty”. Under the rubric of meaningful measurement, validity assessment (i.e. internal and external components of construct validity) is essentially the same as an inquiry into the meanings afforded by the measurements. Concepts of reliability, expressed as a group statistics which is applied in the same way to all the examinees in the sample, have to be obviated when the precision of the trait estimates stemming from the item response measurement models can be determined at each trait level. Reliability, measured in terms of standard errors of estimates needs to be within acceptable limits when internal validity is to be secured. Further evidence of validity may be provided by in‐depth analyses of how “epistemic subjects” of different levels of competence and proficiency engage in different types of assessment tasks, where affective and metacognitive behaviours may be examined as well. These ways of undertaking MM can be codified by proposing a three‐level conceptualization of MM. It is within the rubric of this conceptualization and the MM enquiry paradigm that validity and reliability of test measures are discussed in this paper.  相似文献   

18.
The Ford score     
Abstract

We combined data from the Office for Standards in Education with those from a large national survey of child and adolescent mental health and developed a simple score that schools or LEAs could use to predict the level of emotional and behavioural difficulties that they are likely to encounter. The final Ford score is based on the rates of free school meals, exclusions, unauthorized absence and children with special educational needs. These data are collected routinely, so the Ford score could easily be calculated to provide estimates of the level of emotional and behavioural problems in mainstream schools without the use of additional resources. It needs further reliability and validity testing but could provide a means of allocating resources.  相似文献   

19.
ObjectiveWe conducted a comprehensive assessment of the reliability and validity of the Interview for Traumatic Events in Childhood (ITEC, Lobbestael, Arntz, Kremers, & Sieswerda, 2006), a retrospective, semi-structured interview for childhood maltreatment. The ITEC aims to yield dimensional scores for severity of experiences of different childhood maltreatment dimensions.MethodsInitial psychometric properties were tested with the pilot version of the ITEC in 362 participants. A second study assessed the revised ITEC in 217 participants, patients and non-patients.ResultsFactor analyses produced the best fit for a five-factor model (sexual, physical and emotional abuse, physical and emotional neglect). The scales had good internal consistency, except for the physical neglect subscale, and excellent inter-rater reliability. The scales were highly associated with equivalent scales of the Childhood Trauma Questionnaire (i.e., good convergent validity), and showed good correspondence with patient file information (i.e., good criterion validity).ConclusionThese results support the reliability and validity of the ITEC, making it a potentially useful tool for assessing a broad range of traumatic events in childhood.Practice implicationThe first step in therapy for dealing with childhood maltreatment is to map abusive experiences and assess their severity and impact. Since maltreatment is a sensitive topic that is not reported on easily, trauma interviews are promising assessment instruments since they provide the opportunity to probe and clarify. There are hardly any well-validated trauma interviews available that assess the extent of maltreatment in and outside the family in various dimensions. The current study tries to fill this gap by presenting a new trauma interview; the Interview for Traumatic Events in Childhood.  相似文献   

20.
为编制一个可用于高职院校的教师课堂教学质量学生评价测量工具,根据高职院校教师课堂教学质量学生评价存在的问题,在《大学教师教学效果评价问卷(学生用)》基础上创新性地设计了高职院校教师课堂教学质量学生评价问卷(简称VSEEQ),开发了符合教育测量学标准的、现代教学与学习理论支持的VSEEQ评价问卷,施测并搜集了信效度资料。结果表明,VSEEQ评价问卷具有合理的维度结构,较好的内部一致性信度、重测信度、内容效度和结构效度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号