首页 | 本学科首页   官方微博 | 高级检索  
文章检索
  按 检索   检索词:      
出版年份:   被引次数:   他引次数: 提示:输入*表示无穷大
  收费全文   7篇
  免费   1篇
教育   7篇
科学研究   1篇
  2021年   1篇
  2018年   1篇
  2013年   1篇
  2011年   1篇
  2009年   1篇
  2006年   1篇
  2002年   1篇
  2000年   1篇
排序方式: 共有8条查询结果,搜索用时 15 毫秒
1
1.
This study examines the study-abroad experiences of pre-service teacher candidates at the Faculty of Education, York University, using transformative learning theory. Disorienting experiences are a crucial first step for perspective transformation; students reported facing racial dynamics, “outsider” status, risk-taking behavior and power relations. Students’ utilized a variety of reflection opportunities; however, critical self-reflection imperative for transformation requires greater awareness of one’s frame of reference. Future studies need to examine how students’ specificities shape the realization of study-abroad goals for pre-service teachers and their ability to develop global consciousness and to work towards an equitable and just society.  相似文献   
2.
Cut scores, estimated using the Angoff procedure, are routinely used to make high-stakes classification decisions based on examinee scores. Precision is necessary in estimation of cut scores because of the importance of these decisions. Although much has been written about how these procedures should be implemented, there is relatively little literature providing empirical support for specific approaches to providing training and feedback to standard-setting judges. This article presents a multivariate generalizability analysis designed to examine the impact of training and feedback on various sources of error in estimation of cut scores for a standard-setting procedure in which multiple independent groups completed the judgments. The results indicate that after training, there was little improvement in the ability of judges to rank order items by difficulty but there was a substantial improvement in inter-judge consistency in centering ratings. The results also show a substantial group effect. Consistent with this result, the direction of change for the estimated cut score was shown to be group dependent.  相似文献   
3.
At the recent Association of University Administrators (AUA) Annual Conference in Nottingham, UK, the authors presented a session exploring the role that Lesbian, Gay, Bisexual and Trans* (LGBT*) Staff Networks have to play in higher education institutions and explored the best practice to support their successful launch and longevity. This accompanying article looks to build upon some of the points made in their session while exploring some of the wider issues that LGBT* staff face within higher education in the current policy context, most notably in the UK.  相似文献   
4.
Ethnicity-targeted hate speech has been widely shown to influence on-the-ground inter-ethnic conflict and violence, especially in such multi-ethnic societies as Russia. Therefore, ethnicity-targeted hate speech detection in user texts is becoming an important task. However, it faces a number of unresolved problems: difficulties of reliable mark-up, informal and indirect ways of expressing negativity in user texts (such as irony, false generalization and attribution of unfavored actions to targeted groups), users’ inclination to express opposite attitudes to different ethnic groups in the same text and, finally, lack of research on languages other than English. In this work we address several of these problems in the task of ethnicity-targeted hate speech detection in Russian-language social media texts. This approach allows us to differentiate between attitudes towards different ethnic groups mentioned in the same text – a task that has never been addressed before. We use a dataset of over 2,6M user messages mentioning ethnic groups to construct a representative sample of 12K instances (ethnic group, text) that are further thoroughly annotated via a special procedure. In contrast to many previous collections that usually comprise extreme cases of toxic speech, representativity of our sample secures a realistic and, therefore, much higher proportion of subtle negativity which additionally complicates its automatic detection. We then experiment with four types of machine learning models, from traditional classifiers such as SVM to deep learning approaches, notably the recently introduced BERT architecture, and interpret their predictions in terms of various linguistic phenomena. In addition to hate speech detection with a text-level two-class approach (hate, no hate), we also justify and implement a unique instance-based three-class approach (positive, neutral, negative attitude, the latter implying hate speech). Our best results are achieved by using fine-tuned and pre-trained RuBERT combined with linguistic features, with F1-hate=0.760, F1-macro=0.833 on the text-level two-class problem comparable to previous studies, and F1-hate=0.813, F1-macro=0.824 on our unique instance-based three-class hate speech detection task. Finally, we perform error analysis, and it reveals that further improvement could be achieved by accounting for complex and creative language issues more accurately, i.e., by detecting irony and unconventional forms of obscene lexicon.  相似文献   
5.
The present study examined the long-term usefulness of estimated parameters used to adjust the scores from a performance assessment to account for differences in rater stringency. Ratings from four components of the USMLE® Step 2 Clinical Skills Examination data were analyzed. A generalizability-theory framework was used to examine the extent to which rater-related sources of error could be eliminated through statistical adjustment. Particular attention was given to the stability of these estimated parameters over time. The results suggest that rater stringency estimates obtained at a point in time and then used to adjust ratings over a period of months may substantially decrease in usefulness. In some cases, over several months, the use of these adjustments may become counterproductive. Additionally, it is hypothesized that the rate of deterioration in the usefulness of estimated parameters may be a function of the characteristics of the scale.  相似文献   
6.
Test administrators are appropriately concerned about the potential for time constraints to impact the validity of score interpretations; psychometric efforts to evaluate the impact of speededness date back more than half a century. The widespread move to computerized test delivery has led to the development of new approaches to evaluating how examinees use testing time and to new metrics designed to provide evidence about the extent to which time limits impact performance. Much of the existing research is based on these types of observational metrics; relatively few studies use randomized experiments to evaluate the impact time limits on scores. Of those studies that do report on randomized experiments, none directly compare the experimental results to evidence from observational metrics to evaluate the extent to which these metrics are able to sensitively identify conditions in which time constraints actually impact scores. The present study provides such evidence based on data from a medical licensing examination. The results indicate that these observational metrics are useful but provide an imprecise evaluation of the impact of time constraints on test performance.  相似文献   
7.
When performance assessments are delivered and scored by computer, the costs of scoring may be substantially lower than those of scoring the same assessment based on expert review of the individual performances. Computerized scoring algorithms also ensure that the scoring rules are implemented precisely and uniformly. Such computerized algorithms represent an effort to encode the scoring policies of experts. This raises the question, would a different group of experts have produced a meaningfully different algorithm? The research reported in this paper uses generalizability theory to assess the impact of using independent, randomly equivalent groups of experts to develop the scoring algorithms for a set of computer‐simulation tasks designed to measure physicians’ patient management skills. The results suggest that the impact of this “expert group” effect may be significant but that it can be controlled with appropriate test development strategies. The appendix presents multivariate generalizability analysis to examine the stability of the assessed proficiency across scores representing the scoring policies of different groups of experts.  相似文献   
8.
Although multivariate generalizability theory was developed more than 30 years ago, little published research utilizing this framework exists and most of what does exist examines tests built from tables of specifications. In this context, it is assumed that the universe scores from levels of the fixed multivariate facet will be correlated, but the error terms will be uncorrelated because subscores result from mutually exclusive sets of test items. This paper reports on an application in which multiple subscores are derived from each task completed by the examinee. In this context, both universe scores and errors may be correlated across levels of the fixed multivariate facet. The data described come from the United States Medical Licensing Examination® Step 2 Clinical Skills Examination. In this test, each examinee interacts with a series of standardized patients and each interaction results in four component scores. The paper focuses on the application of multivariate generalizability theory in this context and on the practical interpretation of the resulting estimated variance and covariance components.  相似文献   
1
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号