首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This study examines the prevalence, contexts, and demographic correlates of monotonic response patterns (MRPs) in online student evaluations. Results of two-level hierarchical generalized linear models show evidence of careless monotonic responses to a survey administered to students enrolled in a university-level foreign language course in the Republic of Korea. All else being equal, freshmen and students in classes with fewer survey participants were more likely to choose monotonic response patterns in course evaluations. Possible factors at work in generating MRPs are identified and discussed. The severity of the MRP problem in online ratings underscores the importance for administrators to consider possible validity threats in student evaluations before using them as tools to inform instructional and administrative decisions. It is also important to design course evaluation surveys in such a way as to minimize careless responses and to identify means to induce more thoughtful responses from college students.  相似文献   

2.
Researchers have developed indices to identify persons whose test results ‘misfit’ and are considered statistically ‘aberrant’ or ‘unexpected’ and whose measures are consequently potentially invalid, drawing the test’s validity into question. This study draws on interviews of pupils and their teachers, using a sample of 31 10‐year‐olds who were flagged as most ‘aberrant’ in a standardised mathematics test. The children’s and their teachers’ explanations were analysed and attributed: (i) to item‐, person‐ (self/other) and classroom‐levels; and ii) according to causal dimensions. Children’s and teachers’ explanations were mostly in agreement in relation to unexpected negative results and they included references to previously well‐cited sources of construct‐irrelevant variance (e.g. ineffective test‐taking strategies, careless mistakes) as well as construct‐relevant variance (e.g. misconceptions, weaknesses in particular topics). Findings of this exploratory study are discussed from a test validity and attribution theory perspective: we conclude that this approach offers grounds for multi‐level explanations of person misfit and that this qualitative research approach to unexpected responses is worthy of more attention.  相似文献   

3.
Disengaged responding is a phenomenon that often biases observed scores from achievement tests and surveys in practically and statistically significant ways. This problem has led to the development of methods to detect and correct for disengaged responses on both achievement test and survey scores. One major disadvantage when trying to detect disengaged responses on surveys is that, unlike on achievement tests, there are no correct answers. As a result, validating decision rules for detection methods is problematic. In this study, we condition results from a variety of detection methods used to identify disengaged survey responses on response times. We then show how this conditional approach may be useful in identifying where to set response time thresholds for survey items, as well as in avoiding misclassification when using other detection methods.  相似文献   

4.
In this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic variance seen in the responses and that this variance is indirectly influenced by other factors such as target population or input modality. Extending previous work, we distinguish conceptual, realization, and nonconformity variance, which are differentially impacted by the various factors. While conceptual variance relates to different concepts embedded in the text responses, realization variance refers to their diverse manifestation through natural language. Nonconformity variance is added by aberrant response behavior. Furthermore, besides its performance, the feasibility of using an automatic scoring system depends on external factors, such as ethical or computational constraints, which influence whether a system with a given performance is accepted by stakeholders. Our work provides (i) a framework for assessment practitioners to decide a priori whether automatic content scoring can be successfully applied in a given setup as well as (ii) new empirical findings and the integration of empirical findings from the literature on factors that influence automatic systems' performance.  相似文献   

5.
This paper proposes a set of methods and a framework for evaluating, modeling, and predicting group interactions in computer‐mediated communication. The method of sequential analysis is described along with specific software tools and techniques to facilitate the analysis of message–response sequences. In addition, the Dialogic Theory and its assumptions are presented to establish a theoretical framework and guide to using sequential analysis in computer‐mediated communication research. Step‐by‐step instructions are presented to illustrate how sequential analysis can be used to measure the way latent variables (e.g., message function, response latency, communication style) and exogenous variables (e.g., gender, discourse rules, context) affect how likely a message is to elicit a response, the types of responses elicited by the message, and whether or not the elicited sequence of responses (e.g., claim → challenge → explain) mirror the processes that support group decision‐making, problem‐solving, and learning.  相似文献   

6.
The hypothesis that perceived failure experiences at school would increase the likelihood of aversive parent-child interactions after school was supported in a study of 167 fourth, fifth, and sixth graders. Children completed measures of mood, school events, and parent-child interaction 3 times each day for 2 consecutive days. Reports of social and academic failure experiences at school (e.g., peer problems and difficulty with schoolwork) were associated with increases in child self-reports of demanding and aversive behavior toward parents that evening. There was no evidence of the reverse effect, aversive child behavior did not predict an increase in reports of negative events the next day. When children rated more academic failure events at school, they also described their parents as more disapproving and punishing after school. However, this effect was only partially mediated by increases in the child's aversive behavior. It is argued that the findings cannot be explained solely by a response bias caused by the child's general mood or frame of mind that day. First, school-to-home mood spillover effects were controlled in the analyses. Second, reports of problems at school were not associated with other aspects of parent-child interaction (e.g., the parent's positive behavioral and emotional involvement with the child). In addition to its substantive findings, the study illustrates use of an unbiased method for assessing child responses to daily stressors.  相似文献   

7.
While it is easy to assume that university students who wait until the last minute to complete surveys for their class research requirements provide low-quality data, this issue has not been empirically examined. The goal of the present study was to examine the relation between student research procrastination and two important data quality issues—careless responding and measurement noninvariance. Data gathered from university students across two semesters tentatively indicated that procrastination is related to low-quality survey data. Procrastination was slightly more problematic for certain data quality issues (measurement noninvariance) than others (careless responding). These relations, however, were small and contingent on how procrastination and careless responding were measured. Accordingly, it seems more beneficial for researchers to select a sampling window that supports their research goals and statistical power requirements rather than select a sampling window that attempts to minimize careless survey responding or other measurement issues.  相似文献   

8.
The current study investigates the impact of a criminal justice education on student knowledge about wrongful conviction. Past research has found fallibility of hard evidence (e.g., eyewitness misidentification), police and lawyer behaviors (e.g., tunnel vision), and social group discrimination to be underlying causes of wrongful conviction. We developed a survey to investigate student knowledge of these underlying causes, comparing participants in different years and programs of study. The findings suggest that criminal justice majors were at times more aware of the underlying causes of wrongful conviction than noncriminal justice majors, specifically in regards to the fallibility of hard evidence and social group discrimination. Criminal justice majors were not more knowledgeable in the areas of police and lawyer behavior. The implications of these findings are discussed in terms of the scope of criminal justice education and future careers in criminal justice.  相似文献   

9.
Statistical tools found in the service quality assessment literature—the T2 statistic combined with factor analysis—can enhance the feedback instructors receive from student ratings. T2 examines variability across multiple sets of ratings to isolate individual respondents with aberrant response patterns (i.e., outliers). Analyzing student responses that are outside the “normal” range of responses can identify aspects of the course that cause pockets of students to be dissatisfied. This fresh insight into sources of student dissatisfaction is particularly valuable for instructors willing to make tactical classroom changes that accommodate individual students rather than the traditional approach of using student ratings to develop systemwide changes in course delivery. A case study is presented to demonstrate how the recommended procedure minimizes data overload, allows for valid schoolwide and longitudinal comparisons of correlated survey responses, and helps instructors identify priority areas for instructional improvement.  相似文献   

10.
This study investigated how 4‐month‐old infants represent sequences: Do they track the statistical relations among specific sequence elements (e.g., AB, BC) or do they encode abstract ordinal positions (i.e., B is second)? Infants were habituated to sequences of 4 moving and sounding elements—3 of the elements varied in their ordinal position while the position of 1 target element remained invariant (e.g., A B CD, C B DA)—and then were tested for the detection of changes in the target’s position. Infants detected an ordinal change only when it disrupted the statistical co‐occurrence of elements but not when statistical information was controlled. It is concluded that 4‐month‐olds learn the order of sequence elements by tracking their statistical associations but not their invariant ordinal position.  相似文献   

11.
Superordinate categorization via association with a common response was studied in pigeons. Original training paired disparate classes (e.g., people + chairs and cars + flowers) with a common response (Responses 1 and 2, respectively). Reassignment training taught new responses (Responses 3 and 4, respectively) to one component class from each pair (e.g., people and cars). Superordinate categorization was documented in testing when the pigeons made the same responses to the stimuli that were withheld in reassignment training (e.g., chairs and flowers) as they did to the reassigned stimuli themselves (e.g., people and cars) and when the birds transferred these discriminative responses to novel stimuli from all four component classes. Reassignment training with novel stimuli produced effects that were similar to those of reassignment training with familiar stimuli. Superordinate categorization via association with a common response is thus a robust effect that generalizes to novel stimuli from each of the component classes.  相似文献   

12.
When a response pattern does not fit a selected measurement model, one may resort to robust ability estimation. Two popular robust methods are biweight and Huber weight. So far, research on these methods has been quite limited. This article proposes the maximum a posteriori biweight (BMAP) and Huber weight (HMAP) estimation methods. These methods use the Bayesian prior distribution to compensate for information lost due to aberrant responses. They may also be more resistant to the detrimental effects of downweighting the nonaberrant responses. The effectiveness of BMAP and HMAP was evaluated through a Monte Carlo simulation. Results show that both methods, especially BMAP, are more effective than the original biweight and Huber weight in correcting mild forms of aberrant behavior.  相似文献   

13.
Three runway experiments tested a stage model of extinction which postulated an orderly succession of three qualitatively different stages: habit, trial and error, and resolution. The model predicted that Stage 1 should be characterized by perseveration of habitual routes (i.e., response persistence) and the absence of competing responses; Stage 2, by an increase in investigatory behavior (response variation and hole exploration) and biting behavior; Stage 3, by a decrease in the competing responses of Stage 2 and continued increase in goal avoidance and substitution behavior (e.g., sand-digging). These predictions were largely confirmed. Further, Experiments 1 and 2 showed that, as expected by the model, continuous reinforcement (CRF) resulted in more practice of habitual routes in acquisition and greater response persistence, while partial reinforcement (PRF) resulted in more route variation and hole exploration in acquisition and greater goal persistence which was attributable to prior reinforcement of a trial-and-error coping strategy. Results of Experiment 3, which combined training trials and reward magnitudes orthogonally, supported the prediction that response persistence was positively related to training trials, and goal persistence negatively related to reward magnitudes. All three experiments demonstrated an inverted-U function in investigatory and biting behavior, as predicted by the stage model.  相似文献   

14.
The statistical analysis of answer changes (ACs) has uncovered multiple testing irregularities on large‐scale assessments and is now routinely performed at testing organizations. However, AC data has an uncertainty caused by technological or human factors. Therefore, existing statistics (e.g., number of wrong‐to‐right ACs) used to detect examinees with aberrant ACs capitalize on the uncertainty, which may result in a large Type I error. In this article, the information about ACs is used only for the partitioning of administered items into two disjoint subtests: items where ACs did not occur, and items where ACs did occur. A new statistic is based on the difference in performance between these subtests (measured as Kullback–Leibler divergence between corresponding posteriors of latent traits), where, in order to avoid the uncertainty, only final responses are used. One of the subtests can be filtered such that the asymptotic distribution of the statistic is chi‐square with one degree of freedom. In computer simulations, the presented statistic demonstrated a strong robustness to the uncertainty and higher detection rates in contrast to two popular statistics based on wrong‐to‐right ACs.  相似文献   

15.
The problems of response bias in longitudinal studies of college students are examined. An extensive follow-up questionnaire was sent to 1,253 college seniors who had participated in a similar survey as freshman four years earlier. Careful measure of student responsiveness in relation to various techniques designed to increase the proportion of responders (e.g., postcard, telephone contact) were kept.The less responsive groups were significantly different from their more responsive counterparts on nearly a dozen variables representing a wide variety of content areas, including academic achievement, self-concept, alcohol consumption, social deviance, and major choice preferences. Controlling for sex and socioeconomic status served to reduce, but not eliminate, these biases. Overall, the results indicate that researchers cannot account for follow-up nonresponse bias by making statistical adjustments according to data available at initial testing. The results are discussed in light of identifying the reasons for nonresponse, and attempting to develop categories of nonresponders who may be motivated to cooperate by different types of follow-up techniques.  相似文献   

16.
The psychometric literature provides little empirical evaluation of examinee test data to assess essential psychometric properties of innovative items. In this study, examinee responses to conventional (e.g., multiple choice) and innovative item formats in a computer-based testing program were analyzed for IRT information with the three-parameter and graded response models. The innovative item types considered in this study provided more information across all levels of ability than multiple-choice items. In addition, accurate timing data captured via computer administration were analyzed to consider the relative efficiency of the multiple choice and innovative item types. As with previous research, multiple-choice items provide more information per unit time. Implications for balancing policy, psychometric, and pragmatic factors in selecting item formats are also discussed.  相似文献   

17.
Item response theory (IRT) models can be subsumed under the larger class of statistical models with latent variables. IRT models are increasingly used for the scaling of the responses derived from standardized assessments of competencies. The paper summarizes the strengths of IRT in contrast to more traditional techniques as well as in contrast to alternative models with latent variables (e. g. structural equation modeling). Subsequently, specific limitations of IRT and cases where other methods might be preferable are lined out.  相似文献   

18.
This study examined the effects of conversational language (e.g., asking questions, inviting replies, acknowledgments, referencing others by name, closing signatures, ‘I agree, but’, greetings, etc.) on the frequency and types of responses posted in reply to given types of messages (e.g., argument, evidence, critique, explanation), and how the resulting response patterns support and inhibit collaborative argumentation in asynchronous online discussions. Using event sequence analysis to analyze message-response exchanges in eight online group debates, this study found that (a) arguments elicited 41% more challenges when presented with more conversational language (effect size .32), (b) challenges with more conversational language elicited three to eight times more explanations (effect size .12 to .31), and (c) the number of supporting evidence elicited by challenges was not significantly different from challenges that used more versus less conversational language. Overall, these and other findings from exploratory post-hoc tests show that conversational language can help to produce patterns of interaction that foster high levels of critical discourse, and that some forms of conversational language are more effective in eliciting responses than others.  相似文献   

19.
As item response theory has been more widely applied, investigating the fit of a parametric model becomes an important part of the measurement process. There is a lack of promising solutions to the detection of model misfit in IRT. Douglas and Cohen introduced a general nonparametric approach, RISE (Root Integrated Squared Error), for detecting model misfit. The purposes of this study were to extend the use of RISE to more general and comprehensive applications by manipulating a variety of factors (e.g., test length, sample size, IRT models, ability distribution). The results from the simulation study demonstrated that RISE outperformed G2 and S‐X2 in that it controlled Type I error rates and provided adequate power under the studied conditions. In the empirical study, RISE detected reasonable numbers of misfitting items compared to G2 and S‐X2, and RISE gave a much clearer picture of the location and magnitude of misfit for each misfitting item. In addition, there was no practical consequence to classification before and after replacement of misfitting items detected by three fit statistics.  相似文献   

20.
Sometimes, test‐takers may not be able to attempt all items to the best of their ability (with full effort) due to personal factors (e.g., low motivation) or testing conditions (e.g., time limit), resulting in poor performances on certain items, especially those located toward the end of a test. Standard item response theory (IRT) models fail to consider such testing behaviors. In this study, a new class of mixture IRT models was developed to account for such testing behavior in dichotomous and polytomous items, by assuming test‐takers were composed of multiple latent classes and by adding a decrement parameter to each latent class to describe performance decline. Parameter recovery, effect of model misspecification, and robustness of the linearity assumption in performance decline were evaluated using simulations. It was found that the parameters in the new models were recovered fairly well by using the freeware WinBUGS; the failure to account for such behavior by fitting standard IRT models resulted in overestimation of difficulty parameters on items located toward the end of the test and overestimation of test reliability; and the linearity assumption in performance decline was rather robust. An empirical example is provided to illustrate the applications and the implications of the new class of models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号