排序方式: 共有16条查询结果,搜索用时 15 毫秒
1.
2.
基于统计频率的文本分类特征选择算法研究* 总被引:1,自引:0,他引:1
通过分析χ2统计量(Chi-square, CHI)的缺陷和不足,针对它对低文档频的特征项不可靠,而且不能说明词条和类别的相关性的缺点,对其进行改进,提出统计频率(Statistical Frequency, SF )算法。实验结果表明,统计频率算法能够弥补这些不足,在文本分类中表现出良好的分类效果。 相似文献
3.
大学考试试题质量的量化研究 总被引:1,自引:1,他引:0
考试试题质量是指测验人员所选样本对测试对象属性的测量程度,具体表现为教师编制试题的难易度、信效度以及区分度。本研究随机抽取河西学院2007-2008学年第二学期8个系的16门考试课程为研究样本,对考试结果进行统计分析。结果表明:(1)期末成绩和总评成绩分布大部分呈正态分布,部分考试成绩分布呈现出不对称和尖峰态;(2)期末考试成绩等级上,理科在优秀和良好的比例高于文科在同一等级上的比例,同时,理科在不及格上的比例高于文科。总评考试成绩等级上,理科在优秀和良好的比例高于文科在同一等级上的比例,但是,理科在不及格等级上的比例低于文科,反映出平时成绩对总评成绩的随意性影响;(3)期末试题的难度系数高,试题难度较小,并且文理科和难度系数等级之间的卡方检验无显著的统计学意义;(4)期末试题区分度偏低,并且文理科和区分度系数等级之间的卡方检验无显著的统计学意义。 相似文献
4.
Mary L. McHugh 《Biochemia medica : ?asopis Hrvatskoga dru?tva medicinskih biokemi?ara / HDMB》2013,23(2):143-149
The Chi-square statistic is a non-parametric (distribution free) tool designed to analyze group differences when the dependent variable is measured at a nominal level. Like all non-parametric statistics, the Chi-square is robust with respect to the distribution of the data. Specifically, it does not require equality of variances among the study groups or homoscedasticity in the data. It permits evaluation of both dichotomous independent variables, and of multiple group studies. Unlike many other non-parametric and some parametric statistics, the calculations needed to compute the Chi-square provide considerable information about how each of the groups performed in the study. This richness of detail allows the researcher to understand the results and thus to derive more detailed information from this statistic than from many others.The Chi-square is a significance statistic, and should be followed with a strength statistic. The Cramer’s V is the most common strength test used to test the data when a significant Chi-square result has been obtained. Advantages of the Chi-square include its robustness with respect to distribution of the data, its ease of computation, the detailed information that can be derived from the test, its use in studies for which parametric assumptions cannot be met, and its flexibility in handling data from both two group and multiple group studies. Limitations include its sample size requirements, difficulty of interpretation when there are large numbers of categories (20 or more) in the independent or dependent variables, and tendency of the Cramer’s V to produce relative low correlation measures, even for highly significant results. 相似文献
5.
6.
7.
信息隐藏和检测技术已日趋重要且应用广泛,隐藏分析的目的在于检测媒体中隐藏的信息。本文针对JPEG图像文件的隐藏检测技术应用,介绍了几种典型的隐写术的原理和弱点。通过对相关隐藏算法的研究分析,讨论了基于统计特性的检测隐藏信息的方法,并在结论中给出了相关的实验结果。 相似文献
8.
Previous researchers have attempted to detect significant topics in news stories and blogs through the use of word frequency-based methods applied to RSS feeds. In this paper, the three statistical feature selection methods: χ2, Mutual Information (MI) and Information Gain (I) are proposed as alternative approaches for ranking term significance in an evolving RSS feed corpus. The extent to which the three methods agree with each other on determining the degree of the significance of a term on a certain date is investigated as well as the assumption that larger values tend to indicate more significant terms. An experimental evaluation was carried out with 39 different levels of data reduction to evaluate the three methods for differing degrees of significance. The three methods showed a significant degree of disagreement for a number of terms assigned an extremely large value. Hence, the assumption that the larger a value, the higher the degree of the significance of a term should be treated cautiously. Moreover, MI and I show significant disagreement. This suggests that MI is different in the way it ranks significant terms, as MI does not take the absence of a term into account, although I does. I, however, has a higher degree of term reduction than MI and χ2. This can result in loosing some significant terms. In summary, χ2 seems to be the best method to determine term significance for RSS feeds, as χ2 identifies both types of significant behavior. The χ2 method, however, is far from perfect as an extremely high value can be assigned to relatively insignificant terms. 相似文献
9.
10.
Although Chi-square analysis of contingency tables is almost always taught in introductory courses students often show little understanding. The authors here introduce a method to remedy this. 相似文献