共查询到18条相似文献,搜索用时 187 毫秒
1.
2.
3.
通过分析语音特征参数的特点和说话人识别的基本方法,以线性预测倒谱系数为特征参数提取算法以及隐马尔可夫模型为建模算法,利用凌阳单片机作硬件平台,实现了声控锁的语音控制功能。实验结果表明,系统性能稳定,识别效果良好。 相似文献
4.
5.
一种基于改进的LPC参数倒谱分析的说话人识别方法 总被引:2,自引:0,他引:2
线性预测倒谱LPCC在说话人识别中已被广泛使用,文章以LPCC为基础进行Mel变换,得到新的特征参数LPMCC,一次作为说话人识别系统的特征参数,并在识别部分采用VQ和HMM相结合的方法进行建模和识别,实验证明该方法提高了系统的识别率。 相似文献
6.
7.
文章介绍了语音识别的基本原理以及用DSK6713实现语音识别算法的一些原则和方法,阐述了语音识别在DSP上的实现技术。系统使用梅尔倒谱系数(MFCC)作为特征参数,采用算法相对简单以及计算量较小的动态时间弯折算法(DTW)实现语音参数的匹配。用MATLAB实现DTW算法的仿真,进而将语音识别技术应用到DSP上,实验结果表明对特定人、小词汇量和孤立词的语音识别效果比较好。 相似文献
8.
为了适应强噪声环境下的语音识别,进行了基于美尔倒谱系数特征及隐马尔可夫模型的识别算法研究,主要对提取语音信号的线性预测系数、端点检测、语音特征参数提取、语音算法识别流程等进行了初步研究,并进行了说话人识别系统的仿真验证。 相似文献
9.
语音识别技术已经取得令人鼓舞的成就,市场上也出现了许多相对成熟的语音识别产品,但是大部分语音识别系统仍局限于特定的环境,距离真正的实用化还相差很远。本文以提高语音识别系统的鲁棒性为目标,进行了相关的实验和研究。 相似文献
10.
11.
在噪声鲁棒语音识别研究中,使用并行模型结合(parallel model combination, PMC)方法得到的模型理论上能够接近匹配噪声环境模型的性能,故成为噪声鲁棒语音识别的重要研究方向。本文首先提出了一种基于前后向差分动态参数的特征MFCC_FWD_BWD,该特征满足PMC对特征构造矩阵可逆的要求。在此基础上,提出了一种用于PMC的新模型——并行子状态隐马尔可夫模型(parallel sub-state hidden Markov model, PSSHMM),该模型每个状态包含平行关系的子状态,且子状态间存在转移关系。实验表明,PSSHMM模型在各种噪声和SNR下取得了较好的识别效果,特别是对于非平稳噪声,其鲁棒性能非常显著。 相似文献
12.
随着语音识别技术的发展,孤立词、小词汇量的语音识别系统在日常生活中得到广泛应用,本文提出了一种基于DSP的孤立词实时语音识别系统,并将动态时间规整技术运用到识别算法中。根据楼宇控制系统的特点,结合BACnet网络协议,把系统设计成BACnet设备的一个嵌入式子系统,从而把语音识别应用到楼宇控制系统中。结合了系统硬件速度快、算法高效的特点,实现了对楼宇更加实时、方便的控制。 相似文献
13.
采用能够反映人对语音的感知特性的Mel频率倒谱系数(MFCC)作为语音的特征参数,研究了基于MFCC的VQ的识别方法,对单独使用MFCC与使用MFCC和AMFCC结合的识别率进行比较,实验结果表明通过对说话人的特征参数进行倒谱提升之后,MFCC和△MFCC结合能更好地区分不同说话人。 相似文献
14.
In this paper we present novel ensemble classifier architectures and investigate their influence for offline cursive character recognition. Cursive characters are represented by feature sets that portray different aspects of character images for recognition purposes. The recognition accuracy can be improved by training ensemble of classifiers on the feature sets. Given the feature sets and the base classifiers, we have developed multiple ensemble classifier compositions under four architectures. The first three architectures are based on the use of multiple feature sets whereas the fourth architecture is based on the use of a unique feature set. Type-1 architecture is composed of homogeneous base classifiers and Type-2 architecture is constructed using heterogeneous base classifiers. Type-3 architecture is based on hierarchical fusion of decisions. In Type-4 architecture a unique feature set is learned by a set of homogeneous base classifiers with different learning parameters. The experimental results demonstrate that the recognition accuracy achieved using the proposed ensemble classifier (with best composition of base classifiers and feature sets) is better than the existing recognition accuracies for offline cursive character recognition. 相似文献
15.
Using an acoustic vector sensor (AVS), an efficient method has been presented recently for direction of arrival (DOA) estimation of multiple speech sources via the clustering of the inter-sensor data ratio (AVS-ISDR). Through extensive experiments on simulated and recorded data, we observed that the performance of the AVS-DOA method is largely dependent on the reliable extraction of the target speech dominated time–frequency points (TD-TFPs) which, however, may be degraded with the increase in the level of additive noise and room reverberation in the background. In this paper, inspired by the great success of deep learning in speech recognition, we design two new soft mask learners, namely deep neural network (DNN) and DNN cascaded with a support vector machine (DNN-SVM), for multi-source DOA estimation, where a novel feature, namely, the tandem local spectrogram block (TLSB) is used as the input to the system. Using our proposed soft mask learners, the TD-TFPs can be accurately extracted under different noisy and reverberant conditions. Additionally, the generated soft masks can be used to calculate the weighted centers of the ISDR-clusters for better DOA estimation as compared to the original center used in our previously proposed AVS-ISDR. Extensive experiments on simulated and recorded data have been presented to show the improved performance of our proposed methods over two baseline AVS-DOA methods in presence of noise and reverberation. 相似文献
16.
Language modeling (LM), providing a principled mechanism to associate quantitative scores to sequences of words or tokens, has long been an interesting yet challenging problem in the field of speech and language processing. The n-gram model is still the predominant method, while a number of disparate LM methods, exploring either lexical co-occurrence or topic cues, have been developed to complement the n-gram model with some success. In this paper, we explore a novel language modeling framework built on top of the notion of relevance for speech recognition, where the relationship between a search history and the word being predicted is discovered through different granularities of semantic context for relevance modeling. Empirical experiments on a large vocabulary continuous speech recognition (LVCSR) task seem to demonstrate that the various language models deduced from our framework are very comparable to existing language models both in terms of perplexity and recognition error rate reductions. 相似文献
17.
18.
Emotional expression and understanding are normal instincts of human beings, but automatical emotion recognition from speech without referring any language or linguistic information remains an unclosed problem. The limited size of existing emotional data samples, and the relative higher dimensionality have outstripped many dimensionality reduction and feature selection algorithms. This paper focuses on the data preprocessing techniques which aim to extract the most effective acoustic features to improve the performance of the emotion recognition. A novel algorithm is presented in this paper, which can be applied on a small sized data set with a high number of features. The presented algorithm integrates the advantages from a decision tree method and the random forest ensemble. Experiment results on a series of Chinese emotional speech data sets indicate that the presented algorithm can achieve improved results on emotional recognition, and outperform the commonly used Principle Component Analysis (PCA)/Multi-Dimensional Scaling (MDS) methods, and the more recently developed ISOMap dimensionality reduction method. 相似文献