期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

于震王朝立刘伶俐《中国科技信息》2013,(1):49+56

本文通过实验对比,在语音识别的特征参数方面进行了有效的改进,创新内容是改善Mel频谱倒谱系数(MFCC),将12阶Mel频谱倒谱系数减为11阶,通过实验证明,改进后的参数有效提高了实验的识别率。实验主要采用删减特征分量的方法研究MFCC各阶参数对非特定人特定语音识别的贡献,并通过大量重复性实验得出验证,不同的参数选择对语音识别确实有不同的贡献,而且针对不同的语本模型,贡献也不同。相似文献

2.

基于MFCC的说话人语音识别系统的研究

《黑龙江科技信息》2015,(27)

说话人识别是当前语音识别的研究热点之一。本文主要研究了以下几个方面:说话人语音识别系统,对能够反映人对语音感知特性的Mel频率倒谱系数(MFCC)作为特征参数进行提取。同时,分析了概率神经网络PNN,概率神经网络是性能良好的分类神经网络。实验结果表明,概率神经网络PNN对训练的语音样本有着很高的分类准确率。相似文献

3.

基于凌阳单片机的嵌入式声控门锁的设计

徐春辉《科技广场》2007,(5):208-210

通过分析语音特征参数的特点和说话人识别的基本方法,以线性预测倒谱系数为特征参数提取算法以及隐马尔可夫模型为建模算法,利用凌阳单片机作硬件平台,实现了声控锁的语音控制功能。实验结果表明,系统性能稳定,识别效果良好。相似文献

4.

感觉加权滤波在安多藏语特征提取中的应用

《科技通报》2016,(8)

语音信号的特征提取是语音识别中重要的环节之一,特征提取是否准确决定着语音识别的识别率,不同的语音信号有着不同的特征提取方法,本文针对安多藏语的语音特征,进行线性预测分析,对线性预测余量信号通过感觉加权滤波后重新提取特征,使之具有更高的精确度,更好的稳健性。相似文献

5.

一种基于改进的LPC参数倒谱分析的说话人识别方法 总被引：2，自引：0，他引：2

王婧朱黎《大众科技》2008,(8):28-29

线性预测倒谱LPCC在说话人识别中已被广泛使用,文章以LPCC为基础进行Mel变换,得到新的特征参数LPMCC,一次作为说话人识别系统的特征参数,并在识别部分采用VQ和HMM相结合的方法进行建模和识别,实验证明该方法提高了系统的识别率。相似文献

6.

面向声纹识别的藏语特征提取研究

《西藏科技》2016,(11)

藏族的主要语种,基于其上的声纹识别技术具有重大的研究意义;而在声纹识别过程中,语音特征参数的选择和精确度直接影响了声纹识别的准确率。文章针对藏语声纹识别的需要,选取MFCC为特征参数,对藏语语音的特征提取进行了研究和实践。相似文献

7.

基于DSP的语音识别系统研究

翟片富景新幸《大众科技》2013,(12):16-18

文章介绍了语音识别的基本原理以及用DSK6713实现语音识别算法的一些原则和方法,阐述了语音识别在DSP上的实现技术。系统使用梅尔倒谱系数（MFCC）作为特征参数,采用算法相对简单以及计算量较小的动态时间弯折算法（DTW）实现语音参数的匹配。用MATLAB实现DTW算法的仿真,进而将语音识别技术应用到DSP上,实验结果表明对特定人、小词汇量和孤立词的语音识别效果比较好。相似文献

8.

基于美尔倒谱系数及隐马尔可夫模型的说话人识别系统研究

夏晶《黑龙江科技信息》2012,(30):4-6

为了适应强噪声环境下的语音识别,进行了基于美尔倒谱系数特征及隐马尔可夫模型的识别算法研究,主要对提取语音信号的线性预测系数、端点检测、语音特征参数提取、语音算法识别流程等进行了初步研究,并进行了说话人识别系统的仿真验证。相似文献

9.

基于神经网络的语音识别鲁棒性研究

朱海涛《中国科技信息》2008,37(5):276-277

语音识别技术已经取得令人鼓舞的成就,市场上也出现了许多相对成熟的语音识别产品,但是大部分语音识别系统仍局限于特定的环境,距离真正的实用化还相差很远。本文以提高语音识别系统的鲁棒性为目标,进行了相关的实验和研究。相似文献

10.

基于DSP的语音识别系统设计

李俊周海滨邱胜林张艳珍蔡晓燕《科技广场》2011,(7):118-122

系统以16位数字信号处理器TMS320VC5502为核心,采用音频Codec芯片TLV320AIC23对语音信号进行采集和编码转换,通过端点检测、特征参数提取、DTW算法等关键技术实现特定人、小词汇量、孤立词的语音识别,最终根据LED闪灯次数检测数字0～9的识别结果。相似文献

11.

基于PMC方法的鲁棒声学模型研究

张明新倪宏张东滨陈国平《中国科学院研究生院学报》2006,23(5):660-664

在噪声鲁棒语音识别研究中，使用并行模型结合(parallel model combination, PMC)方法得到的模型理论上能够接近匹配噪声环境模型的性能，故成为噪声鲁棒语音识别的重要研究方向。本文首先提出了一种基于前后向差分动态参数的特征MFCC_FWD_BWD，该特征满足PMC对特征构造矩阵可逆的要求。在此基础上，提出了一种用于PMC的新模型——并行子状态隐马尔可夫模型(parallel sub-state hidden Markov model, PSSHMM)，该模型每个状态包含平行关系的子状态，且子状态间存在转移关系。实验表明，PSSHMM模型在各种噪声和SNR下取得了较好的识别效果，特别是对于非平稳噪声，其鲁棒性能非常显著。相似文献

12.

语音识别技术在楼宇自控系统中的应用

郭莉莉《科技广场》2010,(1):150-153

随着语音识别技术的发展,孤立词、小词汇量的语音识别系统在日常生活中得到广泛应用,本文提出了一种基于DSP的孤立词实时语音识别系统,并将动态时间规整技术运用到识别算法中。根据楼宇控制系统的特点,结合BACnet网络协议,把系统设计成BACnet设备的一个嵌入式子系统,从而把语音识别应用到楼宇控制系统中。结合了系统硬件速度快、算法高效的特点,实现了对楼宇更加实时、方便的控制。相似文献

13.

基于VQ的声纹识别研究

张旺俏《中国科技信息》2007,28(7):124-125,127

采用能够反映人对语音的感知特性的Mel频率倒谱系数（MFCC）作为语音的特征参数，研究了基于MFCC的VQ的识别方法，对单独使用MFCC与使用MFCC和AMFCC结合的识别率进行比较，实验结果表明通过对说话人的特征参数进行倒谱提升之后，MFCC和△MFCC结合能更好地区分不同说话人。相似文献

14.

Effect of ensemble classifier composition on offline cursive character recognition

Ashfaqur Rahman Brijesh Verma 《Information processing & management》2013

In this paper we present novel ensemble classifier architectures and investigate their influence for offline cursive character recognition. Cursive characters are represented by feature sets that portray different aspects of character images for recognition purposes. The recognition accuracy can be improved by training ensemble of classifiers on the feature sets. Given the feature sets and the base classifiers, we have developed multiple ensemble classifier compositions under four architectures. The first three architectures are based on the use of multiple feature sets whereas the fourth architecture is based on the use of a unique feature set. Type-1 architecture is composed of homogeneous base classifiers and Type-2 architecture is constructed using heterogeneous base classifiers. Type-3 architecture is based on hierarchical fusion of decisions. In Type-4 architecture a unique feature set is learned by a set of homogeneous base classifiers with different learning parameters. The experimental results demonstrate that the recognition accuracy achieved using the proposed ensemble classifier (with best composition of base classifiers and feature sets) is better than the existing recognition accuracies for offline cursive character recognition. 相似文献

15.

Learning soft mask with DNN and DNN-SVM for multi-speaker DOA estimation using an acoustic vector sensor

Disong Wang Yuexian Zou Wenwu Wang 《Journal of The Franklin Institute》2018,355(4):1692-1709

Using an acoustic vector sensor (AVS), an efficient method has been presented recently for direction of arrival (DOA) estimation of multiple speech sources via the clustering of the inter-sensor data ratio (AVS-ISDR). Through extensive experiments on simulated and recorded data, we observed that the performance of the AVS-DOA method is largely dependent on the reliable extraction of the target speech dominated time–frequency points (TD-TFPs) which, however, may be degraded with the increase in the level of additive noise and room reverberation in the background. In this paper, inspired by the great success of deep learning in speech recognition, we design two new soft mask learners, namely deep neural network (DNN) and DNN cascaded with a support vector machine (DNN-SVM), for multi-source DOA estimation, where a novel feature, namely, the tandem local spectrogram block (TLSB) is used as the input to the system. Using our proposed soft mask learners, the TD-TFPs can be accurately extracted under different noisy and reverberant conditions. Additionally, the generated soft masks can be used to calculate the weighted centers of the ISDR-clusters for better DOA estimation as compared to the original center used in our previously proposed AVS-ISDR. Extensive experiments on simulated and recorded data have been presented to show the improved performance of our proposed methods over two baseline AVS-DOA methods in presence of noise and reverberation. 相似文献

16.

Leveraging relevance cues for language modeling in speech recognition

Berlin Chen Kuan-Yu Chen 《Information processing & management》2013

Language modeling (LM), providing a principled mechanism to associate quantitative scores to sequences of words or tokens, has long been an interesting yet challenging problem in the field of speech and language processing. The n-gram model is still the predominant method, while a number of disparate LM methods, exploring either lexical co-occurrence or topic cues, have been developed to complement the n-gram model with some success. In this paper, we explore a novel language modeling framework built on top of the notion of relevance for speech recognition, where the relationship between a search history and the word being predicted is discovered through different granularities of semantic context for relevance modeling. Empirical experiments on a large vocabulary continuous speech recognition (LVCSR) task seem to demonstrate that the various language models deduced from our framework are very comparable to existing language models both in terms of perplexity and recognition error rate reductions. 相似文献

17.

基于TD-PSOLA算法的汉语普通话韵律合成 总被引：6，自引：0，他引：6

张后旗俞振利张礼和《科技通报》2002,18(1):6-9,13

结合汉语普通话的韵律特征，采用TD－PSOLA算法实现了汉语普通话的韵律合成，并对合成语音和原始语音的韵律参数作了比较分析。实验结果表明，这种方法能够有效地控制语音韵律参数，实现较高质量的语音韵律合成。相似文献

18.

Acoustic feature selection for automatic emotion recognition from speech 总被引：1，自引：0，他引：1

Jia Rong Gang Li Yi-Ping Phoebe Chen 《Information processing & management》2009

Emotional expression and understanding are normal instincts of human beings, but automatical emotion recognition from speech without referring any language or linguistic information remains an unclosed problem. The limited size of existing emotional data samples, and the relative higher dimensionality have outstripped many dimensionality reduction and feature selection algorithms. This paper focuses on the data preprocessing techniques which aim to extract the most effective acoustic features to improve the performance of the emotion recognition. A novel algorithm is presented in this paper, which can be applied on a small sized data set with a high number of features. The presented algorithm integrates the advantages from a decision tree method and the random forest ensemble. Experiment results on a series of Chinese emotional speech data sets indicate that the presented algorithm can achieve improved results on emotional recognition, and outperform the commonly used Principle Component Analysis (PCA)/Multi-Dimensional Scaling (MDS) methods, and the more recently developed ISOMap dimensionality reduction method. 相似文献