A Context Clustering Technique for Average Voice Models(1)

3. Shared Decision Tree Context Clustering

3.1 Training of Average Voice Model

A block diagram of the training stage of average voice model using the proposing technique is shown in Fig. 2. First, context dependent models without context clustering are separately trained for respective speakers to derive a decision tree for context clustering common to these speaker dependent models. Then, the decision tree, which we refer to as a shared decision tree, is constructed using an algorithm described in Sect. 3.3 from the speaker dependent models. Finally, all speaker dependent models are clustered using the shared decision tree. A Gaussian pdf of average voice model is obtained by combining all speakers’ Gaussian pdfs at every node of the tree. After the reestimation of parameters of the average voice model using training data of all speak- ers, state duration distributions is obtained for each speaker. Finally, state duration distributions of the av- erage voice model is obtained by applying the same procedure.
1. Fig. 2中提出的技术，来训练average voice model
2. 上下文相关的模型，没有经过上下文聚类，被分别训练

时间： 2024-11-08 05:03:13

A Context Clustering Technique for Average Voice Models(1)的相关文章

读论文《TransForm Mapping Using Shared Decision Tree Context Clustering for HMM-based Cross-Lingual Speech Synthesis》(3)

3.1. Shareddecisiontreecontextclustering(STC) STC [11] was originally proposed to avoid generating speaker-biased leaf nodes in the tree construction of an average voice model. 果然,这里作者说了一下STC技术的出处在什么地方然后简单的介绍了STC技术是用来解决什么问题的在average voice model的树的构

读论文《TransForm Mapping Using Shared Decision Tree Context Clustering for HMM-based Cross-Lingual Speech Synthesis》(2)

3 Cross-lingualspeakeradaptationusing STC with a bilingual corpus 第一段: In the state mapping technique described in the previous section, the mismatch of language characteristics affects the mapping performance of transformation matrices because onl

读论文《TransForm Mapping Using Shared Decision Tree Context Clustering for HMM-based Cross-Lingual Speech Synthesis》(1)

3. Cross-lingual speaker adaptation using STC with a bilingual corpus 第一段问题1,为什么要用双语语料库,双语语料库是同一个说话人的吗? cross-lingual speaker adaptation的开山鼻祖是Yijiang Wang的论文,而且也实现了代码,在HTS 2.2中. Yijiang Wang的做法是基于state mapping的,而本文作者的做法是基于STC,然后加上双语语料库注意一点,本文作者与Yij

常用数据库记录

记录一下常用的数据库. TIMIT也忘记当时从哪下的了,网上也没看到好一点的链接.TIMIT全称The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus, 是由德州仪器(TI).麻省理工学院(MIT)和坦福研究院(SRI)合作构建的声学-音素连续语音语料库.TIMIT数据集的语音采样频率为16kHz,一共包含6300个句子,由来自美国八个主要方言地区的630个人每人说出给定的10个句子,所有的句子都在音素级别(phone level)上

5、《Speech recognition with speech synthesis models by marginalising over decision tree leaves》_1

2.1 Decision Tree Marginalization 现在决策树边缘化的基本过程已经了解了简单叙述一下: 这个决策树是HMM合成的决策树给定的triphone标注是:r-ih+z 然后,根据给定的triphone标注,利用当前的语音合成的模型,去推理得到语音识别的模型对给定的triphone利用当前的语音合成的决策树,从根节点开始往下跑根节点问题,右边的是清音吗?右边的音明显是z,是浊音, 所以前往左边的节点,然后问题是:音节是重度吗?擦,这个问题在上下文信息中是没有的,怎

spark MLlib 概念 4：协同过滤（CF）

1. 定义协同过滤(Collaborative Filtering)有狭义和广义两种意义: 广义协同过滤:对来源不同的数据,根据他们的共同点做过滤处理. Collaborative filtering (CF) is a technique used by some recommender systems.[1] Collaborative filtering has two senses, a narrow one and a more general one.[2] In general,

时空上下文视觉跟踪（STC）算法的解读与代码复现(转)

本文转载自zouxy09博客,原文地址为http://blog.csdn.net/zouxy09/article/details/16889905:在此,仅当笔记mark一下及给大家分享一下. 时空上下文视觉跟踪(STC)算法的解读与代码复现 [email protected] http://blog.csdn.net/zouxy09 本博文主要是关注一篇视觉跟踪的论文.这篇论文是Kaihua Zhang等人今年投稿到一个会议的文章,因为会议还没有出结果,所以作者还没有发布他的Matlab源代码

self attention pytorch代码

实现细节; 1.embedding 层 2.positional encoding层:添加位置信息 3,MultiHeadAttention层:encoder的self attention 4,sublayerConnection层:add&norm,使用layerNorm, 5,FeedForward层:两层全连接 6,Masked MultiHeadAttention:decoder中的self attention层,添加mask,不考虑计算当前位置的后面信息 7,MultiHeadAtte

django 投票小项目

创建一个mysite项目 django-admin startproject mysite 创建一个polls应用 django-admin startapp polls mkdir polls/templates/ 创建模板文件夹 polls.views.py 内容如下 from django.shortcuts import get_object_or_404, render from django.core.urlresolvers import reverse from django.v