HTK训练错误消息意义

在HTK训练线上数据的时候,遇到了ERROR [+6550] LoadHTKLabels: Junk at end of HTK transcription,这个问题,网上查阅是说有空行,结果根本没有空行,最后查找可知,是标注文件lab里面有空格,转成lab.mlf文件后,空格单独成一行,所以造成了这个小困扰.

定位问题之后,就很好解决了:

sed -i ‘s/\s//g‘ *char.mlf
sed -i ‘/^$/d‘ *char.mlf

以下是网上找的错误消息的意义,以供后续参考.

UNDERSTANDING HTK ERROR MESSAGES

Various problems & solutions I‘ve come across in using HTK for building a WSJ recognizer and for my dissertation work in Language Modeling. If you‘re here to find answers for your own project, consider posting your problems & solutions on your own website, for others to learn from, too.
PROBLEM SOLUTION
HLEd -d prondict -i monophone.mlf mkphones0.led words.mlf
Does nothing, only #!MLF!# is returned in the output.
There need to be double quotes around the lab filename in the words.mlf file: "*/xxx.LAB" instead of ‘*/xxx.lab‘
HDMan -l hdman.log -w lists/all.wordlist lists/all.words.monophones.dict lists/cmudict.sort
ERROR [+1452] ReadDictProns: word A out of order in dict lists/cmudict.sort
FATAL ERROR - Terminating program HDMan
Unix sort doesn‘t seem to match the sort HTK is looking for. Python‘s sort function seems to work. Numbers are sorted with ‘.‘ before 0, shorter before longer (1 < 1.0 < 10 < 100)
HLEd -l ‘*‘ -d lists/allwords.prons.dict -i lists/all.phonemlf src/mkphones0.led lists/all.wordmlf
ERROR [+5013] ReadString: String too long
FATAL ERROR - Terminating program HLEd
Make changes to the pronunciation dictionary:
Replace all multiple spaces with single space;
Replace all tabs with single space;
Put a ‘\‘ before every double quote ("); %"
Put a ‘\‘ before any dictionary entry beginning with single quote (‘)
HLEd -l ‘*‘ -d lists/allwords.prons.dict.notabnospace -i lists/all.phonemlf src/mkphones0.led lists/all.wordmlf
ERROR [+1232] NumParts: Cannot find word ~ in dictionary
FATAL ERROR - Terminating program HLEd
Add that word to the dictionary, resort if necessary
ERROR [+1232] NumParts: Cannot find word MR.
STEINBERG in dictionary
FATAL ERROR - Terminating program HLEd
In the MLF file, the line "MR." ended with a slash, remove the slash from the MLF file.
HLEd -l ‘*‘ -d prondict -i train.monophone.mlf mkphones0.led train.rem.mlf
ERROR [+6550] LoadHTKList: Label Name Expected
FATAL ERROR - Terminating program HLEd
For all numbers in train.rem.mlf, precede them with \ so they don‘t look like a time.
HLEd -d train.prondict -i train.monophone.mlf mkphones0.led tdt4.arabicBN.mlf
ERROR [+1232] NumParts: Cannot find word #(tdAxl in dictionary
FATAL ERROR - Terminating program HLEd
some of these words ended in \) in the mlf, which was screwing with how it appears in the dictionary. I took out the \) in the mlf, now have to make sure everything has its correct entry in the prondict.
HLEd -d train.prondict -i train.monophone.mlf mkphones0.led tdt4.arabicBN.mlf
ERROR [+6550] LoadHTKLabels: Junk at end of HTK transcription
FATAL ERROR - Terminating program HLEd
Add -T 1 to the command line. Where it stops, look in the .mlf file for that transcription. There may be a blank line or something kooky in it. This will help you find a lot of the errors that HLEd comes up with.
HCopy -C configall -S wav2mfcc.scp
ERROR [+6270] OpenParmChannel: Cannot read parameterised WAV data
ERROR [+6313] OpenAsChannel: OpenParmChannel failed
ERROR [+6316] OpenBuffer: OpenAsChannel failed
ERROR [+1050] OpenParmFile: Config parameters invalid
FATAL ERROR - Terminating program HCopy
moved the HCopy configurations out of configall and into their own configuration file without the HCopy: prefixes
HCopy -C confighcopy -S wav2mfcc.arabicBN.scp -T 1
data2/20000610_0330_0430_voa_arb_spl0.wav -> data2/20000610_0330_0430_voa_arb_spl0.mfcc
ERROR [+6251] Input file is not in RIFF format
ERROR [+6213] OpenWaveInput: Get[format]HeaderInfo failed
ERROR [+6313] OpenAsChannel: OpenWaveInput failed
ERROR [+6316] OpenBuffer: OpenAsChannel failed
ERROR [+1050] OpenParmFile: Config parameters invalid
FATAL ERROR - Terminating program HCopy
seems to work if I put a single file on the command line
couldn‘t figure out the problem, but it worked when I used a different computer
maybe it‘s a 64-bit vs 32-bit problem?
HCompV -C src/ConfigHVite -f 0.01 -v 0.01 -m -S lists/train.plp.list -M hmm0 proto/hmm0/prototype_base
ERROR [+7032] FreezeOptions: vecSize not set
ERROR [+5105] AllocBlock: Cannot allocate block data of 4294967288 bytes
FATAL ERROR - Terminating program HCompV
Was using the wrong hmm0/prototype; make sure it has the appropriate lines at the top (how the MFCCs were defined, E_Z_A_D etc, means of one, variances of zero
HCompV -C src/ConfigHVite -f 0.01 -v 0.01 -m -S lists/train.plp.list -M hmm0 proto/hmm0/prototype
ERROR [+7031] GetTransMat: Bad Trans Mat Sum in Row 3
HMM Def Error: GetTransMat failed at line 40/col 14/char 1028 in proto/hmm0/prototype
ERROR [+7050] HMError:
ERROR [+7032] LoadHMMSet: GetHMMDef failed
ERROR [+2028] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HCompV
In the prototype file, at the matrix, the copy and paste had split up the lines, so the rows did not add up to one. Make sure each row fits on a single line.
HCompV -C configall -T 1 -A -D -m -M hmm0 -f 0.01 -S train_mfcc.list hmm0/prototype
ERROR [+5050] ReadConfigFile: = expected line 1/col 8/char 7 in configall
ERROR [+5020] InitShell: ReadConfigFile failed on file configall
ERROR [+2000] HCompV: InitShell failed
FATAL ERROR - Terminating program HCompV
If the first column of the config file lists the program name (HVite, HCopy, etc), make sure there is a colon after the name.
HCopy: TARGETKIND=MFCC_0_D_A
Also make sure any ‘#‘ for comments come at the beginning of the line, not the second column.
HCompV -c ConfigHVite -T 1 -A -D -m -M hmm0 -f 0.01 -S train_mfcc.list hmm0/prototype
No HTK Configuration Parameters Set
HCompV: Computing side based cepstral mean .....
ERROR [+2039] HCompV: AccGenUtt: speaker pattern matching failure on file: hmm0/prototype
The -c needs to be -C, or else the config file isn‘t read.
HCompV -C ConfigHVite -T 1 -A -D -m -M hmm0 -f 0.01 -S train_mfcc.list hmm0/prototype
ERROR [+2050] CheckData: Parameterisation in ./20001001_10.mfcc is incompatible with hmm hmm0/prototype
In hmm0/prototype, change USER to MFCC_0_D_A (when HCopy is run with MFCC_0 as the TARGETKIND
HCompV -C ConfigHVite -T 1 -A -D -m -M hmm0 -f 0.01 -S train_mfcc.list hmm0/prototype
ERROR [+2050] CheckData: Vector size in /data/data3/bromberg/fisher/segmented/fla_0069_122.mfcc[39] is incompatible with hmm hmm0/proto[13]
In the first line of hmm0/proto, which you need to create by hand in order to run HCompV, make sure the vecSize is the same as the size of the mfccs. Here its saying that the mfcc has 39 dimensions but the proto only calls for 13. Here is a sample script for making the proto file.
HCompV -A -T 1 -S trainsets/training-extfiles0 -l lineObservations -I labels.mlf -o lineObservations -m -M models/hmm0.0 hmmdefs/version1-hmm-top-23vec
Calculating Fixed Variance
HMM Prototype: hmmdefs/version1-hmm-top-23vec
Segment Label: lineObservations
Num Streams : 1
UpdatingMeans: Yes
Target Direct: models/hmm0.0
*** stack smashing detected ***:
HCompV terminated
HTK is 32-bit program. Install GCC 3.4 for it to run it on a 64 bit machine. .. otherwise some part works / some gets stack overflow.
HERest -C src/ConfigHVite -I lists/train.phonemlf -t 250.0 150.0 1000.0 -S train.mfcc.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 lists/monophones1
ERROR [-7324] StepBack: File ... bad data or over pruning
Possible problems include corrupt mfcc, non-matching or non-existent labels. In this case, I had to re-calculate the mean & variance for the prototype hmm using only 1/2 the data, and the problem went away. If every file is considered bad data, you may have derived the features wrong. Go back to HCopy and check the parameters (config file).
HERest -C src/ConfigHVite -I lists/train.phonemlf -t 250.0 150.0 1000.0 -S train.mfcc.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 lists/monophones1
Saving hmm‘s to dir hmm1
ERROR [+7031] PutTransMat: Row 4 of transition mat sum = 1.064684
FATAL ERROR - Terminating program HERest
Too much data. Use the -p option, splitting the input and processing over several machines, then doing a separate HERest pass with -p 0 to accumulate the accumulators. Or, as above, use a smaller portion of the data. Also, make sure that the file durations are spread evenly across lists. Don‘t put all the long files together, mix them up with short ones.
HERest -C src/ConfigHVite -I lists/train.phonemlf -t 250.0 150.0 1000.0 -S lists/train.plp.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 lists/monophones1
ERROR [+5010] InitSource: Cannot open source file hmm0/macros
ERROR [+7010] LoadAllMacros: Can‘t open file
ERROR [+5010] InitSource: Cannot open source file hmm0/hmmdefs
ERROR [+7010] LoadAllMacros: Can‘t open file
ERROR [+7050] LoadHMMSet: Macro name expected
ERROR [+2321] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HERest
Need to make ‘macros‘ file in hmm0 directory. Copy first few lines of the prototype into macros, then append to it the vFloors file.
HERest -C src/ConfigHVite -I lists/train.phonemlf -t 250.0 150.0 1000.0 -S lists/train.plp.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 lists/monophones1
ERROR [+5010] InitSource: Cannot open source file hmm0/hmmdefs
ERROR [+7010] LoadAllMacros: Can‘t open file
ERROR [+7050] LoadHMMSet: Macro name expected
ERROR [+2321] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HERest
Need to manually create hmmdefs file. From htkbook: "...hmmdefs containing a copy for each of the required monophone HMMs is constructed by manually copying the prototype and relabeling it for each required monophone (including sil)." Use the build_hmmdefs.py script. Add another copy of the hmm at the bottom with the label ‘sil‘.
HERest -C src/ConfigHVite -I lists/all.phonemlf -t 250.0 150.0 1000.0 -S lists/train.plp.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 lists/monophones1
Pruning-On[250.0 150.0 1000.0] ERROR [+6510] LOpen: Unable to open label file /scratch/ilana/wsj/data/WSJ0/SI_TR_S/01G/01GC020X.lab
FATAL ERROR - Terminating program HERest
The label file names in the all.phonemlf file were not in all caps. Changed the script that made the word-mlf file to have the filenames in all caps, then HLEd does the phone-mlf correctly.
HERest -A -C configall -p 3 -I train.monophone.mlf -S train.list3 -t 250.0 150.0 1000.0 -H hmm0/hmmdefs -M hmm1 monophones
ERROR [+6510] LOpen: Unable to open label file /data/data3/fisher/segmented/fla_0069_122.lab
FATAL ERROR - Terminating program HERest
In the mlf file, the filenames (utterance names) did not begin with */, so they couldn‘t be matched to the filenames in train.list3. Make sure filenames in the mlf begin with */ and are wrapped in quotation marks.
HERest -C src/ConfigHVite -I lists/all.phonemlf -t 250.0 150.0 1000.0 -S lists/train.plp.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 lists/monophones1
Pruning-On[250.0 150.0 1000.0]
WARNING [-7325] LoadUtterance: No labels in file
/scratch/ilana/wsj/data/WSJ0/SI_TR_S/01G/01GO031F.lab in HERest
Segmentation fault
Rework prompts2mlf_word.py to not let a file begin with ‘.‘; redo word and phone mlfs.
HERest -C src/ConfigHVite -I lists/all.phonemlf -t 250.0 150.0 1000.0 -S lists/train.plp.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 lists/monophones1
Pruning-On[250.0 150.0 1000.0]
ERROR [+7011] SaveHMMSet: Cannot create MMF file hmm1/macros
mkdir hmm1
HERest -C ConfigHVite -I 20001001_1.monophone.mlf -t 250.0 150.0 1000.0 -S train_plp.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 monophones
HMM Def Error: GetToken: Symbol expected at line 1/col 4/char 3 in hmm0/macros
ERROR [+7050] HMError:
HMM Def Error: GetOptions: GetToken failed at line 1/col 5/char 4 in hmm0/macros
ERROR [+7050] HMError:
HMM Def Error: LoadAllMacros: GetOptions Failed at line 1/col 0/char -1 in hmm0/macros
ERROR [+7050] HMError:
HMM Def Error: LoadAllMacros: Macro sym expected at line 1/col 0/char -1 in hmm0/hmmdefs
ERROR [+7050] HMError:
ERROR [+7050] LoadHMMSet: Macro name expected
ERROR [+2321] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HERest
hmmdefs file is screwy, the line ~h ‘aa‘ needs to come before the BEGINHMM line.
HERest -C ConfigHVite -I 20001001_1.monophone.mlf -t 250.0 150.0 1000.0 -S train_plp.list -H hmm0/macros -H hmm0/hmmdefs -M hmm1 monophones
Pruning-On[250.0 150.0 1000.0]
ERROR [+6510] LOpen: Unable to open label file 20001001_1.plp.lab
FATAL ERROR - Terminating program HERest
The filenames have to be matching within the mlf files and the individual names of the pfiles. xxx.lab and xxx.pf, no variations.
HERest -A -C configall -p 1 -I train.monophone.mlf -S train.list1 -t 250.0 150.0 1000.0 -H hmm0/hmmdefs -M hmm1 monophones
ERROR [+5105] AllocBlock: Cannot allocate block data of 5000000 bytes
FATAL ERROR - Terminating program HERest
One of the training files is too large for the system to process. Rerun the HERest command with -T 1, see what file it fails on, remove it from train.list1, and try again. (Alternatively split up that file and its transcript and replace the original file with its splits in the training list and the mlf.)
HHEd -H hmm4/macros -H hmm4/hmmdefs -M hmm5 sil.hed lists/monophones1
ERROR [+7030] GetHMMDef: Trans Mat Dimensions not 3 x 3
HMM Def Error: LoadAllMacros: GetHMMDef failed at char 188656 in hmm4/hmmdefs
ERROR [+7050] HMError:
ERROR [+7050] LoadHMMSet: Macro name expected
ERROR [+2628] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HHEd
Add sp to monophones 1; To make hmmdefs4, use this script: sil2sp.pl, which I got from this htk tutorial website.
HHEd -H hmm4/macros -H hmm4/hmmdefs -M hmm5 src/sil.hed lists/monophones1
WARNING [-2631] EditTransMat: No trans mats to edit! in HHEd
Was using wrong monophones list; must use the one updated with ‘sp‘
HERest -A -C configall -I train.monophone.sp.mlf -S train.list -t 250.0 150.0 1000.0 -H hmm5/hmmdefs -M hmm5 monophones.sp
Pruning-On[250.0 150.0 1000.0]
ERROR [+7332] Create Insts: Cannot have Tee models at start or end of transcription
FATAL ERROR - Terminating Program HERest
There is an ‘sp‘ short pause as the last symbol before ‘.‘ in the mlf. In a previous step there was an HLEd command with a set of commands in a file like ‘mkphones1.led‘. Make sure ‘IS sil sil‘ is in that .led file, which puts the ‘sil‘ at beginning and end of each utterance.
HERest -C src/ConfigHVite -I lists/all.sp.phonemlf -t 250.0 150.0 1000.0 -S lists/train.plp.list -H hmm5/macros -H hmm5/hmmdefs -M hmm6 lists/monophones1.sp
Pruning-On[250.0 150.0 1000.0]
ERROR [+7332] CreateInsts: Cannot have Tee models at start or end of transcription
FATAL ERROR - Terminating program HERest
Recreate phone mlf to have ‘sil‘ before and after each utterance; use "IS sil sil" in the .led file for HLed; if it still doesn‘t work, find by hand the utterances that end in ‘sp‘ and add ‘sil‘ before the period; or use the python command line to fix it.
   
HVite -l ‘*‘ -o SWT -b SILENCE -C ConfigHVite -a -H hmm7/macros -H hmm7/hmmdefs -i 20001001_1.realigned.monophone.mlf -m -t 250.0 -y lab -I 20001001_1.mlf -S train_plp.list prondict.sort.sp monophones
ERROR [+6510] LOpen: Unable to open label file 20001001_1.lab
FATAL ERROR - Terminating program HVite
name of .lab file in 20001001_1.mlf (word mlf) was wrong
HVite -l ‘*‘ -o SWT -b SILENCE -C ConfigHVite -a -H hmm7/macros -H hmm7/hmmdefs -i 20001001_1.realigned.monophone.mlf -m -t 250.0 -y lab -I 20001001_1.mlf -S train_plp.list prondict.sort.sp monophones
nothing appears in new transcription
Trying to realign the transcription using the current hmms. This may be the fault of me not training with enough data. Try just copying the original monophone transcript to realigned.monophone.mlf and continue
HERest -C configall -I 20001001_1-10.realigned.monophone.mlf -t 250.0 150.0 1000.0 -S train_mfcc.list -H hmm7/macros -H hmm7/hmmdefs -M hmm8 monophones
Pruning-On[250.0 150.0 1000.0]
ERROR [+6510] LOpen: Unable to open label file 20001001_5.lab
FATAL ERROR - Terminating program HERest
The HVite process did not create a label file for every utterance; some had no tokens surviving, including file 5. Need to go back to HVite and change some parameters to make sure it can get through all utterances. For instance, change the beam searching parameters with the -t flag (htkbook pg 301)
HERest -C configall -I train.realigned.monophone.mlf -t 250.0 150.0 1000.0 -p 0 -H hmm7/macros -H hmm7/hmmdefs -M hmm8 hmm8/HER1.acc hmm8/HER2.acc hmm8/HER3.acc hmm8/HER4.acc hmm8/HER5.acc hmm8/HER6.acc
ERROR [+7060] InitHMMSet: Expected newline after 2‘th HMM
ERROR [+2321] Initialise: MakeHMMSet failed
FATAL ERROR - Terminating program HERest
forgot to put ‘monophones‘ on the command line before the list of .acc files
HVite -T 1 -C configall -H hmm9/macros -H hmm9/hmmdefs -S train_mfcc.list -l ‘*‘ -i recog_mono2/monophones.mlf -o S -w wdnet -p 0.0 -s 5.0 prondict.sort.final monophones
Read 37 physical / 37 logical HMMs
WARNING [-8520] CreateSEIndex: No transitions to state 5 in HVite
WARNING [-8520] CreateSEIndex: No transitions to state 5 in HVite
Read lattice with 859 nodes / 1713 arcs
Created network with 6391 nodes / 7245 links
I‘m doing recognition at the hmm9 stage as part of debugging. There are a few hmms in hmm8/hmmdefs and hmm9/hmmdefs that have no transition from state 4 to state 5, including ‘O‘ and ‘silst‘. This is a problem. It start occurring after realignment. So for the two iterations of HERest after realignment, use -u mv
HHEd -B -H hmm9/macros -H hmm9/hmmdefs -M hmm10 src/mktri.hed lists/monophones1.sp
ERROR [+2635] FindBaseModel: Cannot Find HMM sl in Current List
FATAL ERROR - Terminating program HHEd
Found and removed ‘sl‘ from lists/triphones1
HHEd -T 1 -H hmm9/hmmdefs -M hmm10 mktri.hed monophones
HHEd 34/34 Models Loaded [5 states max, 1 mixes max]
CL triphones
Cloning current hmms to produce new set
{(*-.
Error ) expected
ERROR [+7230] EdError: item list parse error
FATAL ERROR - Terminating program HHEd
Because ‘.‘ is in the monophones list, the first triphone code in mktri.hed is invalid. remove it.
HERest -B -A -C configall -s stats -p 0 -I train.triphone.cw.mlf -t 250.0 150.0 1000.0 -H hmm11/macros -H hmm11/hmmdefs -M hmm12 triphones hmm12/HER1.acc hmm12/HER2.acc
Pruning-On[250.0 150.0 1000.0]
ERROR [+7191] Infinite WtAcc!
(or) ERROR [+7191] Infinite MuAcc!
FATAL ERROR - Terminating program HERest
Comes up in the accumulation process when doing a split re-estimation. WtAcc due to a row sum error on the transition matrix.(?)
Tried: -u mv in the command, gave me MuAcc instead
Tried: using files less than a minute in length
Tried: splitting data up into more parallel sections (4)
Tried: remove -B from the combining step in HERest, making the resulting hmm text form rather than binary. This worked, but I don‘t know why...
HHEd -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones
ERROR [+2662] AssignStructure: cannot find tree for U-r+sil state 5
FATAL ERROR - Terminating program HHEd
I‘m using 5 middle states but tree.hed only has TB lines for 3 middle states. Add more TB lines to tree.hed for states 5 & 6
HHEd -B -H hmm12/hmmdefs -M hmm13 tree.hed triphones
ERROR [+2662] AssignStructure: cannot find tree for t2-ay+D2 state 2
FATAL ERROR - Terminating program HHEd
One of the phonemes in the triphone listed has no indication of how to cluster it in tree.hed. Remove it from the prondict and start over (with a shortened monophone list), or remove it from fulllist and the HHEd command will run. If you have one like this you probably have a few, look carefully.
HHEd -B -H hmm12/macros -H hmm12/hmmdefs -M hmm13 src/tree.hed lists/triphones1 > log
ERROR [+2662] AssignStructure: cannot find tree for ax-sp+d state 2
FATAL ERROR - Terminating program HHEd
Recreate tree.hed using local monophone list, mkclscript from tutorial, then add to that QS part of tree.hed
HHEd -H hmm12/macros -H hmm12/hmmdefs -M hmm13 tree.hed triphones
ERROR [+2662] FindProtoModel: no proto for z-sp+A in hSet
FATAL ERROR - Terminating program HHEd
In the cross-word triphone models, the sp causes problems, so remove it from the monophone list, from the extra triphones, from tree.hed
HERest -B -C ConfigHVite -I 20001001_1.triphone.mlf -t 250.0 150.0 1000.0 -S train_plp.list -H hmm13/macros -H hmm13/hmmdefs -M hmm14 triphones
ERROR [+5010] InitSource: Cannot open source file Q-n+A
ERROR [+7010] LoadHMMSet: Can‘t find file
ERROR [+2321] Initialise: LoadHMMSet failed
FATAL ERROR - Terminating program HERest
The state-tying actually caused some states to be tied, meaning they get renamed. This is shown in the tiedlist created in the previous step with HHEd and CO "tiedlist" at the end of tree.hed. In the tiedlist output file, there are two columns in some places; the second column names the new label for the hmm in the first. It means those two are tied. So Q-n+A is tied to another triphone and thereby renamed. REPLACE TRIPHONES WITH TIEDLIST ON THE COMMAND LINE.
HERest -B -C configall -I train.triphone.mlf -t 250.0 150.0 1000.0 -S train.realigned.list -H hmm13/macros -H hmm13/hmmdefs -M hmm14 tiedlist
ERROR [+7231] InitSource: Cannot open source file y-l-A
FATAL ERROR - Terminating program HERest
There is a triphone that HERest is trying to reestimate that does not appear in the tiedlist. Recreate the fulllist (all possible triphones) and redo the HHEd step for decision tree tying etc.
HERest -T 1 -D -A -C configall -p 1 -I train.triphone.mlf -S train.mfcc.norm.list -t 250.0 150.0 1000.0 -H hmm15/hmmdefs -M hmm16 tiedlist
HERest ML Updating: Transitions Means Variances
Parallel-Mode[1] System is SHARED
51987 Logical/15201 Physical Models Loaded, VecSize=39
1 MMF input files
Pruning-On[250.0 150.0 1000.0]
Processing Data: fla_0130_96.mfcc; Label fla_0130_96.lab
Utterance prob per frame = -5.125980e+01
Processing Data: fla_0530_22.mfcc; Label fla_0530_22.lab
ERROR [+7321] CreateInsts: Unknown label m+H
FATAL ERROR - Terminating program HERest
add a line to the mktri.led file that has ‘NB sp‘ or ‘NB garbage‘ or ‘NB whatever‘ for whatever monophone for which you don‘t want the biphone context to be made. Go back and remake the triphone transcript, then try the re-estimation again.
HHEd -A -H mix_moreA/hmmdefs -M mix_moreA 10.hedscript tiedlist
WARNING [-2637] HeaviestMix: mix 4 in n2-O+sh2 has v.small gConst [-200000045056.000000] in HHEd
WARNING [-2637] HeaviestMix: mix 1 in n2-O+sh2 has v.small gConst [-170000023552.000000] in HHEd
WARNING [-2637] HeaviestMix: mix 3 in n2-O+sh2 has v.small gConst [-109999996928.000000] in HHEd
WARNING [-2637] HeaviestMix: mix 4 in n2-O+sh2 has v.small gConst [-140000002048.000000] in HHEd
ERROR [+2697] HeaviestMix: heaviest mix is defunct!
FATAL ERROR - Terminating program HHEd
Trying to increase the number of Gaussian mixtures for each hmm at the end of training, incrementing by 2 each time. From htkbook: "Defunct mixture components can be prevented by setting the -w option in HERest so that all mixture weights are floored to some level above MINMIX."
HVite -H hmm15/macros -H hmm15/hmmdefs -S lists/dt.list -l ‘*‘ -i recog/dt.out.mlf -w wdnet -p 0.0 -s 5.0 lists/allwords.prons.dict.final lists/tiedlist
ERROR [+8251] ReadLattice: Word worrisome not in dict
ERROR [+3210] DoAlignment: ReadLattice failed
FATAL ERROR - Terminating program HVite
Made sure vocab, wordnet were all uppercase; dictionary is all uppercase;
HVite -H hmm15/macros -H hmm15/hmmdefs -S lists/dt.list -l ‘*‘ -i recog/dt.out.mlf -w wdnet.upper -p 0.0 -s 5.0 lists/allwords.prons.dict.final lists/tiedlist ERROR [+8251] ReadLattice: Word -PAU- not in dict ERROR [+3210] DoAlignment: ReadLattice failed FATAL ERROR - Terminating program HVite add it to the pronunciation dictionary
HVite -H hmm15/macros -H hmm15/hmmdefs -S lists/dt.list -l ‘*‘ -i recog/dt.out.mlf -w wdnet.upper -p 0.0 -s 5.0 lists/allwords.prons.dict.addrecog lists/tiedlist
WARNING [-8221] InitPronHolders: Total of 77 duplicate pronunciations removed in HVite
ERROR [+8231] GetHCIModel: Cannot find hmm [???-]IY[+???]
FATAL ERROR - Terminating program HVite
Change dictionary and wdnet to all lowercase
HVite -T 1 -C src/ConfigHVite -H hmm15/hmmdefs -H hmm15/macros -S lists/dt.list -i recog/dt.out.mlf -o S -w wdnet.lower -p -10.0 -s 15.0 -t 450.0 250.0 40000.0 lists/allwords.rons.dict.addrecog.lower lists/tiedlist > recog.log
ERROR [+8250] ReadLattice: Premature end of lattice file before header
ERROR [+3210] DoAlignment: ReadLattice failed
FATAL ERROR - Terminating program HVite
Go back into wdnet.lower and uppercase the first line and the J,I,W,S,L etc
HVite -T 1 -C src/ConfigHVite -H hmm15/hmmdefs -H hmm15/macros -S lists/dt.list -i recog/dt.out.mlf -o S -w wdnet.lower -p -10.0 -s 15.0 -t 450.0 250.0 40000.0 lists/allwords.rons.dict.addrecog.lower lists/tiedlist > recog.log
ERROR [+8231] GetHCIModel: Cannot find hmm [l-]e[+sh]
FATAL ERROR - Terminating program HVite
Changed pronunciation of [inhalation] to l ey sh
There is a monophone somewhere in the dictionary, or in the monophone set, that is not represented as an hmm (try "cat hmm0/hmmdefs | grep ‘~h‘" to see what is represented). You may need to either change pronunciations in the dictionary to eliminate barely-used monophones or retrain with all of the monophones intact. It‘s possible if you generated the monophone list from the monophone transcript that some monophones in the prondict were left out, b/c they never occurred in the first pronunciation of any word. Try regenerating the monophone list from the dictionary using shell scripting instead of HLEd.
HVite -T 1 -C src/ConfigHVite -H hmm15/hmmdefs -H hmm15/macros -S lists/dt.list -i recog/dt.out.mlf -o S -w wdnet.lower -p -10.0 -s 15.0 -t 450.0 250.0 40000.0 lists/allwords.rons.dict.addrecog.lower lists/tiedlist > recog.log ERROR [+6313] OpenParmChannel: cannot read HTK Header in File /u/drspeech/data/WSJ0/SI_DT_05/050/050A0503.nst ERROR [+6313] OpenAsChannel: OpenParmChannel failed ERROR [+6316] OpenBuffer: OpenAsChannel failed ERROR [+3250] ProcessFile: Config parameters invalid FATAL ERROR - Terminating program HVite Changed dt.list to dt.plp.list
HVite -z lat -l $expname -C ../configall -t 150.0 -A -D -T 1 -w $expname.htk.lm -s 12.0 -p -10.0 -H ../hmmdefs.16 -S ../dev.mfcc0.list1 prondict.norm8.sort.sp ../tiedlist
ERROR [+8231] GetHCIModel: Cannot find hmm [u-]n[+???]
FATAL ERROR - Terminating program HVite
Haven‘t figured this one out. Can‘t find a pronunciation with fishy phonemes as mentioned. HDecode has no problem with all of the same inputs except for ARPA-based lm, and I don‘t see anything wrong with the htk-lattice-lm. So, I dunno.
HHEd -B -H hmm15/macros -H hmm15/hmmdefs -M hmm16 src/train_mix_inc_2.hed lists/train+cv.triphonemlf
ERROR [+7036] CreateHMM: multiple use of logical HMM name sp
ERROR [+7060] InitHMMSet: Error in CreateHMM
ERROR [+2628] Initialise: MakeHMMSet failed
FATAL ERROR - Terminating program HHEd
 
Reading dictionary from diss/lib/myprondict
ERROR [+8050] ReadDict: Probability malformed 2
ERROR [+8013] ReadDict: Dict format error
ERROR [+9999] Initialise: ReadDict failed
FATAL ERROR - Terminating program HDecode.long
problems in the pronunciation dictionary:
quotations and double quotes need backslash
brackets possibly need backslash
narrow down problem by reducing prondict to only a few lines and gradually adding until the error comes up
one of the last pronunciations has a non-existent phoneme (2), change it.
HDecode.long -z lat -l decodeLCA_nonums_wordLM -C configall -t 150.0 -A -D -T 1 -w lev.alltext.word.lm -s 12.0 -p -10.0 -H mix_moreA/hmmdefs -S lev.dev.mfcc.list4 levtrain.prondict.ver3 tiedlist
Reading dictionary from levtrain.prondict.ver3
Reading acoustic models...
Read 4163 physical / 230643 logical HMMs
ERROR [+9999] HLVNet: no model label for phone (uw-gar+gar)
FATAL ERROR - Terminating program HDecode.long
I have a ‘gar‘bage model that is like sp, should not belong to any triphones. In HDecode, only the phonemes associated with start and/or endnode are allowed to be monophone-only. Go back and add gar triphones to full_list, remake the tiedlist. Might need to add some info for gar to tree.hed. Re-estimate from there forward.
HDecode.long -z lat -l * -C ConfigHVite -t 150 -A -D -T 1 -w 20001001_1.lm -s 12.0 -p -10.0 -H hmm15/hmmdefs -S train_plp.list prondict.sort tiedlist
ERROR [+4019] HDecode: beam width expected
FATAL ERROR - Terminating program /u/drspeech/opt/htk-3.4/i586-linux/bin/HDecode.long
The value after the -t flag must be a float. Change to 150.0
HDecode.long -z lat -l * -C ConfigHVite -t 150.0 -A -D -T 4 -w 20001001_1.lm -s 12.0 -p -10.0 -H hmm15/hmmdefs -S train_plp.list prondict.sort tiedlist
ERROR [+9999] HDecode: cannot find STARTWORD ‘<s>‘
FATAL ERROR - Terminating program /u/drspeech/opt/htk-3.4/i586-linux/bin/HDecode.long
add <s> and <\s> to the pronunciation dictionary with a pronunciation of sil
HDecode.long -z lat -l * -C ConfigHVite -t 150.0 -A -D -T 4 -w 20001001_1.lm -s 12.0 -p -10.0 -H hmm15/hmmdefs -S train_plp.list prondict.sort tiedlist
ERROR [+9999] HDecode: cannot find file ‘sp‘
FATAL ERROR - Terminating program /u/drspeech/opt/htk-3.4/i586-linux/bin/HDecode.long
add
sp sil
to the end of the prondict
HDecode.long -z lat -l decodeA -C ../configall -t 150.0 -A -D -T 1 -w mix2.unk.knd.lm -s 12.0 -p -39.0 -H ../hmmdefs.16 -S ../dev.mfcc0.list4 prondict.expand ../tiedlist
FATAL ERROR - Terminating program HDecode.long
ERROR [+5010] InitSource: Cannot open source file f-uw+x
ERROR [+7010] LoadHMMSet: Can‘t find file
ERROR [+4128] Initialise: LoadHMMSet failed
There is a mismatch between the hmms that are defined in the hmmdefs file and those that are listed in the tiedlist. One or the other needs to change, probably the tiedlist. This may involve going back far enough to re-create the hmms used in the last HHEd command, so to recreate the tiedlist.
HDecode.long -z lat -l * -C ConfigHVite -t 150.0 -A -D -T 4 -w 20001001_1.lm -s 12.0 -p -10.0 -H hmm15/hmmdefs -S train_plp.list prondict.sort tiedlist
WARNING [-9999] no token survived to sent end! in HDecode.long
Segmentation fault
This is the model I built on a single sound file, so maybe that‘s the right answer...
ERROR [+9999] HLVNet: no model label for phone (.-q+r)
FATAL ERROR - Terminating program /u/drspeech/opt/htk-3.4/i586-linux/bin/HDecode.long
Remove ‘. .‘ from the pronunciation dictionary
HDecode.long -z lat -l decodeA -C configall -t 150.0 -A -D -T 1 -w p.3grams.lm -s 12.0 -p -10.0 -H mix_moreA/hmmdefs.16 -S dev.mfcc0.list3 prondict.norm8.sort tiedlist
ERROR [+9999] HLVNet: no model label for phone (x-sil+S)
FATAL ERROR - Terminating program HDecode.long
It shouldn‘t be looking for a triphone with ‘sil‘ in the middle. Search for ‘sil‘ in the pronunciation dictionary; the only words it should serve as pronunciation for are <s> and <\s> .
ERROR [+9999] HDecode: Incompatible parm kinds MFCC_0 vs. MFCC_D_A_0
FATAL ERROR - Terminating program /u/drspeech/opt/htk-3.4/i586-linux/bin/HDecode.long
Changed the format of the hmms to mfcc_d_a_0 even though the original files were made into MFCC_0. They‘ve gotta be the same. Use MFCC_D_A_0 in the hcopy config file.
Reading dictionary from diss/lib/myprondict
Reading acoustic models...Read 26745 physical / 250049 logical HMMs
ERROR [+9999] HLVNet: no model label for phone (sil-}+w)
FATAL ERROR - Terminating program HDecode.long
There are still some labels in the pronuncation dictionary that do not have defined acoustic models (}). Change those labels, which may have come in through the pronunciation-building script.
ERROR [+8113] ReadARPAngram: failed reading lm prob at char 1283900 in diss/data/language_model/fsms.4grams.64Kvocab.lm This error can be reproduced by having the wrong number of ngrams present in the lm file as compared to the number defined at the top of the file.
Make sure the LM and prondict have the same encoding.
Make sure all quotes and double quotes have a backslash
Make sure the lm and prondict contain \ and \<\\s\>.
WARNING [-8100] ReadARPAngram: unseen word ‘???‘ in ngram in HDecode.long Words in lm not present in pronunciation dict. Write a script to find them and add them in, being sure to resort the pronunciation dictionary afterwards.
ReadNGrams: 1827th 2Grams out of order The bigrams are not in alphabetical order.
HLRescore -n $lm -f -y crec -r 10.0 -t 150.0 -s 20.0 -p -42.0 -C configall -A -D -T 1 -S unconstrained.list $prondict.expand
Reading LM from mix1.unk.knd.lm
ERROR [+8150] ReadNGrams: 577308th 2Grams out of order
FATAL ERROR - Terminating program HLRescore
Nothing seems out of place in the LM which was made and not messed with. LM worked for HDecode.
HLRescore -n $lm -f -y crec -t 150.0 -s 12.0 -p -10.0 -C configall -A -D -T 1 $prondict $lattice
WARNING [-9999] word 0 not in LM wordlist in HLRescore
HLRescore: HLat.c:415: LatTopSort: Assertion `time+1 == lat->nn‘ failed.
Got rid of the second error (LatTopSort) by determinizing, minimizing, and topologically sorting the fsm before converting to pfsg & htklat.
Make sure that NULL (or whatever you‘ve substituted for NULL in the lattice) exists in both the pronunciation dictionary and the language model.
HLRescore -n $lm -f -y crec -t 150.0 -s 12.0 -p -10.0 -C configall -A -D -T 1 $prondict $lattice
ERROR [+8250] ReadLattice: Premature end of lattice file before header
ERROR [+4013] HLRescore: can‘t read lattice
FATAL ERROR - Terminating program HLRescore
In the lattice, change ‘NODES‘ to ‘N‘ and ‘LINKS‘ to ‘L‘.
HLRescore -n $lm -f -y crec -t 150.0 -s 12.0 -p -10.0 -C configall -A -D -T 1 $prondict $lattice
ERROR [+8251] ReadLattice: Word bEd:bEd not in dict
ERROR [+4013] HLRescore: can‘t read lattice
FATAL ERROR - Terminating program HLRescore
Needed to use transducer=1 to get the pfsg to print correctly with fsm-to-pfsg, but now the format is messing up HLRescore. Either change to transducer=0 in fsm-to-pfsg, or, use sed to change each term:term to term.
HResults -I reference.mlf /dev/null decoded.mlf
ERROR [+6550] LoadHTKList: Label Name Expected
FATAL ERROR - Terminating program HResults
In the reference.mlf file, there exists either a blank line, or a digit without backslash or quotes, or something else unpalatable to HResults. Use HResults -f to figure out which utterance it‘s in (the one _after_ the last one listed), and fix it. For instance, put quotations around a number or backslash a quote, etc.
HResults -I reference.mlf /dev/null hypothesis.mlf
ERROR [+6570] Get LabelList: n[1] > numLists[0]
FATAL ERROR - Terminating program HResults
Run the command again with -f to show full results. Look in the reference mlf file at the utterance _after_ the last one listed before the error shows up. It‘s empty. Put something there or remove it (must be removed from reference).
HResults -I reference.mlf /dev/null hypothesis.mlf
ERROR [+6510] LOpen: Unable to open label file NBCTV_MORNING_20070111.lab
FATAL ERROR - Terminating program HResults
One of the utterances in the hypothesis mlf does not have a corresponding utterance in the reference mlf. Either it‘s missing entirely or the names don‘t match, check spelling and capitalization of the filenames in the two mlfs.
HERest -A -D -T 10 -C configall -C hmmadapt6-1/config_adapt -S adapt6.list -I train.triphone.new.mlf -H hmmadapt6-1/hmmdefs.16 -H hmmadapt6-1/glob -K hmmadapt6-2 mllr -u a tiedlist
ERROR [+999] Components missing from Base Class list (4630 74080)
ERROR [+999] BaseClass check failed
FATAL ERROR - Terminating program HERest
Trying to do adaptation. Built regression tree using instructions here. But because my hmm definitions were incremented to 16 mixes, I had to change the last line of the global file to: CLASS 1 {*.state[2-4].mix[1-16]}
HERest -A -D -T 10 -C configall -C hmmadapt6-1/config_adapt -S adapt6.list -I train.triphone.new.mlf -H hmmadapt6-1/hmmdefs.16 -H hmmadapt6-1/glob -K hmmadapt6-2 mllr -u a tiedlist
ERROR [+999] Output xform mask *.%%% does not match filename data2/20001220_1530_1600_NTV_ARB/20001220_1530_1600_NTV_ARB_3.mfcc
FATAL ERROR - Terminating program HERest
Added -h data2*_ARB_??.mfcc -H hmmadapt6-1/hmmdefs.16 -H hmmadapt6-1/glob -u a -K hmmadapt6-2 -M hmmadapt6-2 -d hmmadapt6-1 tiedlist
ERROR [+7060] InitHMMSet: Expected newline after 1‘th HMM
ERROR [+2321] Initialise: MakeHMMSet failed
FATAL ERROR - Terminating program HERest
时间: 2024-10-25 02:27:04

HTK训练错误消息意义的相关文章

分析器错误消息: 无法识别的属性“targetFramework”。

配置错误 说明: 在处理向该请求提供服务所需的配置文件时出错.请检查下面的特定错误详细信息并适当地修改配置文件. 分析器错误消息: 无法识别的属性“targetFramework”.请注意属性名称区分大小写. 源错误: 行 9: </connectionStrings> 行 10: <system.web> 行 11: <compilation debug="true" targetFramework="4.0" /> 行 12:

访问svc 文件,编译器错误消息: CS0016,未能写入输出文件

编译错误              说明: 在编译向该请求提供服务所需资源的过程中出现错误.请检查下列特定错误详细信息并适当地修改源代码.             编译器错误消息: CS0016: 未能写入输出文件“c:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\ncs.qms.apphost.branch\930ee5f1\66b34343\App_global.asax.dorw-abx.dll”

编译器错误消息: CS0234: 命名空间“System.Web”中不存在类型或命名空间名称“Optimization”(是否缺少程序集引用?)

署名:冯兵 今天我遇上一个问题:编译器错误消息: CS0234: 命名空间“System.Web”中不存在类型或命名空间名称“Optimization”(是否缺少程序集引用?). 让我烦恼了很长一段时间,怎么调试也不知道错在哪,后来问师兄才知道错在哪. 起因:修改一个功能模块的代码,怕影响到其他功能模块的功能实现,所以我新建了一个区域,把要修改模块的相关代码都复制过去,把带波浪线的提示都修改完后,   运行就出现这个错误. 解决方案:找到复制前和复制后的视图文件夹下的文件名为“Web.confi

[c#]分析器错误消息: 发现不明确的匹配。

(1)相同的变量名称 protected System.Web.UI.WebControls.Label lbltitle; protected System.Web.UI.WebControls.Label lblTitle; (2) 查找你的aspx页面里面的控件是否有和你的aspx.cs里面的类变量重名(不区分大小写)的ID 具体如下: aspx页面中<input name="username" runat="server" type="tex

IIS 编译器错误消息 CS0016

一.错误描述:编译器错误消息: CS0016: 未能写入输出文件"c:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files\root\491c7102\33ebe5b7\App_Web_three.cshtml.62285242.fhbj74eg.dll"--"拒绝访问. " 二.解决办法 1)   在 C:\Windows 目录下找到 Temp 文件,没有就创建一个. 2)  

[原创]java WEB学习笔记71:Struts2 学习之路-- struts2常见的内建验证程序及注意点,短路验证,非字段验证,错误消息的重用

本博客的目的:①总结自己的学习过程,相当于学习笔记 ②将自己的经验分享给大家,相互学习,互相交流,不可商用 内容难免出现问题,欢迎指正,交流,探讨,可以留言,也可以通过以下方式联系. 本人互联网技术爱好者,互联网技术发烧友 微博:伊直都在0221 QQ:951226918 -----------------------------------------------------------------------------------------------------------------

您在基于 Windows 7 的或基于 Windows Server 2008 R2 的计算机上读取器中插入智能卡时出现错误消息:&quot;设备驱动程序软件未能成功安装&quot;

http://support.microsoft.com/kb/976832/zh-cn http://support.microsoft.com/kb/976832/zh-tw 症状 当智能卡插入智能卡阅读器后时,Windows 尝试下载并安装智能卡 minidrivers 通过插服务卡.如果自定义的加密服务提供程序未在系统上安装智能卡的驱动程序在任一预配置位置,如 Windows 更新. WSUS 或 intranet 路径不可用,在通知区域中将收到以下错误消息: 未能成功安装设备驱动程序软

VBA: 错误消息:&quot;类未注册&quot;插入用户窗体

症状 当您尝试插入用户窗体,当您在 Microsoft Visual Basic 编辑器中一在这篇文章,或者在您试图打开的文档包含用户窗体的开头列出的产品时,您可能会收到以下错误消息: 类没有注册.寻找与 CLSID 的对象: {AC9F2F90-E877-11CE-9F68-00AA00574A4F} 或 类没有注册.寻找与 CLSID 的对象: {C62A69F0-16DC-11CE-9E98-00AA00574A4F} 当您单击帮助,将显示以下消息: 不存在帮助主题.您的应用程序供应商联系

Oracle数据库错误消息

Oracle数据库错误消息 导出错误消息 l EXP-00000导出终止失败 原因:导出时产生Oracle错误. 操作:检查相应的Oracle错误消息. l EXP-00001数据域被截断 - 列长度=数字,缓冲区大小=数字,实际大小=数字 原因:数据缓冲区中列不适合. 操作:记录错误参数和消息,作为输出内部错误发送到Oracle Support Services(如果一个表不能完全导出,则导出不能继续). l EXP-00002写导出文件错误 原因:不能导入导出文件,可能由设备错误引起,通常伴