![]() Not all of the files are equally important. s5# ls data/trainĬmvn.scp feats.scp reco2file_and_channel segments spk2utt text utt2spk wav.scp The specific example we'll look at the Switchboard recipe in egs/swbd/s5. There are other directories such as "data/eval2000" (for a test set) that have essentially the same format ("essentially" because we may have an "stm" and "glm" file in the test directory, to enable sclite scoring). Note: there is nothing special about the directory name "data/train". If you want to prepare data which you will decode with an already existing system and an already existing language model, the "data" part is all you need to touch.Īs an example of the "data" part of the data preparation, look at the directory "data/train" in one of the example directories (assuming you have already run the scripts there). ![]() The "data" part relates to the specific recordings you have, and the "lang" part contains things that relate more to the language itself, such as the lexicon, the phone set, and various extra information about the phone set that Kaldi needs. One relates to "the data" (directories like data/train/) and one relates to "the language" (directories like data/lang/). The output of the data preparation stage consists of two sets of things. There are more commands after these in the WSJ script that relate to training language models locally (rather than using the ones supplied by LDC), but the ones above are the most important ones. Utils/prepare_lang.sh data/local/dict "" data/local/lang_tmp data/lang || exit 1 Local/wsj_data_prep.sh $wsj0/?-.? || exit 1 In the WSJ case the commands are: wsj0=/export/corpora5/LDC/LDC93S6B Utils/prepare_lang.sh data/local/dict '!SIL' data/local/lang data/lang || exit 1 In the case of RM these commands are: local/rm_data_prep.sh /export/corpora5/LDC/LDC93S3A/rm_comp || exit 1 For example, in the Resource Management (RM) setup it is local/rm_data_prep.sh. ![]() The parts in the sub-directory named local/ are always specific to the database. egs/rm/s5/run.sh) have a few commands at the top of them that relate to various phases of data preparation. In addition to this page, you can refer to the data preparation scripts in those directories. This page will assume that you are using the latest version of the example scripts (typically named "s5" in the example directories, e.g. This section explains how to prepare the data. After running the example scripts (see Kaldi tutorial), you may want to set up Kaldi to run with your own data. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. Archives
December 2022
Categories |