Training Fastconformer-CTC on Kazakh Language
community. I just started working with NeMo. So, I want to train ASR based on Fastconformer-CTC with 100 hours of recorded calls in kazakh language. What type of training will be better for my situation, fine-tuning some pre-trained model or train it from scratch? Also can someone give info about the data processing and manifest?
Semantic chunking of an audio
I need to perform semantic chunking of a video. So far, I have resampled the audio to 16000 Hz and then used wav2vec2 for getting transcriptions of the audio.