I’m new in Machine learning. I have task to live transcribe audio from microphone and also make word-level transcripts of this live transcription instead of utterance level.
I tried these projects:
For live transcription I found projects on github:
1: collabora/WhisperLive
2: ggerganov/whisper.cpp
3: ufal/whisper_streaming
All of these 3 projects above create timestamps at utterance level.
For word_level timestamps:
1: linto-ai/whisper-timestamped
2: m-bain/whisperX
3: ggerganov/whisper.cpp
I think these projects have no live-transcription support. They just transcribe pre-recorded audio file.
I am not sure either of above projects are able to do both (live transcription and word-level timestamping) simultaneously.
Is there any project or algorithm which does both?
Maybe Whisper Model is not good for it?