Live transcription with word-level timestamps

  Kiến thức lập trình

I’m new in Machine learning. I have task to live transcribe audio from microphone and also make word-level transcripts of this live transcription instead of utterance level.

I tried these projects:

For live transcription I found projects on github:
1: collabora/WhisperLive
2: ggerganov/whisper.cpp
3: ufal/whisper_streaming
All of these 3 projects above create timestamps at utterance level.

For word_level timestamps:
1: linto-ai/whisper-timestamped
2: m-bain/whisperX
3: ggerganov/whisper.cpp
I think these projects have no live-transcription support. They just transcribe pre-recorded audio file.

I am not sure either of above projects are able to do both (live transcription and word-level timestamping) simultaneously.

Is there any project or algorithm which does both?

Maybe Whisper Model is not good for it?

Theme wordpress giá rẻ Theme wordpress giá rẻ Thiết kế website