Live transcription with word-level timestamps

I’m new in Machine learning. I have task to live transcribe audio from microphone and also make word-level transcripts of this live transcription instead of utterance level.

I tried these projects:

For live transcription I found projects on github:
1: collabora/WhisperLive
2: ggerganov/whisper.cpp
3: ufal/whisper_streaming
All of these 3 projects above create timestamps at utterance level.

For word_level timestamps:
1: linto-ai/whisper-timestamped
2: m-bain/whisperX
3: ggerganov/whisper.cpp
I think these projects have no live-transcription support. They just transcribe pre-recorded audio file.

I am not sure either of above projects are able to do both (live transcription and word-level timestamping) simultaneously.

Is there any project or algorithm which does both?

Maybe Whisper Model is not good for it?

