I trust you’re doing well. Currently, I’m immersed in a project aiming to convert speech to text, leveraging the CommonVoice dataset. I’m keen on utilizing seq2seq models for this endeavor, yet I lack guidance on its implementation. Could you please direct me to a resource or share similar code? Any assistance on the necessary steps would be greatly appreciated. Many thanks in advance for your support!

