Segmental Contrastive Predictive Coding for speech segmentation
Does anyone know of any past python (either torch or tensorflow) implementations of this paper, where they use contrastive learning to segment speech, that I could reference? I’m trying to train a model from scratch to segment audio into syllables. So far I have only found implementations of this in other contexts (i.e. image segmentation etc).