Efficient Compression of Multitask Multilingual Speech Models
Master Dissertation, Télécom Paris, Institut Polytechnique de Paris
TL;DR: We find that Whisper speaker and model-related biases are worsened by light compression (quantization), for low-resource languages and small models, and to address this we propose a distilled modular variant that markedly improves it for under-represented languages while preserving its multilingual, multitask strengths with minimal overhead.