Hi, I'm Thomas Ferraz, a PhD Track Student at École Polytechnique / Télécom Paris (Institut Polytechnique de Paris), specializing in Natural Language Processing (NLP) with a focus on large language models, multilingual NLP, and low-resource languages. I hold a Master’s degree in Applied Math & AI (Master MVA) from ENS Paris-Saclay and an engineering degree from the Universidade de São Paulo, where I graduated top of my class. My research aims to advance methods for building robust, efficient, and inclusive NLP systems, particularly for underrepresented languages. I have gained valuable industry experience through research internships at Meta, Amazon, Apple, and NAVER Labs, where I contributed to projects on Efficient ML, multilingual NLP, Multilingual ASR, and LLMs Instruction-following.
A selection of papers that reflect my main research focus and contributions.
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints Thomas Palmeira Ferraz, Kartik Mehta, Yu-Hsiang Lin, Haw-Shiuan Chang, Shereen Oraby, Sijia Liu, Vivek Subramanian, Tagyoung Chung, Mohit Bansal, Nanyun Peng EMNLP, 2024 & Sys2Reasoning @ NeurIPS, 2024 TL;DR: Introducing RealInstruct to evaluate LLMs on real multi-constrained instructions, and DeCRIM self-correction that improves instruction following decomposing requests and refining responses, enabling open LLMs to outperform GPT-4 with strong feedback. | |
Multilingual DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts Thomas Palmeira Ferraz, Marcely Zanon Boito, Caroline Brun, Vassilina Nikoulina ICASSP, 2024 TL;DR: Propose a lightweight adaptation method to bridge the gap between small and large models on under-represented languages. It leverages language-specific experts and knowledge distillation from the larger model, outperforming fine-tuning and LoRA while adding minimal overhead. | |
ZeroBERTo: Leveraging Zero-Shot Text Classification by Topic Modeling Alexandre Alcoforado, Thomas Palmeira Ferraz, Rodrigo Gerber, Enzo Bustos, André Seidel Oliveira, Bruno Miguel Veloso, Fabio Levy Siqueira, Anna Helena Reali Costa PROPOR, 2022 TL;DR: ZeroBERTo, a hybrid model for zero-shot text classification combining topic modeling with language models, overcoming input size limitations and reducing runtime, achieving 12% better F1 score and 13x faster inference compared to XLM-R on Portuguese benchmark. |