Publications

My research on ML systems efficiency spans three main areas: developing efficient inference methods, addressing resource constraints in multilingual settings, and building practical applications for social impact.

ML Efficiency & Systems

Training-Free Semantic Deferrals for Open-Ended LLM Cascades

Duncan Soiffer, Steven Kolawole, Virginia Smith

ICML 2025 (ES-FOMO)

Extends agreement-based cascading to open-ended generation tasks, enabling efficient routing for language model queries without requiring additional training data or model modifications.

Privacy Isn't Free: Benchmarking the Systems Cost of Privacy-Preserving ML

Nnaemeka Obiefuna, Samuel Oyeneye, Similoluwa Odunaiya, Iremide Oyelaja, Steven Kolawole

ICML 2025 (ES-FOMO)

Comprehensive benchmarking study examining the computational/energy overhead of privacy-preserving techniques in ML; conducted with independent student researchers at ML Collective.

PARALLELPROMPT: Extracting Parallelism from Large Language Model Queries

Steven Kolawole, Keshav Santhanam, Virginia Smith, Pratiksha Thaker

NeurIPS 2024 (AFM)

Introduces PARALLELPROMPT, a benchmark revealing that 10% of natural user queries contain latent parallelism. Demonstrates semantic decomposition methods achieving up to 5× speedups without hardware modifications.

Agreement-Based Cascading for Efficient Inference

Steven Kolawole*, Don Dennis*, Ameet Talwalkar, Virginia Smith

Under Review, 2025

Develops a training-free cascading framework using ensemble agreement as a confidence signal for routing. Achieves 2-25× cost reductions while maintaining/improving accuracy over diverse tasks.

Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes

Steven Kolawole*, Lucio Dery*, Jean-François Kagy, Virginia Smith, Graham Neubig, Ameet Talwalkar

Under Review, 2025

Presents Bonsai, a forward-pass-only structured pruning method that outperforms gradient-based approaches while using 3× less memory, making model compression accessible on everyday hardware.

Resource-Constrained NLP

What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models

Busayo Awobade*, Mardiyyah Oduwole*, Steven Kolawole*

ICLR 2024 (AfricaNLP)

Investigates compression techniques on AfriBERTa, demonstrating that pruning, knowledge distillation, and quantization remain effective in the "low-resource double-bind" of small-data language models.

Adapting to the Low-Resource Double-Bind: Investigating Low-Compute Methods on Low-Resource African Languages

Colin Leong, ..., Steven Kolawole, et al.

ICLR 2023 (AfricaNLP)

Explores efficient training methods for African language processing under computational constraints, addressing the challenge of limited data and limited compute resources simultaneously.

Applied ML & Social Good

Vision Transformers for Mobile Applications: A Short Survey

Nahid Alam*, Steven Kolawole*, Simardeep Sethi*, Nishant Bansali, Karina Nguyen

arXiv preprint, 2023

Comprehensive survey examining how Vision Transformers can be optimized for mobile deployment, analyzing architecture modifications and efficiency techniques for resource-constrained environments.

Sign-to-Speech Model for Sign Language Understanding: A Case Study of Nigerian Sign Language

Steven Kolawole, Opeyemi Osakuade, Nayan Saxena, Babatunde Kazeem Olorisade

IJCAI 2022 (AI for Social Good Track)

Develops a sign-to-speech system for Nigerian Sign Language to bridge communication gaps. This work earned the national AI Champion award from the Nigeria Computer Society, demonstrating practical AI for social impact.

External Profiles

Google Scholar Profile OpenReview Profile arXiv Papers

* denotes equal contribution

This list includes peer-reviewed publications, workshop papers, and preprints. For the most up-to-date list with citation counts, please visit my Google Scholar profile.