Steven Kolawole
PhD Student, Carnegie Mellon University
Email: skolawol[at]cs[dot]cmu[dot]edu
Steven Kolawole

I'm a PhD student at CMU, grateful to be advised by Virginia Smith. I work on making large language models cheaper and faster to run, on everyday hardware and at production scale.

So far I've pruned LLMs with only forward passes (Bonsai), evicted from the KV cache without ever building the attention matrix (EpiKV), built cascades where cheap models handle most queries and only defer when they're unsure (ABC), and found the parallelism sitting in user prompts (ParallelPrompt). The general intuition behind my work so far is to notice and work with what's already there, instead of adding more machinery.

This summer I'm at AWS Bedrock's Science team, working with Nathan Pemberton and Kyle Ulrich on speculative decoding.

Selected Publications

One per layer of the inference stack. Full list →

Model · compression
Steven Kolawole*, Lucio Dery*, Jean-François Kagy, Virginia Smith, Graham Neubig, Ameet Talwalkar
arXiv, 2024
what it does
Prunes LLMs using only forward passes, instead of the gradients backprop needs. It still matches gradient-based methods, and prunes 7-8B models on a single GPU.
Memory · KV cache
Steven Kolawole, Virginia Smith
preprint, 2026
what it does
Decides what to evict from the KV cache by how much a token's representation changes, not by its attention weights. So it never builds the attention matrix, runs inside FlashAttention stacks, and fits several times more context.
Serving · adaptive routing
Steven Kolawole*, Don Dennis*, Ameet Talwalkar, Virginia Smith
TMLR, 2025
what it does
Stacks models from cheap to expensive and only defers to a bigger one when the cheap ones disagree. No confidence scores, no trained router. It drops into an existing setup and cuts serving cost several-fold at the same accuracy.
Workload · parallelism
Steven Kolawole, Keshav Santhanam, Virginia Smith, Pratiksha Thaker
NeurIPS 2025, Datasets & Benchmarks Track
what it does
Lots of real prompts contain sub-tasks that don't depend on each other, but systems still run them one by one. I built the tooling (a C++ engine, schema extraction, vLLM/SGLang hooks) to run them in parallel: up to 7× faster while keeping >90% of output quality.
Complete publication list

News

More updates

Community

I lead ML Collective Africa, a pan-African sub-community of ML Collective, where we back under-served, early-career researchers with research focus groups, mentorship, training, and compute, so they can do good research wherever they're starting from. Plenty have gone on to top venues and strong grad programs, but that's the byproduct, not the point. Since 2022 I've organized annual fundraisers that send African students to the Deep Learning Indaba, and I occasionally contribute to Black in AI's ELAI program.

From 2023 to 2025, I mentored 20+ underrepresented graduate-school aspirants at STEM for Development. During undergrad, I helped run dozens of technical training programs reaching a "bunch" of students, and personally taught several hundred of them.

Outside research, I enjoy powerlifting, amateur boxing, watching LFC matches, and reading widely. I'm always up for conversations that connect technical work with broader social impact.

The Four Six Years of BSc. the long, human version, if you're curious
Steven speaking at PyCon Italia
Speaking at PyCon Italia, one of the one million and one. :)

Before CMU, I did my BSc in Computer Science at the Federal University of Agriculture, Abeokuta (FUNAAB), Nigeria, advised by Adebayo Abayomi-Alli. Academic-union strikes and the COVID lockdown stretched a four-year degree into six. Fortunately, that gave me far more room than a typical undergrad to figure out what I actually cared about.

I used it. Around 27 tech conferences across the globe, a pile of hackathons and internships, and, from early 2021, learning to do research independently with ML Collective (with cameos at Masakhane and Cohere For AI), mentored by Rosanne Liu and Jason Yosinski. My first finished project, on sign-language understanding, won the Nigeria Computer Society's National AI Champion award; later, friends and I built a real-time opinion-mining system for digital assets and won a ~$115K Algorand Foundation grant to ship it, which let me keep doing independent research without worrying much about rent. :)

I'm a "community-taught" ML practitioner, so a lot of those years went to giving back: Data Science Nigeria, Google Developer Student Clubs, She Code Africa, NACOS, and ML Collective. That unconventional path of building research capability outside formal structures is exactly what drives my work on AI efficiency and accessibility, and my community-building efforts.

For the human record: in the earliest years I directed choirs at local churches (vocals, drums, piano); my BSc was also roundly punctuated by real existential crises [1, 2]; and my final year saw me trade award-worthy social awkwardness for a minor reputation as a local clown, obsessively fine-tuning my Afro-pop dance moves. But I did become a reveler, though.