Staff Machine Learning Engineer
I build large-scale ML systems that reach hundreds of millions of people. Over the past decade I've taken models from research notebooks to production infrastructure — training, serving, and the unglamorous reliability work in between. I care about systems simple enough to reason about and fast enough to disappear.
An inference runtime that cut p99 latency 3× for a 70B model. Open-sourced; now powers serving at three other labs.
View →A post-training quantization method that holds accuracy at 4-bit. Paper + library, 4k★ on GitHub.
View →An interactive tool for visualizing attention inside transformer layers.
Lead the training infrastructure for a 70B-parameter foundation model. Cut training cost 40% with a custom data pipeline and brought time-to-first-token down 3× in production serving.
Built the recommendation system powering the home feed for 200M+ daily users, lifting engagement 18%. Owned the online feature store end to end.
Shipped the company's first on-device vision model for real-time object detection on mobile, running at 30fps under 50MB.
Stanford University · Focus: Machine Learning
UC Berkeley
Bouldering, Generative art, Espresso