Portfolio Studio · Portfolios with a point of view

I work at the intersection of machine intelligence and human understanding. My research centres on interpretability — making the systems we build legible to the people who use them. I write essays on technology and attention, and I believe the most consequential engineering decisions are, at heart, decisions about values.

Selected Work

The Legible Machine

Author — 2023

A book on making intelligent systems understandable to the people they affect. Translated into six languages.

Circuits

Lead — 2021

Open tooling for the mechanistic interpretation of transformer circuits — now standard equipment in the field.

On Attention

Essayist

An ongoing series of essays on technology, attention, and the ethics of automation.

Experience

Lead Research Engineer

2020–Present

The Aurora Institute

Lead the interpretability group. Published widely-cited work on the mechanistic interpretation of transformer circuits, and built the open tooling the field now uses to inspect them.

Visiting Researcher

2018–2020

MIT Media Lab

Studied human–model interaction and how explanations change the way people trust automated systems.

Software Engineer

2015–2018

Wolfram Research

Worked on symbolic computation and technical documentation systems.

Education

Ph.D., Computer Science

University of Oxford · Thesis on interpretable representations · 2014–2018

B.A., Mathematics

University of Cambridge · 2011–2014

Capabilities

Python · PyTorch · Interpretability · Research · Technical Writing · Mathematics · Causal Inference · Statistics

Recognition

ICML Outstanding PaperICML, 2022
Author, “The Legible Machine”2023
Fellow, Royal Society of Arts2021

Beyond work

Poetry, Long-distance running, Classical piano