Portfolio Studio
Sign in to save

I work at the intersection of machine intelligence and human understanding. My research centres on interpretability — making the systems we build legible to the people who use them. I write essays on technology and attention, and I believe the most consequential engineering decisions are, at heart, decisions about values.

Selected Work

The Legible Machine

The Legible Machine

Author — 2023

A book on making intelligent systems understandable to the people they affect. Translated into six languages.

Circuits

Circuits

Lead — 2021

Open tooling for the mechanistic interpretation of transformer circuits — now standard equipment in the field.

On Attention

On Attention

Essayist

An ongoing series of essays on technology, attention, and the ethics of automation.

Experience

Lead Research Engineer

2020Present

The Aurora Institute

Lead the interpretability group. Published widely-cited work on the mechanistic interpretation of transformer circuits, and built the open tooling the field now uses to inspect them.

Visiting Researcher

20182020

MIT Media Lab

Studied human–model interaction and how explanations change the way people trust automated systems.

Software Engineer

20152018

Wolfram Research

Worked on symbolic computation and technical documentation systems.

Education

Ph.D., Computer Science

University of Oxford · Thesis on interpretable representations · 2014–2018

B.A., Mathematics

University of Cambridge · 2011–2014

Capabilities

Python · PyTorch · Interpretability · Research · Technical Writing · Mathematics · Causal Inference · Statistics

Recognition

  • ICML Outstanding PaperICML, 2022
  • Author, “The Legible Machine”2023
  • Fellow, Royal Society of Arts2021

Beyond work

Poetry, Long-distance running, Classical piano