Open source · ML compiler
Emmy
An open-source compiler that lowers PyTorch graphs to CUDA through six inspectable intermediate representations — about 5,000 lines of Python, built from scratch.
Read the code
# install pip install deplodock # compile a layer to CUDA deplodock compile -c "nn.RMSNorm(2048)(torch.randn(1,32,2048))"
Installs as deplodock for now — the package is being renamed to Emmy.
The pipeline
A graph is lowered through six intermediate representations, each one printable. Scheduling comes from a search over Tile-IR rewrite rules rather than fixed heuristics.
Torch IRTensor IRLoop IRTile IRKernel IRCUDA
Benchmarks
Measured on consumer-class GPUs. Full methodology is in the blog series.
Built in the open, under Apache-2.0.