Open source · ML compiler

Emmy

An open-source compiler that lowers PyTorch graphs to CUDA through six inspectable intermediate representations — about 5,000 lines of Python, built from scratch.

Read the code
# install
pip install deplodock

# compile a layer to CUDA
deplodock compile -c "nn.RMSNorm(2048)(torch.randn(1,32,2048))"

Installs as deplodock for now — the package is being renamed to Emmy.

The pipeline

A graph is lowered through six intermediate representations, each one printable. Scheduling comes from a search over Tile-IR rewrite rules rather than fixed heuristics.

Torch IRTensor IRLoop IRTile IRKernel IRCUDA

Built in the open, under Apache-2.0.