About

An applied inference optimization lab

RiftStack works on the software layer between models and hardware. We build Emmy, a compiler that generates fast kernels, and a serving stack that runs them efficiently — so the same model costs less and responds faster on the GPUs teams already have.

Team

Dmitry Trifonov

Dmitry Trifonov

Founder

Slawomir Strumecki

Slawomir Strumecki

Founding engineer

Dimitrios Verranos

Dimitrios Verranos

Founding engineer

Heiko Polinski

Heiko Polinski

Marketing / Design

Daksh Kaushik

Daksh Kaushik

Engineer

Yashasvi Gupta

Yashasvi Gupta

Engineer

Natalia Trifonova

Natalia Trifonova

Engineer

Ivan Oleynikov

Ivan Oleynikov

Engineer

Track record

A small team building low-level systems for performance-critical software, with backgrounds at Apple, Roblox, and Ubisoft.

50–60% over cuBLAS

FP32 SGEMM in batched mode on the RTX 5090. In non-batched mode cuBLAS still wins.

1,157 tok/s

Qwen3 Coder on a single consumer-class RTX 5090.

ML compiler from scratch

Emmy — tracing, fusion, scheduling, and CUDA codegen in Python.