About

Hi, I'm Andrey

I'm a software engineer specializing in production LLM inference, AI infrastructure and distributed systems. I enjoy transforming complex systems into something straightforward, dependable, and easy to understand.

At Severstal, I am responsible for the architecture and technical development of DaVinci, a shared GenAI platform that supports enterprise AI products and coding agents. Its inference foundation currently runs self-hosted, open-weight models on Kubernetes, vLLM and 24 NVIDIA H200 GPUs.

My work covers model serving, traffic routing and admission control, performance and reliability, observability, capacity planning, safe model rollout and multi-data-centre resilience. I also designed the target topology and resilience strategy for a planned expansion to 48 H200 and eight H100 GPUs across two data centres.

Background

Prior to specialising in AI infrastructure, I developed and scaled production systems in the SaaS, fintech, data privacy, payments and high-traffic consumer products sectors.

My experience includes:

modernising a $3M+ ARR SaaS platform and improving critical backend paths by 2–10x, while achieving 99.998% availability;
designing and launching a content platform that reached over 20 million monthly active users;
building backend, cloud and distributed systems for start-ups and established companies in Germany and Russia.
working in senior individual contributor, technical leadership and CTO roles.

I have consistently taken responsibility for important, technically complicated systems that are expected to work reliably in production.

Writing and contact

On this site, I write about AI infrastructure, LLM inference, distributed systems and the practical lessons I have learned from operating production platforms.

You can find my projects on GitHub, connect with me on LinkedIn, or download my résumé.