Andrey Krisanov

Production LLM inference, AI infrastructure, and distributed systems

Now

What I'm working on and learning right now. Last updated: June 2026.

Building

I'm leading the architecture and evolution of DaVinci, Severstal's shared GenAI platform.

My current focus is:

  • designing multi-data-center inference, degraded operation, failover, and recovery
  • establishing model lifecycle, release gates, safe rollout, and rollback
  • improving inference observability, SLOs, load testing, and GPU capacity planning
  • evolving the AI gateway, model routing, quotas, and fallback policies
  • improving the performance and reliability of production LLM serving

The platform currently runs on 24 NVIDIA H200 GPUs and is expanding to 48 H200 and 8 H100 GPUs across two data centers.

Learning

I'm currently going deeper into:

  • distributed systems and multi-data-center architecture
  • LLM inference performance and GPU serving
  • Kubernetes networking, scheduling, and reliability
  • Go for infrastructure and platform services
  • performance engineering and advanced systems design

Writing

I'm writing about production LLM inference, AI infrastructure, distributed systems, and practical lessons from operating real systems.

Looking ahead

My long-term focus is Staff-level AI infrastructure, LLM inference, and distributed systems roles where I can combine hands-on engineering with cross-team technical direction.