Now
What I'm working on and learning right now. Last updated: June 2026.
Building
I'm leading the architecture and evolution of DaVinci, Severstal's shared GenAI platform.
My current focus is:
- designing multi-data-center inference, degraded operation, failover, and recovery
- establishing model lifecycle, release gates, safe rollout, and rollback
- improving inference observability, SLOs, load testing, and GPU capacity planning
- evolving the AI gateway, model routing, quotas, and fallback policies
- improving the performance and reliability of production LLM serving
The platform currently runs on 24 NVIDIA H200 GPUs and is expanding to 48 H200 and 8 H100 GPUs across two data centers.
Learning
I'm currently going deeper into:
- distributed systems and multi-data-center architecture
- LLM inference performance and GPU serving
- Kubernetes networking, scheduling, and reliability
- Go for infrastructure and platform services
- performance engineering and advanced systems design
Writing
I'm writing about production LLM inference, AI infrastructure, distributed systems, and practical lessons from operating real systems.
Looking ahead
My long-term focus is Staff-level AI infrastructure, LLM inference, and distributed systems roles where I can combine hands-on engineering with cross-team technical direction.