Andrey Krisanov

Production LLM inference, AI infrastructure, and distributed systems

#llm

Monitoring vLLM in Production: Metrics, PromQL, Alerts, and Runbooks

28.01.2026

A production-oriented guide to monitoring vLLM 0.23.x with Prometheus and Grafana: latency, queueing, preemption, KV-cache pressure, throughput, alerting, and incident diagnosis.

Why vLLM Scales: Paging the KV-Cache for Faster LLM Inference

27.01.2026

Why traditional LLM serving wastes GPU memory – and how vLLM’s PagedAttention model enables larger batches, higher throughput, and more predictable latency.