vLLM Metrics in Production
A hands-on guide to vLLM monitoring: the key Prometheus metrics (TTFT, TPOT, queueing, KV cache, swapping), Grafana panels, and alert rules that help you debug latency and plan capacity.
A hands-on guide to vLLM monitoring: the key Prometheus metrics (TTFT, TPOT, queueing, KV cache, swapping), Grafana panels, and alert rules that help you debug latency and plan capacity.
Why traditional LLM serving wastes GPU memory – and how vLLM’s PagedAttention model enables larger batches, higher throughput, and more predictable latency.
A concise cheat sheet for uv: managing Python versions, dependencies, virtual environments, scripts, and tools in one fast, cross-platform tool.
How to scan NuGet packages for security vulnerabilities using GitLab CI.
How to use FFmpeg to convert FLAC files to Apple Lossless without losing the original quality and uploading them to Apple Music.
Using Github Actions and pip-tools to compile a requirements.txt file from your dependencies.
Keycloak allows configuring a custom LDAP user filter for User Federation to select a subset of user entries in Active Directory.
Over the past few years, I've been interviewing dozens of software engineers who didn't know how their developed services run in production.
You're starting a new project with Apache Kafka. Before setting up broker parameters and writing producers and consumers, what questions should you ask yourself? To ensure a smooth start, I have prepared the following checklist/questionnaire.
The story about the DNS resolver, Linux VMs, experienced infrastructure team, and me troubleshooting an incident happened on Sunday morning.