Lohith Bellad
distributed systems × networking · the glue layer · darkbuffers.com
About
I'm a distributed systems engineer who lives in the glue layer — the transport protocols, kernel paths, and edge fabric that quietly decide whether your bytes arrive on time. A decade of shipping code that sits between every service and every user: userland TCP/IP stacks, QUIC and HTTP/3, congestion control (Cubic vs. BBR vs. BBRv2), L4 proxies, and the telemetry pipelines that make any of it debuggable at planet scale.
Most of my work has been the kind that only shows up in graphs: shaving p99 across a fleet, A/B-ing protocols (HTTP/2 vs. HTTP/3 over lossy mobile links, gRPC vs. Thrift on east-west service meshes, QUIC vs. TCP on long-fat pipes), and pushing distributed tracing, eBPF-based flow telemetry, and structured-log pipelines deep enough into the stack that you can finally answer which hop ate the budget. I've spent more time staring at OpenTelemetry spans, Prometheus histograms, and per-CPU flame graphs than I'd like to admit — and I'd argue observability is the actual product when you operate at scale; the service is just what generates the signals.
Comfortable from the wire up: tcpdump in one terminal,
kgdb in another, a flame graph open somewhere. I like
problems that smell like tail latency, head-of-line blocking, mbuf
leaks, congestion collapse, retransmit storms, clock skew across
regions, or a node misbehaving at 3am — the kind where the bug is
hiding three layers below where the alert fired.
USC MS (Computer Networks), 2015.
Lately I've been pulling that same instinct upstack, into LLM inference runtimes — KV-cache management, prefill/decode scheduling, GPU memory paths, and the tail-latency story of token streaming at scale. Same problem shape I've spent a decade on (queues, contention, head-of-line blocking, tail percentiles), just with attention kernels instead of TCP segments.
Off-hours: reading Building LLMs from Scratch and Kubernetes in Action, and running the homelab this site is served from.