Icpe2026_paper
Our paper, “Evaluating Kubernetes Performance for GenAI Inference: From Automatic Speech Recognition to LLM Summarization”, has been accepted at the 17th ACM/SPEC International Conference on Performance Engineering (ICPE 2026) in Florence, Italy. We demonstrate how emerging Kubernetes-native projects such as Kueue, Dynamic Accelerator Slicer, and the Gateway API Inference Extension can be combined to deliver scalable and resource-efficient container orchestration for complex GenAI workflows. Our evaluation on a multi-stage use case of automatic speech recognition and LLM summarization shows significant improvements, including up to 90% better tail latency for time to first token under high loads.
Enjoy Reading This Article?
Here are some more articles you might like to read next: