Icpe2026_paper

Our paper, “Evaluating Kubernetes Performance for GenAI Inference: From Automatic Speech Recognition to LLM Summarization”, has been accepted at the 17th ACM/SPEC International Conference on Performance Engineering (ICPE 2026) in Florence, Italy. We demonstrate how emerging Kubernetes-native projects such as Kueue, Dynamic Accelerator Slicer, and the Gateway API Inference Extension can be combined to deliver scalable and resource-efficient container orchestration for complex GenAI workflows. Our evaluation on a multi-stage use case of automatic speech recognition and LLM summarization shows significant improvements, including up to 90% better tail latency for time to first token under high loads.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • a post with tabs
  • a post with typograms
  • a post that can be cited
  • a post with pseudo code
  • a post with code diff