Best Practice for Interference Detection and Resource Isolation Enhancement on... - Haogang Wang
Kubernetes上的干扰检测和资源隔离增强的最佳实践 | Best Practice for Interference Detection and Resource Isolation Enhancement on Kubernetes - Haogang Wang, Kuaishou
基于Kubernetes的容器云平台部署了延迟敏感的工作负载和批处理作业的混合组合。随着Pod的部署密度增加,干扰问题已成为确保平台稳定性的主要挑战。这阻碍了资源利用率的提高,需要平台增加额外成本来添加更多服务器以支持工作负载部署。在快手,通过建立一个干扰观测和诊断系统,我们实现了对干扰问题的快速识别和故障排除。此外,我们实现了对每个服务的资源进行细粒度控制,包括CPU和内存,有效减轻了批处理作业对延迟敏感工作负载的影响。这使得平台能够部署更多的批处理作业,同时确保延迟敏感工作负载的稳定性,从而提高了整体资源利用率。
The container cloud platform based on Kubernetes deploys a hybrid mix of latency sensitive workloads and batch jobs. As the deployment density of pods increases, interference issues have become a major challenge in ensuring the stability of the platform. This prevents the improvement of resource utilization, requiring the platform to incur additional costs to add more servers to support the workload deployment. In Kuaishou, by establishing a system for observing and diagnosing interference, we have achieved rapid identification and troubleshooting of interference issues. Additionally, we have achieved fine-grained control over resources on a per-service basis, including cpu and memory, effectively mitigating the impact of batch jobs on latency sensitive workloads. This enables the platform to deploy more batch jobs while ensuring the stability of latency sensitive workloads, thereby improving the overall resource utilization.
CNCF概况(幻灯片)
扫描二维码联系我们!
CNCF (Cloud Native Computing Foundation)成立于2015年12月,隶属于Linux Foundation,是非营利性组织。
CNCF(云原生计算基金会)致力于培育和维护一个厂商中立的开源生态系统,来推广云原生技术。我们通过将最前沿的模式民主化,让这些创新为大众所用。请关注CNCF微信公众号。