Balancing Cost and Quality in OpenTelemetry: An Evaluation of Sampling Policies - Zhu Jiekun, Quwan
在OpenTelemetry中平衡成本和质量:采样策略的评估 | Balancing Cost and Quality in OpenTelemetry: An Evaluation of Sampling Policies - Zhu Jiekun, Quwan
分布式追踪对现今微服务架构的故障排查至关重要。在较大规模的企业中,每秒可产生超过 1 亿个 Trace Span,每天需要 200TB 的磁盘空间来存储它们。收集全量 Trace 数据的成本可能很高,而许多看起来相同的 Trace 对问题排查并无帮助。OpenTelemetry 提供了多种采样策略,这些采样策略有助于减少资源使用,但不同的策略需要不同的网络 I/O、内存和存储空间资源,以实现不同的采样质量。面对特定场景,找到最合适的采样策略搭配至关重要。我们将分享评估不同采样策略组合的经验,定量分析它们的成本和采样质量。我们还会介绍一些旨在覆盖更多边缘 Trace 的采样策略。您会了解到不同采样策略的具体成本和收益,以及如何为的业务量身打造采样体系。
Distributed tracing is crucial for troubleshooting in today's microservice architecture.In a sizable organization, over 100 million Trace Spans could be generated every second, requiring 200TB of disk space per day to store them. Collecting the entire volume of Trace data can be expensive, and many Traces that appear identical may not be helpful. OpenTelemetry offers various sampling policies. These policies help to reduce resource usage, but different policies require different amounts of network I/O, memory, and storage space to achieve varying sampling quality. Finding the best combo of different policies for a specific scenario is essential. We will share our experience in evaluating different combo of sampling policies by analyzing their cost and sampling quality quantitatively. We'll also introduce some sampling policies that aim to cover more edge cases. You will learn about the specific costs and benefits of different sampling policies and discover how to customize them for your own business.
CNCF概况(幻灯片)
扫描二维码联系我们!
CNCF (Cloud Native Computing Foundation)成立于2015年12月,隶属于Linux Foundation,是非营利性组织。
CNCF(云原生计算基金会)致力于培育和维护一个厂商中立的开源生态系统,来推广云原生技术。我们通过将最前沿的模式民主化,让这些创新为大众所用。请关注CNCF微信公众号。