GPU Monitoring
Get a complete performance view of your GPU infrastructure at a glance. Monitor GPU memory usage, power draw, SM occupancy, SM efficiency, NVLink throughput, and PCIe health across your entire cluster.
Start with metrics to form hypotheses, then click data points to dive into detailed GPU profiles.
From GPU Monitoring to CUDA Insights
GPU metrics alone stop short—they tell you WHAT is happening, not WHY. zymtrace combines monitoring with deep CUDA profiling to close the loop from detection to optimization.
Monitor Metrics
Track GPU utilization, memory usage, power consumption, and hardware health across your infrastructure. Identify performance trends and potential bottlenecks.
Form Hypotheses
Use process-level breakdowns to identify which applications consume the most resources. Spot anomalies in utilization patterns or memory pressure issues.
Deep Dive Profiles
Click on interesting data points to navigate to detailed GPU profiles. Analyze CUDA kernels, instruction-level execution, and optimization opportunities.
OpenTelemetry Compliant
zymtrace is OpenTelemetry compliant, including support for OTEL resource attributes.
The zymtrace team were
part of the team that pioneered, open-sourced, and donated the eBPF profiler
to OpenTelemetry.
With zymtrace, we’re extending that same low-level
engineering excellence to GPU-bound workloads and building
a highly scalable profiling platform purpose-built for
today’s distributed, heterogeneous environments — spanning
both general-purpose and AI-accelerated workloads.
FAQ
Frequently Asked Questions
Ready to Monitor Your GPU Infrastructure?
Get unified visibility into GPU performance and hardware health across your cluster.
Start Monitoring