New We raised $12.2M. Read more
<- Back to blog

Product

Zymtrace Agent Skills: Profile-Guided Optimization for AI Coding Agents

Israel Ogbole

Israel Ogbole

5 mins read

In this article
Discuss on Slack

Share

Our users kept doing something we didn’t quite plan for: installing and upgrading Zymtrace from inside their coding agent, without ever opening a terminal. At first this worked because we had a handful of MCP tools and a lot of patience. Then the agents started reaching for tools that weren’t there yet.

So we leaned in. We went from 4 MCP tools to 22, and packaged the whole thing as Agent Skills: a lightweight, open format for extending AI agents with specialized knowledge and repeatable workflows.

Zymtrace now ships a set of Agent Skills for Claude Code, OpenAI Codex, and Cursor. The format is open, so they work with other agentic systems too; those three are just what we have tested. It also includes the zymtrace-perf-engineer subagent, a specialized agent that runs the optimization workflow end to end, on its own.

A single plugin gives AI coding agents the skills to install, operate, and upgrade the Zymtrace backend and profiler, and to investigate and optimize CPU- and GPU-bound workloads. The optimization skills go all the way to the fix, preparing the configuration or code change for your approval.

Just two commands to get started in Claude Code:

claude plugin marketplace add zystem-io/zymtrace-skills
claude plugin install zymtrace@zymtrace-skills

Codex and Cursor are covered in Install below.

The skills are powered by more than 20 Zymtrace MCP tools. The skills carry the workflow and performance-engineering knowledge; the MCP tools provide the evidence: metrics, profiles, runtime and deployment context.

The optimization skills follow a consistent workflow: investigate the profiles, identify the bottleneck, apply a fix, benchmark, repeat.

We will walk through GPU optimization as the example. It can run interactively, with you in the loop, or hands-off via the zymtrace-perf-engineer subagent that takes the workflow end to end.

Click to expand

The Zymtrace AI Agent Skills in action: a coding agent investigates a workload, pulls metrics and flamegraphs through the Zymtrace MCP, and works toward the fix.

Connecting GPU perf metrics to profiles

GPU and inference-engine metrics tell you what is happening, not why a workload is slow. A GPU can sit underutilized for very different reasons: poor batching, queue starvation, CPU orchestration, memory pressure, synchronization, inefficient kernels, routing imbalance, or inter-GPU communication, and the metrics alone won’t tell them apart.

The profiles hold the evidence, but correlating it for every workload takes time. That is the work the optimize-gpu-workloads skill does. Using the Zymtrace MCP tools, it pulls together:

  • GPU metrics, from utilization to SM efficiency and Tensor Core activity
  • Inference-engine telemetry from vLLM, SGLang, and NVIDIA Dynamo-Triton
  • CPU and GPU flamegraphs showing where time goes
  • NVTX annotations, when enabled
  • Workload, container, and deployment context

This allows the agent to distinguish a serving configuration problem from a host-side bottleneck, a memory-bound workload, or a kernel-level issue.

Two ways to run it

Interactive. Ask your coding agent to investigate a workload, and it will work through the skill step by step with you in the loop.

Hands-off. Hand the investigation to the zymtrace-perf-engineer subagent and let it run the workflow from end to end.

The subagent is built from what we have learned while optimizing GPU workloads with customers, together with the expertise of performance engineers on our team and across the field.

It uses the Zymtrace MCP tools to read the metrics, inspect the Zymtrace AI flamegraph, and cross-check the CPU profile using the same workload filters.

When NVTX is enabled, it can connect GPU kernels back to the model layer, inference phase, tensor shape, or custom operation that triggered them.

Once it identifies the bottleneck, it can prepare the relevant configuration or code changes, ask for approval, redeploy the workload, run the benchmark, and profile it again to verify the result.

Powered by the Model Context Protocol (MCP)

Zymtrace ships with a built-in MCP server. Every optimization skill runs on the MCP tools bundled in your self-hosted Zymtrace installation, so the same AI-driven debugging workflow runs across private-cloud, on-premises, and air-gapped environments.

The MCP server runs entirely on your infrastructure, so your data stays in your environment up to the point your coding agent reads it. What the agent then shares with its model provider is governed by the agent and model you choose.

Install

See the full Zymtrace AI Agent Skills documentation for setup and usage instructions.

Claude Code

claude plugin marketplace add zystem-io/zymtrace-skills
claude plugin install zymtrace@zymtrace-skills

OpenAI Codex

codex plugin marketplace add zystem-io/zymtrace-skills

Then install Zymtrace from /plugins.

Cursor

Import the marketplace from Settings → Plugins, then paste this link: https://docs.zymtrace.com/ai-agent-skills

Zymtrace Agent Skills installed in Cursor

The Zymtrace plugin imported into Cursor.

Get started

Get started with Zymtrace at no cost.

Deploy Zymtrace, install the profiler, and give your coding agent the runtime context it needs to find bottlenecks, prepare fixes, and verify the result.