Career

Senior Software Engineer (zymtrace GPU/CPU profiler, Rust + Go)

Fully Remote (Global)

About zymtrace

Organizations spend billions on GPU infrastructure to power AI, yet roughly 60-65% of that investment is wasted on underutilized hardware, idle cycles, and inefficient workloads. The problem isn’t the teams running them — it’s that existing tools treat GPUs as black boxes, showing surface-level metrics without revealing where the waste actually lives.

zymtrace is a distributed AI infrastructure optimization platform that gives our customers deep, always-on visibility into GPU-accelerated workloads across their entire clusters. We profile from PyTorch and JAX code through CUDA kernels all the way down to individual GPU instructions and stall reasons, then correlate everything back to the CPU code that triggered it. Zero code changes. Zero guesswork.

Leading AI labs, Fortune 500 companies, and quantitative research firms trust zymtrace to debug and optimize their most demanding workloads, achieving results like 300% inference speedups and millions in annual infrastructure savings.

Our founders pioneered, open-sourced, and contributed the eBPF CPU profiler to OpenTelemetry, the same technology now adopted by Grafana, Datadog, Google, and others. We’re now applying that same low-level engineering depth to GPU-bound workloads as an NVIDIA Inception partner.

We’re a team of kernel hackers and systems programmers who operate at the deepest layers of the stack: GPUs, CUDA runtimes, eBPF, compilers, and instruction-level introspection. That depth isn’t a feature. It’s the product. By joining zymtrace, you’ll work at the bleeding edge of modern computing, helping organizations optimize AI training and inference workloads at massive scale. The problems we solve touch every layer of the stack, and the impact is measured in millions of GPU-hours reclaimed.

About the Role

We are seeking a highly skilled Senior Software Engineer to work on the zymtrace profiler. The profiler currently consists of two main parts: a CUDA/GPU profiler built entirely in Rust, and an extension of the OpenTelemetry (OTel) eBPF profiler developed in Go. Refer to the architecture.

The ideal candidate must have strong expertise in Rust, be familiar with Go, and possess a solid understanding of low-level system programming, including eBPF and C.

In addition to maintaining existing profilers, you must be competent in architecting new components of the profiler to support emerging AI accelerators, ensuring zymtrace remains extensible and future-proof across specific target AI/hardware platforms.

You will design, develop, and maintain low-level solutions that collect performance events from heterogeneous workloads. This role demands a focus on writing clean, efficient, and maintainable code, with a deep appreciation for system performance in resource-constrained environments.

Must-Have Qualifications

  • Strong experience with Rust and a deep understanding of memory safety, concurrency, and performance optimization
  • Comfortable working with Go
  • Familiarity with eBPF and C
  • Competence in architecting modular, extensible low-level software components
  • Experience in low-level programming and systems engineering on Linux
  • Not be afraid of diving into X86_64 and ARM64 assembly (reading and understanding)
  • Proficiency with debugging, profiling, and performance tuning of large-scale systems
  • Ability to work independently and communicate clearly in a distributed team environment

Nice-to-Have Qualifications

  • Knowledge of CUDA, GPU architectures and emerging AI accelerators
  • Experience with OpenTelemetry or similar observability tooling

Why Join zymtrace?

  • Fully remote work environment with flexible hours
  • Work on a meaningful project with a world-class team
  • Use and contribute to cutting-edge technologies like Rust, eBPF, ClickHouse, Nix, and WASM
  • Competitive compensation and benefits package

Refs: ‍

zymtrace launch blog post

About the team, our investors and advisors