Career
Senior Software Engineer (Distributed Systems, Rust)
Fully Remote (Global)
About zymtrace
Organizations spend billions on GPU infrastructure to power AI, yet roughly 60-65% of that investment is wasted on underutilized hardware, idle cycles, and inefficient workloads. The problem isn’t the teams running them — it’s that existing tools treat GPUs as black boxes, showing surface-level metrics without revealing where the waste actually lives.
zymtrace is a distributed AI infrastructure optimization platform that gives our customers deep, always-on visibility into GPU-accelerated workloads across their entire clusters. We profile from PyTorch and JAX code through CUDA kernels all the way down to individual GPU instructions and stall reasons, then correlate everything back to the CPU code that triggered it. Zero code changes. Zero guesswork.
Leading AI labs, Fortune 500 companies, and quantitative research firms trust zymtrace to debug and optimize their most demanding workloads, achieving results like 300% inference speedups and millions in annual infrastructure savings.
Our founders pioneered, open-sourced, and contributed the eBPF CPU profiler to OpenTelemetry, the same technology now adopted by Grafana, Datadog, Google, and others. We’re now applying that same low-level engineering depth to GPU-bound workloads as an NVIDIA Inception partner.
We’re a team of kernel hackers and systems programmers who operate at the deepest layers of the stack: GPUs, CUDA runtimes, eBPF, compilers, and instruction-level introspection. That depth isn’t a feature. It’s the product. By joining zymtrace, you’ll work at the bleeding edge of modern computing, helping organizations optimize AI training and inference workloads at massive scale. The problems we solve touch every layer of the stack, and the impact is measured in millions of GPU-hours reclaimed.
About the Role
We are seeking a highly skilled Senior Software Engineer to help build the core infrastructure of zymtrace’s platform. You’ll design and implement distributed systems that collect, process, and analyze performance events from CPU and GPU workloads at scale. This role requires deep expertise in distributed systems and strong proficiency in Rust.
You’ll work on our high-performance infrastructure stack, using cutting-edge technologies like eBPF and WASM. Our engineering team consists primarily of low-level engineers, and we value deep technical expertise in distributed systems, with familiarity in low-level system programming.
This is a hands-on development role focused on building distributed systems, not an SRE position. You’ll spend most of your time writing production Rust code and working on complex distributed systems challenges.
Key Responsibilities
- Design, build, and optimize highly scalable distributed systems
- Develop and maintain software in async Rust, focusing on performance and reliability
- Work extensively with Clickhouse and Scylla, ensuring efficient data processing and storage
- Contribute to the development of low-latency, high-throughput systems using eBPF and WASM
- Collaborate with other engineers to define system architectures and improve existing infrastructure
- Troubleshoot and optimize distributed database performance
Must-Have Qualifications
- Strong experience with async Rust and a deep understanding of memory safety, concurrency, and performance optimization
- Expertise in distributed databases like ClickHouse, Scylla, or similar technologies
- Experience in low-level programming and systems engineering on Linux
- Proficiency with debugging, profiling, and performance tuning of large-scale systems
- Strong async communication skills and ability to work independently in a distributed team
Nice-to-Have Qualifications
- Knowledge of GPU architectures and CUDA programming
- Experience with the Go language
- Knowledge of recommendation systems
- Experience with using eBPF for performance monitoring and observability
Why Join zymtrace?
- Fully remote work environment with flexible hours
- Work on a meaningful project with a world-class team
- Use and contribute to cutting-edge technologies like
Rust,eBPF,ClickHouse,Nix, andWASM - Competitive compensation and benefits package