We are looking for an experienced AI/ML Performance Engineer to design and execute high-intensity stress workloads for next-generation AI platforms. This role focuses on identifying performance bottlenecks, improving system stability, and enabling scalable, production-ready AI infrastructure.
Key Responsibilities
Design and implement high-intensity stress workloads using PyTorch and Triton
Analyze system performance to identify bottlenecks, stability issues, and performance cliffs
Develop workloads targeting large GEMMs, attention mechanisms, MoE-like architectures, mixed precision, and long-running executions
Build custom Triton kernels to stress hardware execution units, memory hierarchies, and synchronization paths
Create scalable test harnesses across problem size, number of devices, and runtime duration
Integrate workloads with profiling, monitoring, and failure triage tools
Collaborate with platform, firmware, and SDK teams
Provide documentation and reproducible scripts for lab and CI environments
Required Skills
Strong experience in performance testing and analysis (test result analysis, server stats, bottleneck identification, tuning, and recommendations)
Proficiency in Python
Scripting experience using Shell or PowerShell
Experience with PyTorch and/or Triton
Nice to Have
Experience with AI hardware platforms or simulators
Exposure to distributed systems and multi-device workloads
Seniority level
Mid-Senior level
Employment type
Contract
Job function
Engineering, Information Technology, and Research
Industries
IT Services and IT Consulting, Software Development, and IT System Custom Software Development
Referrals increase your chances of interviewing at VDart by 2x