Labs
MissionCareersUpdates
Models
Terra-1Mantle-1Aether-1
SafetyResearchTeamBlogDocs
All Positions
AI ProductsAtlanta, GAFull-Time

ML Engineer — Inference Optimisation

Make TAL models faster and cheaper to run — closing the gap between research model quality and production serving cost.

About This Role

A model that can't be served cheaply can't reach everyone. The Inference team makes TAL models blazingly fast so we can fulfill our universal access commitment.

Responsibilities
  • Profile and optimise LLM inference latency and throughput using vLLM, TensorRT-LLM, and custom CUDA kernels
  • Implement quantisation, speculative decoding, and KV cache optimisation techniques
  • Build serving infrastructure that scales from zero to millions of requests
  • Benchmark inference performance across hardware configurations
  • Collaborate with research team to ensure optimised models maintain quality
Requirements
  • BS/MS in Computer Science or Electrical Engineering
  • 3+ years ML engineering with a focus on inference or model serving
  • Deep understanding of GPU architecture, CUDA programming, and memory management
  • Experience with vLLM, TensorRT, ONNX, or similar inference frameworks
  • Proficiency in Python and C++
Nice to Have
  • Experience with FlashAttention, PagedAttention, or similar attention optimisations
  • Background in compiler design or hardware architecture
  • Familiarity with quantisation-aware training (QAT) and GPTQ/AWQ

TAL Corp is an equal opportunity employer. We believe the best team reflects the full diversity of humanity — because we are building for all of it.

Apply Through Training

At TAL Corp you don't just send a résumé — you prove yourself. Apply by joining our training program; complete it, and top performers are hired into this role.

  • 1Register & start your 7-day program
  • 2Train, build real skills, earn a credential
  • 3Top performers → straight into hiring

Already training? Log in