You Bring • 3+ years of experience in deploying and optimizing machine learning models in production, with 1+ years of experience in deploying deep learning models • Experience deploying async inference APIs (FastAPI, gRPC, Ray Serve etc.) • Understanding of PyTorch internals and inference-time optimization • Familiarity with LLM runtimes: vLLM, TGI, TensorRT-LLM, ONNX Runtime etc. • Familiarity with GPU profiling tools (nsight, nvtop), model quantization pipelines

ML Inference & Optimization Engineer

Your next job is waiting