Senior MLOps Engineer
Агентство / HR ресурс
Mission Hire
( сайт не указан )
Аккаунт зарегистрирован с email *@gmail.com
Опыт работы от 3 до 5 летerere
How you will benefit
- Salary which entirely fits your expectations
- Full relocation package to Dubai which will be adopted to your specific needs
- Official employment in the UAE
We are seeking a Senior MLops Engineer with proven experience in deploying and managing large-scale ML infrastructure for LLMs, TTS, STT, Stable Diffusion, and other GPU-intensive models in production. You will lead the design and operation of cost-efficient, high-availability, and high-performance serving stacks in a Kubernetes-based AWS environment.
How You Will Influence the Workflow
- You will architect, deploy, and maintain scalable ML infrastructure on AWS EKS using Terraform and Helm.
- You will own end-to-end model deployment pipelines for LLMs, diffusion models (LDM/Stable Diffusion), and other generative/AI models requiring high GPU throughput.
- You will design cost-effective, auto-scaling serving systems using tools like Triton Inference Server, vLLM, Ray Serve, or similar model-serving frameworks.
- You will build and maintain CI/CD pipelines integrating the ML model lifecycle (training → validation → packaging → deployment).
- You will optimize GPU resource utilization and implement job orchestration with frameworks like KServe, Kubeflow, or custom workloads on EKS.
- You will deploy and manage FluxCD (or ArgoCD) for GitOps-based deployment and environment promotion.
- You will implement robust monitoring, logging, and alerting for model health and infrastructure performance (Prometheus, Grafana, Loki).
- You will collaborate closely with ML Engineers and Software Engineers to ensure smooth integration, observability, and feedback loops.
What Will Help You Stand Out
- Experience with model serving frameworks such as Triton, vLLM, Ray Serve, TorchServe, or similar will highlight your expertise in serving AI models.
- Experience deploying and optimizing LLMs and LDMs (e.g., Stable Diffusion) under high load with GPU-aware scaling will set you apart as an expert in handling resource-intensive models.
- Strong expertise in Kubernetes (EKS) and infrastructure-as-code (Terraform, Helm) will showcase your ability to build high-quality, efficient infrastructure.
- Hands-on experience in Python and the full ML model lifecycle in production environments will prove your practical experience in real-world deployments.
- A willingness to explore new languages such as Go or Rust for backend systems or performance-critical systems will give you a competitive edge.
Required Experience
- 2–3 years of experience with model serving frameworks such as Triton, vLLM, Ray Serve, TorchServe, or similar.
- 2–3 years of experience deploying and optimizing LLMs and LDMs (e.g., Stable Diffusion) under high load with GPU-aware scaling.
- 3–4 years of experience with Kubernetes (EKS) and infrastructure-as-code (Terraform, Helm).
- 4–5 years of hands-on software engineering experience in Python, with production-grade experience in ML model lifecycle.
- Nice to have: familiarity with Go or Rust for backend or performance-critical systems.
- Fluent English
Nice to Have
- Experience with model quantization, distillation, and optimization.
- Familiarity with ML model registries like MLflow or DVC.
- Exposure to Kafka or event-driven data pipelines.
- Contributions to open-source MLOps tools or frameworks.