Senior MLOps Engineer

Агентство / HR ресурс  Mission Hire ( сайт не указан )
Аккаунт зарегистрирован с email *@gmail.com
Dubai, UAE
Сеньор
Информационные технологии • DevOps • AWS • Google Cloud • ML/AI • Dev tools
1 мая
Работа в офисе
Опыт работы от 3 до 5 лет
erere
Агентство  Mission Hire
Описание вакансии

How you will benefit 

  • Salary which entirely fits your expectations
  • Full relocation package to Dubai which will be adopted to your specific needs
  • Official employment in the UAE

We are seeking a Senior MLops Engineer with proven experience in deploying and managing large-scale ML infrastructure for LLMsTTSSTTStable Diffusion, and other GPU-intensive models in production. You will lead the design and operation of cost-efficienthigh-availability, and high-performance serving stacks in a Kubernetes-based AWS environment.

How You Will Influence the Workflow

  • You will architect, deploy, and maintain scalable ML infrastructure on AWS EKS using Terraform and Helm.
  • You will own end-to-end model deployment pipelines for LLMs, diffusion models (LDM/Stable Diffusion), and other generative/AI models requiring high GPU throughput.
  • You will design cost-effective, auto-scaling serving systems using tools like Triton Inference ServervLLMRay Serve, or similar model-serving frameworks.
  • You will build and maintain CI/CD pipelines integrating the ML model lifecycle (training → validation → packaging → deployment).
  • You will optimize GPU resource utilization and implement job orchestration with frameworks like KServeKubeflow, or custom workloads on EKS.
  • You will deploy and manage FluxCD (or ArgoCD) for GitOps-based deployment and environment promotion.
  • You will implement robust monitoring, logging, and alerting for model health and infrastructure performance (Prometheus, Grafana, Loki).
  • You will collaborate closely with ML Engineers and Software Engineers to ensure smooth integration, observability, and feedback loops.

What Will Help You Stand Out

  • Experience with model serving frameworks such as TritonvLLMRay ServeTorchServe, or similar will highlight your expertise in serving AI models.
  • Experience deploying and optimizing LLMs and LDMs (e.g., Stable Diffusion) under high load with GPU-aware scaling will set you apart as an expert in handling resource-intensive models.
  • Strong expertise in Kubernetes (EKS) and infrastructure-as-code (Terraform, Helm) will showcase your ability to build high-quality, efficient infrastructure.
  • Hands-on experience in Python and the full ML model lifecycle in production environments will prove your practical experience in real-world deployments.
  • A willingness to explore new languages such as Go or Rust for backend systems or performance-critical systems will give you a competitive edge.

Required Experience

  • 2–3 years of experience with model serving frameworks such as TritonvLLMRay ServeTorchServe, or similar.
  • 2–3 years of experience deploying and optimizing LLMs and LDMs (e.g., Stable Diffusion) under high load with GPU-aware scaling.
  • 3–4 years of experience with Kubernetes (EKS) and infrastructure-as-code (Terraform, Helm).
  • 4–5 years of hands-on software engineering experience in Python, with production-grade experience in ML model lifecycle.
  • Nice to have: familiarity with Go or Rust for backend or performance-critical systems.
  • Fluent English

Nice to Have

  • Experience with model quantization, distillation, and optimization.
  • Familiarity with ML model registries like MLflow or DVC.
  • Exposure to Kafka or event-driven data pipelines.
  • Contributions to open-source MLOps tools or frameworks.

Специализация
Информационные технологииDevOpsAWSGoogle Cloud
Отрасль и сфера применения
ML/AIDev tools
Уровень должности
Сеньор
Откликнуться на вакансию
Быстрый отклик и регистрация/авторизация

Или быстрая регистрация/авторизация через OAuth