Эта вакансия была перемещена в архив. Возможно она уже не актуальна и рекрутер больше не принимает отклики на эту вакансию. Вы можете найти актуальные похожие вакансии

Senior MLOps Engineer

Агентство / HR ресурс Mission Hire ( сайт не указан )
Аккаунт зарегистрирован с email *@gmail.com

Dubai, UAE

Сеньор

Информационные технологии • DevOps • AWS • Google Cloud • ML/AI • Dev tools

1 мая

Работа в офисе
Опыт работы от 3 до 5 летerere

Агентство Mission Hire

Короткая ссылка: gkjb.ru/hgkT

Описание вакансии

How you will benefit

Salary which entirely fits your expectations
Full relocation package to Dubai which will be adopted to your specific needs
Official employment in the UAE

We are seeking a Senior MLops Engineer with proven experience in deploying and managing large-scale ML infrastructure for LLMs, TTS, STT, Stable Diffusion, and other GPU-intensive models in production. You will lead the design and operation of cost-efficient, high-availability, and high-performance serving stacks in a Kubernetes-based AWS environment.

How You Will Influence the Workflow

You will architect, deploy, and maintain scalable ML infrastructure on AWS EKS using Terraform and Helm.
You will own end-to-end model deployment pipelines for LLMs, diffusion models (LDM/Stable Diffusion), and other generative/AI models requiring high GPU throughput.
You will design cost-effective, auto-scaling serving systems using tools like Triton Inference Server, vLLM, Ray Serve, or similar model-serving frameworks.
You will build and maintain CI/CD pipelines integrating the ML model lifecycle (training → validation → packaging → deployment).
You will optimize GPU resource utilization and implement job orchestration with frameworks like KServe, Kubeflow, or custom workloads on EKS.
You will deploy and manage FluxCD (or ArgoCD) for GitOps-based deployment and environment promotion.
You will implement robust monitoring, logging, and alerting for model health and infrastructure performance (Prometheus, Grafana, Loki).
You will collaborate closely with ML Engineers and Software Engineers to ensure smooth integration, observability, and feedback loops.

What Will Help You Stand Out

Experience with model serving frameworks such as Triton, vLLM, Ray Serve, TorchServe, or similar will highlight your expertise in serving AI models.
Experience deploying and optimizing LLMs and LDMs (e.g., Stable Diffusion) under high load with GPU-aware scaling will set you apart as an expert in handling resource-intensive models.
Strong expertise in Kubernetes (EKS) and infrastructure-as-code (Terraform, Helm) will showcase your ability to build high-quality, efficient infrastructure.
Hands-on experience in Python and the full ML model lifecycle in production environments will prove your practical experience in real-world deployments.
A willingness to explore new languages such as Go or Rust for backend systems or performance-critical systems will give you a competitive edge.

Required Experience

2–3 years of experience with model serving frameworks such as Triton, vLLM, Ray Serve, TorchServe, or similar.
2–3 years of experience deploying and optimizing LLMs and LDMs (e.g., Stable Diffusion) under high load with GPU-aware scaling.
3–4 years of experience with Kubernetes (EKS) and infrastructure-as-code (Terraform, Helm).
4–5 years of hands-on software engineering experience in Python, with production-grade experience in ML model lifecycle.
Nice to have: familiarity with Go or Rust for backend or performance-critical systems.
Fluent English

Nice to Have

Experience with model quantization, distillation, and optimization.
Familiarity with ML model registries like MLflow or DVC.
Exposure to Kafka or event-driven data pipelines.
Contributions to open-source MLOps tools or frameworks.

Специализация
Информационные технологии DevOps AWS Google Cloud
Отрасль и сфера применения
ML/AI Dev tools
Уровень должности
Сеньор