Senior Data Engineer
Прямой работодатель WaveAccess ( waveaccess.ru )
Опыт работы более 5 лет
We are seeking a skilled Senior Data Engineer to develop innovative solutions within a complex data architecture framework. This role will work extensively with diverse data sources and integrate advanced AI capabilities into our data pipelines. You will manage large-scale datasets, ensuring efficiency, reliability, and performance in a dynamic, collaborative environment.
If you are passionate about building cutting-edge solutions for large-scale data systems and experiment using AI for data pipelines, we would love to hear from you!
The ideal candidate should have a strong understanding of Change Data Capture (CDC), Slowly Changing Dimensions (SCD), and Dead Letter Queues (DLQ), as well as advanced techniques such as incremental processing, data partitioning, and schema evolution.
Key Responsibilities:
- Design and implement data pipelines to collect, transform and clean data from different sources
- Develop high-performance solutions to maintain structured, unstructured and vectorized data, while optimizing data transformation and retrieval
- Drive innovation by experimenting with AI-based solutions in areas with minimal established best practices.
- Collaborate with the team to build backend services using Python and Apache Airflow
- Utilize AWS cloud infrastructure to build and manage reliable, scalable pipelines
- Work with PostgreSQL, AWS s3, MongoDB, graph and vector databases to store and manage large amounts of data efficiently.
Mandatory Requirements:
- 4+ years of experience in data engineering, ETL processes, and managing complex data storage solutions
- Proven commercial experience building and deploying applications using Python and Apache Airflow
- Hands-on experience with AWS cloud infrastructure and managing data storage and transportation solutions in a cloud environment.
- Advanced proficiency in SQL, particularly PostgreSQL, and a strong understanding of diverse data storage solutions, including both structured and unstructured data
- Eagerness to learn and experiment with AI agents and integrate AI-based logic into data pipeline architecture
- Excellent problem-solving skills, with the ability to navigate ambiguity and propose solutions in challenging situations.
Nice-to-Have: - Experience working with vector storages and graph databases
- Experience with cloud-based machine learning platforms
- Knowledge of efficient data file formats, such as Parquet and Avro, particularly for batch data processing in data lakes or warehouses, experience in using Apache Spark
What We Offer
- Work in a dynamic international team.
- Employment according to labor law, 100% payment for sick leave and vacation.
- Opportunity for cooperation through individual entrepreneurship/self-employment.
- Participation in foreign and Russian projects.
- Health insurance with dental coverage.
- Necessary equipment for work.
- Corporate training programs.
- Broad opportunities for self-realization, professional and career growth.
- Democratic approach to processes and flexible start of the workday.