Job ID: JOB_ID_1937
Role Overview: AI Data Engineer
We are seeking a highly skilled and experienced AI Data Engineer to join our dynamic team. This role is critical for designing, building, and optimizing our data infrastructure to support advanced AI and machine learning initiatives. The ideal candidate will have a deep understanding of data engineering principles, combined with hands-on experience in modern AI technologies such as Generative AI and Large Language Models (LLMs). You will be responsible for the end-to-end data lifecycle, from ingestion to the delivery of high-quality data sets for AI consumption.
Key Responsibilities and Technical Requirements
- Advanced SQL and Database Management: You must possess hands-on experience with writing complex SQL queries. This includes proficiency in using Joins, Self Joins, Views, Materialized Views, and Cursors. You should be comfortable with recursive queries and the use of analytical functions like GROUP BY and PARTITION BY. A significant portion of the role involves SQL performance tuning to ensure efficient data retrieval and processing across large-scale datasets.
- ETL and Dimensional Data Modeling: Expertise in ETL processes and Dimensional Data Modeling is essential. You should have a thorough understanding of Slowly Changing Dimensions (SCD Type 1, 2, and 3). You will be responsible for designing schemas, distinguishing between fact and dimension tables, and making informed design considerations based on business requirements to ensure data scalability and integrity.
- Python Programming and Automation: Proficiency in Python scripting is a must, specifically using libraries such as Pandas, PyParsing, and NumPy. You will use Apache Airflow for workflow orchestration and data pipeline automation. Experience setting up Python scripts on DataLab, scheduling processes, and connecting with DataLakes (e.g., S3) is required to maintain robust data flows.
- Snowflake Architecture: A solid understanding of Snowflake Architecture is required. This includes experience with SnowSQL, Snowpipe, and performance optimization techniques. You should be familiar with warehouse concepts such as scaling, clustering, micro-partitioning, and time travel to manage cloud data storage effectively.
- AI and LLM Integration: This role places a heavy emphasis on AI integration. You should have a good understanding of Retrieval-Augmented Generation (RAG) and experience with Prompt and Context Engineering. You will manage vector databases and indices for semantic search. Familiarity with LLM orchestration frameworks like LangChain and LlamaIndex is highly beneficial for building intelligent data applications.
Professional Expectations and Collaboration
- Collaborate effectively with offshore teams to ensure seamless project delivery across different time zones.
- Actively participate in discussions with business stakeholders to understand requirements and provide suitable technical solutions that align with organizational goals.
- Maintain high standards of data quality and integrity across all pipelines, implementing rigorous testing and validation protocols.
- Stay updated with the latest trends in AI and data engineering to drive innovation and efficiency within the organization.
- Own project delivery from conception to production, ensuring all technical documentation is up to date.
This position requires a candidate who is not only technically proficient but also a proactive problem solver. If you have a passion for data and AI, and you meet the requirements listed above, we encourage you to apply for this challenging and rewarding role.
Special Requirements
Need Valid and local TX. C2C / C2H positions.
Compensation & Location
Salary: $145,000 – $195,000 per year (Estimated)
Location: Remote, TX
Recruiter / Company – Contact Information
Recruiter / Employer: Nvoids
Email: bethrachel963@gmail.com
Recruiter Notice:
To remove this job posting, please send an email from
bethrachel963@gmail.com with the subject:
DELETE_JOB_ID_1937