Data Architect: I (Junior)

Company:  Central Point Partners
Location: Columbus
Closing Date: 19/10/2024
Hours: Full Time
Type: Permanent
Job Requirements / Description
Data Architect

Hybrid 3 days per week in office in Columbus, OH

3 Months Contract to Hire

As an MLOps Engineer, you will be responsible for the end-to-end productionization and deployment of machine learning models at scale. You will work closely with data scientists to refine models and ensure they are optimized for production. Additionally, you will be responsible for maintaining and improving our MLOps infrastructure, automating deployment pipelines, and ensuring compliance with IT and security standards. You will play a critical role in image management, vulnerability remediation, and the deployment of Client models using modern infrastructure-as-code practices.

## API Experience: The MLOps Engineer will be responsible for developing and maintaining APIs and data pipelines that facilitate the seamless integration of machine learning model outputs into our Kafka-based event hub platform. This role requires a strong background in Python, API development (batch and real-time), Kafka, FTP/SFTP automation, and familiarity with Linux operating systems. You will work closely with data scientists to understand and finalize the schema of model outputs, document schemas using Swagger, collaborate with event hub architects, and ensure that data is accurately and reliably published, whether through Kafka APIs, automated FTP processes, or custom-developed APIs for real-time integration.

## Key Responsibilities:

1) Vulnerability Remediation & Image Management:

  - Manage and update Docker images, ensuring they are secure and optimized.

  - Collaborate with data scientists to validate that models run effectively on updated images.

  - Address security vulnerabilities by updating and patching Docker images.

2) AWS & Terraform Expertise:

  - Deploy, manage, and scale AWS services (SageMaker, S3, Lambda) using Terraform.

  - Automate the spin-up and spin-down of AWS infrastructure using Terraform scripts.

  - Monitor and optimize AWS resources to ensure cost-effectiveness and efficiency.

3) DevOps & CI/CD Pipeline Management:

  - Design, implement, and maintain CI/CD pipelines in Azure DevOps (ADO).

  - Integrate CI/CD practices with model deployment processes, ensuring smooth productionization of Client models.

  - Strong experience with Git for code versioning and collaboration.

4) Model Productionization:

  - Participate in the end-to-end process of productionizing machine learning models, from model deployment to monitoring and maintaining their performance.

  - Work with large language models, focusing on implementing near real-time and batch inferences.

  - Address data drift and model drift in production environments.

5) Collaboration & Continuous Learning:

  - Work closely with data scientists, DevOps engineers, and other MLOps professionals to ensure seamless integration and deployment of Client models.

  - Stay updated on the latest trends and technologies in MLOps, especially related to AWS and Docker.

6) API related responsibilities:

- Schema Documentation: Collaborate with data scientists to refine and document model output schemas using Swagger for downstream API development.

- Data Transfer & API Development: Automate data transfers (data pipelines) to Kafka using FTP/SFTP or Kafka APIs. Develop and maintain batch and real-time APIs for model output integration.

- Event Hub Integration: Work with Kafka engineers to ensure accurate data publishing and monitor for reliability.

## Required Skills & Qualifications:

  - Python: Deep expertise in Python for scripting and automation.

  - AWS: Strong experience with AWS services, particularly SageMaker, S3, and Lambda.

  - Terraform: Proficiency in using Terraform for infrastructure-as-code on AWS.

  - Docker: Extensive experience with Docker, including building, managing, and securing Docker images.

  - Linux: Strong command-line skills in Linux, especially for Docker and system management.

- DevOps Experience: Azure DevOps (ADO): Significant experience in setting up and managing CI/CD pipelines in ADO.

  - Git: Proficient in using Git for version control and collaboration.

  - Proven experience in developing and managing both batch and real-time APIs, preferably in a Kafka-based event-driven architecture.

  - Expertise in API development, including both batch and real-time data processing. Exposure to API documentation tools like Swagger. Strong understanding of schema design and data serialization formats such as JSON.

  - Additional DevOps Tools: Experience with Jenkins or other CI/CD tools is a plus.

- Experience & Education: 4 years of experience in combination of MLOps/DevOps/Data Engineering; Bachelor's degree in Computer Science, Engineering, or a related discipline.

## Preferred Qualifications:

- Experience with large language models and productionizing Client models in a cloud environment.

- Exposure to near real-time inference systems and batch processing in Client.

- Familiarity with data drift and model drift management.

#LI-SB1

Apply Now
Share this job
  • Similar Jobs

  • Junior Data Scientist

    Columbus
    View Job
  • Data Architect/Modeler

    Columbus
    View Job
  • Senior Data Architect

    Columbus
    View Job
  • Junior Software Developer - Remote

    Columbus
    View Job
  • Junior Software Developer - Remote

    Columbus
    View Job
An error has occurred. This application may no longer respond until reloaded. Reload 🗙