Back to Jobs

Member of Technical Staff, Data Engineer

Odyssey • Santa Clara • Full-time

$120,000

per year

Cloud Computing Machine Learning Data Engineer Kubernetes data pipelines

Job Description

Odyssey is pioneering AI video you can both watch and interact with in real-time.

Odyssey was founded in late 2023 by Oliver Cameron (Cruise, Voyage) and Jeff Hawke (Wayve, Oxford AI PhD), two veterans of self-driving cars and AI. They’ve since recruited a world-class team of AI researchers from Waymo, Tesla, Meta, Cruise, Wayve, and Microsoft.

Odyssey has raised significant venture capital from GV, EQT Ventures, Air Street Capital, DCVC, Elad Gil, Garry Tan, Soleio, Jeff Dean, Kyle Vogt, Qasar Younis, Guillermo Rauch, Soumith Chintala, and researchers from OpenAI, DeepMind, Meta, and Midjourney. Ed Catmull, the founder of Pixar, serves on Odyssey’s board.

We're seeking a data engineer to own our ML/data platform. This role has different titles at many companies, and will be a mixture of infrastructure, tooling, and data pipelines that enable our researchers to efficiently work with multimodal data (images, video, 3D, and more), conduct experiments, and seamlessly move models to production. You'll have significant autonomy in technical decisions and the opportunity to grow into a technical leadership role as we scale.

A typical week

Design and implement scalable data pipelines for processing large-scale multimodal datasets.
Collaborate with ML researchers to optimize data preprocessing and training workflows.
Make key architectural decisions about our data platform infrastructure.
Improve our Kubernetes-based data processing, training, and serving infrastructure.

Your core responsibilities

Design and implement our Kubernetes-based ML data platform from the ground up.
Build scalable data pipelines that support both research experimentation and production deployment.
Create systems for dataset versioning, experiment tracking, and model lifecycle management.
Develop tools and interfaces that make it easy for researchers to find, enrich, and version multimodal data.
Establish best practices for reproducibility and production readiness.
Collaborate closely with ML researchers to understand and optimize their workflows.

Your technical scope

Work with large-scale multimodal datasets, including imagery, video, and 3D. (PB-scale).
Design and manage multi-node, multi-cloud Kubernetes clusters for distributed training.
Implement monitoring and observability for ML workflows.
Support real-time inference requirements for creative tools and experiences.

The required skills & experience

5+ years of software engineering experience, with significant work in data platforms.
Strong Python development and system design expertise.
Deep experience with data pipeline development and ETL processes.
Production Kubernetes experience and container orchestration expertise.
Hands-on experience with data-oriented ML infrastructure tools (experiment tracking, feature stores, model registries).
Proficiency with cloud platforms (AWS/GCP/Azure).
Experience with data versioning and experiment tracking systems.
Understanding of ML workflows and researcher needs.

Your ideal qualities

Self-directed and comfortable with ambiguity.
Strong bias for action and pragmatic problem-solving.
Track record of extreme ownership in technical projects.
Excellent communication skills, especially with technical stakeholders.
Experience building systems from scratch in fast-paced environments.
Passion for enabling ML research and production excellence.

The nice-to-haves

Experience with video, image, or 3D data pipelines for ML/AI.
Experience with distributed computing frameworks (we use Ray) or workflow orchestration (we use Flyte).
Familiarity with vector databases (e.g., LanceDB) and similarity search.
Experience at AI/ML companies or research labs.
Contributions to open-source data/ML tools.
Experience building researcher-facing tools.

Growth opportunities

Help build our data platform engineering team as we scale.
Define our technical strategy for data platform and infrastructure.
Establish key partnerships with data platform framework open source projects and vendors.
Shape our technical hiring strategy.
Deep engagement with the broader data and ML infrastructure community.