Data Engineer - Scientific Data Ingestion Job at Mithrl, San Francisco, CA

a0NSRU1rNjk1ODhja3MvUlVKN1BaYXhmRWc9PQ==
  • Mithrl
  • San Francisco, CA

Job Description

ABOUT MITHRL

We envision a world where novel drugs and therapies reach patients in months, not years, accelerating breakthroughs that save lives.

Mithrl is building the world’s first commercially available AI Co-Scientist—a discovery engine that empowers life science teams to go from messy biological data to novel insights in minutes. Scientists ask questions in natural language, and Mithrl answers with real analysis, novel targets, and patent-ready reports. No coding. No waiting. No bioinformatics bottlenecks.

We are the fastest growing tech-bio startup in the Bay Area with over 12X YoY revenue growth. Our platform is already being used by teams at some of the largest biotechs and big pharma across three continents to accelerate and uncover breakthroughs—from target discovery to mechanism of action.

WHAT YOU WILL DO

Build and own an AI-powered ingestion & normalization pipeline to import data from a wide variety of sources — unprocessed Excel/CSV uploads, lab and instrument exports, as well as processed data from internal pipelines.

Develop robust schema mapping, coercion, and conversion logic (think: units normalization, metadata standardization, variable-name harmonization, vendor-instrument quirks, plate-reader formats, reference-genome or annotation updates, batch-effect correction, etc.).

Use LLM-driven and classical data-engineering tools to structure “semi-structured” or messy tabular data — extracting metadata, inferring column roles/types, cleaning free-text headers, fixing inconsistencies, and preparing final clean datasets.

Ensure all transformations that should only happen once (normalization, coercion, batch-correction) execute during ingestion — so downstream analytics / the AI “Co-Scientist” always works with clean, canonical data.

Build validation, verification, and quality-control layers to catch ambiguous, inconsistent, or corrupt data before it enters the platform.

Collaborate with product teams, data science / bioinformatics colleagues, and infrastructure engineers to define and enforce data standards, and ensure pipeline outputs integrate cleanly into downstream analysis and storage systems.

WHAT YOU BRING

Must-have

  • 5+ years of experience in data engineering / data wrangling with real-world tabular or semi-structured data.
  • Strong fluency in Python, and data processing tools (Pandas, Polars, PyArrow, or similar).
  • Excellent experience dealing with messy Excel / CSV / spreadsheet-style data — inconsistent headers, multiple sheets, mixed formats, free-text fields — and normalizing it into clean structures.
  • Comfort designing and maintaining robust ETL/ELT pipelines, ideally for scientific or lab-derived data.
  • Ability to combine classical data engineering with LLM-powered data normalization / metadata extraction / cleaning.
  • Strong desire and ability to own the ingestion & normalization layer end-to-end — from raw upload → final clean dataset — with an eye for maintainability, reproducibility, and scalability.
  • Good communication skills; able to collaborate across teams (product, bioinformatics, infra) and translate real-world messy data problems into robust engineering solutions.

Nice-to-have

  • Familiarity with scientific data types and “modalities” (e.g. plate-readers, genomics metadata, time-series, batch-info, instrumentation outputs).
  • Experience with workflow orchestration tools (e.g. Nextflow, Prefect, Airflow, Dagster), or building pipeline abstractions.
  • Experience with cloud infrastructure and data storage (AWS S3, data lakes/warehouses, database schemas) to support multi-tenant ingestion.
  • Past exposure to LLM-based data transformation or cleansing agents — building or integrating tools that clean or structure messy data automatically.
  • Any background in computational biology / lab-data / bioinformatics is a bonus — though not required.

WHAT YOU WILL LOVE AT MITHRL

  • Mission-driven impact: you’ll be the gatekeeper of data quality — ensuring that all scientific data entering Mithrl becomes clean, consistent, and analysis-ready. You’ll have outsized influence over the reliability and trustworthiness of our entire data + AI stack.
  • High ownership & autonomy: this role is yours to shape. You decide how ingestion works, define the standards, build the pipelines. You’ll work closely with our product, data science, and infrastructure teams — shaping how data is ingested, stored, and exposed to end users or AI agents.
  • Team: Join a tight-knit, talent-dense team of engineers, scientists, and builders
  • Culture: We value consistency, clarity, and hard work. We solve hard problems through focused daily execution
  • Speed: We ship fast (2x/week) and improve continuously based on real user feedback
  • Location: Beautiful SF office with a high-energy, in-person culture
  • Benefits: Comprehensive PPO health coverage through Anthem (medical, dental, and vision) + 401(k) with top-tier plans

Job Tags

Work at office,

Similar Jobs

Fairbury Public Schools

Spanish Teacher 2026-2027 Job at Fairbury Public Schools

 ...Fairbury Public Schools is currently seeking a Spanish Teacher 7-12 for the 2026-2027 school year. Candidates must hold a valid Nebraska Teacher Certificate with a Spanish Endorsement. Fairbury Public Schools is a progressive, 1:1 Technology district, boasting excellent... 

Niche SSP - No.1 for Estimating Talent

Director of Preconstruction Job at Niche SSP - No.1 for Estimating Talent

 ...Title: Director of Preconstruction Location: Atlanta, GA Salary: up to $200k base plus benefits Client: We are partnered with...  ...in Engineering, Architecture, Construction Management, or related field. Advanced degree preferred. 10 to 15 years of experience... 

Nutrition4Life

Registered Dietitian Nutritionist Job at Nutrition4Life

Nutrition4Life in midtown Manhattan seeking for an RDN to provide quality individualized counseling. Qualifications : -RD clinical experience at least 3 years (expertise in wide range of medical conditions)-RD Outpatient/private practice experience minimal 3...

Fleet Farm

Instructional Designer Job at Fleet Farm

 ...professional who loves turning complex business needs into engaging training experiences? Fleet Farm is looking for a talented Instructional Designer to develop innovative learning programs for our stores, distribution centers and corporate teams. What Youll Do: Design and... 

Southern Glazer’s Wine and Spirits, LLC

Route Delivery Driver Class B - Union Job at Southern Glazer’s Wine and Spirits, LLC

 ...will make an offer appropriately. Overview The Distribution Driver Class B is responsible for delivering products to clients;...  ...payment for goods delivered. Sort merchandise by the invoice for delivery. Adhere to the safe and courteous operation of the delivery...