RL Environments Engineer

16 000 - 25 000 USDNetto miesięcznie - B2B

BI & Data

RL Environments Engineer

BI & Data

San Carlos St 333, Warsaw +4 Lokalizacje

Preference Model via XOR Inc.

Pełny etat

B2B

Starszy specjalista / Senior

Praca w pełni zdalna

16 000 - 25 000 USDNetto miesięcznie - B2B

Opis stanowiska

About the company

XOR is hiring exclusively on behalf of our partner Preference Model.

Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models for are outside of their training data distribution. Preference Model creates reinforcement learning environments that encapsulate real-world use cases, enabling AI systems to practice, adapt, and learn from feedback grounded in reality. We seek to bring the real world into distribution for the models.

Our founding team has previous experience on Anthropic’s data team building data infrastructure, tokenizers, and datasets behind the Claude model. We are partnering with leading AI labs to push AI closer to achieving its transformative potential.

The company has closed a large Seed round from Tier-1 VC’s in Silicon Valley and is working with top AI labs, informing priorities and timelines.

XOR runs the end-to-end hiring process for this role (screening, take-home, and coordination with the Preference Model team). Please apply through this posting to be considered.

What you’ll do

You will design and build realistic engineering tasks and environments that train and evaluate LLMs. Depending on your strengths, you may focus more on production ML systems or more on performance and low-level optimization - both are valuable here.

Responsibilities

Design and build MLE/SWE environments and diverse tasks.
Target a specified language model and satisfy the required difficulty distribution.
Deliver ~1 task per 8-10 hours once onboarded.
Edit tasks within 24 hours based on customer feedback.

Onboard quickly and start delivering on day one with minimal supervision.

Requirements

What we’re looking for (must-haves)

Strong Python (engineering-quality, not notebook-only).
Hands-on LLM/GenAI work in production: you’ve shipped and operated real systems (not “wrapped an API and called it AI”).
Strong product/engineering ownership: comfortable building, fixing, and scaling end-to-end pipelines.
Docker + production mindset (debugging, reliability, iteration speed).
≥4 hours PST overlap and advanced English (C1/C2) for specs, reviews, and feedback.
Ability to meet throughput expectations and respond quickly to feedback.

Strong signals (nice-to-have, big plus)

Experience designing environments/tasks for RL and/or evaluations.
Experience in high-stakes or regulated domains (e.g., healthcare, finance, fraud/risk, safety-critical systems).
ML systems experience: CI/CD, monitoring, evaluation harnesses, MLOps, scalable pipelines.
Systems depth: C++/Rust/Scala/Java, performance/infra optimization, distributed systems.
Exposure to RL / bandits / agentic systems (not required, but a strong signal).

Not a fit if

You’re primarily a prompt engineer without strong ML/engineering foundations.
You’re a research-only / academic-only profile with little or no shipping/production ownership.

You’ve only built in notebooks or rely heavily on managed AutoML tools.

Working conditions

Remote contractor, full-time 40 hours per week, flexible schedule.
Bonuses per delivered tasks in addition to the base salary.
Potential path to FTE and relocation (performance and mutual fit).

Compensation

$90-$130 USD/hour base salary pay (equivalent of $15,00-$22,500), depending on seniority and take-home assignment quality.
Monthly performance bonuses in addition to the base pay.

Process

1) Apply via the job board

Please submit your CV and add a short note on which track fits you best:

2) Short take-home assignment (form)

After you apply, XOR will share a short take-home in the format of a form with a small task.
The Preference Model technical team will review your submission.
In parallel, you can schedule a short call with XOR to learn more about the role and the company and ask questions.

3) Teamlead interview

If the take-home looks strong, we will schedule a technical interview with the Preference Model team.

4) Second take-home assignment (coding task)

Final decision is made after second take-home assignment .

Note on take-home compensation

Time spent on the take-home can be compensated if you receive an offer.

Wymagane umiejętności

CUDA

Python

CI/CD

c++

AI

LLMs

Docker

Znajomość języków

Angielski: C1

Lokalizacja biura

RL Environments Engineer

16 000 - 25 000 USDNetto miesięcznie - B2B

Podsumowanie oferty

RL Environments Engineer

San Carlos St 333, Warsaw

Preference Model via XOR Inc.

16 000 - 25 000 USDNetto miesięcznie - B2B

Aplikując zgadzam się na przetwarzanie moich danych osobowych w celu przeprowadzenia procesu rekrutacyjnego. Please be informed that the data controller is XOR Inc (hereinafter "controller"). You have the right to request access to your personal data... WięcejThis site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Rekomendowane oferty

KMD

Warsaw

Praca w pełni zdalna

Zdalnie

Senior Data Engineer with Databricks

160 - 180PLN/godz.

Apache Spark

GIT

ETL

Python

Databricks

CI/CD

MSSQL

Starszy specjalista / SeniorSeniorB2BB2B

REKLAMA: RocketJobs poleca

Wynagrodzenie

16 000 - 25 000 USD

Netto miesięcznie - B2B

Zaaplikowano -

Rekomendowane oferty

KMD

Warsaw

Praca w pełni zdalna

Zdalnie

Senior Data Engineer with Databricks

160 - 180PLN/godz.

Apache Spark

GIT

ETL

Python

Databricks

CI/CD

MSSQL

Starszy specjalista / SeniorSeniorB2BB2B

PTT Consulting Sp. z o. o.

Warszawa

Praca w pełni zdalna

Zdalnie

Data Engineer

Nowa

20 160 - 25 200PLN/mies.

SQL

Power BI

Prefect

Snowflake

AirFlow

Python

Azure/AWS/GCP

BigQuery

Starszy specjalista / SeniorSeniorB2BB2B

Nowa

CLOUDFIDE SPÓŁKA Z OGRANICZONĄ ODPOWIEDZIALNOŚCIĄ

Praca w pełni zdalna

Zdalnie

Senior Data Engineer

Nowa

140 - 180PLN/godz.

SQL

Python

Databricks

Azure

PySpark

Starszy specjalista / SeniorSeniorB2BB2B

Nowa

Entrada AI

Praca w pełni zdalna

Zdalnie

Senior Databricks Engineer

160 - 210PLN/godz.

ETLtools

SQL

Python

Databricks

bigdata

Starszy specjalista / SeniorSeniorB2B, Umowa o pracęB2B, UoP

Onwelo

Warszawa

Praca w pełni zdalna

Zdalnie

Databricks Data Engineer

120 - 160PLN/godz.

SQL

Python

Databricks

Azure

PySpark

Starszy specjalista / SeniorSeniorB2BB2B

REKLAMA: RocketJobs poleca