Foundational Data for Those Pushing the Frontier of AI

AI needs exceptional data. We partner with leading language model and embodied AI labs to deliver the data and evaluations that define the next frontier of capability.

High Quality Data to Match Research Needs

Evaluations

Evaluations

Private benchmarks and human in the loop evaluation pipelines to align model capabilities with the use case.

VERIFIED BENCHMARKS
HUMAN IN THE LOOP
EXPERT NETWORK
RL Environments

RL Environments

Realistic and high rigor environments for SWE, computer use, agent, and function calling tasks.

PROCESS/OUTCOME REWARDS
CHALLENGING TASKS
LONG HORIZON PLANNING
SFT Data

SFT Data

Labeled input–output pairs from top tier experts, used to teach models reasoning, instruction following, and safety.

INSTRUCTION FOLLOWING
REASONING
SAFETY & ALIGNMENT
Embodied AI

Embodied AI

Diverse real world data at the scale foundation models need.

TELEOP / UMI / ALOHA
SCENE AND TASK RANDOMIZATION
WORLD MODELS

Products

Tab
Enter
...
Space
Esc

Computer/Browser Use

Enterprise and consumer web app environments, evaluating the model's ability to interpret page contents from screenshots or HTML, and take actions by clicking, scrolling, and typing.

Hi! I want to transfer $500 to my Chase account.

Thinking... (01s)
Search: How do I transfer money...

I can help with that! Let me check if you have that account linked.

get_linked_accounts

{

"user_id": "u_8f92k",

"accounts": [

{"bank": "Chase", "last4": 7134 }

]

}

Tool Calling Agents

Domain expert designed agentic task environments, such as customer service, where agents search documents and execute tool calls to complete economically valuable tasks.

sequence_alignment.py
terminal

from typing import Tuple


MATCH_SCORE = 2

MISMATCH_SCORE = -1

GAP_PENALTY = -2


def nw_align(s: str, t: str) -> int:

n, m = len(s), len(t)

dp = [[0] * (m + 1) for _ in range(n + 1)]


for i in range(n + 1):

dp[i][0] = i * GAP_PENALTY

Software Engineering

Software engineering environments with in-distribution tasks that simulate real world workflows such as writing, debugging, and optimizing large scale codebases.

Let X, Y be real n×n matrices such that
X² + Y² - XY = 0.

Suppose that YX - XY is invertible. Compute the number of possible values for n such that n ≤ 2025.

Thinking... (13m 11s)

Consolidating the matrix problem and equation interpretation. We're solving for an equation involving real n×n matrices, X and Y, such that the sum...

Novel Math Post-Training Data

Advanced mathematical post-training data featuring novel question, answer, and rubric sets, written by PhDs and Olympiad medalists to target specific exams such as IMO and Putnam.

The grid layout breaks on mobile. Can you make it responsive?

Checking the CSS breakpoints... I see the issue in grid-cols. Fixing it now.

Human-Agent Interaction in Coding

Interactive coding trajectory data from long-horizon, end-to-end coding workflows in which expert developers collaborate with AI coding agents.

Thinking

Custom Projects

Custom research projects designed with partner labs to develop tailored data, environments, or evaluations aligned with specific benchmarks and research goals.