Datasets πŸŽ“

Turn Engli's verified maps into a standardized Q&A training set, then fine-tune an open-source model on it β€” delegated to a RunPod GPU with your API key. The hard part of any instruction dataset (correct, inspectable answers) is already done; this synthesizes, refines, and ships it.

Build a custom dataset β€” blueprint + RLHF sandbox πŸ§ͺ

Design a blueprint β€” a master focus (subject), a specialized-agent persona, reference docs, and a deterministic verification harness β€” then generate Q&A with your key. The harness grades every pair in the loop (no model): the flagship πŸ”’ Mathematics preset auto-rejects any solution whose own arithmetic doesn't equal its stated answer, mixing Engli intent + program-aided (Python) checks. Keep/reject the rest, compare datasets by verified rate, export, and fire a RunPod fine-tune: blueprint β†’ generate β†’ verify β†’ curate β†’ train β†’ repeat.

Sign in (top-right) to design a verifiable dataset with a human-feedback loop.

Preview the built-in data

A sample of the generated set. Build the full thing with python3 -m engli.datasets.build.

Loading samples…

⬇ Download train.jsonl/data/engli-instruct.train.jsonl

Fine-tune on RunPod πŸ›°οΈ

Delegates an Unsloth LoRA SFT run to a RunPod GPU pod. The job downloads your dataset, trains the base model, and (optionally) pushes the adapter to the Hugging Face Hub. RunPod GPU time is billed to your RunPod account.

Credentials

Your key is sent to RunPod through this app's API route for this request only β€” never stored or logged.

Job

Build it yourself

# generate + refine + split, write Alpaca (and chat/qa/dpo) + a dataset card
python3 -m engli.datasets.build --formats alpaca,chat,qa,dpo --augment 2

# deterministically verify a dataset against a harness (the same checker the
# web sandbox runs in the loop) β€” e.g. score a math set's self-consistency:
python3 -m engli.datasets.harness out/alpaca/train.jsonl --kind numeric

# frontier pipeline: derive preference (DPO/KTO), verifiable-reward (RLVR), and
# chat sets from curated pairs β€” and decontaminate against your eval set:
python3 -m engli.datasets.pipeline pairs.jsonl --format dpo --decontaminate eval.jsonl
python3 -m engli.datasets.pipeline pairs.jsonl --format card    # provenance + stats

# also RUN each ```python``` block and confirm it prints the #### answer:
python3 -m engli.datasets.harness out/alpaca/train.jsonl --kind numeric --execute

# judge-free benchmark: grade your model's predictions against gold answers:
python3 -m engli.datasets.harness preds.jsonl --gold gsm8k_test.jsonl

# host out/alpaca/train.jsonl somewhere, then point the dataset field at its URL
# (or use any hf:owner/name dataset), pick a GPU, and Launch.