Unified Training Comparison
This smoke demo shows that DreamerV3 and TD-MPC2 can run through the same high-level training flow and the same quick verification flow.
It uses the same random ReplayBuffer source for both families, so treat it as
a contract demonstration rather than a benchmark or a real-environment
performance comparison.
Run
uv sync --extra dev --extra training
uv run python examples/compare_unified_training.py --quick
What It Demonstrates
- same random ReplayBuffer source for both families
- same
TrainingConfigcontract - same quick verification flow for both families
- same artifact generation flow
- same visualization helper (
write_reward_heatmap_ppm)
The default command also runs quick verification for each family and writes the
structured result alongside the training outputs. Use --skip-verify only when
you intentionally want a faster local smoke without verification artifacts.
Artifacts
outputs/comparison/summary.jsonoutputs/comparison/dreamer.ppmoutputs/comparison/tdmpc2.ppmoutputs/comparison/dreamerv3/quick_verify.jsonoutputs/comparison/tdmpc2/quick_verify.json
Interpretation
This demo proves that both families can be created, trained, rolled out, and checked through the same public workflow shape. It remains a smoke-only contract demonstration and does not establish real-environment performance, benchmark superiority, paper reproduction, or a public proof claim.