Benchmarks¶
WorldFlux provides three reproducible benchmark entrypoints with aligned CLI contracts.
Common CLI Contract¶
All benchmark scripts support:
--quick(CI-safe short run)--full(longer run for manual/scheduled validation)--seed <int>--output-dir <path>
All runs emit:
- summary JSON (
summary.json) - visualization artifact (
imagination.ppm)
Benchmark 1: DreamerV3 (Atari-oriented)¶
Full-mode example:
Expected minimum result:
- finite losses
- imagination artifact generated
Benchmark 2: TD-MPC2 (MuJoCo-oriented)¶
Full-mode example:
Expected minimum result:
- finite losses
- imagination artifact generated
Benchmark 3: Diffusion Imagination¶
Full-mode example:
Expected minimum result:
- finite losses
- imagination artifact generated
Reproducibility Notes¶
- Keep
seedfixed for comparisons. - CPU is the default benchmark target in quick mode.
- Full mode is intended for
workflow_dispatchor scheduled workflows. - Runtime and artifacts depend on hardware and optional dependencies.