OSS Quality Gates¶
This document defines baseline quality checks for WorldFlux.
CI Gates (Required on PR)¶
All PRs must pass:
- Lint:
uvx ruff check src/ tests/ examples/ benchmarks/ scripts/ - Format:
uvx ruff format --check src/ tests/ examples/ benchmarks/ scripts/ - Typecheck:
uv run mypy src/worldflux/ - Unit tests:
uv run pytest tests/ - Public contract freeze:
uv run pytest -q tests/test_public_contract_freeze.py - Public contract snapshot update (additive-only):
uv run python scripts/update_public_contract_snapshot.py --snapshot tests/fixtures/public_contract_snapshot.jsonbreakingclassification blocks PR; only additive changes are auto-updated.- Example smoke tests:
uv run python examples/quickstart_cpu_success.py --quickuv run python examples/compare_unified_training.py --quickuv run python examples/train_dreamer.py --testuv run python examples/train_tdmpc2.py --testuv run python examples/train_jepa.py --steps 5 --batch-size 4 --obs-dim 8uv run python examples/train_token_model.py --steps 5 --batch-size 4 --seq-len 8 --vocab-size 32uv run python examples/train_diffusion_model.py --steps 5 --batch-size 4 --obs-dim 4 --action-dim 2uv run python examples/plan_cem.py --horizon 3 --action-dim 2- Benchmark quick checks:
uv run python benchmarks/benchmark_dreamerv3_atari.py --quick --seed 42uv run python benchmarks/benchmark_tdmpc2_mujoco.py --quick --seed 42uv run python benchmarks/benchmark_diffusion_imagination.py --quick --seed 42- Docs build:
uv run mkdocs build --strict - Docs domain TLS gate:
uv run python scripts/check_docs_domain_tls.py --host worldflux.ai --url https://worldflux.ai/ --expected-san worldflux.ai- Critical coverage threshold:
uv run python scripts/check_critical_coverage.py --report coverage.xml - Planner boundary tests: verify planner/dynamics decoupling invariants
- Parity harness smoke:
uv run pytest -q tests/test_parity/ - Parity suite governance:
uv run python scripts/check_parity_suite_coverage.py --policy reports/parity/suite_policy.json --lock reports/parity/upstream_lock.json
Release-only Parity Gate (Required on Release)¶
- Fixed parity artifacts must validate before publish:
uv run python scripts/validate_parity_artifacts.py --run reports/parity/runs/dreamer_atari100k.json --run reports/parity/runs/tdmpc2_dmcontrol39.json --aggregate reports/parity/aggregate.json --lock reports/parity/upstream_lock.json --required-suite dreamer_atari100k --required-suite tdmpc2_dmcontrol39 --max-missing-pairs 0- Required-family suite policy must validate:
uv run python scripts/check_parity_suite_coverage.py --policy reports/parity/suite_policy.json --lock reports/parity/upstream_lock.json --aggregate reports/parity/aggregate.json --enforce-pass- Release stops when either DreamerV3 or TD-MPC2 fails non-inferiority.
Operational Reliability Checks¶
- Save/Load parity: outputs after
save_pretrainedand reload should remain within tolerance. - Numerical stability: no NaN/Inf during defined smoke-check budgets.
- Contract compliance: required payload and planner metadata fields are validated.
Family Validation Scope¶
- DreamerV3 / TD-MPC2: finite losses and stable train/eval smoke paths.
- Token / Diffusion / JEPA / V-JEPA2: finite outputs and smoke-train viability.
- Skeleton families (DiT/SSM/Renderer3D/Physics/GAN): contract validation and trainer smoke pass.
Planner Boundary Checks¶
- Planner returns
ActionPayloadwithextras["wf.planner.horizon"]. - Dynamics family swaps do not require planner code changes.
Recommended Commands¶
uvx ruff check src/ tests/ examples/ benchmarks/ scripts/
uvx ruff format --check src/ tests/ examples/ benchmarks/ scripts/
uv run mypy src/worldflux/
uv run pytest tests/
uv run pytest -q tests/test_public_contract_freeze.py
uv run pytest -q tests/test_parity/
uv run python scripts/check_parity_suite_coverage.py --policy reports/parity/suite_policy.json --lock reports/parity/upstream_lock.json
uv run python scripts/update_public_contract_snapshot.py --snapshot tests/fixtures/public_contract_snapshot.json
uv run python scripts/check_docs_domain_tls.py --host worldflux.ai --url https://worldflux.ai/ --expected-san worldflux.ai
uv run python scripts/check_critical_coverage.py --report coverage.xml
uv run mkdocs build --strict