Observation Shape and Action Dim
When worldflux init asks for these values, it needs the exact I/O contract of your environment.
Observation Shape
Observation shape is the per-step observation tensor shape.
- Atari/image tasks usually use channel-first shape like
3,64,64 - Vector/state tasks usually use one dimension like
39 - This value maps to
architecture.obs_shapeinworldflux.toml
Examples:
3,64,64: RGB image observation (channels, height, width)1,84,84: grayscale frame stack with one channel39: flat state vector with 39 features
Action Dim
Action dim is the action width expected by the world model.
- Discrete environments: use the number of discrete actions (for one-hot action vectors)
- Continuous environments: use the size of the continuous action vector
- This value maps to
architecture.action_diminworldflux.toml
Examples:
- Breakout-like discrete control with 6 actions:
action_dim = 6 - MuJoCo HalfCheetah with 6-dim continuous action:
action_dim = 6
How To Pick Correct Values
- Check your environment spec first.
- Set
obs_shapeto the exact observation tensor shape used in training. - Set
action_dimto the exact action width emitted by your policy/planner. - Keep these values consistent across
train.py,inference.py, and datasets.
Common Mistakes
- Using
64,64,3while pipeline expects channel-first3,64,64 - Entering
action_dim=1for a discrete environment with multiple actions - Changing
obs_shapeafter checkpoints are created (can break load/inference)
Shape Reference Table
| Environment Type | obs_shape | action_dim | action_type | Notes |
|---|---|---|---|---|
| Atari (standard) | 3, 64, 64 | 18 | discrete | Channel-first RGB |
| Atari (grayscale stack) | 4, 84, 84 | 18 | discrete | 4-frame stack |
| MuJoCo HalfCheetah | 17 | 6 | continuous | Flat state vector |
| MuJoCo Humanoid | 376 | 17 | continuous | High-dim state |
| DMControl Walker | 24 | 6 | continuous | Proprioceptive |
| DMControl Pixels | 3, 84, 84 | varies | continuous | Visual observation |
| Custom image env | C, H, W | varies | varies | Channel-first required |
Multi-Modal Observations (v3 API)
WorldFlux v3 supports multi-modal observation specifications via observation_modalities:
model = create_world_model(
"dreamerv3:size12m",
observation_modalities={
"image": {"shape": (3, 64, 64), "kind": "image"},
"proprio": {"shape": (12,), "kind": "vector"},
},
action_dim=6,
)
Each modality entry requires:
shape: Tensor shape per time step (excluding batch dimension)kind: One of"image","vector","video","tokens","text","other"dtype(optional):"float32"(default),"float16", or"bfloat16"
Action Specifications (v3 API)
For explicit action control, use action_spec:
model = create_world_model(
"tdmpc2:5m",
obs_shape=(39,),
action_spec={"kind": "continuous", "dim": 6},
)
Supported kind values: "continuous", "discrete", "token", "latent", "none".
Programmatic Inspection
from worldflux import get_config
config = get_config("dreamerv3:size12m")
print(f"obs_shape: {config.obs_shape}") # (3, 64, 64)
print(f"action_dim: {config.action_dim}") # 6
print(f"action_type: {config.action_type}") # continuous
print(f"modalities: {config.observation_modalities}")