Skip to main content

Training Observability

Guide for monitoring training health and generating structured training reports.

Overview

WorldFlux automatically generates structured training reports with health signals, loss curve analysis, and actionable recommendations.

Training Reports

After training completes, a training_report.json is saved to the output directory:

from worldflux.training import Trainer

trainer = Trainer(model, config=config)
trainer.train(data)
# → outputs/training_report.json generated automatically

Health Signals

SignalWhat it MonitorsSeverity Levels
Loss convergenceSlope of recent loss valueshealthy / warning / critical
Gradient healthNaN/Inf gradient eventshealthy / critical
Numerical stabilityNon-finite valueshealthy / warning / critical
Throughput stabilitySteps/sec degradationhealthy / warning
Latent healthLatent collapse indicatorshealthy / warning

Health Score

The overall health score (0.0–1.0) is a weighted average of all health signals. A score above 0.8 indicates healthy training.

WASR Integration

Training reports emit a run.summary event to WASR telemetry for centralized monitoring.