Observability¶
The observability stack provides real-time visibility into strategy latency, trade events, and container health.
Stack Components¶
| Service | Port | Image |
|---|---|---|
| Grafana Alloy | 12345 | grafana/alloy:v1.16.1 |
| Mimir | 9009 | grafana/mimir:3.0.6 |
| Loki | 3100 | grafana/loki:3.6.10 |
| Grafana | 3000 | grafana/grafana:13.0 |
| cAdvisor | 8080 | ghcr.io/google/cadvisor:0.56.2 |
Starting the Stack¶
just obs-up # start (respects ENVIRONMENT in .env)
just obs-down # stop
just obs-logs # tail all logs
just obs-restart grafana # restart a single service
Grafana¶
Access at http://localhost:3000 (user: admin, password: admin).
Pre-provisioned resources (in infra/grafana/provisioning/):
- Datasource: Mimir (Prometheus-compatible remote read)
- Dashboard: NautilusTrader Latency (Mimir)
Alloy Pipeline¶
Configured in infra/alloy/config.alloy. The pipeline:
- Tails
./logs/trader.jsonfor new log lines - Parses the JSON payload from the
messagefield - Extracts
metric_type,metric_name,metric_value,strategy_id - Promotes extracted fields to Loki stream labels
- Emits Prometheus metrics:
{metric_type="gauge"}→loki_process_custom_nautilus_latency_us{metric_type="counter"}→loki_process_custom_nautilus_metrics_this_run_total- Remote-writes to Mimir at
http://mimir:9009/api/v1/push
Tracked Metrics¶
| Metric Name | Strategy | Description |
|---|---|---|
strategy_latency_us |
all | Signal-to-submit latency in µs |
order_ack_rtt_us |
all | Time to OrderAccepted in µs |
fill_report_lag_us |
all | Fill report delivery lag in µs |
trade_pnl_usdt |
all | Per-trade realized PnL in USDT |
Storage Backends¶
Controlled by ENVIRONMENT in .env:
| Environment | Mimir Storage | Loki Storage |
|---|---|---|
dev |
Local filesystem (/data/mimir) |
Local filesystem (/loki) |
prod |
S3-compatible object store | S3-compatible object store |
PromQL Examples¶
# Average strategy latency over last 5 minutes
avg_over_time(loki_process_custom_nautilus_latency_us[5m])
# Windowed fill count (avoids counter-reset issues)
increase(loki_process_custom_nautilus_metrics_this_run_total[1h])
# Latency by strategy
avg by (strategy_id) (loki_process_custom_nautilus_latency_us)
Warning
Counters reset when Alloy restarts. Always use increase() rather than raw counter values.
For authoritative trade totals, query PostgreSQL directly.