Skip to content

Observability

The observability stack provides real-time visibility into strategy latency, trade events, and container health.

Stack Components

Service Port Image
Grafana Alloy 12345 grafana/alloy:v1.16.1
Mimir 9009 grafana/mimir:3.0.6
Loki 3100 grafana/loki:3.6.10
Grafana 3000 grafana/grafana:13.0
cAdvisor 8080 ghcr.io/google/cadvisor:0.56.2

Starting the Stack

just obs-up     # start (respects ENVIRONMENT in .env)
just obs-down   # stop
just obs-logs   # tail all logs
just obs-restart grafana  # restart a single service

Grafana

Access at http://localhost:3000 (user: admin, password: admin).

Pre-provisioned resources (in infra/grafana/provisioning/):

  • Datasource: Mimir (Prometheus-compatible remote read)
  • Dashboard: NautilusTrader Latency (Mimir)

Alloy Pipeline

Configured in infra/alloy/config.alloy. The pipeline:

  1. Tails ./logs/trader.json for new log lines
  2. Parses the JSON payload from the message field
  3. Extracts metric_type, metric_name, metric_value, strategy_id
  4. Promotes extracted fields to Loki stream labels
  5. Emits Prometheus metrics:
  6. {metric_type="gauge"}loki_process_custom_nautilus_latency_us
  7. {metric_type="counter"}loki_process_custom_nautilus_metrics_this_run_total
  8. Remote-writes to Mimir at http://mimir:9009/api/v1/push

Tracked Metrics

Metric Name Strategy Description
strategy_latency_us all Signal-to-submit latency in µs
order_ack_rtt_us all Time to OrderAccepted in µs
fill_report_lag_us all Fill report delivery lag in µs
trade_pnl_usdt all Per-trade realized PnL in USDT

Storage Backends

Controlled by ENVIRONMENT in .env:

Environment Mimir Storage Loki Storage
dev Local filesystem (/data/mimir) Local filesystem (/loki)
prod S3-compatible object store S3-compatible object store

PromQL Examples

# Average strategy latency over last 5 minutes
avg_over_time(loki_process_custom_nautilus_latency_us[5m])

# Windowed fill count (avoids counter-reset issues)
increase(loki_process_custom_nautilus_metrics_this_run_total[1h])

# Latency by strategy
avg by (strategy_id) (loki_process_custom_nautilus_latency_us)

Warning

Counters reset when Alloy restarts. Always use increase() rather than raw counter values. For authoritative trade totals, query PostgreSQL directly.