Monitoring

If you can't see what your deployment is doing, you're flying blind. And flying blind with a production system is... not great. This chapter covers the observability features built into Adama, how to configure logging, health check endpoints, debugging techniques, and how to think about performance monitoring.

Health Check Endpoints

Adama exposes HTTP endpoints for health monitoring. These are meant for load balancers, orchestrators, and whatever monitoring system you're running.

Basic Health Check

The basic health check confirms the server is up and accepting connections:

GET /~health_check_lb

Response when healthy:

200 OK

It's lightweight -- suitable for frequent polling by load balancers.

Deep Health Check

The deep health check does internal verification:

GET /~deep_health_check_status_page

This endpoint verifies:

  • Core services are running
  • Document runtime is operational
  • Internal components are healthy

Use this for more thorough monitoring, but at lower frequency. It does actual work internally, so don't hammer it.

Custom Health Check Paths

You can configure custom paths in your web configuration:

{
  "http-health-check-path": "/health",
  "http-deep-health-check-path": "/health/deep"
}

Handy when integrating with existing monitoring infrastructure that expects specific paths.

Load Balancer Configuration

When using a load balancer, configure it to poll the health endpoint:

# Example HAProxy configuration
backend adama_servers
    option httpchk GET /~health_check_lb
    http-check expect status 200
    server adama1 10.0.0.1:8080 check inter 5s
    server adama2 10.0.0.2:8080 check inter 5s

For Kubernetes:

livenessProbe:
  httpGet:
    path: /~health_check_lb
    port: 8080
  initialDelaySeconds: 10
  periodSeconds: 5

readinessProbe:
  httpGet:
    path: /~deep_health_check_status_page
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 10

Built-in Metrics

Adama's runtime collects metrics about document operations, connections, and resource usage. These feed into the platform's monitoring infrastructure.

Core Metrics

The runtime tracks:

Metric Category What It Measures
Document Operations Creates, loads, saves, deletes
Connection Metrics Active connections, connection rate, disconnections
Message Processing Messages received, processed, errors
State Machine Events State transitions, blocked states
Memory Usage Documents loaded, memory per document

Metering

Adama tracks resource consumption for billing and capacity planning:

  • CPU time: Processing time per document
  • Storage: Data size per document
  • Bandwidth: Network transfer per connection
  • Operations: API calls per document

Accessing Metrics

Metrics are available through the platform API. For self-hosted deployments, integrate with your monitoring stack using the metrics factory pattern:

// Custom metrics integration
MetricsFactory factory = new PrometheusMetricsFactory();
CoreMetrics coreMetrics = new CoreMetrics(factory);

Logging

Good logging is the difference between "I know what went wrong" and "I have no idea what went wrong." Adama supports configurable logging at multiple levels.

Log Levels

Level Use Case
ERROR Unrecoverable errors, system failures
WARN Recoverable issues, degraded performance
INFO Normal operations, significant events
DEBUG Detailed operational information
TRACE Very detailed debugging information

Logging Configuration

Configure logging through your Java logging framework. For Logback:

<configuration>
  <appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
    <encoder>
      <pattern>%d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n</pattern>
    </encoder>
  </appender>

  <appender name="FILE" class="ch.qos.logback.core.rolling.RollingFileAppender">
    <file>logs/adama.log</file>
    <rollingPolicy class="ch.qos.logback.core.rolling.TimeBasedRollingPolicy">
      <fileNamePattern>logs/adama.%d{yyyy-MM-dd}.log</fileNamePattern>
      <maxHistory>30</maxHistory>
    </rollingPolicy>
    <encoder>
      <pattern>%d{ISO8601} [%thread] %-5level %logger - %msg%n</pattern>
    </encoder>
  </appender>

  <root level="INFO">
    <appender-ref ref="STDOUT" />
    <appender-ref ref="FILE" />
  </root>

  <!-- Verbose logging for Adama internals during debugging -->
  <logger name="ape" level="DEBUG" />
</configuration>

Structured Logging

For production, use structured logging so your log aggregation system can actually parse things:

<encoder class="net.logstash.logback.encoder.LogstashEncoder">
  <customFields>{"service":"adama","environment":"production"}</customFields>
</encoder>

This outputs JSON-formatted logs suitable for Elasticsearch, Splunk, CloudWatch, or whatever you're using.

Log Aggregation

Centralize your logs. Pick your approach:

  1. File-based: Use Filebeat or Fluentd to ship logs
  2. Network-based: Configure a network appender to send directly
  3. Container-based: Log to stdout and let the orchestrator handle aggregation

What to Log

Focus on actionable information. Logging everything is just as bad as logging nothing -- you end up drowning in noise.

Event Type Log Level Information to Include
Document created INFO Space, key, creator
Connection established INFO Client identifier, document
Connection failed WARN Reason, client identifier
Message processed DEBUG Channel, message type
State transition DEBUG From state, to state, trigger
Error in document ERROR Space, key, error details, stack trace

Debugging Techniques

When things go sideways, here's how to figure out why.

Document State Inspection

Examine the current state of a document:

java -jar adama.jar document read --space myapp --key doc-123

This shows the document's current data. Oftentimes the bug is obvious once you can actually see the state.

Connection Debugging

Monitor active connections and their state:

  1. Enable DEBUG logging for connection handlers
  2. Watch for connection lifecycle events
  3. Check for connection leaks (disconnects without cleanup)

Message Tracing

Trace message flow through the system:

channel myChannel(MyMessage msg) {
  // Log incoming messages for debugging
  // Remove or gate behind a flag in production
  @debug("Received message: " + msg.type);
  // ... handle message
}

The @debug directive outputs information during development but can be disabled in production.

State Machine Debugging

Track state machine transitions:

#waiting {
  @debug("Entered waiting state");
  // state logic
}

#processing {
  @debug("Entered processing state");
  // state logic
}

Testing in Development

Use the built-in testing framework to validate behavior before you ship:

public int count;

message MyMessage {
  int value;
}

channel myChannel(MyMessage msg) {
  count++;
}

test scenario {
  @send myChannel(@no_one, { value: 42 });
  assert count == 1;
}

Run tests before deployment. This sounds obvious, but I'm saying it anyway.

Performance Monitoring

Monitor performance to find bottlenecks and plan capacity.

Key Performance Indicators

Track these to understand system health:

Metric Healthy Range Warning Signs
Message latency < 10ms p99 Increasing latency
Connection rate Stable Sudden spikes
Document load time < 100ms Increasing over time
Memory per document Stable Unbounded growth
Error rate < 0.1% Any increase

Latency Analysis

Measure end-to-end latency at multiple points:

  1. Client to server: Network latency
  2. Message processing: Document execution time
  3. Response delivery: Delta computation and transmission

High latency at each point means something different:

  • Client to server: Network or load balancer issues
  • Message processing: Complex document logic or resource contention
  • Response delivery: Large delta payloads or network congestion

Memory Analysis

Monitor memory usage patterns:

# JVM memory statistics
jstat -gc <pid> 1000

Watch for:

  • Steadily increasing heap usage (memory leak -- bad)
  • Frequent full garbage collections (insufficient heap)
  • Large survivor spaces (objects living too long)

Profiling

For detailed performance analysis:

# CPU profiling
java -agentpath:/path/to/async-profiler/libasyncProfiler.so=start,file=profile.html -jar solo.jar ...

# Heap analysis
jmap -dump:format=b,file=heap.hprof <pid>

Analyze profiles to find:

  • Hot code paths consuming CPU
  • Memory allocation patterns
  • Lock contention

Capacity Planning

Use historical metrics to plan ahead:

  1. Track document count growth over time
  2. Monitor peak concurrent connections
  3. Measure message throughput during peak hours
  4. Project future needs based on user growth

Alerting

Set up alerts for conditions that need immediate attention. The goal is to wake up for real problems, not false alarms.

Critical Alerts

Condition Action
Health check failing Investigate immediately, possible outage
Error rate > 1% Review logs for error patterns
Memory > 90% Scale up or investigate leak
Latency > 1s p99 Investigate bottleneck

Warning Alerts

Condition Action
Memory > 70% Plan scaling
Connection rate spike Investigate source
Disk usage > 80% Clean up or expand
Certificate expiring Renew before expiration

Alert Configuration Example

Using Prometheus Alertmanager:

groups:
- name: adama
  rules:
  - alert: AdamaHealthCheckFailing
    expr: probe_success{job="adama_health"} == 0
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "Adama health check failing"

  - alert: AdamaHighErrorRate
    expr: rate(adama_errors_total[5m]) > 0.01
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "Elevated error rate in Adama"

Operational Runbooks

Have procedures ready for common scenarios. When things break at 2am, you don't want to be figuring this out from scratch.

High Memory Usage

  1. Check active document count
  2. Identify documents with large state
  3. Review for memory leaks in document code
  4. Consider scaling horizontally
  5. If critical, restart with larger heap

Elevated Error Rate

  1. Check recent deployments (it's almost always a recent deployment)
  2. Review error logs for patterns
  3. Identify affected documents or channels
  4. Rollback if deployment-related
  5. Fix and redeploy if code issue

Connection Storms

  1. Identify source of connections
  2. Check for client retry loops (this is the usual suspect)
  3. Enable rate limiting if available
  4. Scale capacity if legitimate traffic
  5. Block malicious sources

Invest in monitoring before you need it. When something breaks, good observability is the difference between a five-minute fix and a five-hour outage.

Previous Deployment
Next Security