Graph Indexing
Adama has first-class graph indexing through three keywords: assoc, join, and traverse. These build precomputed, reactive indexes that map relationships between records across tables -- eliminating the need to scan entire tables when following relationships.
This is one of my favorite features in the language, because it solves a problem that's genuinely painful without it.
Why Graph Indexing
Consider a tagging system where people have tags via a junction table. Without graph indexing, finding all tags for a set of people requires iterating the junction table for each person, then looking up each tag -- O(people * edges) work per query. With graph indexing, the runtime maintains a precomputed edge map that updates incrementally as data changes. Traversal becomes a direct lookup.
The key use cases:
- Avoiding full table scans when following foreign key relationships
- Cross-table indexing including relationships through nested tables inside records
- Reactive formulas that efficiently recompute when edges change
- Many-to-many relationships that resolve in constant time per edge
Core Concepts
Graph indexing has three parts:
assoc-- Declares a named association between two record types (the "from" and "to" types)join-- Connects an edge table to an association, defining how edge records map to from/to IDstraverse-- A query operator that follows an association from a set of source records to destination records
That's it. Three concepts, and they compose nicely.
Basic Example: People and Tags
record Person {
public int id;
public string name;
}
record Tag {
public int id;
public string tag_name;
}
record PersonTag {
int person_id;
int tag_id;
}
table<Person> _people;
table<Tag> _tags;
table<PersonTag> _edges;
// 1. Declare the association: Person -> Tag
assoc<Person, Tag> tagged;
// 2. Join: the _edges table provides edges, mapping person_id -> tag_id
join tagged via _edges[e] from e.person_id to e.tag_id;
// 3. Traverse: get all tags reachable from all people
public formula all_tags = iterate _people traverse tagged;
When you write iterate _people traverse tagged, the runtime:
- Collects the IDs of all Person records from the iterate
- Looks up precomputed edges in the
taggedgraph - Returns the matching Tag records directly -- no scanning
The graph updates incrementally. Adding or removing a PersonTag edge record automatically updates the index. The all_tags formula reactively recomputes only the affected edges.
Filtered Traversal
You can filter the source set before traversing:
record Person {
public int id;
public string name;
public int group;
}
record Tag {
public int id;
public string val;
}
record Edge {
int from_id;
int to_id;
}
table<Person> _people;
table<Tag> _tags;
table<Edge> _edges;
assoc<Person, Tag> link;
join link via _edges[e] from e.from_id to e.to_id;
// Only tags reachable from group 1
public formula group1_tags = iterate _people where group == 1 traverse link;
// Only tags reachable from group 2
public formula group2_tags = iterate _people where group == 2 traverse link;
// All tags reachable from any person
public formula all_tags = iterate _people traverse link;
Each formula independently follows the graph from its filtered source set. The underlying graph index is shared -- you pay for it once.
Dynamic Edges
Edges can be added and removed at runtime. The graph index updates automatically:
record Person {
public int id;
public string name;
}
record Tag {
public int id;
public string tag_name;
}
record PersonTag {
int person_id;
int tag_id;
}
table<Person> _people;
table<Tag> _tags;
table<PersonTag> _edges;
assoc<Person, Tag> tagged;
join tagged via _edges[e] from e.person_id to e.tag_id;
public formula all_tags = iterate _people traverse tagged;
message AddEdge {
int person;
int tag;
}
message RemoveEdge {
int edge_id;
}
channel add_edge(AddEdge msg) {
_edges <- {person_id: msg.person, tag_id: msg.tag};
// all_tags automatically reflects the new edge
}
channel remove_edge(RemoveEdge msg) {
(iterate _edges where id == msg.edge_id).delete();
// all_tags automatically drops the removed edge
}
No manual cache invalidation needed. The differential edge tracker detects which edge records changed and incrementally updates the graph. This is the reactive model doing its thing.
Nested Table Joins (Cross-Table Indexing)
This is where it gets really interesting. You can join through nested tables inside records. This solves the problem of efficiently finding which top-level records contain specific nested data -- without scanning every top-level record and its nested table.
record User {
public int id;
public string name;
public principal account;
}
table<User> _users;
record Member {
int user_id;
}
// Association: User -> Group
assoc<User, Group> _users_to_groups;
record Group {
public int id;
public string name;
// Each group has a nested table of members
table<Member> _members;
// Join through the NESTED table
join _users_to_groups via _members[x] from x.user_id to id;
}
table<Group> _groups;
// For the current viewer, find their user record, then traverse to their groups
bubble my_groups = iterate _users where account == @who traverse _users_to_groups;
Each Group record contains a nested _members table. The join inside the Group record connects each member's user_id to the Group's own id. This builds a graph index across all groups' membership tables simultaneously.
Without graph indexing, finding a user's groups would require iterating every group and scanning its members table -- O(groups * members). With the graph index, it's a direct lookup. For applications with hundreds of groups, each with hundreds of members, the difference is night and day.
Conditional Edges with @maybe
Sometimes an edge should only be active under certain conditions. Use @maybe to make from/to expressions optional:
record Source {}
record Target {}
assoc<Source, Target> link;
record Edge {
bool enabled;
int from;
int to;
}
table<Edge> edges;
// Edge only counts when enabled is true
join link via edges[x]
from (x.enabled ? @maybe(x.from) : @maybe<int>)
to (x.enabled ? @maybe(x.to) : @maybe<int>);
When enabled is false, the expression returns an empty maybe and the edge is excluded from the graph. When enabled becomes true, the edge is added. This lets you implement soft-delete, feature flags, or approval workflows on edges without actually removing the edge records.
You can also make just one side conditional:
// Only the "to" side is conditional
join link via edges[x] from x.from to (x.enabled ? @maybe(x.to) : @maybe<int>);
// Only the "from" side is conditional
join link via edges[x] from (x.enabled ? @maybe(x.from) : @maybe<int>) to x.to;
Multiple Joins Per Association
An association can have multiple join statements. Each join adds edges from a different source table:
record User {}
record Resource {}
assoc<User, Resource> access;
record DirectAccess {
int user_id;
int resource_id;
}
record GroupAccess {
int user_id;
int resource_id;
}
table<DirectAccess> _direct;
table<GroupAccess> _group;
// Both tables contribute edges to the same association
join access via _direct[d] from d.user_id to d.resource_id;
join access via _group[g] from g.user_id to g.resource_id;
The runtime uses reference counting internally, so duplicate edges (same from/to pair from multiple sources) are handled correctly -- the edge exists as long as at least one join contributes it.
Association with Edge Type
You can declare an association with an optional edge type for documentation purposes:
record User {}
record Group {}
record Membership {}
// The third type parameter documents the edge record type
assoc<User, Group, Membership> member_of;
How It Works Internally
Understanding the internals helps you reason about performance:
assoccreates anRxAssocGraph-- a reactive data structure mapping from-IDs to sets of to-IDs with reference countingjoincreates aDifferentialEdgeTrackerthat monitors the edge table for changes. When an edge record is inserted, updated, or deleted, only that record is reprocessedtraversecallsRxAssocGraph.map()which collects destination IDs from the precomputed graph and resolves them to actual records from the registered target table- Graph recomputation happens during the document's commit cycle via
__computeGraphs(), after all mutations in the transaction have been applied
Performance characteristics:
- Edge insertion/deletion: O(1) per edge (hash map operations)
- Traversal: O(source records + result records) -- proportional to input/output, not total edges
- Memory: O(total edges) for the graph index
- Updates are differential -- only changed edge records are reprocessed each commit cycle
The good news is that the constant factors are small. The bad news is that the memory cost is proportional to total edges, so if you have millions of edges, that's real memory. Worth it for the query speed, but be aware.
Practical Patterns
User-Group Membership
record User {
public int id;
public string name;
public principal who;
}
record Group {
public int id;
public string name;
}
record Membership {
int user_id;
int group_id;
}
table<User> _users;
table<Group> _groups;
table<Membership> _memberships;
assoc<User, Group> in_group;
join in_group via _memberships[m] from m.user_id to m.group_id;
// Current user's groups
bubble my_groups = iterate _users where who == @who traverse in_group;
Content Tagging
record Article {
public int id;
public string title;
public string body;
public int category;
}
record Tag {
public int id;
public string label;
}
record ArticleTag {
int article_id;
int tag_id;
}
table<Article> _articles;
table<Tag> _tags;
table<ArticleTag> _tagging;
assoc<Article, Tag> has_tag;
join has_tag via _tagging[at] from at.article_id to at.tag_id;
// All tags used by articles in category 1
public formula category1_tags = iterate _articles where category == 1 traverse has_tag;
Bidirectional Relationships
For bidirectional traversal, define two associations -- one for each direction:
record Person {
public int id;
public string name;
}
record Knows {
int person_a;
int person_b;
}
table<Person> _people;
table<Knows> _relationships;
// Forward: person_a -> person_b
assoc<Person, Person> knows;
join knows via _relationships[r] from r.person_a to r.person_b;
// Reverse: person_b -> person_a
assoc<Person, Person> known_by;
join known_by via _relationships[r] from r.person_b to r.person_a;
Manual Relationship Patterns
For simpler cases where graph indexing is overkill -- small tables, infrequent queries -- you can use manual foreign key lookups. These scan tables on every evaluation, but for small data sets that's perfectly fine.
Foreign Key Lookup
record Author {
public int id;
public string name;
index id;
}
record Book {
public int id;
public string title;
public int authorId;
index authorId;
}
table<Author> authors;
table<Book> books;
// Find books by author — uses field index, scans matching records
public formula booksByAuthor1 = iterate books where authorId == 1;
Field-level index declarations make equality lookups fast for individual queries, but each query still evaluates independently. Graph indexing precomputes the full relationship map.
When to Use Manual vs Graph Indexing
| Scenario | Recommendation |
|---|---|
| Small tables (< 100 records) | Manual lookups with index are fine |
| Large junction tables | Graph indexing avoids repeated scans |
| Nested table relationships | Graph indexing is the only efficient option |
| Reactive formulas over relationships | Graph indexing for incremental updates |
| One-off procedural lookups | Manual lookups are simpler |
| Multiple queries over same relationship | Graph indexing amortizes the cost |
At core, graph indexing is about paying an upfront cost (memory for the index, incremental updates on writes) to make reads fast. If your reads are frequent and your relationships are large, it's a clear win. If your tables are small and your queries are rare, a plain iterate ... where ... is simpler and good enough.