// blog·2026-04-15

Schema inference for code graphs: the missing piece of AI code understanding

Grafema can tell you that ResourcesService.jsx calls POST /api/v2/resources and that the handler for that call is ResourcesController.create in resourcesController.js. That’s useful. It’s also incomplete.

What does ResourcesController.create return? What shape does the request body need to be? If the frontend passes { name, kind, options } and the backend expects { name, type, config }, where’s the mismatch?

Graph of calls without data schemas = knowing who sends the letter but not what’s in it.

This post is about the next layer: schema inference. Where we are, what we’ve tried, and what’s hard about it.

The 30% problem

An AI agent using Grafema today can answer structural questions: which files, which functions, which calls connect to which handlers. This is roughly 30% of the context needed for a complete code change.

The other 70% is semantic: what data is flowing, what transformations happen to it, what contracts exist between caller and callee. Without this, the agent knows the architecture but not the substance. It can find the right handler function, but it can’t verify that it’s passing the right shape of data to it.

In practice, agents compensate by reading the handler function in detail and inferring the expected schema from the code. This works when the code is well-typed and clearly written. It fails when types are any, when the code is dynamically typed, or when the schema lives somewhere other than the handler itself.

Schema inference is about making that implicit information explicit in the graph.

Three sources of schema information in a codebase

Most codebases have three different places where the “shape of the data” is written down. They’re almost never connected to each other.

TypeScript interfaces and types

// types/resources.ts
interface Resource {
  id: string;
  name: string;
  kind: 'database' | 'api' | 'storage';
  config: Record<string, unknown>;
  createdAt: Date;
}

interface CreateResourceRequest {
  name: string;
  kind: Resource['kind'];
  config: Record<string, unknown>;
}

TypeScript interfaces are the most common form of schema documentation. They’re static, readable, and when kept up to date, highly accurate. The problem: they’re often not kept up to date, and the connection between an interface and the code that uses it is implicit (via TypeScript’s structural typing).

Zod/yup/joi validators

// validators/resources.ts
const createResourceSchema = z.object({
  name: z.string().min(1).max(255),
  kind: z.enum(['database', 'api', 'storage']),
  config: z.record(z.unknown()).default({})
});

Runtime validators are often more accurate than TypeScript types because they’re enforced at runtime — if the data doesn’t match, the request fails. They also encode constraints (min length, max value, required vs. optional) that interfaces can’t express.

The problem: validators are scattered. They might live next to the route handler, in a separate validation file, or be imported from a shared package. Connecting them to the routes they validate requires following import chains and call graphs.

ORM models

// models/Resource.ts (TypeORM)
@Entity('resources')
export class Resource {
  @PrimaryGeneratedColumn('uuid')
  id: string;

  @Column({ length: 255, nullable: false })
  name: string;

  @Column({ type: 'enum', enum: ResourceKind })
  kind: ResourceKind;

  @Column({ type: 'jsonb', default: {} })
  config: object;
}

ORM models are often the ground truth for what can actually be stored. They encode database-level constraints that neither TypeScript interfaces nor validators necessarily reflect. But they live in a different layer of the stack — often in a separate package, often with a different naming convention.

All three sources exist in most production codebases. None of the major code analysis tools connect them today.

The inference challenge

Even when all three sources exist, connecting them is hard.

TypeScript types are often imprecise. any, unknown, Record<string, unknown>, object — these appear constantly in production code that “works” but doesn’t fully type its data. An agent reading the types gets a partial picture at best.

Validators aren’t always co-located with routes. A validator defined in src/validators/resources.ts and used in src/routes/resources.ts requires following the import chain to understand that this validator applies to this route. Import chains can be deep and indirect.

ORM models are in a different service. In a microservices architecture, the ORM models might live in a data-service package that’s imported by multiple backend services. Connecting a frontend API request to the ORM model it eventually writes to requires tracing through three service boundaries.

There’s also the question of which source to trust when they disagree. A TypeScript interface says a field is string | null. The Zod validator says it’s a required string. The ORM column is nullable. Three different answers. Which one reflects runtime reality?

Grafema’s approach: schema nodes in the graph

We’re prototyping a new node type: schema:type. It represents the data schema associated with an endpoint or function boundary.

The idea is to make schemas first-class graph nodes, connected to the rest of the graph by edges:

http:route → [ACCEPTS_SCHEMA] → schema:type (the request body schema)
http:route → [RETURNS_SCHEMA] → schema:type (the response schema)
schema:type → [DEFINED_BY] → TypeScript interface node or Zod validator node
schema:type → [MAPS_TO] → ORM model node

This lets an agent ask: “what is the request body schema for POST /api/v2/resources?” and get an answer from the graph rather than from reading four files.

A Datalog query:

# prototype syntax — schema nodes are not in the public CLI yet
grafema query --raw 'violation(Source) :-
  node(Route, "http:route"),
  attr(Route, "fullPath", "/api/v2/resources"), attr(Route, "method", "POST"),
  edge(Route, Schema, "ACCEPTS_SCHEMA"),
  edge(Schema, Source, "DEFINED_BY").'

Result: the schema node, plus a pointer to wherever the schema is defined — a TypeScript interface, a Zod validator, or an inferred shape from the handler’s parameters.

Why this matters more than it sounds

Consider an AI agent asked to “add a description field to data sources.” To do this correctly, it needs to:

Find the backend route that creates data sources
Understand the current request body schema
Add description to the TypeScript interface
Add description to the Zod/Joi validator
Add description to the ORM model migration
Update the frontend component’s form

Without schema inference, the agent reads five or six files to reconstruct this mental model. With schema nodes in the graph, steps 1–3 are a query. The agent spends its context window on the actual work.

The broader point: AI agents writing code that calls APIs are currently forced to guess or infer the shape of data. Guesses produce type mismatches that fail at runtime. Schema inference reduces guesses.

Current status

Schema inference is in active development. It’s not in the public CLI yet.

We’re actively working on this in a feature branch. Current focus:

Better barrel file resolution for TypeScript interface detection
A Zod-specific analyzer that understands .object(), .string(), .enum() patterns
The ORM connection layer for TypeORM (Prisma is easier — the schema is a single file)

We expect a beta in a future release. If you want to test early or have specific schema patterns you need covered, open an issue on GitHub.

Cross-boundary tracing (the prerequisite) is in the current CLI: /docs/cross-service-tracing. Schema inference will be announced on the blog when it ships.