Schema inference for code graphs: the missing piece of AI code understanding
Grafema can tell you that ResourcesService.jsx calls POST /api/v2/resources and that the handler for that call is ResourcesController.create in resourcesController.js. That’s useful. It’s also incomplete.
What does ResourcesController.create return? What shape does the request body need to be? If the frontend passes { name, kind, options } and the backend expects { name, type, config }, where’s the mismatch?
Graph of calls without data schemas = knowing who sends the letter but not what’s in it.
This post is about the next layer: schema inference. Where we are, what we’ve tried, and what’s hard about it.
The 30% problem
An AI agent using Grafema today can answer structural questions: which files, which functions, which calls connect to which handlers. This is roughly 30% of the context needed for a complete code change.
The other 70% is semantic: what data is flowing, what transformations happen to it, what contracts exist between caller and callee. Without this, the agent knows the architecture but not the substance. It can find the right handler function, but it can’t verify that it’s passing the right shape of data to it.
In practice, agents compensate by reading the handler function in detail and inferring the expected schema from the code. This works when the code is well-typed and clearly written. It fails when types are any, when the code is dynamically typed, or when the schema lives somewhere other than the handler itself.
Schema inference is about making that implicit information explicit in the graph.
Three sources of schema information in a codebase
Most codebases have three different places where the “shape of the data” is written down. They’re almost never connected to each other.
TypeScript interfaces and types
// types/resources.ts
interface Resource {
id: string;
name: string;
kind: 'database' | 'api' | 'storage';
config: Record<string, unknown>;
createdAt: Date;
}
interface CreateResourceRequest {
name: string;
kind: Resource['kind'];
config: Record<string, unknown>;
}
TypeScript interfaces are the most common form of schema documentation. They’re static, readable, and when kept up to date, highly accurate. The problem: they’re often not kept up to date, and the connection between an interface and the code that uses it is implicit (via TypeScript’s structural typing).
Zod/yup/joi validators
// validators/resources.ts
const createResourceSchema = z.object({
name: z.string().min(1).max(255),
kind: z.enum(['database', 'api', 'storage']),
config: z.record(z.unknown()).default({})
});
Runtime validators are often more accurate than TypeScript types because they’re enforced at runtime — if the data doesn’t match, the request fails. They also encode constraints (min length, max value, required vs. optional) that interfaces can’t express.
The problem: validators are scattered. They might live next to the route handler, in a separate validation file, or be imported from a shared package. Connecting them to the routes they validate requires following import chains and call graphs.
ORM models
// models/Resource.ts (TypeORM)
@Entity('resources')
export class Resource {
@PrimaryGeneratedColumn('uuid')
id: string;
@Column({ length: 255, nullable: false })
name: string;
@Column({ type: 'enum', enum: ResourceKind })
kind: ResourceKind;
@Column({ type: 'jsonb', default: {} })
config: object;
}
ORM models are often the ground truth for what can actually be stored. They encode database-level constraints that neither TypeScript interfaces nor validators necessarily reflect. But they live in a different layer of the stack — often in a separate package, often with a different naming convention.
All three sources exist in most production codebases. None of the major code analysis tools connect them today.
The inference challenge
Even when all three sources exist, connecting them is hard.
TypeScript types are often imprecise. any, unknown, Record<string, unknown>, object — these appear constantly in production code that “works” but doesn’t fully type its data. An agent reading the types gets a partial picture at best.
Validators aren’t always co-located with routes. A validator defined in src/validators/resources.ts and used in src/routes/resources.ts requires following the import chain to understand that this validator applies to this route. Import chains can be deep and indirect.
ORM models are in a different service. In a microservices architecture, the ORM models might live in a data-service package that’s imported by multiple backend services. Connecting a frontend API request to the ORM model it eventually writes to requires tracing through three service boundaries.
There’s also the question of which source to trust when they disagree. A TypeScript interface says a field is string | null. The Zod validator says it’s a required string. The ORM column is nullable. Three different answers. Which one reflects runtime reality?
Grafema’s approach: schema nodes in the graph
We’re prototyping a new node type: schema:type. It represents the data schema associated with an endpoint or function boundary.
The idea is to make schemas first-class graph nodes, connected to the rest of the graph by edges:
http:route→[ACCEPTS_SCHEMA]→schema:type(the request body schema)http:route→[RETURNS_SCHEMA]→schema:type(the response schema)schema:type→[DEFINED_BY]→ TypeScript interface node or Zod validator nodeschema:type→[MAPS_TO]→ ORM model node
This lets an agent ask: “what is the request body schema for POST /api/v2/resources?” and get an answer from the graph rather than from reading four files.
A Datalog query:
npx @grafema/cli query --raw '
type(Route, "http:route"),
attr(Route, "fullPath", "/api/v2/resources"),
attr(Route, "method", "POST"),
edge(Route, Schema, "ACCEPTS_SCHEMA"),
edge(Schema, Source, "DEFINED_BY")
'
Result: the schema node, plus a pointer to wherever the schema is defined — a TypeScript interface, a Zod validator, or an inferred shape from the handler’s parameters.
Prototype results on ToolJet
We ran an early prototype of schema inference on ToolJet. Results are preliminary and messy, but worth sharing.
TypeScript interface detection: We found and linked TypeScript interfaces to route handlers in 71% of cases where an interface existed. The remaining 29% failed because the interface was imported through a barrel file chain (three levels of re-export), and our import resolver stopped at the first barrel.
Zod validator detection: Harder. ToolJet doesn’t use Zod consistently — some routes use Joi, some use class-validator with NestJS-style decorators, some use ad-hoc manual validation. Coverage was around 45% on routes that had any validator at all. Routes with no validator (there are several) got no schema node.
ORM model connection: We didn’t attempt this in the prototype. TypeORM entities are in a different package from the route handlers, and our cross-package analysis isn’t robust enough yet.
The headline number: 61% of ToolJet’s API routes got a schema:type node with at least one source of schema information. That’s enough to be useful for a large fraction of tasks. It’s not enough to rely on for correctness.
Why this matters more than it sounds
Consider an AI agent asked to “add a description field to data sources.” To do this correctly, it needs to:
- Find the backend route that creates data sources
- Understand the current request body schema
- Add
descriptionto the TypeScript interface - Add
descriptionto the Zod/Joi validator - Add
descriptionto the ORM model migration - Update the frontend component’s form
Without schema inference, the agent reads five or six files to reconstruct this mental model. With schema nodes in the graph, steps 1–3 are a query. The agent spends its context window on the actual work.
The broader point: AI agents writing code that calls APIs are currently forced to guess or infer the shape of data. Guesses produce type mismatches that fail at runtime. Schema inference reduces guesses.
Current status
Schema inference is in active development. It’s not in the public CLI yet.
We’re actively working on this in a feature branch. Current focus:
- Better barrel file resolution for TypeScript interface detection
- A Zod-specific analyzer that understands
.object(),.string(),.enum()patterns - The ORM connection layer for TypeORM (Prisma is easier — the schema is a single file)
We expect a beta in a future release. If you want to test early or have specific schema patterns you need covered, open an issue on GitHub.
Cross-boundary tracing (the prerequisite) is in the current CLI: /docs/cross-service-tracing. Schema inference will be announced on the blog when it ships.