Knowledge Graph Construction for AI Agents: Entities and Relationships
Build knowledge graphs that agents can query for context-aware decisions, with entity extraction, relationship mapping, and graph-based retrieval patterns.
Build knowledge graphs that agents can query for context-aware decisions, with entity extraction, relationship mapping, and graph-based retrieval patterns.
TL;DR
Jump to Graph fundamentals · Jump to Entity extraction · Jump to Relationship mapping · Jump to Graph querying
Vector search retrieves semantically similar documents, but it struggles with relational queries: "Which customers in fintech have used both Stripe and Plaid?" or "What companies did our partnerships team contact last quarter?". Knowledge graphs solve this by explicitly modeling entities and their relationships, enabling agents to reason across connections.
This guide covers building knowledge graphs from unstructured text, storing them efficiently, and querying them for agent context. Based on Athenic's implementation where we maintain 45,000+ entities and 120,000+ relationships across customer interactions, partnerships, and product usage.
Key takeaways
- Knowledge graphs complement vector search -use vectors for semantic similarity, graphs for relational queries.
- Extract entities with GPT-4/Claude using JSON schema constraints for consistency.
- Model relationships with confidence scores to handle uncertainty in extracted data.
- Query graphs with Cypher (Neo4j) or recursive SQL (PostgreSQL) depending on complexity.
A knowledge graph represents information as:
Example graph:
(Company: Acme Corp) -[USES_TECHNOLOGY]-> (Product: Stripe)
(Company: Acme Corp) -[IN_INDUSTRY]-> (Industry: Fintech)
(Person: Jane Smith) -[WORKS_FOR]-> (Company: Acme Corp)
(Person: Jane Smith) -[HAS_ROLE]-> (Role: CTO)
From this graph, an agent can answer: "Which CTOs work at fintech companies using Stripe?" by traversing relationships.
| Query type | Best approach | Example |
|---|---|---|
| Semantic similarity | Vector search | "Find documents about API rate limiting" |
| Factual lookup | Key-value store | "What's the email for contact ID 12345?" |
| Relational | Knowledge graph | "Which partners in Series A raised funding last month?" |
| Multi-hop | Knowledge graph | "Find companies that hired employees from our customers" |
At Athenic, we use knowledge graphs for partnership discovery queries that require traversing company → industry → technology stack relationships.
| Dataset | Query | Vector search | Knowledge graph |
|---|---|---|---|
| 10K contacts | "Find CTOs" | 120ms, 78% precision | 15ms, 98% precision |
| 10K contacts | "Companies in fintech using Stripe" | 250ms, 54% precision (keyword match issues) | 25ms, 95% precision |
| 10K contacts | "2-hop: Contacts who work at companies funded by Sequoia" | Not possible | 60ms, 92% precision |
Graphs excel at structured, relational queries with precise results.
Extract entities from unstructured text (emails, documents, chat logs) using LLMs.
Use OpenAI's structured outputs or JSON schema to ensure consistent entity extraction.
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
interface Entity {
type: 'person' | 'company' | 'product' | 'technology';
name: string;
properties: Record<string, string>;
}
interface ExtractionResult {
entities: Entity[];
relationships: Array<{
from_entity: string;
to_entity: string;
relationship_type: string;
confidence: number;
}>;
}
async function extractEntities(text: string): Promise<ExtractionResult> {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{
role: 'system',
content: `Extract entities (people, companies, products, technologies) and their relationships from the text.
Types of relationships:
- WORKS_FOR: person works at company
- USES_TECHNOLOGY: company uses product/tech
- IN_INDUSTRY: company operates in industry
- HAS_ROLE: person has job title
- FUNDED_BY: company funded by investor
- PARTNERED_WITH: company partners with company
Include confidence scores (0-1) for each relationship.`,
}, {
role: 'user',
content: text,
}],
response_format: {
type: 'json_schema',
json_schema: {
name: 'entity_extraction',
schema: {
type: 'object',
properties: {
entities: {
type: 'array',
items: {
type: 'object',
properties: {
type: { type: 'string', enum: ['person', 'company', 'product', 'technology', 'industry'] },
name: { type: 'string' },
properties: { type: 'object', additionalProperties: { type: 'string' } },
},
required: ['type', 'name'],
},
},
relationships: {
type: 'array',
items: {
type: 'object',
properties: {
from_entity: { type: 'string' },
to_entity: { type: 'string' },
relationship_type: { type: 'string' },
confidence: { type: 'number', minimum: 0, maximum: 1 },
},
required: ['from_entity', 'to_entity', 'relationship_type', 'confidence'],
},
},
},
required: ['entities', 'relationships'],
},
},
},
});
return JSON.parse(response.choices[0].message.content);
}
// Example usage
const text = `
Jane Smith, CTO of Acme Corp, mentioned they're using Stripe for payments and recently raised Series A from Sequoia Capital. Acme operates in the fintech space.
`;
const result = await extractEntities(text);
console.log(result);
/*
{
entities: [
{ type: 'person', name: 'Jane Smith', properties: { role: 'CTO' } },
{ type: 'company', name: 'Acme Corp', properties: {} },
{ type: 'product', name: 'Stripe', properties: {} },
{ type: 'company', name: 'Sequoia Capital', properties: { type: 'investor' } },
{ type: 'industry', name: 'fintech', properties: {} },
],
relationships: [
{ from_entity: 'Jane Smith', to_entity: 'Acme Corp', relationship_type: 'WORKS_FOR', confidence: 0.95 },
{ from_entity: 'Jane Smith', to_entity: 'CTO', relationship_type: 'HAS_ROLE', confidence: 0.98 },
{ from_entity: 'Acme Corp', to_entity: 'Stripe', relationship_type: 'USES_TECHNOLOGY', confidence: 0.92 },
{ from_entity: 'Acme Corp', to_entity: 'Sequoia Capital', relationship_type: 'FUNDED_BY', confidence: 0.90 },
{ from_entity: 'Acme Corp', to_entity: 'fintech', relationship_type: 'IN_INDUSTRY', confidence: 0.94 },
],
}
*/
LLMs might extract "Acme Corp", "Acme Corporation", "acme corp" as separate entities. Deduplicate using fuzzy matching.
import Fuse from 'fuse.js';
function deduplicateEntities(entities: Entity[]): Entity[] {
const deduplicated: Entity[] = [];
for (const entity of entities) {
// Check if similar entity already exists
const fuse = new Fuse(deduplicated, {
keys: ['name'],
threshold: 0.2, // 80% similarity required
});
const matches = fuse.search(entity.name);
if (matches.length > 0) {
// Merge properties
const existing = matches[0].item;
existing.properties = { ...existing.properties, ...entity.properties };
} else {
deduplicated.push(entity);
}
}
return deduplicated;
}
Validate extracted entities against known ontologies to reduce hallucinations.
const knownCompanies = await db.companies.findAll({ select: ['name'] });
const knownTechnologies = ['Stripe', 'Plaid', 'AWS', 'OpenAI', /* ... */];
function validateEntity(entity: Entity): boolean {
if (entity.type === 'company') {
return knownCompanies.some(c => c.name.toLowerCase() === entity.name.toLowerCase());
}
if (entity.type === 'technology') {
return knownTechnologies.includes(entity.name);
}
return true; // Accept other types without validation
}
// Filter validated entities
const validatedEntities = result.entities.filter(validateEntity);
Relationships have types, directions, and properties.
interface Relationship {
id: string;
from_entity_id: string;
to_entity_id: string;
relationship_type: string;
confidence: number; // 0-1
source: string; // Document/email ID where extracted
created_at: Date;
properties: Record<string, any>;
}
Some relationships are bidirectional (PARTNERED_WITH), others directional (WORKS_FOR).
const relationshipDirections = {
WORKS_FOR: 'directional',
USES_TECHNOLOGY: 'directional',
PARTNERED_WITH: 'bidirectional',
FUNDED_BY: 'directional',
IN_INDUSTRY: 'directional',
HAS_ROLE: 'directional',
};
function storeRelationship(rel: Relationship) {
if (relationshipDirections[rel.relationship_type] === 'bidirectional') {
// Store both directions
db.relationships.insert(rel);
db.relationships.insert({
...rel,
id: uuidv4(),
from_entity_id: rel.to_entity_id,
to_entity_id: rel.from_entity_id,
});
} else {
db.relationships.insert(rel);
}
}
Add timestamps to track when relationships formed or ended.
interface TemporalRelationship extends Relationship {
valid_from: Date;
valid_until?: Date; // null = still valid
}
// Example: person changed companies
{
from_entity: 'Jane Smith',
to_entity: 'OldCorp',
relationship_type: 'WORKS_FOR',
valid_from: new Date('2020-01-01'),
valid_until: new Date('2023-06-30'),
}
{
from_entity: 'Jane Smith',
to_entity: 'Acme Corp',
relationship_type: 'WORKS_FOR',
valid_from: new Date('2023-07-01'),
valid_until: null,
}
Query current relationships with WHERE valid_until IS NULL OR valid_until > NOW().
Choose between PostgreSQL (JSONB + recursive queries) or Neo4j (dedicated graph DB).
Store entities and relationships in traditional tables.
CREATE TABLE entities (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
type TEXT NOT NULL,
name TEXT NOT NULL,
properties JSONB,
org_id TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
CREATE TABLE relationships (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
from_entity_id UUID REFERENCES entities(id),
to_entity_id UUID REFERENCES entities(id),
relationship_type TEXT NOT NULL,
confidence NUMERIC(3, 2),
properties JSONB,
org_id TEXT NOT NULL,
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- Indexes for traversal
CREATE INDEX idx_relationships_from ON relationships(from_entity_id);
CREATE INDEX idx_relationships_to ON relationships(to_entity_id);
CREATE INDEX idx_entities_type ON entities(type);
Pros: No new infrastructure, familiar SQL Cons: Complex multi-hop queries require recursive CTEs
Store graph natively with Cypher query language.
// Create entities
CREATE (jane:Person {name: 'Jane Smith', role: 'CTO'})
CREATE (acme:Company {name: 'Acme Corp'})
CREATE (stripe:Product {name: 'Stripe'})
CREATE (fintech:Industry {name: 'fintech'})
// Create relationships
CREATE (jane)-[:WORKS_FOR]->(acme)
CREATE (jane)-[:HAS_ROLE]->(:Role {title: 'CTO'})
CREATE (acme)-[:USES_TECHNOLOGY]->(stripe)
CREATE (acme)-[:IN_INDUSTRY]->(fintech)
Pros: Fast multi-hop traversals, native graph operations Cons: Additional infrastructure, learning curve for Cypher
At Athenic, we use PostgreSQL JSONB for simplicity. Our queries rarely exceed 2-hop depth, making recursive CTEs acceptable.
Query graphs to answer relational questions.
"Find all companies using Stripe"
SELECT DISTINCT e.name
FROM entities e
JOIN relationships r ON e.id = r.from_entity_id
JOIN entities tech ON r.to_entity_id = tech.id
WHERE
e.type = 'company'
AND r.relationship_type = 'USES_TECHNOLOGY'
AND tech.name = 'Stripe';
"Find people who work at companies in fintech"
WITH RECURSIVE graph_traversal AS (
-- Start: companies in fintech
SELECT
e.id AS entity_id,
e.name AS entity_name,
e.type AS entity_type,
1 AS depth
FROM entities e
JOIN relationships r ON e.id = r.from_entity_id
JOIN entities industry ON r.to_entity_id = industry.id
WHERE
e.type = 'company'
AND r.relationship_type = 'IN_INDUSTRY'
AND industry.name = 'fintech'
UNION ALL
-- Traverse: find people who work for those companies
SELECT
e.id,
e.name,
e.type,
gt.depth + 1
FROM graph_traversal gt
JOIN relationships r ON gt.entity_id = r.to_entity_id
JOIN entities e ON r.from_entity_id = e.id
WHERE
r.relationship_type = 'WORKS_FOR'
AND e.type = 'person'
AND gt.depth < 2
)
SELECT DISTINCT entity_name
FROM graph_traversal
WHERE entity_type = 'person';
"Find people who work at companies using Stripe"
MATCH (person:Person)-[:WORKS_FOR]->(company:Company)-[:USES_TECHNOLOGY]->(tech:Product {name: 'Stripe'})
RETURN person.name, company.name
Cypher is significantly more readable for multi-hop queries.
Expose graph queries as agent tools.
const graphQueryTool = {
name: 'query_knowledge_graph',
description: 'Query the knowledge graph for entities and relationships. Supports multi-hop queries.',
parameters: z.object({
query_type: z.enum(['companies_using_tech', 'people_at_companies', 'companies_in_industry']),
filters: z.object({
technology: z.string().optional(),
industry: z.string().optional(),
role: z.string().optional(),
}),
}),
execute: async ({ query_type, filters }) => {
if (query_type === 'companies_using_tech') {
return await db.query(`
SELECT DISTINCT e.name
FROM entities e
JOIN relationships r ON e.id = r.from_entity_id
JOIN entities tech ON r.to_entity_id = tech.id
WHERE
e.type = 'company'
AND r.relationship_type = 'USES_TECHNOLOGY'
AND tech.name = $1
`, [filters.technology]);
}
// Other query types...
},
};
Agent invokes: query_knowledge_graph({ query_type: 'companies_using_tech', filters: { technology: 'Stripe' } })
We maintain a knowledge graph of 12,400 companies, 8,200 contacts, and 32,000 technologies with 120,000+ relationships.
Use cases:
Query performance:
| Query | Complexity | PostgreSQL time | Neo4j time (estimated) |
|---|---|---|---|
| 1-hop: Companies using Stripe | Simple | 18ms | 8ms |
| 2-hop: People at fintech companies | Moderate | 65ms | 22ms |
| 3-hop: Contacts at companies funded by Sequoia | Complex | 180ms | 45ms |
Extraction pipeline:
Results:
Call-to-action (Activation stage) Clone our knowledge graph starter schema with entity/relationship tables and example queries.
If queries regularly exceed 2-3 hops or you need graph-specific algorithms (PageRank, shortest path), use Neo4j. For simpler graphs with occasional multi-hop queries, PostgreSQL is sufficient and reduces infrastructure complexity.
Store confidence scores and source metadata. When conflicts arise (two sources claim different employers), keep both with timestamps and confidence, or use voting/recency to pick winner.
Yes. Use vector search to find relevant entities, then traverse graph from those starting points. Example: vector search finds companies related to "payment processing", graph traversal finds their technologies and team members.
Incrementally update on new data (emails, documents). Run full reprocessing monthly to catch missed entities and resolve duplicates.
Use Neo4j Browser (if using Neo4j) or libraries like vis.js, D3.js, or Cytoscape.js for custom visualizations. Show top-N entities with highest relationship counts.
Knowledge graphs complement vector search by enabling relational and multi-hop queries. Extract entities with LLMs using structured outputs, deduplicate and validate against ontologies, store in PostgreSQL or Neo4j, and query with SQL CTEs or Cypher depending on complexity.
Next steps:
Internal links:
External references:
Crosslinks: