TL;DR
- Function calling lets LLMs execute actions beyond text generation (API calls, database queries, tool use).
- Works by: Define tools in JSON schema → LLM decides which to call → You execute → Return result to LLM.
- OpenAI: Best-in-class function calling, parallel calls supported, strict schema validation.
- Anthropic Claude: "Tool use" pattern, excellent reliability, supports parallel tools.
- Open-source: Llama 3.1+, Mistral support function calling but lower reliability (60-75% vs 95%+).
- Critical: Validate all LLM tool calls before execution (security), handle errors gracefully, use retry logic.
- Cost: Function calling adds 15-30% token overhead but unlocks 10× more agent capabilities.
Function Calling with LLMs: Complete Implementation Guide
Without function calling:
User: "What's the weather in London?"
LLM: "I don't have access to real-time weather data."
With function calling:
User: "What's the weather in London?"
LLM: [Calls get_weather(city="London")]
System: [Executes API call, returns: {"temp": 15, "condition": "Cloudy"}]
LLM: "It's currently 15°C and cloudy in London."
Function calling transforms LLMs from text generators into agents that interact with the real world.
What Is Function Calling?
Definition: LLM returns structured JSON describing which function to call with what parameters, instead of just generating text.
You provide:
- Tool definitions (function name, description, parameters)
- User query
LLM returns:
- Decision on which tool to call
- Parameters to pass
You execute:
- Run the actual function
- Return result to LLM
- LLM incorporates result into final response
Example Flow
1. Define Tool:
{
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name, e.g. 'London'"
},
"units": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature units"
}
},
"required": ["city"]
}
}
2. User Query:
"What's the weather like in Paris today?"
3. LLM Response:
{
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"Paris\", \"units\": \"celsius\"}"
}
}]
}
4. Execute Function:
def get_weather(city, units="celsius"):
response = requests.get(f"https://api.weather.com/current?city={city}&units={units}")
return response.json()
result = get_weather("Paris", "celsius")
# {"temp": 18, "condition": "Partly cloudy", "humidity": 65}
5. Return to LLM:
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"temp\": 18, \"condition\": \"Partly cloudy\", \"humidity\": 65}"
}
6. Final LLM Response:
"The weather in Paris today is partly cloudy with a temperature of 18°C and 65% humidity."
OpenAI Function Calling
Model support: GPT-4, GPT-4 Turbo, GPT-3.5 Turbo (June 2023+)
Basic Implementation
from openai import OpenAI
client = OpenAI()
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "search_database",
"description": "Search customer database by name or email",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query (name or email)"
},
"limit": {
"type": "integer",
"description": "Max results to return",
"default": 10
}
},
"required": ["query"]
}
}
}
]
# Initial request
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[
{"role": "user", "content": "Find customer john.doe@example.com"}
],
tools=tools,
tool_choice="auto" # Let model decide
)
# Check if tool was called
message = response.choices[0].message
if message.tool_calls:
for tool_call in message.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Execute function
if function_name == "search_database":
result = search_database(**arguments)
# Send result back to LLM
messages = [
{"role": "user", "content": "Find customer john.doe@example.com"},
message, # Assistant's tool call
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
}
]
final_response = client.chat.completions.create(
model="gpt-4-turbo",
messages=messages
)
print(final_response.choices[0].message.content)
Parallel Function Calling
OpenAI supports calling multiple functions in one turn.
tools = [
{"type": "function", "function": {"name": "get_weather", ...}},
{"type": "function", "function": {"name": "get_news", ...}},
{"type": "function", "function": {"name": "get_stock_price", ...}}
]
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "What's the weather, top news, and AAPL stock price?"}],
tools=tools
)
# LLM might return 3 tool calls at once
message = response.choices[0].message
results = []
for tool_call in message.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Execute each function
if function_name == "get_weather":
result = get_weather(**arguments)
elif function_name == "get_news":
result = get_news(**arguments)
elif function_name == "get_stock_price":
result = get_stock_price(**arguments)
results.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
# Return all results together
final_response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[
{"role": "user", "content": "..."},
message,
*results # All tool results
]
)
Performance: Parallel calling reduces latency by 2-3× for multi-tool queries.
Strict Schema Validation
New feature (Oct 2024): Enforce exact schema compliance.
tools = [{
"type": "function",
"function": {
"name": "book_flight",
"strict": True, # Enforce schema
"parameters": {
"type": "object",
"properties": {
"origin": {"type": "string"},
"destination": {"type": "string"},
"date": {"type": "string", "pattern": "^\\d{4}-\\d{2}-\\d{2}$"}
},
"required": ["origin", "destination", "date"],
"additionalProperties": False
}
}
}]
Benefit: Guaranteed valid JSON, no parsing errors. Improves reliability from 98% to 99.9%+.
Anthropic Claude Tool Use
Model support: Claude 3 Opus, Sonnet, Haiku (all versions)
Implementation
import anthropic
client = anthropic.Anthropic()
# Define tools
tools = [
{
"name": "get_customer_info",
"description": "Retrieves customer information from database",
"input_schema": {
"type": "object",
"properties": {
"customer_id": {
"type": "string",
"description": "Unique customer ID"
}
},
"required": ["customer_id"]
}
}
]
# Initial request
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "Get info for customer C12345"}
]
)
# Check for tool use
for content in response.content:
if content.type == "tool_use":
tool_name = content.name
tool_input = content.input
tool_use_id = content.id
# Execute tool
if tool_name == "get_customer_info":
result = get_customer_info(**tool_input)
# Return result to Claude
follow_up = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
tools=tools,
messages=[
{"role": "user", "content": "Get info for customer C12345"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": json.dumps(result)
}
]
}
]
)
print(follow_up.content[0].text)
Multi-Tool Support
Claude can also call multiple tools in one response:
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=2048,
tools=[
{"name": "get_weather", ...},
{"name": "search_flights", ...}
],
messages=[
{"role": "user", "content": "What's the weather in Tokyo and are there flights available tomorrow?"}
]
)
# Response may contain multiple tool_use blocks
tool_results = []
for content in response.content:
if content.type == "tool_use":
result = execute_tool(content.name, content.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": content.id,
"content": json.dumps(result)
})
Open-Source Model Function Calling
Supported models:
- Llama 3.1 (8B, 70B, 405B)
- Mistral Large, Mistral Medium
- Mixtral 8x7B (limited support)
Llama 3.1 Example
from transformers import AutoTokenizer, AutoModelForCausalLM
import json
model_id = "meta-llama/Meta-Llama-3.1-70B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# Define tools
tools = [
{
"name": "calculate",
"description": "Perform mathematical calculation",
"parameters": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression to evaluate"}
},
"required": ["expression"]
}
}
]
# Format prompt with tools
messages = [
{"role": "system", "content": f"You have access to these tools:\n{json.dumps(tools, indent=2)}"},
{"role": "user", "content": "What's 127 * 89?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
# Parse tool call from response
# Llama 3.1 returns: <function_call>{"name": "calculate", "arguments": {"expression": "127 * 89"}}</function_call>
Reliability: Open-source models have 60-75% tool calling accuracy vs 95%+ for GPT-4/Claude.
When to use: Self-hosted requirements, cost sensitivity (inference is free after initial hosting cost).
Function Calling Patterns
Pattern 1: Single Tool, Simple Flow
Use case: One clear action (search, calculate, lookup).
def simple_tool_agent(user_query):
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": user_query}],
tools=[search_tool],
tool_choice="auto"
)
if response.choices[0].message.tool_calls:
tool_call = response.choices[0].message.tool_calls[0]
result = execute_tool(tool_call)
# Return result to LLM for natural language response
final = client.chat.completions.create(
model="gpt-4-turbo",
messages=[
{"role": "user", "content": user_query},
response.choices[0].message,
{"role": "tool", "tool_call_id": tool_call.id, "content": result}
]
)
return final.choices[0].message.content
return response.choices[0].message.content
Pattern 2: Multi-Step Workflow
Use case: Chain multiple tool calls (retrieve data → process → store).
def multi_step_agent(user_query, max_iterations=5):
messages = [{"role": "user", "content": user_query}]
for _ in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=messages,
tools=all_tools
)
message = response.choices[0].message
messages.append(message)
# If no tool calls, agent is done
if not message.tool_calls:
return message.content
# Execute all tool calls
for tool_call in message.tool_calls:
result = execute_tool(tool_call)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
return "Max iterations reached"
Example:
User: "Find top 3 customers by revenue and email them a thank you note"
Iteration 1: [Calls get_top_customers(limit=3)]
Iteration 2: [Calls send_email(to=..., subject=..., body=...)] × 3
Iteration 3: [Returns "Sent thank you emails to top 3 customers"]
Pattern 3: Conditional Tool Selection
Use case: Different tools based on context.
tools = [
{"name": "search_web", "description": "Search the internet"},
{"name": "search_database", "description": "Search internal database"},
{"name": "calculate", "description": "Perform calculations"}
]
# LLM automatically selects appropriate tool based on query
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": user_query}],
tools=tools,
tool_choice="auto" # Model decides which tool
)
Query: "What's our Q3 revenue?" → Calls search_database
Query: "What's the latest on AI regulation?" → Calls search_web
Query: "What's 15% of $2,400?" → Calls calculate
Error Handling & Security
Validation Before Execution
Critical: Never blindly execute tool calls. Validate first.
def execute_tool_safely(tool_call):
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# 1. Whitelist check
ALLOWED_FUNCTIONS = ["search_database", "get_weather", "calculate"]
if function_name not in ALLOWED_FUNCTIONS:
return {"error": "Unauthorized function"}
# 2. Parameter validation
if function_name == "search_database":
# Prevent SQL injection
if not isinstance(arguments.get("query"), str):
return {"error": "Invalid query parameter"}
# Limit query length
if len(arguments["query"]) > 200:
return {"error": "Query too long"}
# 3. Rate limiting
if is_rate_limited(function_name):
return {"error": "Rate limit exceeded"}
# 4. Execute with timeout
try:
result = timeout_execute(globals()[function_name], arguments, timeout=10)
return result
except TimeoutError:
return {"error": "Function execution timeout"}
except Exception as e:
return {"error": f"Execution failed: {str(e)}"}
Retry Logic
def call_with_retry(messages, tools, max_retries=3):
for attempt in range(max_retries):
try:
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=messages,
tools=tools,
timeout=30
)
# Validate response has expected structure
if response.choices[0].message:
return response
except json.JSONDecodeError as e:
if attempt == max_retries - 1:
raise
# Invalid JSON in tool arguments, retry
time.sleep(2 ** attempt) # Exponential backoff
except openai.APIError as e:
if attempt == max_retries - 1:
raise
time.sleep(2 ** attempt)
raise Exception("Max retries exceeded")
Cost Analysis
Token overhead: Function definitions added to every request.
Example:
User query: 50 tokens
Function definitions (3 tools): 200 tokens
Total input: 250 tokens (vs 50 without functions)
Cost multiplier: 5×
But: Enables capabilities worth far more than 5× cost.
Cost Optimization
| Strategy | Token Reduction | Trade-off |
|---|
| Define only relevant tools per query | 60% | Requires query classification |
| Use shorter descriptions | 30% | Less LLM guidance |
| Lazy tool loading (add tools mid-conversation) | 50% | More complexity |
| Cache tool definitions (Claude) | 90% | Requires prompt caching |
Recommendation: Use Claude's prompt caching for tool definitions. Cuts cost by 90% for repeated tool use.
# Claude prompt caching
response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a helpful assistant.",
"cache_control": {"type": "ephemeral"}
},
{
"type": "text",
"text": f"Available tools:\n{json.dumps(tools)}",
"cache_control": {"type": "ephemeral"} # Cache tools
}
],
messages=[{"role": "user", "content": user_query}]
)
Result: First call pays full cost, subsequent calls in same session pay 10% for tool definitions.
Production Checklist
Before deploying function calling to production:
Frequently Asked Questions
Can the LLM call functions I didn't define?
No. LLM can only call functions you explicitly provide in the tools array. It cannot invent or call arbitrary functions.
What if the LLM hallucinates function arguments?
Validate all arguments before execution. Use strict schema validation (OpenAI) or manual checks. Never trust LLM output blindly.
How do I prevent the LLM from calling expensive APIs repeatedly?
Implement rate limiting per function. Track calls per session, return error if limit exceeded.
call_counts = {}
def execute_tool(tool_call):
function_name = tool_call.function.name
call_counts[function_name] = call_counts.get(function_name, 0) + 1
if call_counts[function_name] > 10: # Max 10 calls per function
return {"error": "Rate limit: Too many calls to this function"}
return globals()[function_name](**arguments)
Should I use tool_choice="auto" or force a specific tool?
Auto: Let model decide (recommended for most cases)
Force: Use when you know exactly which tool should run (e.g., form submission must call submit_form)
# Force tool call
response = client.chat.completions.create(
model="gpt-4-turbo",
messages=[{"role": "user", "content": "Submit this form"}],
tools=[submit_form_tool],
tool_choice={"type": "function", "function": {"name": "submit_form"}}
)
Bottom line: Function calling transforms LLMs into agents that interact with real systems. OpenAI and Claude have 95%+ reliability. Always validate before execution, implement retries, and monitor costs. The 15-30% token overhead is worth the 10× capability expansion.
Next: Read our Multi-Agent Systems guide for coordinating multiple function-calling agents.