In our earlier tutorial, we constructed an AI agent able to answering queries by browsing the online. Nevertheless, when constructing brokers for longer-running duties, two essential ideas come into play: persistence and streaming. Persistence means that you can save the state of an agent at any given level, enabling you to renew from that state in future interactions. That is essential for long-running functions. Alternatively, streaming permits you to emit real-time indicators about what the agent is doing at any second, offering transparency and management over its actions. On this tutorial, we’ll improve our agent by including these highly effective options.
Setting Up the Agent
Let’s begin by recreating our agent. We’ll load the mandatory surroundings variables, set up and import the required libraries, arrange the Tavily search software, outline the agent state, and eventually, construct the agent.
pip set up langgraph==0.2.53 langgraph-checkpoint==2.0.6 langgraph-sdk==0.1.36 langchain-groq langchain-community langgraph-checkpoint-sqlite==2.0.1
import os
os.environ['TAVILY_API_KEY'] = ""
os.environ['GROQ_API_KEY'] = ""
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage, ToolMessage
from langchain_groq import ChatGroq
from langchain_community.instruments.tavily_search import TavilySearchResults
software = TavilySearchResults(max_results=2)
class AgentState(TypedDict):
messages: Annotated[list[AnyMessage], operator.add]
class Agent:
def __init__(self, mannequin, instruments, system=""):
self.system = system
graph = StateGraph(AgentState)
graph.add_node("llm", self.call_openai)
graph.add_node("motion", self.take_action)
graph.add_conditional_edges("llm", self.exists_action, {True: "motion", False: END})
graph.add_edge("motion", "llm")
graph.set_entry_point("llm")
self.graph = graph.compile()
self.instruments = {t.title: t for t in instruments}
self.mannequin = mannequin.bind_tools(instruments)
def call_openai(self, state: AgentState):
messages = state['messages']
if self.system:
messages = [SystemMessage(content=self.system)] + messages
message = self.mannequin.invoke(messages)
return {'messages': [message]}
def exists_action(self, state: AgentState):
end result = state['messages'][-1]
return len(end result.tool_calls) > 0
def take_action(self, state: AgentState):
tool_calls = state['messages'][-1].tool_calls
outcomes = []
for t in tool_calls:
print(f"Calling: {t}")
end result = self.instruments[t['name']].invoke(t['args'])
outcomes.append(ToolMessage(tool_call_id=t['id'], title=t['name'], content material=str(end result)))
print("Again to the mannequin!")
return {'messages': outcomes}
Including Persistence
So as to add persistence, we’ll use LangGraph’s checkpointer characteristic. A checkpointer saves the state of the agent after and between each node. For this tutorial, we’ll use SqliteSaver, a easy checkpointer that leverages SQLite, a built-in database. Whereas we’ll use an in-memory database for simplicity, you’ll be able to simply join it to an exterior database or use different checkpoints like Redis or Postgres for extra strong persistence.
from langgraph.checkpoint.sqlite import SqliteSaver
import sqlite3
sqlite_conn = sqlite3.join("checkpoints.sqlite",check_same_thread=False)
reminiscence = SqliteSaver(sqlite_conn)
Subsequent, we’ll modify our agent to just accept a checkpointer:
class Agent:
def __init__(self, mannequin, instruments, checkpointer, system=""):
# The whole lot else stays the identical as earlier than
self.graph = graph.compile(checkpointer=checkpointer)
# The whole lot else after this stays the identical
Now, we are able to create our agent with persistence enabled:
immediate = """You're a sensible analysis assistant. Use the search engine to lookup data.
You're allowed to make a number of calls (both collectively or in sequence).
Solely lookup data if you find yourself positive of what you need.
If you must lookup some data earlier than asking a follow-up query, you might be allowed to do this!
"""
mannequin = ChatGroq(mannequin="Llama-3.3-70b-Specdec")
bot = Agent(mannequin, [tool], system=immediate, checkpointer=reminiscence)
Including Streaming
Streaming is important for real-time updates. There are two sorts of streaming we’ll give attention to:
1. Streaming Messages: Emitting intermediate messages like AI choices and gear outcomes.
2. Streaming Tokens: Streaming particular person tokens from the LLM’s response.
Let’s begin by streaming messages. We’ll create a human message and use the stream technique to look at the agent’s actions in real-time.
messages = [HumanMessage(content="What is the weather in Texas?")]
thread = {"configurable": {"thread_id": "1"}}
for occasion in bot.graph.stream({"messages": messages}, thread):
for v in occasion.values():
print(v['messages'])
Closing output: The present climate in Texas is sunny with a temperature of 19.4°C (66.9°F) and a wind pace of 4.3 mph (6.8 kph)…..
Whenever you run this, you’ll see a stream of outcomes. First, an AI message instructing the agent to name Tavily, adopted by a software message with the search outcomes, and eventually, an AI message answering the query.
Understanding Thread IDs
The thread_id is a vital a part of the thread configuration. It permits the agent to keep up separate conversations with completely different customers or contexts. By assigning a singular thread_id to every dialog, the agent can preserve monitor of a number of interactions concurrently with out mixing them up.
For instance, let’s proceed the dialog by asking, “What about in LA?” utilizing the identical thread_id:
messages = [HumanMessage(content="What about in LA?")]
thread = {"configurable": {"thread_id": "1"}}
for occasion in bot.graph.stream({"messages": messages}, thread):
for v in occasion.values():
print(v)
Closing output: The present climate in Los Angeles is sunny with a temperature of 17.2°C (63.0°F) and a wind pace of two.2 mph (3.6 kph) ….
The agent infers that we’re asking in regards to the climate, because of persistence. To confirm, let’s ask, “Which one is hotter?”:
messages = [HumanMessage(content="Which one is warmer?")]
thread = {"configurable": {"thread_id": "1"}}
for occasion in bot.graph.stream({"messages": messages}, thread):
for v in occasion.values():
print(v)
Closing output: Texas is hotter than Los Angeles. The present temperature in Texas is nineteen.4°C (66.9°F), whereas the present temperature in Los Angeles is 17.2°C (63.0°F)
The agent appropriately compares the climate in Texas and LA. To check if persistence retains conversations separate, let’s ask the identical query with a unique thread_id:
messages = [HumanMessage(content="Which one is warmer?")]
thread = {"configurable": {"thread_id": "2"}}
for occasion in bot.graph.stream({"messages": messages}, thread):
for v in occasion.values():
print(v)
Output: I want extra data to reply that query. Are you able to please present extra context or specify which two issues you might be evaluating?
This time, the agent will get confused as a result of it doesn’t have entry to the earlier dialog’s historical past.
Streaming Tokens
To stream tokens, we’ll use the astream_events technique, which is asynchronous. We’ll additionally swap to an async checkpointer.
from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver
async with AsyncSqliteSaver.from_conn_string(":reminiscence:") as checkpointer:
abot = Agent(mannequin, [tool], system=immediate, checkpointer=checkpointer)
messages = [HumanMessage(content="What is the weather in SF?")]
thread = {"configurable": {"thread_id": "4"}}
async for occasion in abot.graph.astream_events({"messages": messages}, thread, model="v1"):
variety = occasion["event"]
if variety == "on_chat_model_stream":
content material = occasion["data"]["chunk"].content material
if content material:
# Empty content material within the context of OpenAI means
# that the mannequin is asking for a software to be invoked.
# So we solely print non-empty content material
print(content material, finish="|")
This can stream tokens in real-time, providing you with a stay view of the agent’s thought course of.
Conclusion
By including persistence and streaming, we’ve considerably enhanced our AI agent’s capabilities. Persistence permits the agent to keep up context throughout interactions, whereas streaming gives real-time insights into its actions. These options are important for constructing production-ready functions, particularly these involving a number of customers or human-in-the-loop interactions.
Within the subsequent tutorial, we’ll dive into human-in-the-loop interactions, the place persistence performs a vital function in enabling seamless collaboration between people and AI brokers. Keep tuned!
References:
- (DeepLearning.ai) https://study.deeplearning.ai/programs/ai-agents-in-langgraph
Additionally, don’t overlook to observe us on Twitter and be part of our Telegram Channel and LinkedIn Group. Don’t Neglect to hitch our 75k+ ML SubReddit.
🚨 Meet IntellAgent: An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System (Promoted)

Vineet Kumar is a consulting intern at MarktechPost. He’s at the moment pursuing his BS from the Indian Institute of Expertise(IIT), Kanpur. He’s a Machine Studying fanatic. He’s obsessed with analysis and the newest developments in Deep Studying, Pc Imaginative and prescient, and associated fields.