Add Long-Term Memory to LLM Applications with Mem0

June 3, 2026

Add Long-Term Memory to LLM Applications with Mem0

Khuyen Tran

Introduction
Vector Search vs Memory
Setup
Build a Stateless Support Bot
Test the Stateless Support Bot
Store Support Interactions in Mem0
Retrieve Memories Before Responding
Compare Stateless vs Memory-Aware Responses
Add a Status Update
Manage Stored Memories
Final Thoughts

Introduction

LLM applications are often stateless by default. Each prompt only knows what you include at that moment, so useful context from earlier interactions can disappear between sessions.

For simple questions, that is often enough. For returning users, it can lead to repeated explanations, missing preferences, and less helpful responses.

Mem0 helps by extracting important facts from conversations, storing them by user, and retrieving relevant memories when needed.

To see how Mem0 works, we will build a simple e-commerce support workflow, first without memory and then with Mem0, so you can compare how stored context changes the response.

💻 Get the Code: Open the notebook in Google Colab to run it in your browser, or grab the source from GitHub.

Stay Current with CodeCut

Easy-to-digest articles on Python, AI, and open-source tools. Delivered twice a week.

Vector Search vs Memory

A vector database is a good starting point for retrieval, but memory requires more than finding similar text. You still need a way to decide which details are worth saving for future interactions.

In this customer support example, the vector database may retrieve the entire message, including the coffee detail even though it is unrelated to the delayed order:

My order #4821 still has not arrived. I contacted support last week and got no update. Also, I was checking this while making coffee this morning.

A memory layer should keep only the support-relevant facts:

Order #4821 is delayed, and the customer contacted support last week without receiving an update.

To get from raw retrieval to useful memory, your application still needs logic for:

Deciding which details from the message should be remembered for future replies.
Making sure each memory is tied to the right user or conversation.
Deciding how to handle newer memories that conflict with or supersede earlier ones.
Formatting retrieved messages into prompt-ready context.

You can implement this yourself, but the extra extraction and memory-management logic adds up. Mem0 packages that workflow behind a simpler memory API, which we will use in the next section.

If vector search is new to you, see our Pinecone vector database tutorial for a practical introduction to semantic search and retrieval.

Setup

Install the libraries used in this tutorial:

mem0ai: Adds long-term memory for storing and retrieving user context.
litellm: Provides a unified interface for calling LLM providers.
python-dotenv: Loads API keys and configuration from a .env file.

pip install mem0ai litellm python-dotenv

This article uses mem0ai v2.0.4, litellm v1.87.0, and python-dotenv v1.2.2.

Since this tutorial uses OpenAI’s GPT-4o-mini, store your OPENAI_API_KEY in a .env file in your project folder so Mem0 and LiteLLM can access it.

OPENAI_API_KEY=your-openai-api-key

Start by loading the environment variables and imports used throughout the tutorial:

from dotenv import load_dotenv
from litellm import completion
from mem0 import Memory

load_dotenv()

Next, define a helper that sends a list of chat messages to GPT-4o-mini through LiteLLM and returns only the generated text:

def ask_model(messages: list[dict[str, str]]) -> str:
    response = completion(model="openai/gpt-4o-mini", messages=messages)
    return response.choices[0].message.content

To use another model, replace "openai/gpt-4o-mini" with a model name from the LiteLLM provider docs.

Finally, configure Mem0 with an LLM for extracting memories from messages. This example uses GPT-4o-mini, but Mem0 and LiteLLM do not have to use the same model:

memory = Memory.from_config(
    {
        "llm": {
            "provider": "openai",
            "config": {"model": "gpt-4o-mini"},
        },
    }
)

Build a Stateless Support Bot

First, create a support bot with no memory. It only sees the current message.

SYSTEM_PROMPT = (
    "You are a helpful e-commerce customer support agent. "
    "Be polite, concise, and practical."
)


def ask_stateless_bot(question: str) -> str:
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": question},
    ]
    return ask_model(messages)

The customer opens a support chat about a delayed order. We will also include an irrelevant detail in the message, so that we can later show that Mem0 can focus on the useful support facts:

initial_message = (
    "My order #4821 still has not arrived. "
    "I contacted support last week and got no update. "
    "Also, I was checking this while making coffee this morning." # irrelevant detail
)

print(ask_stateless_bot(initial_message))

Output

I'm sorry to hear that your order #4821 hasn't arrived yet. I understand how frustrating this can be. Let me assist you by checking the status for you. Please give me a moment.

The reply is reasonable for a first message. The problem shows up when the customer returns later.

Test the Stateless Support Bot

Let’s see how the bot responds to a follow-up question without passing the earlier conversation:

follow_up = "Hi, any update on my delayed order?"

print(ask_stateless_bot(follow_up))

Output

I’d be happy to help with your order. Could you please provide me with your order number? This will help me look up the status for you.

The response is polite, but it loses the context that matters most: the order number, the prior support contact, and the customer’s frustration.

Store Support Interactions in Mem0

To preserve the missing order context, add Mem0 as a memory layer. Store the customer message under a stable customer_id so it can be retrieved in later interactions.

customer_id = "cust_4821"
order_id = "4821"
issue_type = "shipping"
sentiment = "frustrated"

memory_extraction_prompt = (
    "Extract only useful customer-support facts. Ignore unimportant information."
)

add_result = memory.add(
    messages=[{"role": "user", "content": initial_message}],
    user_id=customer_id,
    metadata={
        "issue_type": issue_type,
        "sentiment": sentiment,
        "order_id": order_id,
    },
    prompt=memory_extraction_prompt,
)

print(add_result)

Output

{
    "results": [
        {
            "id": "8d4cf548-683c-49cf-8d68-44d431155741",
            "memory": (
                "User's order #4821 has not arrived as of June 3, 2026, "
                "and they contacted support last week but received no update."
            ),
            "event": "ADD",
        }
    ]
}

Note that the saved memory keeps the support-relevant details: the order number, the delayed delivery, and the prior support contact. It does not include the unrelated coffee detail from the original message.

Raw customer message
   |
   v
LLM-based extraction step
   |
   +--> Save: order #4821 has not arrived
   +--> Save: customer contacted support last week
   +--> Drop: customer was making coffee

This has a few practical benefits:

Cleaner retrieval: Future searches return useful support context instead of unrelated conversation details.
Lower API cost: Passing compact memories instead of noisy chat history can reduce the number of tokens sent to the model.
Better privacy boundaries: Incidental personal details are less likely to be retained when they are not needed.

If you do not want Mem0 to infer or rewrite memories, set infer=False. In that mode, Mem0 stores the provided message directly instead of extracting selected facts from it.

Retrieve Memories Before Responding

Before answering a new message, define a helper that searches Mem0 for memories related to the current question and limits the search to the current customer_id:

def retrieve_customer_memories(question: str, customer_id: str) -> dict:
    return memory.search(
        query=question,
        filters={"user_id": customer_id},
        top_k=3,
    )


for item in retrieve_customer_memories(follow_up, customer_id)["results"]:
    print(item["memory"])

Output

User's order #4821 has not arrived and they contacted support last week but received no update.

Next, convert the dictionary returned by Mem0 into plain text so it can be inserted into the system prompt:

def format_memories(search_result: dict) -> str:
    memories = search_result["results"]
    if not memories:
        return "No relevant memories found."

    return "\n".join(f"- {item['memory']}" for item in memories)


print(format_memories(retrieve_customer_memories(follow_up, customer_id)))

Output

- User's order #4821 has not arrived and they contacted support last week but received no update.

Compare Stateless vs Memory-Aware Responses

Now we are ready to create a memory-aware support bot. The function below does three things:

Retrieves relevant customer memories from Mem0.
Adds those memories to the system prompt.
Sends the updated prompt to the model.

def ask_memory_aware_bot(question: str, customer_id: str) -> str:
    customer_memories = format_memories(
        retrieve_customer_memories(question, customer_id)
    )

    messages = [
        {
            "role": "system",
            "content": (
                f"{SYSTEM_PROMPT}\n\n"
                f"Relevant customer memories:\n{customer_memories}"
            ),
        },
        {"role": "user", "content": question},
    ]
    return ask_model(messages)

Ask the same follow-up again:

print(ask_memory_aware_bot(follow_up, customer_id))

Output

Thanks for checking back on order #4821. I can see it is still delayed and that you contacted us last week without an update. I am escalating this with shipping now and will email you a new delivery estimate today.

This response now uses the stored memory. It remembers the order number, the earlier support contact, and the unresolved shipping issue without asking the customer to repeat them.

Add a Status Update

Next, add a new message that changes the order status. This lets us see how the memory-aware bot handles newer information:

status_update_result = memory.add(
    messages=[{"role": "user", "content": "Order #4821 arrived today."}],
    user_id=customer_id,
    metadata={"order_id": order_id, "issue_type": issue_type},
)

print(status_update_result)

Output

{
    "results": [
        {
            "id": "1fb682a4-9da2-4a7b-a174-9d77c015c1d1",
            "memory": (
                "User's order #4821 arrived on June 3, 2026, after "
                "previously not arriving and contacting support for an update."
            ),
            "event": "ADD",
        }
    ]
}

Mem0 stores this as a new memory. Ask the same follow-up again:

print(ask_memory_aware_bot(follow_up, customer_id))

Output

Hello! I see that your order #4821 has actually arrived on June 3, 2026. If you need any further assistance or have questions about your order, feel free to let me know!

The answer changes because the model now receives both the earlier delay and the newer arrival status in the retrieved memories.

Manage Stored Memories

When memory affects model responses, you need a way to inspect what was stored. Use get_all() to see the memories Mem0 has saved for a customer:

all_memories = memory.get_all(filters={"user_id": customer_id})

for item in all_memories["results"]:
    print(item["id"], "-", item["memory"])

Output

8d4cf548-683c-49cf-8d68-44d431155741 - User's order #4821 has not arrived and they contacted support last week but received no update.
1fb682a4-9da2-4a7b-a174-9d77c015c1d1 - User's order #4821 arrived on June 3, 2026, after previously not arriving and contacting support for an update.

If a memory is wrong or a ticket is resolved, delete it by ID:

memory_id = all_memories["results"][0]["id"]
memory.delete(memory_id)

Output

{'message': 'Memory deleted successfully!'}

You can also delete all memories for a user at once using delete_all(). Be careful because this permanently removes every stored memory for that user:

# Irreversible: deletes all memories for this user
memory.delete_all(user_id=customer_id)

Output

{'message': 'Memories deleted successfully!'}

Final Thoughts

Stateless LLM applications are easy to build, but they only know what is included in the current prompt.

Mem0 adds a memory layer for applications that need continuity across sessions. The same pattern can support customer support bots, personal assistants, tutoring apps, healthcare intake tools, CRM workflows, and onboarding assistants.

With Mem0, applications can save important details from past interactions and retrieve the most relevant context for future responses.

Use memory when you don’t want returning users to have to repeat important details. Skip it when each request is self-contained or when storing user history adds unnecessary complexity.

Add Long-Term Memory to LLM Applications with Mem0

Add Long-Term Memory to LLM Applications with Mem0

Khuyen Tran

Table of Contents

Introduction

Stay Current with CodeCut

Vector Search vs Memory

Setup

Build a Stateless Support Bot

Test the Stateless Support Bot

Store Support Interactions in Mem0

Retrieve Memories Before Responding

Compare Stateless vs Memory-Aware Responses

Add a Status Update

Manage Stored Memories

Final Thoughts

Related Tutorials

Stay Current with CodeCut

Related Posts

Leave a Comment Cancel Reply

Get in touch

Join the Newsletter

Follow Us on Social Media

Add Long-Term Memory to LLM Applications with Mem0

Add Long-Term Memory to LLM Applications with Mem0

Khuyen Tran

Table of Contents

Introduction

Stay Current with CodeCut

Vector Search vs Memory

Setup

Build a Stateless Support Bot

Test the Stateless Support Bot

Store Support Interactions in Mem0

Retrieve Memories Before Responding

Compare Stateless vs Memory-Aware Responses

Add a Status Update

Manage Stored Memories

Final Thoughts

Related Tutorials

Stay Current with CodeCut

Related Posts

Leave a Comment Cancel Reply

Work with Khuyen Tran

Work with Khuyen Tran