Build Trustworthy AI with Real-Time Citations Using Diffbot

Introduction
Understanding the Architecture and Performance
Getting started with Diffbot LLM
Key Features and Capabilities
Self-hosting for privacy
Building Real-World Applications with Diffbot and LangChain
- LangChain + Diffbot basics
- Building a RAG application with Diffbot and LangChain
Conclusion and Future Outlook

Introduction

Have you ever asked a language model about breaking news or stock prices and gotten outdated or misleading answers?

For example:

Prompt:

“What did the Federal Reserve announce about interest rates this morning?”

Possible Hallucinated Answer from a Traditional Model:

“This morning, the Federal Reserve announced a 0.5% increase in interest rates to combat inflation, citing strong job growth and rising consumer prices.”

This happens because traditional language models operate with static training data that becomes stale within months. They often sound confident, but there is no way to verify their claims. When you’re building applications that need accurate, up-to-date information with verifiable sources, these limitations aren’t just annoying—they’re unacceptable.

Diffbot LLM takes a different approach by connecting language generation to a massive, continuously updated knowledge graph containing over 1 trillion facts. Instead of relying solely on training data, it retrieves real-time information and provides citations for every claim it makes. Powered by GraphRAG, it keeps answers current and verifiable.

In this tutorial, you’ll learn how to set up Diffbot LLM, understand how its knowledge graph powers more reliable responses, and build applications that need trustworthy, cited AI interactions.

Understanding the Architecture and Performance

To understand how Diffbot LLM delivers those reliable, cited responses, we need to examine the architectural philosophy behind its design and the performance metrics that validate this approach.

The GraphRAG philosophy

Diffbot’s approach represents a fundamental shift in AI architecture: instead of building ever-larger models that memorize more facts, they built smaller models that excel at finding and using external knowledge. This design challenges the “bigger is better” mindset, showing a 70B model with live data can outperform larger models on factual tasks.

This architecture recognizes that facts change constantly while reasoning abilities remain stable. Rather than spending compute on memorizing Wikipedia, Diffbot fine-tuned models to become expert users of APIs, search engines, and code interpreters.

For example, instead of memorizing static facts like “Who is the CEO of Pfizer?”, Diffbot trains its model to make a real-time API call using a query like type:Organization name:"Pfizer" with the Diffbot Knowledge Graph API.

Here’s a simplified version of the API’s response:

{
  "data": [
    {
      "ceo": {
        "summary": "Greek-American pharmaceutical executive",
        "image": "https://kg.diffbot.com/image/api/get?fetch=yes&url=g%3Cj7guSXzAoBWu.x0KwLZrUn.%5B%3CR0Aa4Hwygr9m6W%3Exl5G%60BkxmPAP1_w%7B%3C%3AVlDZ.Bv%3E",
        "types": ["Person"],
        "name": "Albert Bourla",
        "diffbotUri": "http://diffbot.com/entity/EHPJc2wuRMGGvvptCHJ8jyg",
        "targetDiffbotId": "EHPJc2wuRMGGvvptCHJ8jyg",
        "type": "Person"
      }
    }
  ]
}

This API retrieves the latest CEO data by accessing the data.ceo.name field in the response (e.g., Albert Bourla), and includes a citation link to the original source.

The result is a system that stays current without expensive retraining cycles and provides transparency that pure neural approaches cannot match.

Core components

Diffbot’s system has three main components working together. The Knowledge Graph contains over 10 billion entities and their relationships, continuously ingesting new information from millions of web pages since 2016. This creates a living map of factual knowledge spanning organizations, people, products, and events.

The system’s architecture includes:

Knowledge Graph – 10+ billion entities with 1+ trillion facts, updated every 4-5 days
Fine-tuned models – 8B parameter and 70B parameter versions that can run on a single A100 or dual H100 GPUs, respectively
Real-time web search – Direct access to current web pages
Code interpreter – Executes JavaScript to compute answers directly, ensuring accuracy and transparency without relying on guesswork.
Multimodal capabilities – Understands and analyzes images through built-in visual reasoning tools

Rather than memorizing facts, Diffbot fine-tuned the model to become an expert tool user, providing accurate results with full transparency.

Performance validation

Diffbot LLM’s benchmark performance validates its architectural advantages:

FreshQA – 81% accuracy on real-time questions (highest among tested systems)
MMLU-Pro – 70.36% on academic knowledge (best open-source under 100B parameters)
SimpleQA – Outperformed all models including Perplexity Sonar Pro and Gemini-2.0-flash
Citation accuracy – Every claim linked to verifiable sources

The improvement comes from tool use rather than larger weights, showing external knowledge retrieval can match larger traditional models.

Getting started with Diffbot LLM

With the architectural foundation in place, let’s walk through how to start using Diffbot LLM in practice.

Diffy – the web UI

The fastest way to experience Diffbot LLM is through Diffy.chat, the web interface that showcases the system’s capabilities without requiring any setup. You can immediately test real-time information retrieval, citations, and multimodal requests to understand what makes Diffbot LLM different.

Key features available through the web UI:

Real-time web URL extraction – Summarize any webpage with proper attribution
Knowledge graph querying – Access structured facts from Diffbot’s trillion-fact database
Image analysis – Upload images for visual understanding and interpretation
JavaScript code interpreter – Get precise calculations and data processing
Citation tracking – See exactly where every fact comes from with clickable sources

The citation panel shows sources for each claim, giving you immediate insight into the transparency that sets Diffbot apart from traditional LLMs.

Installation and setup

Getting started with the Diffbot LLM API requires just a few steps. Sign up for a free developer account at app.diffbot.com/get-started to obtain your API token. The free tier provides sufficient credits for testing, with higher limits available for production use.

#| eval: false
pip install openai python dotenv
touch .env  # Create a .env file
echo "DIFFBOT_API_TOKEN=your-token-here" >> .env  # Add your token to a .env file

The API follows OpenAI’s interface exactly, so you can integrate Diffbot LLM by simply changing the base URL and API key. This compatibility means no code restructuring – your current OpenAI integration will work immediately with Diffbot’s GraphRAG capabilities.

Your first query with citations

Here’s a simple example demonstrating how Diffbot LLM returns responses with full citations:

#| eval: false
import os

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()

DIFFBOT_API_TOKEN = os.getenv("DIFFBOT_API_TOKEN")

diffbot_client = OpenAI(
    base_url="https://llm.diffbot.com/rag/v1",
    api_key=DIFFBOT_API_TOKEN,
)

completion = diffbot_client.chat.completions.create(
    model="diffbot-small-xl",
    messages=[{"role": "user", "content": "What is GraphRAG?"}],
)

In this code, we:

Initialize the OpenAI client with Diffbot’s base URL for the RAG endpoint.
Send a chat-style request to the diffbot-small-xl model with the user message “What is GraphRAG?”

Here is the first few lines of the response:

#| eval: false
print(completion.choices[0].message.content[:1000])

**GraphRAG** stands for **Graph-based Retrieval Augmented Generation**, a method that integrates **knowledge graphs** with traditional **Retrieval Augmented Generation (RAG)** approaches to enhance the effectiveness and contextuality of AI responses.

### Understanding GraphRAG {#understanding-graphrag}

GraphRAG improves upon traditional RAG models by organizing data into a structured knowledge graph, which allows for more precise and context-aware results. Traditional RAG methods rely on semantic searches of unstructured text snippets, whereas GraphRAG leverages a hierarchical and community-based structure within the knowledge graph to facilitate complex queries and relationships ([Analytics Vidhya](https://www.analyticsvidhya.com/blog/2024/11/graphrag/#h-what-is-graphrag)).

#### How Does GraphRAG Work?

The **GraphRAG indexing package** is a data pipeline and transformation suite designed to extract structured data from unorganized text using **Large Language Models (LLMs)**. The standard pipeline includes

The response structure follows OpenAI’s familiar format but cites a credible source— Analytics Vidhya. This makes it easy to swap in Diffbot for any existing OpenAI-based setup when your application requires real-time information and verifiable citations from a knowledge graph.

Key Features and Capabilities

To understand Diffbot’s capabilities, let’s compare the outputs from Diffbot and GPT-4o. To do that, we’ll first define a few helper functions: one to query Diffbot, one to query OpenAI, and another to print the model’s response:

#| eval: false
def query_model(query_text, model, base_url=None, api_key=None):
    client_args = {}
    if base_url:
        client_args["base_url"] = base_url
    if api_key:
        client_args["api_key"] = api_key

    client = OpenAI(**client_args)
    return client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": query_text}],
    )

def query_diffbot(query_text, model="diffbot-small-xl"):
    return query_model(
        query_text,
        model=model,
        base_url="https://llm.diffbot.com/rag/v1",
        api_key=DIFFBOT_API_TOKEN,
    )

def query_openai(query_text, model="o4-mini"):
    return query_model(query_text, model=model)

def print_response(response):
    print(response.choices[0].message.content)

In the following sections, we’ll use these helper functions to ask questions and compare how each model responds.

1. Real-time knowledge retrieval

Let’s ask the model about the weather in Tokyo.

OpenAI:

#| eval: false
openai_completion_1 = query_openai("What is the weather in Tokyo?")
print_response(openai_completion_1)

1. Visit a weather website or app
   • Weather.com / The Weather Channel
   • AccuWeather
   • Japan Meteorological Agency (JMA): https://www.jma.go.jp

2. Ask a voice‐assistant (Siri, Google Assistant, Alexa, etc.)
...

The OpenAI model doesn’t provide the weather directly. Instead, it suggests where you can find it, reflecting its lack of real-time access.

Diffbot:

#| eval: false
diffbot_completion_1 = query_diffbot("What is the weather in Tokyo?")
print_response(diffbot_completion_1)

### 🌤️ Tokyo, Japan

*May 28-31, 2025*

Wed ☁️ | 62-75°F ███████████████ | 💨  5 mph | 💧 14%
Thu 🌧️ | 62-75°F ███████████████ | 💨  5 mph | 💧 25%
Fri 🌧️ | 61-58°F ████████████░░ | 💨 10 mph | 💧 94%
Sat ⛈️ | 72-61°F ███████████████ | 💨  8 mph | 💧 93%

*Last updated: 9:13 PM JST ([AccuWeather](https://www.accuweather.com/en/jp/tokyo/226396/weather-forecast/226396)) ([Weather.com](https://weather.com/weather/tenday/l/Minato+ku+Tokyo+Prefecture+Japan?canonicalCityId=89a9a327ec5fd290c4d12f51a20485cb))*

The Diffbot model returns a real-time weather forecast with detailed daily updates and citations. This indicates the model is capable of actively retrieving and integrating up-to-date, verifiable information into its responses.

2. Code execution

Now, let’s ask a challenging math question that a traditional model would surely fail without access to a sandbox coding environment:

#| eval: false
completion = query_diffbot(
    "Find the square root of 12394890235",
)

print(completion.choices[0].message.content)

Output:

The square root of **12394890235** is approximately **111332.34137033139**.

<details>
  <summary>Source</summary>

  ```javascript
  // Input value
  var x = 12394890235;

  // Logging the input value
  console.log("Input value (x):", x);

  // Calculating the square root
  var squareRoot = Math.sqrt(x);

  // Logging the result
  console.log("Square root of x:", squareRoot);

  // Final answer
  console.log("Answer:", squareRoot);
  ```

  Output:
```text
  Input value (x): 12394890235
  Square root of x: 111332.34137033139
  Answer: 111332.34137033139
```text
</details>

Both exponent calculations are precise and the final answer is correct as well with the sources being related to exponent calculations.

3. Image analysis

Now, let’s ask Diffbot multimodal questions that involve images. We will ask it to describe a nondescript URL of CodeCut’s banner:

OpenAI:

#| eval: false
_image_url = "https://codecut.ai/wp-content/uploads/2025/05/codecut-home-image.png"

openai_completion_5 = query_openai(f"Describe this image to me: {_image_url}")
print(openai_completion_5.choices[0].message.content)

Again, GPT-4o is producing a fabricated response while Diffbot is accurate:

Diffbot:

#| eval: false
_image_url = "https://codecut.ai/wp-content/uploads/2025/05/codecut-home-image.png"
diffbot_completion_5 = query_diffbot(f"Describe this image to me: {_image_url}")
print(diffbot_completion_5.choices[0].message.content)

This image features a modern laptop computer prominently in the center. The laptop is open to a coding interface, indicating its use for programming or software development purposes. The background of the image is clean and uncluttered, emphasizing the focus on the laptop and its digital content. The image has a resolution of 1200x1000 pixels.

This example verifies the multi-modal capabilities of Diffbot.

Self-hosting for privacy

If your use-case involves high-stakes sensitive information like financial or medical databases, you can get all the benefits of the Serverless API locally by running a couple of Docker commands:

For the 8B model, much smaller in disk size:

docker run --runtime nvidia --gpus all -p 8001:8001 --ipc=host -e VLLM_OPTIONS="--model diffbot/Llama-3.1-Diffbot-Small-2412 --served-model-name diffbot-small --enable-prefix-caching"  docker.io/diffbot/diffbot-llm-inference:latest

For the larger 70B model with full capabilities:

docker run --runtime nvidia --gpus all -p 8001:8001 --ipc=host -e VLLM_OPTIONS="--model diffbot/Llama-3.3-Diffbot-Small-XL-2412 --served-model-name diffbot-small-xl --enable-prefix-caching --quantization fp8 --tensor-parallel-size 2"  docker.io/diffbot/diffbot-llm-inference:latest

Once the application starts up successfully and you see a message like the following:

INFO:  Application startup complete.
INFO:  Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

You can run all the examples above by replacing the base URL with the endpoint http://localhost:8001/rag/v1.

However, do note that these models require high-end GPUs like A100 and H100s to run at full precision. If you don’t have the right hardware, consider using RunPod.io which cost:

$5.98/hr for dual H100 GPU setup (total 160 GB VRAM)
$1.89/hr for a single A100 GPU setup (80 GB VRAM)

If you want to see another example of how to run LLM’s privately consider reading our article Run Private AI Workflows with LangChain and Ollama for a different approach.

Building Real-World Applications with Diffbot and LangChain

While the basic API integration shows Diffbot LLM’s capabilities, combining it with LangChain unlocks the full potential for building production-ready applications that require sophisticated workflows and real-time knowledge.

LangChain + Diffbot basics

Before building complex applications, you’ll need to install the required LangChain packages and understand how to integrate them with Diffbot’s API. Start by installing the necessary dependencies:

pip install langchain langchain-openai

LangChain provides a familiar interface for working with language models through its ChatOpenAI class. Since Diffbot LLM follows OpenAI’s API format, integration requires only changing the base URL and API key:

#| eval: false
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="diffbot-small-xl",
    temperature=0,
    max_tokens=None,
    timeout=None,
    base_url="https://llm.diffbot.com/rag/v1",
    api_key=DIFFBOT_API_TOKEN,
)

This setup gives you access to all of LangChain’s features while benefiting from Diffbot’s real-time knowledge retrieval and citation capabilities. You can use the standard message format for simple interactions:

#| eval: false
messages = [
    ("system", "You are a data scientist who writes efficient Python code."),
    ("human", "Given a DataFrame with columns 'product' and 'sales', calculates the total sales for each product."),
]

ai_msg = llm.invoke(messages)
print(ai_msg.content)

For more structured applications, LangChain’s ChatPromptTemplate allows you to create reusable prompt templates with variables:

#| eval: false
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are a data scientist who writes efficient {language} code",
        ),
        ("human", "{input}"),
    ]
)
chain = prompt | llm
_result = chain.invoke(
    {
        "language": "SQL",
        "input": "Given a table with columns 'product' and 'sales', calculates the total sales for each product.",
    }
)
print(_result.content)

The pipe operator ( |) creates a chain that flows data from the prompt through the language model, making it easy to build complex workflows with multiple steps.

If you want to see another example of combining an LLM with Langchain you can read about it in our previous article Build Smarter Data Science Workflows with DeepSeek and LangChain.

Building a RAG application with Diffbot and LangChain

Now let’s build a production-level research assistant that combines document analysis with Diffbot’s real-time knowledge. This application will analyze uploaded documents, extract topics, and provide current information about those topics with proper citations.

#| eval: false
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
import json
from typing import List, Dict

class ResearchAssistant:
    def __init__(self, diffbot_api_key: str):
        self.llm = ChatOpenAI(
            model="diffbot-small-xl",
            temperature=0.3,
            base_url="https://llm.diffbot.com/rag/v1",
            api_key=diffbot_api_key
        )
        self.setup_chains()

    def setup_chains(self):
        # Chain for extracting topics from documents
        self.topic_extraction_prompt = ChatPromptTemplate.from_template("""
        Analyze the following document and extract 3-5 main topics or entities that would benefit
        from current information. Return as a JSON list of topics.

        Document: {document}

        Topics (JSON format):
        """)

        # Chain for researching each topic
        self.research_prompt = ChatPromptTemplate.from_template("""
        Provide comprehensive, current information about: {topic}

        Context from document: {context}

        Include:
        1. Current status and recent developments
        2. Key statistics or data points
        3. Recent news or updates
        4. Relevant industry trends

        Ensure all facts are cited with sources.
        """)

        # Chain for generating final report
        self.report_prompt = ChatPromptTemplate.from_template("""
        Create a comprehensive research report based on the document analysis and current research.

        Original Document Summary: {document_summary}

        Research Findings: {research_findings}

        Generate a well-structured report that:
        1. Summarizes the original document's main points
        2. Provides current context for each major topic
        3. Identifies any outdated information in the document
        4. Suggests areas for further investigation

        Include proper citations throughout.
        """)

The ResearchAssistant class defines three specialized chains for different stages of the analysis. Each chain has a specific role: topic extraction identifies what to research, research gathering collects current information, and report generation synthesizes everything into a cohesive analysis.

#| eval: false
    def extract_topics(self, document: str) -> List[str]:
        """Extract main topics from the document for research."""
        chain = self.topic_extraction_prompt | self.llm | StrOutputParser()

        try:
            result = chain.invoke({"document": document})
            # Parse JSON response to get topic list
            topics = json.loads(result.strip())
            return topics if isinstance(topics, list) else []
        except (json.JSONDecodeError, Exception) as e:
            print(f"Error extracting topics: {e}")
            return []

    def research_topic(self, topic: str, context: str) -> str:
        """Research current information about a specific topic."""
        chain = self.research_prompt | self.llm | StrOutputParser()

        return chain.invoke({
            "topic": topic,
            "context": context
        })

    def generate_report(self, document: str, research_findings: List[Dict]) -> str:
        """Generate comprehensive report with current information."""
        # Create document summary
        summary_prompt = ChatPromptTemplate.from_template(
            "Provide a concise summary of this document: {document}"
        )
        summary_chain = summary_prompt | self.llm | StrOutputParser()
        document_summary = summary_chain.invoke({"document": document})

        # Format research findings
        findings_text = "\n\n".join([
            f"**{finding['topic']}:**\n{finding['research']}"
            for finding in research_findings
        ])

        # Generate final report
        report_chain = self.report_prompt | self.llm | StrOutputParser()

        return report_chain.invoke({
            "document_summary": document_summary,
            "research_findings": findings_text
        })

These methods handle the core workflow: extract_topics identifies research targets, research_topic gathers current information using Diffbot’s knowledge graph, and generate_report synthesizes everything into a comprehensive analysis.

#| eval: false
    def analyze_document(self, document: str) -> Dict:
        """Complete document analysis with current research."""
        print("Extracting topics from document...")
        topics = self.extract_topics(document)

        if not topics:
            return {"error": "Could not extract topics from document"}

        print(f"Researching {len(topics)} topics...")
        research_findings = []

        for topic in topics:
            print(f"  - Researching: {topic}")
            research = self.research_topic(topic, document)
            research_findings.append({
                "topic": topic,
                "research": research
            })

        print("Generating comprehensive report...")
        final_report = self.generate_report(document, research_findings)

        return {
            "topics": topics,
            "research_findings": research_findings,
            "final_report": final_report,
            "status": "completed"
        }

# Usage example
assistant = ResearchAssistant(DIFFBOT_API_TOKEN)

sample_document = """
Artificial Intelligence has made significant progress in natural language processing.
Companies like OpenAI and Google have released powerful language models.
The field of machine learning continues to evolve with new architectures and techniques.
Investment in AI startups reached $25 billion in 2023.
"""

result = assistant.analyze_document(sample_document)
print(result["final_report"])

This workflow demonstrates how LangChain’s orchestration capabilities combine with Diffbot’s real-time knowledge to create applications that deliver current, cited information.

You can extend this pattern for competitive intelligence, academic research, market analysis, or any application that benefits from combining document analysis with real-time knowledge.

Conclusion and Future Outlook

Diffbot LLM breaks from the “bigger is better” mindset by focusing on smarter, more efficient design. Instead of building massive models to store static facts, it empowers smaller models to:

Find and use external knowledge
Reduce compute costs
Stay up to date without retraining
Provide verifiable outputs

As more organizations require AI systems they can trust and audit, the GraphRAG approach developed by Diffbot LLM may well become the standard for production AI applications where accuracy matters more than creativity.

Build Production-Ready RAG Systems with MLflow Quality Metrics

July 13, 2025

9 Claude Code Techniques I Wish I Had Known Earlier

July 12, 2025

Transparent Calculations and Real-Time Research in One Conversation

July 12, 2025

Build Trustworthy AI with Real-Time Citations Using Diffbot

Table of Contents

Build Trustworthy AI with Real-Time Citations Using Diffbot

Khuyen Tran

Bex Tuychiev

Table of Contents

Introduction

Understanding the Architecture and Performance

The GraphRAG philosophy

Core components

Performance validation

Getting started with Diffbot LLM

Diffy – the web UI

Installation and setup

Your first query with citations

Key Features and Capabilities

1. Real-time knowledge retrieval

2. Code execution

3. Image analysis

Self-hosting for privacy

Building Real-World Applications with Diffbot and LangChain

LangChain + Diffbot basics

Building a RAG application with Diffbot and LangChain

Conclusion and Future Outlook

Related Posts

Leave a Comment Cancel Reply

Stay up-to-date with
data skills using
CodeCut

Drop a line

Get in touch

Follow Us on Social Media

Build Trustworthy AI with Real-Time Citations Using Diffbot

Table of Contents

Build Trustworthy AI with Real-Time Citations Using Diffbot

Khuyen Tran

Bex Tuychiev

Table of Contents

Introduction

Understanding the Architecture and Performance

The GraphRAG philosophy

Core components

Performance validation

Getting started with Diffbot LLM

Diffy – the web UI

Installation and setup

Your first query with citations

Key Features and Capabilities

1. Real-time knowledge retrieval

2. Code execution

3. Image analysis

Self-hosting for privacy

Building Real-World Applications with Diffbot and LangChain

LangChain + Diffbot basics

Building a RAG application with Diffbot and LangChain

Conclusion and Future Outlook

Related Posts

Leave a Comment Cancel Reply

Stay up-to-date with data skills using CodeCut

Drop a line

Get in touch

Follow Us on Social Media

Work with Khuyen Tran

Work with Khuyen Tran

Stay up-to-date with
data skills using
CodeCut