Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter #307: kotaemon: Self-Hosted Document QA with Citations in One Command

Newsletter #307: kotaemon: Self-Hosted Document QA with Citations in One Command

Grab your coffee. Here are this week’s highlights.


📅 Today’s Picks

kotaemon: Self-Hosted Document QA with Citations in One Command

Code example: kotaemon: Self-Hosted Document QA with Citations in One Command

Problem

Building a RAG app for document Q&A usually means assembling a parser, vector database, retrieval pipeline, and UI from scratch.

Each piece has its own setup, and getting everything to work together can take hours of debugging.

Solution

kotaemon packages the entire RAG stack into a single Docker image, letting you skip the setup and go straight to asking questions.

Key features:

  • Citations linked to exact PDF pages for verifiable answers
  • Question answering across multiple documents with figures and tables
  • Works with local models or cloud APIs like OpenAI, Azure, and Groq
  • Extensible Gradio-based UI with multi-user document management

gws: Replace Bulky MCP Tools with 100+ Modular Skill Files

Code example: gws: Replace Bulky MCP Tools with 100+ Modular Skill Files

Problem

Connecting AI agents to Google Workspace through MCP often means injecting every tool definition into each request, even if only a couple are needed.

That overhead quickly eats into the token budget, leaving less room for reasoning and task execution.

Solution

gws solves this by replacing bulky tool definitions with 100+ modular SKILL.md files.

Agents load only the skills they need, keeping the context lean and efficient.

Key features:

  • Works with Claude Code, Cursor, Gemini CLI, and other AI agents out of the box
  • 100+ skill files covering Google Docs, Sheets, Drive, Calendar, and more
  • Agents load only relevant skills instead of full tool definitions

☕️ Weekly Finds

unstructured [RAG] – Turn any document into clean, structured data ready for RAG pipelines and LLM applications.

json_repair [LLM] – Repair malformed JSON from LLMs, APIs, and logs. A drop-in replacement for json.loads() that auto-fixes broken output.

MindsDB [AI Agents] – Query AI models directly from your database using SQL. Connect 200+ data sources to LLMs, ML, and vector operations.

Looking for a specific tool? Explore 70+ Python tools →

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran