Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter #240: Auto-Summarize Chat History with LangChain Middleware

Newsletter #240: Auto-Summarize Chat History with LangChain Middleware


๐Ÿ“… Today’s Picks

Auto-Summarize Chat History with LangChain Middleware

Code example: Auto-Summarize Chat History with LangChain Middleware

Problem

Long chat histories can quickly increase token usage, leading to higher API costs and slower responses.

Solution

LangChain v1.0 introduces SummarizationMiddleware that automatically condenses older messages when token thresholds are exceeded.

Key features:

  • Integrates into existing LangChain agents with minimal code changes
  • Automatic summarization when token limits are reached
  • Preserves recent context with configurable message retention
  • Uses efficient models for summarization (e.g., gpt-4o-mini)

Batch Process DataFrames with PySpark Pandas UDF Vectorization

Code example: Batch Process DataFrames with PySpark Pandas UDF Vectorization

Problem

Traditional UDFs (User-Defined Functions) run your custom Python function on each row individually, which can significantly slow down DataFrame operations.

Solution

Pandas UDFs solve this by batching data into chunks and applying vectorized pandas transformations across entire columns, rather than looping through rows.

As a result, they can be 10 to 100 times faster on large DataFrames.


โ˜•๏ธ Weekly Finds

lifelines [ML] – Survival analysis in Python with Kaplan Meier, Cox regression, and parametric models

nb-clean [Python Utils] – Clean Jupyter notebooks for version control by removing outputs, metadata, and execution counts

FuzzTypes [Python Utils] – Pydantic extension for autocorrecting field values using fuzzy string matching

Looking for a specific tool? Explore 70+ Python tools โ†’

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

Leave a Comment

Your email address will not be published. Required fields are marked *

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran