📅 Today’s Picks |
LangChain: Smart Text Chunking Without Breaking Context
Problem:
RAG (Retrieval-Augmented Generation) applications require splitting documents into smaller chunks for processing.
However, basic text splitting breaks semantic meaning, making your embeddings less effective for retrieval.
Solution:
LangChain’s RecursiveCharacterTextSplitter ensures your document chunks maintain meaning and context for better RAG performance.
It intelligently splits text by trying these separators in order:
- Double newlines (paragraphs)
- Single newlines
- Periods
- Spaces
- Individual characters (as last resort)
RecursiveCharacterTextSplitter also allows you to configure the chunk size and overlap to your specific use case.
Full Article:
Altair: Multi-Chart Filtering in Pure Python
Problem:
Static individual charts fail to show relationships between different data views and perspectives.
Traditional dashboards require complex backend infrastructure for interactive filtering.
Solution:
Altair’s linked plots enable interactive selections that dynamically filter multiple connected visualizations.
Other features of Altair:
- Declarative syntax that makes visualization intuitive
- Built-in data transformations and aggregations
- Seamless chart composition and layering
☕️ Weekly Finds |
Boruta-Shap
ML
A Tree based feature selection algorithm which combines both the Boruta feature selection algorithm with Shapley values for interpretable feature importance
py-roughviz
Data Viz
A python visualization library for creating sketchy/hand-drawn styled charts that look fun and catchy compared to standard matplotlib graphs
prek
Python Utils
Better pre-commit re-engineered in Rust – automatically installs required Python versions and creates virtual environments with no hassle