Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Newsletter #291: Docling: Turn DOCX Reviewer Feedback into Structured Data

Newsletter #291: Docling: Turn DOCX Reviewer Feedback into Structured Data

Grab your coffee. Here are this week’s highlights.


๐Ÿ“… Today’s Picks

Narwhals: One Decorator for pandas, Polars, and DuckDB

Code example: Narwhals: One Decorator for pandas, Polars, and DuckDB

Problem

Writing a DataFrame function that supports multiple libraries usually means maintaining separate versions of the same logic for each one.

If changes are needed, they need to be applied to every version.

Solution

With Narwhals@narwhalify decorator, you write the logic once using a unified API.

The function then works with whatever DataFrame type is passed in and returns the same type, reducing friction when switching tools.

How is this different from Ibis? Ibis is built for data scientists switching between SQL backends. Narwhals is built for library authors who need their code to work with any DataFrame type.


Docling: Turn DOCX Reviewer Feedback into Structured Data

Code example: Docling: Turn DOCX Reviewer Feedback into Structured Data

Problem

Pulling comments from Word files turns informal feedback into data you can analyze, manage, and act on in code.

Traditionally, this requires parsing raw XML and manually mapping each comment back to its referenced text.

Solution

Docling v2.71.0 simplifies this process. Converted documents now attach a comments field to every text item, making reviewer annotations accessible without manual XML handling.

This opens up workflows that were previously too tedious to automate:

  • Flag unresolved comments before merging document versions
  • Build dashboards tracking reviewer feedback across teams
  • Feed comment data into LLMs for sentiment analysis or summarization

๐Ÿ“š Latest Deep Dives

Portable DataFrames in Python: When to Use Ibis, Narwhals, or Fugue – Write your DataFrame logic once and run it on any backend. Compare Ibis, Narwhals, and Fugue to find the right portability strategy for your Python workflow.


โ˜•๏ธ Weekly Finds

pdfGPT [LLM] – Chat with the contents of your PDF files using GPT capabilities and semantic search with sentence embeddings

SandDance [Data Viz] – Microsoft Research data visualization tool that maps every data row to a visual mark for interactive exploration

trafilatura [Web Scraping] – Python package and CLI for web crawling, scraping, and text extraction with output as CSV, JSON, HTML, or XML

Looking for a specific tool? Explore 70+ Python tools โ†’

Stay Current with CodeCut

Actionable Python tips, curated for busy data pros. Skim in under 2 minutes, three times a week.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran