Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM
Machine Learning
Machine Learning
Machine Learning & AI
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Tips
Python Utilities
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Code Optimization

Ruff: The Fast All-in-One Python Code Quality Tool

Git hooks are useful to identify simple issues before committing the code. However, adding various hooks to each of your projects can be cumbersome.

Ruff is a Python linter written in Rust that can be used to replace various tools like Flake8, isort, pydocstyle, yesqa, eradicate, pyupgrade, and autoflake.

Ruff also executes 10-100x faster than existing linters.

Ruff: The Fast All-in-One Python Code Quality Tool Read More »

heartrate: Real-time Code Visualization in Python

The Problem: Manual Debugging and Print Statements

Understanding how your Python code executes in real-time and identifying performance bottlenecks can be challenging. Traditional debugging methods and print statements often clutter code and provide an incomplete picture of program flow.

Let’s consider an example of a simple factorial function with manual debugging and print statements.

def factorial(x, depth=0):
print(f"Calculating factorial({x})")

if x == 1:
print(f"Base case: factorial(1) = 1")
return 1
else:
result = x * factorial(x-1, depth + 1)
print(f"factorial({x}) = {x} * factorial({x-1}) = {result}")
return result

if __name__ == "__main__":
num = 5
result = factorial(num)
print(f"The factorial of {num} is {factorial(num)}")

Output:

Calculating factorial(5)
Calculating factorial(4)
Calculating factorial(3)
Calculating factorial(2)
Calculating factorial(1)
Base case: factorial(1) = 1
factorial(2) = 2 * factorial(1) = 2
factorial(3) = 3 * factorial(2) = 6
factorial(4) = 4 * factorial(3) = 24
factorial(5) = 5 * factorial(4) = 120
Calculating factorial(5)
Calculating factorial(4)
Calculating factorial(3)
Calculating factorial(2)
Calculating factorial(1)
Base case: factorial(1) = 1
factorial(2) = 2 * factorial(1) = 2
factorial(3) = 3 * factorial(2) = 6
factorial(4) = 4 * factorial(3) = 24
factorial(5) = 5 * factorial(4) = 120
The factorial of 5 is 120

This approach results in cluttered code and repeated execution of the factorial function.

The Solution: Heartrate

Heartrate is a Python library that allows you to visualize your code execution in real time through a browser interface. It shows line execution counts, recent activity with color-coded bars, and a live stack trace without modifying your source code.

To use Heartrate, you only need to add two lines of code.

import heartrate
heartrate.trace(browser=True)

def factorial(x):
if x == 1:
sleep(1)
return 1
else:
sleep(1)
return (x * factorial(x-1))

if __name__ == "__main__":
num = 5
print(f"The factorial of {num} is {factorial(num)}")

This will open a browser window displaying the visualization of the code execution.

The visualization consists of the following components:

Line hit counts on the left side

Visual bars showing recent line executions (longer = more hits, lighter = more recent)

Currently executing lines highlighted

By using heartrate, you can gain a deeper understanding of your code’s execution flow and identify performance bottlenecks without cluttering your code with print statements.

Link to heartrate.
Favorite

heartrate: Real-time Code Visualization in Python Read More »

eradicate: Remove Junk Comments from Python Files

Outdated or unused code left as comments in Python files can clutter codebases, making them harder to read and maintain.

Eradicate solves this by automatically identifying and removing commented-out code from Python files.

Let’s see eradicate in action:

Example Python file:

# from math import *

def mean(nums: list):
# print(nums)
# TODO: check if nums is empty
# Return mean
return sum(nums) / len(nums)

# nums = [0, 1]
nums = [1, 2, 3]
mean(nums)

Preview changes:

$ eradicate main.py

— before/main.py
+++ after/main.py
@@ -1,11 +1,8 @@
-# from math import *

def mean(nums: list):
– # print(nums)
# TODO: check if nums is empty
# Return mean
return sum(nums) / len(nums)

-# nums = [0, 1]
nums = [1, 2, 3]
mean(nums)

Apply changes:

$ eradicate main.py -i

Results:

def mean(nums: list):
# TODO: check if nums is empty
# Return mean
return sum(nums) / len(nums)

nums = [1, 2, 3]
mean(nums)

In this example, eradicate removes:

The commented-out import statement # from math import *

The commented-out debug print statement # print(nums)

The commented-out variable assignment # nums = [0, 1]

However, it preserves the meaningful comments:

The TODO comment # TODO: check if nums is empty

The descriptive comment # Return mean

This cleanup improves the code’s readability by removing distracting, unused code comments while keeping important notes for developers.

You can use eradicate with pre-commit by adding the following to your .pre-commit-config.yaml file:

repos:
– repo: https://github.com/wemake-services/eradicate/
rev: v2.2.0
hooks:
– id: eradicate

Link to eradicate.
Favorite

eradicate: Remove Junk Comments from Python Files Read More »

Quix Streams: Real-Time Data Processing in Python

Traditional batch processing techniques can be slow when handling large data sets that arrive continuously. In contrast, data streaming is a robust method that enables real-time processing of such data.

Quix Streams is a Python library that enables data streaming by leveraging Streaming DataFrames, which are similar to pandas DataFrames used for batch processing.

This familiar interface allows pandas users to easily build stream processing pipelines with minimal code.

Link to Quix Streams.
Favorite

Quix Streams: Real-Time Data Processing in Python Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran