Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Python Tips

Lambda vs Named Functions: When to Use Each

Lambda functions are ideal when a function is used only once and does not require a name. They provide a concise way to define small, one-time-use functions.

For example:

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9]

# use lambda function because it is used only once
even_numbers = filter(lambda num: num % 2 == 0, numbers)

In this example, the lambda function filters out even numbers from the list. Since it’s only used once, a lambda function is a suitable choice.

However, if you need to reuse a function in various parts of your code, it’s better to define a named function.

# use named function because it is used multiple times
def is_even(num: int):
return num % 2 == 0

even_numbers = filter(is_even, numbers)
any(is_even(num) for num in numbers)

True

In this example, the is_even function is defined by a name and is used multiple times. This approach avoids repeating the same code and makes your code more maintainable.
Favorite

Lambda vs Named Functions: When to Use Each Read More »

Simplify Custom Object Operations with Python Magic Methods

Manually defining arithmetic operations between custom objects can make code less readable and harder to understand.

Python’s special methods, such as __add__, enable natural arithmetic syntax between custom objects, making code more readable and intuitive, and allowing you to focus on program logic.

Let’s consider an example to illustrate the problem. Suppose we have a class Animal with attributes species and weight, and we want to calculate the total weight of two animals.

class Animal:
def __init__(self, species: str, weight: float):
self.species = species
self.weight = weight

lion = Animal("Lion", 200)
tiger = Animal("Tiger", 180)

# Messy calculations
total_weight = lion.weight + tiger.weight
total_weight

Output:

380

In this example, we have to manually access the weight attribute of each Animal object and perform the addition. This approach is not only verbose but also error-prone, as we might accidentally access the wrong attribute or perform the wrong operation.

To enable natural arithmetic syntax, we can implement the __add__ method in the Animal class. This method will be called when we use the + operator between two Animal objects.

class Animal:
def __init__(self, species: str, weight: float):
self.species = species
self.weight = weight

def __add__(self, other):
# Combine weights
return Animal(f"{self.species}+{other.species}", self.weight + other.weight)

lion = Animal("Lion", 200)
tiger = Animal("Tiger", 180)
combined = lion + tiger
combined.weight

Output:

380

By implementing the __add__ method, we can now use the + operator to combine the weights of two Animal objects. This approach is not only more readable but also more intuitive.
Favorite

Simplify Custom Object Operations with Python Magic Methods Read More »

Wrting Scalable Data Science Code with the Open-Closed Principle

Use the Open-Closed Principle to create easily extendable classes without modifying existing code. This approach reduces the risk of introducing bugs in existing, tested code and allows for easy addition of new features or functionalities.

Without adhering to the Open-Closed Principle, you might frequently modify existing classes to add new features. Consider the following example of a data visualization class:

class DataVisualizer:
def visualize(self, data, chart_type):
if chart_type == "bar":
self.create_bar_chart(data)
elif chart_type == "line":
self.create_line_chart(data)
elif chart_type == "scatter":
self.create_scatter_plot(data)

def create_bar_chart(self, data):
print("Creating bar chart…")

def create_line_chart(self, data):
print("Creating line chart…")

def create_scatter_plot(self, data):
print("Creating scatter plot…")

In this example, every time you want to add a new chart type, you need to modify the visualize method and add a new method for the chart type. This violates the Open-Closed Principle and can lead to a growing, unwieldy class.

Instead, you can redesign the class to be open for extension but closed for modification:

from abc import ABC, abstractmethod

class Chart(ABC):
@abstractmethod
def create(self, data):
pass

class BarChart(Chart):
def create(self, data):
print("Creating bar chart…")

class LineChart(Chart):
def create(self, data):
print("Creating line chart…")

class ScatterPlot(Chart):
def create(self, data):
print("Creating scatter plot…")

class DataVisualizer:
def visualize(self, data, chart):
chart.create(data)

# Usage
visualizer = DataVisualizer()
visualizer.visualize(data, BarChart())
visualizer.visualize(data, LineChart())

In this improved version:

We define an abstract Chart class with a create method.

Each specific chart type (BarChart, LineChart, ScatterPlot) inherits from Chart and implements its own create method.

The DataVisualizer class now takes a Chart object as a parameter, making it closed for modification.

This design allows for adding new chart types by creating classes that inherit from Chart, without modifying the DataVisualizer class, making the code more modular and easier to test.
Favorite

Wrting Scalable Data Science Code with the Open-Closed Principle Read More »

Python Data Handling: Lists or NumPy Arrays?

Python offers two popular data structures for storing collections: built-in lists and NumPy arrays. Understanding their differences is crucial for efficient programming.

Key Differences:

Data Types

Lists: Can mix types

mixed_list = [1, "hello", 3.14, True]

NumPy: Homogeneous

import numpy as np
homogeneous_array = np.array([1, 2, 3, 4])

Performance

Lists: Slower for numerical operations

list_data = list(range(1000000))
%time squared_list = [x**2 for x in list_data]
# Output: CPU times: user 26.4 ms, sys: 7.96 ms, total: 34.3 ms

NumPy: Optimized for numerical computations

np_data = np.arange(1000000)
%time squared_np = np_data**2
# Output: CPU times: user 9 ms, sys: 830 μs, total: 9.83 ms

Functionality

Lists: Basic operations

lst = [1, 2, 3]
lst.append(4)
lst.insert(0, 0)
print(lst) # Output: [0, 1, 2, 3, 4]

NumPy: Advanced mathematical operations and broadcasting

arr = np.array([1, 2, 3])
print(np.sin(arr)) # Output: [0.84147098 0.90929743 0.14112001]
print(arr + np.array([10, 20, 30])) # Output: [11 22 33]

Dimensionality

Lists: Nesting for multi-dimensions

nested_list = [[1, 2], [3, 4], [5, 6]]
print(nested_list[1][0]) # Output: 3

NumPy: Native support for multi-dimensional arrays

matrix = np.array([[1, 2], [3, 4], [5, 6]])
print(matrix.shape) # Output: (3, 2)
print(matrix[:, 1]) # Output: [2 4 6]

When to Use Python Lists:

Storing mixed data types

Frequently changing collection size

Working with small to medium-sized data

General-purpose programming

Example:

user_data = [
{"name": "Alice", "age": 30, "active": True},
{"name": "Bob", "age": 25, "active": False}
]

When to Use NumPy Arrays:

Large numerical datasets

Scientific computing and data analysis

Need for advanced mathematical operations

Working with multi-dimensional data

Example:

import numpy as np

data = np.array([[1, 2, 3], [4, 5, 6]])
mean = np.mean(data)
std_dev = np.std(data)
print(f"Mean: {mean}, Standard Deviation: {std_dev}")
Favorite

Python Data Handling: Lists or NumPy Arrays? Read More »

8 Powerful Python List Methods to Supercharge Your Code

Python lists come with a variety of built-in methods that make working with them efficient and convenient. Here’s an overview of some of the most useful list methods:

append(x)

Adds an element to the end of the list.

fruits = ['apple', 'banana']
fruits.append('cherry')
fruits
# ['apple', 'banana', 'cherry']

extend(iterable)

Adds all elements from an iterable to the end of the list.

fruits = ['apple', 'banana']
fruits.extend(['cherry', 'date'])
fruits
# ['apple', 'banana', 'cherry', 'date']

insert(i, x)

Inserts an element at a specified position.

fruits = ['apple', 'banana']
fruits.insert(1, 'cherry')
fruits
# ['apple', 'cherry', 'banana']

remove(x)

Removes the first occurrence of an element.

fruits = ['apple', 'banana', 'cherry', 'banana']
fruits.remove('banana')
fruits
# ['apple', 'cherry', 'banana']

pop([i])

Removes and returns the element at a given position. If no index is specified, it removes and returns the last item.

fruits = ['apple', 'banana', 'cherry']
last = fruits.pop() # last = 'cherry'
second = fruits.pop(1) # second = 'banana'
# fruits is now ['apple']

index(x[, start[, end]])

Returns the index of the first occurrence of an element. Can specify start and end positions for the search.

fruits = ['apple', 'banana', 'cherry', 'banana']
index = fruits.index('banana') # index = 1

count(x)

Returns the number of occurrences of an element in the list.

fruits = ['apple', 'banana', 'cherry', 'banana']
count = fruits.count('banana') # count = 2

reverse()

Reverses the elements of the list in place.

fruits = ['apple', 'banana', 'cherry']
fruits.reverse()
# fruits is now ['cherry', 'banana', 'apple']

Understanding and effectively using these methods can lead to more efficient and readable code.
Favorite

8 Powerful Python List Methods to Supercharge Your Code Read More »

From Novice to Pro: Transitioning from Lists to Sets in Python

Sets in Python are a powerful tool that can significantly enhance your code’s efficiency. Here’s why sets are so useful:

Uniqueness Guaranteed

Sets automatically eliminate duplicates, ensuring each element appears only once.

unique_numbers = set([1, 2, 2, 3, 3, 3, 4])
unique_numbers

{1, 2, 3, 4}

Lightning-Fast Lookups

Membership testing in sets is extremely fast (O(1) on average).

large_set = set(range(1000000))
999999 in large_set # Nearly instant!

True

Efficient Set Operations

Perform mathematical set operations with ease.

Union: Combine sets to get all unique elements

# Creating a set
fruits = {"apple", "banana", "cherry"}

more_fruits = {"mango", "grape"}
all_fruits = fruits.union(more_fruits)
all_fruits

{'apple', 'banana', 'cherry', 'grape', 'mango'}

Intersection: Find common elements between sets

tropical_fruits = {"banana", "mango", "pineapple"}
common_fruits = all_fruits.intersection(tropical_fruits)
common_fruits

{'banana', 'mango'}

# Difference with another set
non_tropical_fruits = all_fruits.difference(tropical_fruits)
non_tropical_fruits

{'apple', 'cherry', 'grape'}

Memory Efficiency

Sets can be more memory-efficient than lists for storing large amounts of unique data.

Quick Deduplication

Convert a list to a set and back to quickly remove duplicates.

deduplicated = list(set([1, 2, 2, 3, 3, 3])
Favorite

From Novice to Pro: Transitioning from Lists to Sets in Python Read More »

Choosing the Right Data Structure in Python

Python offers several built-in data structures that are essential for efficient programming. In this guide, we’ll explore four fundamental structures: tuples, lists, sets, and dictionaries. We’ll discuss their characteristics, use cases, and how to work with them effectively.

Tuples: Immutable Sequences

Tuples are immutable sequences, meaning once created, they cannot be modified.

Key Characteristics:

Ordered

Immutable

Allow duplicates

Support indexing

When to Use Tuples:

When immutability is required (e.g., as dictionary keys)

To ensure data integrity

To return multiple values from a function

Example:

coordinates = (40.7128, -74.0060)
city_info = {
coordinates: "New York City"
}
print(city_info[coordinates]) # Output: New York City

Lists: Versatile Mutable Sequences

Lists are ordered collections that can store various object types and be modified after creation.

Key Characteristics:

Ordered

Mutable

Allow duplicates

Support indexing

When to Use Lists:

When order matters

When you need to modify the collection

When duplicates are allowed

Example:

stock_prices = [100, 101, 102, 103, 100]
stock_prices.append(99)
print(stock_prices) # Output: [100, 101, 102, 103, 100, 99]

Sets: Unique, Unordered Collections

Sets are unordered collections of unique elements.

Key Characteristics:

Unordered

Mutable

No duplicates

No indexing

When to Use Sets:

To ensure uniqueness

For mathematical set operations

For efficient membership testing

Example:

fruits = {"apple", "banana", "cherry"}
fruits.add("date")
fruits.update(["elderberry", "fig"])
fruits.remove("banana")
print(fruits) # Output: {'apple', 'cherry', 'date', 'elderberry', 'fig'}

Dictionaries: Key-Value Pairs

Dictionaries are collections of key-value pairs.

Key Characteristics:

Ordered (as of Python 3.7+)

Mutable

No duplicate keys

Keys must be immutable

When to Use Dictionaries:

For efficient key-based lookups

To store related data as key-value pairs

Example:

car = {
"make": "Toyota",
"model": "Corolla",
"year": 2020
}
car["color"] = "blue"
car.pop("year")
print(car) # Output: {'make': 'Toyota', 'model': 'Corolla', 'color': 'blue'}

Performance Considerations

Set membership testing is generally faster than list membership testing, especially for large collections.

Dictionary key lookups are very efficient.

Copying vs. Referencing

When assigning one variable to another, be aware of whether you’re creating a new reference or a copy:

# Referencing (both variables point to the same object)
list1 = [1, 2, 3]
list2 = list1

# Copying (creates a new object)
list3 = list1.copy()

By understanding these data structures and their properties, you can choose the most appropriate one for your specific programming needs, leading to more efficient and readable code.
Favorite

Choosing the Right Data Structure in Python Read More »

Embracing Duck Typing for Cleaner, More Adaptable Data Science Code

Duck typing comes from the phrase “If it walks like a duck and quacks like a duck, then it must be a duck.” This concept allows for writing flexible code that works with different object types, as long as they possess the required methods or attributes.

For data scientists, duck typing enables the creation of versatile functions that work seamlessly with various data structures without explicit type checking.

Let’s explore this with a simple example:

import numpy as np
import pandas as pd

class CustomDataFrame:
def __init__(self, data):
self.data = data
def mean(self):
return np.mean(self.data)
def std(self):
return np.std(self.data)

def analyze_data(data):
print(f"Mean: {data.mean()}")
print(f"Standard Deviation: {data.std()}")

# These all work, thanks to duck typing
numpy_array = np.array([1, 2, 3, 4, 5])
pandas_series = pd.Series([1, 2, 3, 4, 5])
custom_df = CustomDataFrame([1, 2, 3, 4, 5])

analyze_data(numpy_array)
analyze_data(pandas_series)
analyze_data(custom_df)

Output:

Mean: 3.0
Standard Deviation: 1.4142135623730951
Mean: 3.0
Standard Deviation: 1.5811388300841898
Mean: 3.0
Standard Deviation: 1.4142135623730951

In this example, analyze_data works with NumPy arrays, Pandas Series, and our custom CustomDataFrame class because they all have mean and std methods.

Benefits of Duck Typing in Data Science

Time-saving: You don’t need separate functions for different data types.

Code Cleanliness: You avoid numerous if statements for type checking.

Adaptability: Your code can easily handle new data types.

Consider how the code might look without duck typing:

def analyze_data(data):
if isinstance(data, np.ndarray):
mean = np.mean(data)
std = np.std(data)
elif isinstance(data, pd.Series):
mean = data.mean()
std = data.std()
elif isinstance(data, CustomDataFrame):
mean = data.mean()
std = data.std()
else:
raise TypeError("Unsupported data type")

print(f"Mean: {mean}")
print(f"Standard Deviation: {std}")

This approach is less flexible and requires modification each time a new data type is introduced, making the code more complex and harder to maintain.
Favorite

Embracing Duck Typing for Cleaner, More Adaptable Data Science Code Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran