Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM
Machine Learning
Machine Learning
Machine Learning & AI
Manage Data
MLOps
Natural Language Processing
NumPy
Pandas
Polars
PySpark
Python Tips
Python Utilities
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

About Article

Setting Up Automated Model Training Workflows with AWS S3

Consider you’re an e-commerce platform aiming to enhance recommendation personalization. Your data resides in S3.

To refine recommendations, you plan to retrain recommendation models using fresh customer interaction data whenever a new file is added to S3. But how exactly do you do this?

In this article, you will learn how to set up this workflow using Kestra.
Favorite

Setting Up Automated Model Training Workflows with AWS S3 Read More »

5 Steps to Transform Messy Functions into Production-Ready Code

In a data science project, writing poorly designed functions can introduce maintenance hurdles and diminish the code’s readability.

In this article, you will learn how to create a function that:

Perform a single, well-defined task

Can be extended without modifying the original code

Are capable of handling inputs with unexpected variations

By following these principles, you’ll be able to create functions that are not only effective but also easy to maintain and understand.

Link to the article.
Favorite

5 Steps to Transform Messy Functions into Production-Ready Code Read More »

Build Reliable Machine Learning Pipelines with Continuous Integration

Continuous integration (CI) is the practice of automatically testing and integrating code changes into a shared repository.

In a machine learning project, CI can be very useful for several reasons:

Catching errors early: CI facilitates the early identification of errors by automatically testing any code changes made.

Faster feedback and decision-making: By providing clear metrics and parameters, CI enables faster decision-making, freeing up reviewer time for more critical tasks.

In my latest article and video, you will learn how to create CI in an ML project.

ArticleVideoCode
Favorite

Build Reliable Machine Learning Pipelines with Continuous Integration Read More »

Create Observable and Reproducible Notebooks with Hex

Jupyter Notebook is not ideal for interpretability, reproducibility, and versioning for numerous reasons. Hex notebooks solve these issues with a graph-based execution model.

Hex links cells through their dependencies and executes only the cells whose dependencies change. The GIF above demonstrates this.

In my latest article, you will learn some useful features of Hex and how to integrate Hex into your data pipeline with Prefect.

Link to the article.
Favorite

Create Observable and Reproducible Notebooks with Hex Read More »

PRegEx: Write Human-Readable Regular Expressions in Python

A regular expression (or RegEx) is a string of text that lets you create patterns to match a text. A complicated RegEx is difficult to read and create. Is there a way that you can write a more human-readable RegEx with ease?

That is when PRegEx comes in handy. PRegEx is a Python package that allows you to construct RegEx patterns in a more human-friendly way.

In my latest article, you will learn how how to use PRegex for different use cases.

Article.

Source code.
Favorite

PRegEx: Write Human-Readable Regular Expressions in Python Read More »

0
    0
    Your Cart
    Your cart is empty
    Scroll to Top

    Work with Khuyen Tran

    Work with Khuyen Tran