Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors
Filter by Categories
About Article
Analyze Data
Archive
Best Practices
Better Outputs
Blog
Code Optimization
Code Quality
Command Line
Course
Daily tips
Dashboard
Data Analysis & Manipulation
Data Engineer
Data Visualization
DataFrame
Delta Lake
DevOps
DuckDB
Environment Management
Feature Engineer
Git
Jupyter Notebook
LLM
LLM Tools
Machine Learning
Machine Learning & AI
Machine Learning Tools
Manage Data
MLOps
Natural Language Processing
Newsletter Archive
NumPy
Pandas
Polars
PySpark
Python Helpers
Python Tips
Python Utilities
Scrape Data
SQL
Testing
Time Series
Tools
Visualization
Visualization & Reporting
Workflow & Automation
Workflow Automation

Machine Learning

Distributed Machine Learning with MLlib

Use MLlib, a Spark-based library, for distributed machine learning tasks and large-scale datasets.

Similar to scikit-learn, MLlib provides the tools for:

🔹 ML Algorithms: Classification, regression, clustering, and collaborative filtering
🔹 Featurization: Feature extraction, transformation, dimensionality reduction, and selection
🔹 Pipelines: Construction, evaluation, and tuning of ML Pipelines

Distributed Machine Learning with MLlib Read More »

Efficient Experiment Management with DVC’s VSCode Extension

DVC is a version control system designed to manage data, machine learning models, and experiments.

With the DVC VSCode extension, you can:
🔹 Execute and visualize your machine-learning experiments
🔹 Share experiments
🔹 Revisit specific experiments to view associated code and data
🔹 Create dedicated branches for individual experiments

Efficient Experiment Management with DVC’s VSCode Extension Read More »

Automatic Model Evaluation and Explainability with MLflow Evaluate

After training your ML model, you can use mlflow.evalutes() to automatically generate relevant metrics without requiring manual metric creation.

Additionally, you can gain insights into the factors influencing the model’s predictions using various techniques, such as feature importance analysis or SHAP.

Automatic Model Evaluation and Explainability with MLflow Evaluate Read More »

Covalent: Accelerate Machine Learning with Local Model Iteration

Covalent is a Python library that allows you to
🔹 Iterate quickly on large ML models in a local environment
🔹 Assign resource-intensive functions to advanced hardware.

This capability is particularly valuable in scenarios like federated learning, which ensures data privacy by training models on isolated data sources and sharing only model updates.

Covalent is also ideal for other use cases such as machine learning, computer vision, and NLP.

Covalent: Accelerate Machine Learning with Local Model Iteration Read More »

Scroll to Top

Work with Khuyen Tran

Work with Khuyen Tran