Machine Learning Archives

Distributed Machine Learning with MLlib

Leave a Comment / Machine Learning Tools / Khuyen Tran

Use MLlib, a Spark-based library, for distributed machine learning tasks and large-scale datasets.

Similar to scikit-learn, MLlib provides the tools for:

🔹 ML Algorithms: Classification, regression, clustering, and collaborative filtering
🔹 Featurization: Feature extraction, transformation, dimensionality reduction, and selection
🔹 Pipelines: Construction, evaluation, and tuning of ML Pipelines

Distributed Machine Learning with MLlib Read More »

Efficient Experiment Management with DVC’s VSCode Extension

Leave a Comment / Machine Learning Tools / Khuyen Tran

DVC is a version control system designed to manage data, machine learning models, and experiments.

With the DVC VSCode extension, you can:
🔹 Execute and visualize your machine-learning experiments
🔹 Share experiments
🔹 Revisit specific experiments to view associated code and data
🔹 Create dedicated branches for individual experiments

Efficient Experiment Management with DVC’s VSCode Extension Read More »

Automatic Model Evaluation and Explainability with MLflow Evaluate

Leave a Comment / Machine Learning Tools / Khuyen Tran

After training your ML model, you can use mlflow.evalutes() to automatically generate relevant metrics without requiring manual metric creation.

Additionally, you can gain insights into the factors influencing the model’s predictions using various techniques, such as feature importance analysis or SHAP.

Automatic Model Evaluation and Explainability with MLflow Evaluate Read More »

Covalent: Accelerate Machine Learning with Local Model Iteration

Leave a Comment / Machine Learning Tools / Khuyen Tran

Covalent is a Python library that allows you to
🔹 Iterate quickly on large ML models in a local environment
🔹 Assign resource-intensive functions to advanced hardware.

This capability is particularly valuable in scenarios like federated learning, which ensures data privacy by training models on isolated data sources and sharing only model updates.

Covalent is also ideal for other use cases such as machine learning, computer vision, and NLP.

Covalent: Accelerate Machine Learning with Local Model Iteration Read More »

ManimML: Create Animations of Common ML Concepts in Python

Leave a Comment / Machine Learning Tools / Khuyen Tran

If you want to create animations and visualizations for common ML concepts in Python, try ManimML.

The GIF above is the visualization of the Variational Autoencoder made by ManimML.

ManimML: Create Animations of Common ML Concepts in Python Read More »

Create a Readable Machine Learning Pipeline in One Line of Code

Leave a Comment / Machine Learning Tools / Khuyen Tran

If you want to create a readable machine-learning pipeline in a single line of code, try the make_pipeline function in scikit-learn.

make_pipeline is especially useful when working with complex pipelines that involve many different transformers and estimators.

Create a Readable Machine Learning Pipeline in One Line of Code Read More »

PostgresML: Integrate Machine Learning with PostgreSQL

2 Comments / Machine Learning Tools, SQL / Khuyen Tran

If you want to seamlessly integrate machine learning models into your PostgreSQL database, use PostgresML.

PostgresML: Integrate Machine Learning with PostgreSQL Read More »

Scikit-LLM: Supercharge Text Analysis with ChatGPT and scikit-learn Integration

Leave a Comment / LLM Tools, Machine Learning Tools / Khuyen Tran

To integrate advanced language models with scikit-learn for enhanced text analysis tasks, use Scikit-LLM.

Scikit-LLM’s ZeroShotGPTClassifier enables text classification on unseen classes without requiring re-training.

Scikit-LLM: Supercharge Text Analysis with ChatGPT and scikit-learn Integration Read More »

Pipeline + GridSearchCV: Prevent Data Leakage when Scaling the Data

Leave a Comment / Feature Engineer, Machine Learning Tools / Khuyen Tran

Scaling the data before using GridSearchCV can lead to data leakage since the scaling tells some information about the entire data.

To prevent this, assemble both the scaler and machine learning models in a pipeline and then use it as the estimator for GridSearchCV.

Pipeline + GridSearchCV: Prevent Data Leakage when Scaling the Data Read More »

mlforecast: Scalable Machine Learning for Time Series

Leave a Comment / Machine Learning Tools, Time Series / Khuyen Tran

mlforecast is a Python library that allows you to:

Perform time series forecasting using machine learning models

Scale them to massive amounts of data, try mlforecast.

mlforecast: Scalable Machine Learning for Time Series Read More »

Machine Learning

Distributed Machine Learning with MLlib

Efficient Experiment Management with DVC’s VSCode Extension

Automatic Model Evaluation and Explainability with MLflow Evaluate

Covalent: Accelerate Machine Learning with Local Model Iteration

ManimML: Create Animations of Common ML Concepts in Python

Create a Readable Machine Learning Pipeline in One Line of Code

PostgresML: Integrate Machine Learning with PostgreSQL

Scikit-LLM: Supercharge Text Analysis with ChatGPT and scikit-learn Integration

Pipeline + GridSearchCV: Prevent Data Leakage when Scaling the Data

mlforecast: Scalable Machine Learning for Time Series

Drop a line

Get in touch

Follow Us on Social Media

Machine Learning

Work with Khuyen Tran

Work with Khuyen Tran