Distributed Machine Learning with MLlib

Use MLlib, a Spark-based library, for distributed machine learning tasks and large-scale datasets.

Similar to scikit-learn, MLlib provides the following tools:

  • ML Algorithms: Classification, regression, clustering, and collaborative filtering
  • Featurization: Feature extraction, transformation, dimensionality reduction, and selection
  • Pipelines: Construction, evaluation, and tuning of ML Pipelines

Link to MLlib.

Scroll to Top