Authors: Subhabrata Banerjee, Pooja Ayanile, Divyank Garg, Ajit Patankar, Sabyasachi Mukhopadhyay

Introduction

In a previous blog about Initiating Ray clusters we created a cluster of 5 VMs for developing a parallel distributed application for AI/ML. Legacy Python classes and functions can then be converted to Ray enabled code to run on the distributed cluster. Apart from this, Ray provides different AI/ML libraries on top of common defined libraries to improve performance and model accuracy. …


Authors: Divyank Garg, Ajit Patankar, Sabyasachi Mukhopadhyay, Subhabrata Banerjee, Pooja Ayanile

Introduction

As described in the previous posts, Ray can be used to distribute model training across a cluster of machines. The previous example did not have integration of sklearn and Ray. In this post we describe tight integration of sklearn and Ray and show how this integration results in better performance without sacrificing accuracy. This integration is achieved by specifying Ray as the backend for sklearn. We address this specific use case in this post.

Case studies

In this use case, we build a classification model using a data set of approximately…


Authors: Pooja Ayanile, Divyank Garg, Ajit Patankar, Sabyasachi Mukhopadhyay, Subhabrata Banerjee

Introduction

Model selection involves training multiple models on the same data set and comparing the resulting output and then selecting the best optimal model based criterias such as execution time and performance. Generally, model selections processes are of the same type for both machine learning and deep learning models.

Model Selection is a very important stage in a ML pipeline to make better inferences and future strategies but it is also one of the time consuming tasks. Due to the repetitive nature of Model Training and Testing, we can take…


Authors: Sabyasachi Mukhopadhyay, Subhabrata Banerjee, Pooja Ayanile, Divyank Garg, Ajit Patankar

Although the installation of Ray cluster is fairly straightforward, there is still some learning curve involved in getting to a robust installation. The goal of this blog post is to facilitate this learning process.

While Ray is primarily intended to be used in a cluster of multiple machines, it can also be used on a single machine and in that case it will try to use all the available cores on that machine and one can still get some performance improvement. …


Author: Ajit Patankar, Sabyasachi Mukhopadhyay, Subhabrata Banerjee, Pooja Ayanile, Divyank Garg

Ray is a distributed computing framework primarily designed for AI/ML applications. Ray originated with the RISE Lab at UC Berkeley. We have extensively used Ray in our AI/ML development process and summarize our evaluation in this series of blog posts. Our team develops complex AI/ML models primarily on large telemetry and NLP data sources. We started evaluating new generation of distributed computing tools in response to the following issues:

  • Inability to support data and model pipelines on a single machine.
  • Frequent necessity of developing custom infrastructure tools

Amongst the…

juniper ai/ml team

Team actively working on Ray

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store