Ray enjoys a growing popularity in the machine learning community. Getting it up and running under Windows can be tricky however. This blog tells you how.
Why you want to use Ray
Ray is a stunningly versatile framework for distributed computing. Ray does not use the Java Virtual Machine, and instead relies on efficient C++ code, together with an elegant and simple pythonic API. From a machine learning perspective, it has two main virtues. First of all, it can be used to tie popular tools like Spark and Tensorflow/Pytorch together into a single cluster, with elegant options for data transfer between the two. There is also brand new support for . So Ray lets you manage your distributed computing power whenever you need a lot of it for machine learning in general and deep learning in particular.
Ray's second virtue is its [great support for reinforcement learning (also in conjunction with Tensorflow and/or Pytorch). See here for a gentle introduction to reinforcement learning, which has many applications, including recommender systems.
What makes Ray difficult to use under Windows
First of all, Windows support for Ray is in alpha, and obviously not recommended for production use. Nevertheless, when getting to learn ray, some of you may still want to install it on their Windows laptop, if you don't prefer to use a Linux-based installation. When you take a look at the official installation instructions is a breeze, not only on Linux, but also on Windows: update your Visual C++ Runtime, do a simple installation with pip, and there you go.
Except that nowadays, everybody tries to keep their python environments neatly separated, which means that instead of pip, you usually use an package manager like venv/virtualenv, pipenv, pew, or conda. And that's where the fun begins. Using conda with Ray is technically a working option. For some of us, however, this option doesn't work from a legal point of view. The problem is that Anaconda Inc. changed their terms of service for the code repository that is used with conda, to require a commercial license if you're in a business with 200 or more employees.
Since this change, many in the machine learning community have switched to other package managers. Unfortunately, basic tools like venv (which is part of the Python standard library since version 3.3) and virtualenv don't work with Ray under Windows. What makes this worse is that pipenv also uses venv under the hood, so it doesn't work either. This excludes pretty much all of the most popular conda alternatives.
What's the best workaround?
Of course, we could just forget about dependency management and use pip to directly install Ray directly into the site-packages of our Python installation. Unfortunaly, this kind of dirty hack will get you into dependency hell sooner or later, and that's a place we definitely want to avoid. So we try to find the root cause of the problem, hoping we might discover a less clumsy workaround along the way. It turns out that a bug in the Windows version of venv causes the problem. So ironically, it's not the young framework Ray that is to blame here, but a much more mature piece of software. When you go through the details of the bug, you see that it was introduced in Python 3.7.3 (and seems complex enough not to be fixed soon). You also see that venv is positively confirmed to work correctly in Python 3.6. Hence, faced with the choice of either not using a package manager at all (if conda is legally off limits), or using an older version of Python, the latter will usually be the smaller evil. Maybe you would also like to reconsider an installation under Linux as Python 3.6 is admittedly outdated.
Hence, the solution for the Windows installation is the following (using pipenv, venv is similar):
- Install some version of Python 3.6 from the official Python release download page. I'd recommend 3.6.8, as it's the latest version available for download.
- Use gitbash or the Windows command line to navigate to the directoy of your Python project
- Issue the command "pipenv --python 3.6.8" to create a pipenv environment that uses Python 3.6.8 (or whatever version of Python 3.6 you chose).
- Do a quick "pipenv install ray" to install ray.
And now it works and you're ready to go! Take a look at the great tutorials on the Ray web site and launch your discovery tour into the world of Ray.
Any questions about the installation or the advantages of Ray under Windows? I’d love to exchange ideas with you!