Creating a Python package
Before being able to make our Python code available as a package, we need to make sure that our Python module meets the requirements for this:
- The module should have at least one of the files setup.py, setup.cfg or pyproject.toml. These can be used individually or in combination to define how the Python package should be installed later. Prerequisites such as Python version >= 3.9 can be specified in this manner, for example.
- The code should have a folder structure as shown in the following snippet: There is a main folder containing all configuration files and a subfolder. The subfolder contains the actual Python code.
structure of package:
├── setup.py # or setup.cfg or pyproject.toml
├── my_package
│ ├── __init__.py
│ └── example.py
After ensuring that these requirements have been met, we can create package distributions from our module:
cd my-package
python3 -m pip install --upgrade build
python3 -m build
ls ./dist
This code installs the native Python build tool for us, and uses it to create the Python distributions. The result is a WHEEL file as well as a TAR archive.
Setting up Google's artifact registry
Google's artifact registry offers a complete solution for images and code libraries. We will use it to version and manage our Python packages. The registry can be easily created via the UI or with gcloud.
Integrating Google's artifact registry
Before being able to load our package into the registry, we have to make a few more preparations:
- First of all, a pypirc file is needed. This file contains specifications for uploading packages to private registries. Here we list our newly created artifact registry and specify its URL.
# ~/.pypirc [distutils] index-servers = my-repository [my-repository] repository =
- Now we need to obtain authority to access Google's artifact registry. This is done via Python's keyring service. For this we also need Google's keyring library which allows us to use our GCP credentials for login. After logging into gcloud and installing the library, we no longer have to bother about access rights.
gcloud auth login python3 -m pip install keyrings.google-artifactregistry-auth
Uploading the package to Google's artifact registry
Our distributions have been built, our registry is ready and we are authorized for access. Now we can load our package into the registry. This is done with Python's standard tool Twine. After installing Twine, we can upload the package to the registry via the command line.
python3 -m pip install twine
twine upload --repository-url
./dist/*Done! Our package is in the cloud. From now on, it is accessible for all our services.
Use of the private package in a docker container
Now we can use the package directly in a new service. The easiest way to do this is in container-based solutions such as Vertex AI training jobs with custom containers.
For this purpose, we list the package in the docker service's requirements. Here we just have to make sure to specify our registry's URL. This tells tools like Pip where to look for listed dependencies.
Important! The URL requires the "/simple" suffix. This tells dependency management tools (pip) how to communicate with the server. For more details, refer to PEP 503.
# requirements.txt
--extra-index-url
/simple/
my-package
...
In the docker build process, it is then necessary to install Google's keyring library again. This also provides the docker daemon with the rights to communicate with the registry.
# Dockerfile
FROM python:3.9-slim
WORKDIR /app
COPY ./train.py .
RUN pip install keyrings.google-artifactregistry-auth
RUN pip install -r requirements.txt
CMD python ./train.py
Finished! The image can be built and pushed.
Conclusion
We have just seen how to make Python packages available with Google's cloud registry, and how they can be used from Vertex AI. Which functions do you often copy from one project to another? A perfect starting point for cleaning up here is to put the code into a package and make it usable for your future projects.