Customize Your Environment#
Determined launches workloads using Docker containers. By default, workloads run inside a Determined-provided container that includes common deep learning libraries and frameworks.
If your model code has additional dependencies, the easiest way to install them is to specify a startup hook. For more complex dependencies, use a custom Docker image.
If you are using Determined on Kubernetes, review the Custom Pod Specs guide.
Environment Variables#
For both trial runners and commands, you can configure the environment variables inside the
container using the experiment or task environment.environment_variables
configuration field. The
format is a list of NAME=VALUE
strings. For example:
environment:
environment_variables:
- A=hello world
- B=$A
- C=${B}
Variables are set sequentially, which affect variables that depend on the expansion of other
variables. In the example, A
, B
, and C
each have the value hello_world
in the
container.
Proxy variables set in this way take precedence over variables set in the agent configuration.
You can also set variables for each accelerator type, separately:
environment:
environment_variables:
cpu:
- A=hello x86
gpu:
- A=hello nvidia
rocm:
- A=hello amd
Startup Hooks#
If a startup-hook.sh
file exists in the top level of your model definition directory (for
experiments), or context directory (for shells, notebooks, and TensorBoards), it is automatically
run with every Docker container startup before any Python interpreters are launched or deep learning
operations are performed. The startup hook can customize the container environment, install
additional dependencies, and download datasets, among other shell script commands.
Note
startup-hook.sh
does not apply to det cmd
. It applies to experiments, notebooks, shells,
and TensorBoards, but not commands.
For shells, notebooks, and TensorBoards, make sure to supply the context directory using the
--context
or -c
option. You can also use the --include
option, though it may require
more directory management.
Startup hooks are not cached and run before the start of every workload, so expensive or long-running operations in a startup hook can result in poor performance.
Example startup hook to install the wget
utility and the pandas
Python package:
apt-get update && apt-get install -y wget
python3 -m pip install pandas
This GPT Neox example
contains a TensorFlow Keras model that
uses a startup hook to install an additional Python dependency.
Container Images#
Officially supported, default Docker images are provided to launch containers for experiments, commands, and other workflows.
All trial runner containers are launched with additional Determined-specific harness code, which orchestrates model training and evaluation in the container. Trial runner containers are also loaded with the experiment’s model definition and hyperparameter values for the current trial.
GPU-specific versions of each library are automatically selected when running on agents with GPUs.
Default Images#
Environment |
File Name |
---|---|
CPUs |
|
NVIDIA GPUs |
|
AMD GPUs |
|
NGC Version#
By default, a suitable NGC container version is used in our images. You can select a different
version of NGC containers to build images from. Versions are listed on the NVIDIA Frameworks site. To build custom
images, cloning the MLDE environments repo,
modify the NGC_PYTORCH_VERSION
or NGC_TENSORFLOW_VERSION
variables in the MakeFile, and run
make build-pytorch-ngc or make build-tensorflow-ngc respectively.
Custom Images#
While the official images contain all the dependencies needed for basic deep learning workloads,
many workloads have additional dependencies. If the extra dependencies are quick to install, use a
startup hook. If installing dependencies using startup-hook.sh
takes too
long, build your own Docker image and publish it to a Docker registry, such as Docker Hub.
Warning
Do NOT install TensorFlow, PyTorch, Horovod, or Apex packages, which conflict with Determined-installed packages.
Use one of the official Determined images as a base image in the FROM
instruction.
Example Dockerfile that installs custom conda
-, pip
-, and apt
-based dependencies:
# Determined Image
FROM determinedai/tensorflow-ngc:0.38.0
# Custom Configuration
RUN apt-get update && \
DEBIAN_FRONTEND="noninteractive" apt-get -y install tzdata && \
apt-get install -y unzip python-opencv graphviz
COPY environment.yml /tmp/environment.yml
COPY pip_requirements.txt /tmp/pip_requirements.txt
RUN conda env update --name base --file /tmp/environment.yml
RUN conda clean --all --force-pkgs-dirs --yes
RUN eval "$(conda shell.bash hook)" && \
conda activate base && \
pip install --requirement /tmp/pip_requirements.txt
Assuming this image is published to a public repository on Docker Hub, configure an experiment, command, or notebook with:
environment:
image: "my-user-name/my-repo-name:my-tag"
where my-user-name
is your Docker Hub user, my-repo-name
is the name of the Docker Hub
repository, and my-tag
is the image tag to use, such as latest
.
For a private Docker Hub repository, specify the credentials:
environment:
image: "my-user-name/my-repo-name:my-tag"
registry_auth:
username: my-user-name
password: my-password
For a private Docker Registry, specify the registry path:
environment:
image: "myregistry.local:5000/my-user-name/my-repo-name:my-tag"
Images are fetched using HTTPS by default. An HTTPS proxy can be configured using the
https_proxy
field in the agent configuration.
Set the custom image and credentials as the defaults for all tasks launched in Determined using the
image
and registry_auth
fields in the master configuration.
Restart the master for these changes to take effect.
Virtual Environments#
Model developers commonly use virtual environments. The following example configures virtual environments using custom images:
# Determined Image
FROM determinedai/pytorch-ngc:0.38.0
# Create a virtual environment
RUN conda create -n myenv python=3.8
RUN eval "$(conda shell.bash hook)" && \
conda activate myenv && \
pip install scikit-learn
# Set the default virtual environment
RUN echo 'eval "$(conda shell.bash hook)" && conda activate myenv' >> ~/.bashrc
To ensure that a virtual environment is activated every time a new interactive terminal session is
created, in JupyterLab or using Determined Shell, update ~/.bashrc
with the scripts to activate
the virtual environment you want.
Example using a startup hook to switch to a virtual environment:
# Switch to the desired virtual environment
eval "$(conda shell.bash hook)"
conda activate myenv
# Do that for every new interactive terminal session
echo 'eval "$(conda shell.bash hook)" && conda activate myenv' >> ~/.bashrc
Note
startup-hook.sh
does not apply to det cmd
. It applies to experiments, notebooks, shells,
and TensorBoards, but not commands.