Install Determined Using det deploy
#
This user guide provides instructions for using the det deploy
command-line tool to deploy
Determined locally or in a production cluster. det deploy
automates the process of starting
Determined as a collection of Docker containers.
You can also use det deploy
to install Determined on the cloud. For more information, see the
AWS and GCP installation guides.
In a typical production setup, the master and agent nodes run on separate machines. The master and agent nodes can also run on a single machine, which is useful for local development. This user guide provides instructions for both scenarios.
Preliminary Setup#
Note
To use det deploy
for local installations, Docker must be installed. For Docker installation
instructions, visit installation.
Install the determined
Python package by running
pip install determined
Note
When deploying locally, the system prompts you to set a strong password.
The command, pip install determined
, installs the determined
library which includes the Determined command-line interface (CLI).
Configure and Start the Cluster#
A configuration file is needed to set important values in the master, such as where to save model checkpoints. For information about how to create a configuration file, see Configuring the Cluster. There are also sample configuration files available.
Note
det deploy
will use a default configuration file if you don’t provide one. It also
transparently manages PostgreSQL along with the master, so the configuration options related to
those services do not need to be set.
Deploy a Single-Node Cluster#
For local development or small clusters (such as a GPU workstation), you may wish to install both a master and an agent on the same node. To do this, run one of the following commands:
# If the machine has GPUs:
det deploy local cluster-up
# If the machine doesn't have GPUs:
det deploy local cluster-up --no-gpu
This will start a master and an agent on that machine. To verify that the master is running,
navigate to http://<master-hostname>:8080
in a browser, which should bring up the Determined
WebUI. If you’re using your local machine, for example, navigate to http://localhost:8080
.
In the WebUI, go to the Cluster
page. You should now see slots available (either CPU or GPU,
depending on what hardware is available on the machine).
For single-agent clusters launched with:
det deploy local cluster-up --auto-work-dir <absolute directory path>
the cluster will automatically make the specified directory available to tasks on the cluster as
./shared_fs
. If --auto-work-dir
is not specified, the cluster will default to mounting your
home directory. This will allow you to access your local preferences and any relevant files stored
in the specified directory with the cluster’s notebooks, shells, and TensorBoard tasks. To disable
this feature, use:
det deploy local cluster-up --no-auto-work-dir
For production deployments, you’ll want to use a cluster configuration file. To provide this configuration file to det deploy
, use:
det deploy local cluster-up --master-config-path <path to master.yaml>
Stop a Single-Node Cluster#
To stop a Determined cluster, on the machine where a Determined cluster is currently running, run
det deploy local cluster-down
Note
det deploy local cluster-down
will not remove any agents created with det deploy local
agent-up
. To remove these agents, use det deploy local agent-down
.
Deploy a Standalone Master#
In many cases, your Determined cluster will consist of multiple nodes. In this case you will need to start a master and agents separately. In order to start a standalone master, run:
det deploy local master-up
Note
For production deployments, you’ll want to use a cluster configuration file. To provide this configuration file to det deploy
, use the flag
--master-config-path <path to master.yaml>
.
To stop a running master, run:
det deploy local master-down
Deploy Agents#
To deploy a standalone agent on a machine, run one of the following commands:
# If the machine has GPUs:
det deploy local agent-up <master_hostname>
# If the machine doesn't have GPUs:
det deploy local agent-up --no-gpu <master_hostname>
This will create an agent on that machine. To verify whether it has successfully connected to the
master, navigate to the WebUI and check whether slots have appeared on the Cluster
page.
To launch the agent into a specific resource pool, use the --agent-resource-pool
flag:
det deploy local agent-up --agent-resource-pool=<resource_pool> <master_hostname>
For more information about resource pools, see Resource Pools.
To stop a running agent, run:
det deploy local agent-down