Commands and Shells¶
In addition to structured model training workloads, which are handled using experiments, HPE Machine Learning Development Environment also supports free-form tasks using commands and shells. Commands and shells enable you to use an HPE Machine Learning Development Environment cluster and cluster GPUs without needing to write code that conforms to the trial APIs.
Commands execute a user-specified program on the cluster. Commands are useful for running existing code in batch mode.
Shells start SSH servers that let you use cluster resources interactively. Shells provide access to the cluster in the form of interactive SSH sessions.
This document describes the most common CLI and shell commands.
CLI commands start with
det command, abbreviated as
det cmd. The main subcommand is
cmd run, which runs a command in the cluster and streams its output. For example, the following
CLI command uses
nvidia-smi to display information about the GPUs available to tasks in the
det cmd run nvidia-smi
More complex commands including shell constructs can also be run provided they are quoted to prevent interpretation by the local shell:
det cmd run 'for x in a b c; do echo $x; done'
det cmd run streams output from the command until it finishes, but the command continues
executing and occupying cluster resources even if the CLI is interrupted or killed, such as due to
Ctrl-C. To stop the command or view additional output, you need the command UUID, which
you can get from the output of the original
det cmd run or
det cmd list. After you have the
det cmd logs <UUID>to view a snapshot of logs.
det cmd logs -f <UUID>to view the current logs and continue streaming future output.
det cmd kill <UUID>to stop the command.
The CLI is distributed as a Python wheel package. Each user should install a copy of the CLI on their local development machine.
The CLI requires Python >= 3.7. It is recommended that you install the CLI into a virtualenv, although this is optional. To install the CLI into a virtualenv, activate the virtualenv before entering the following command.
Installed the CLI using the
pip install determined
After the CLI has been installed, it should be configured to connect to the HPE Machine Learning
Development Environment master at the appropriate IP address. This is done by setting the
DET_MASTER environment variable:
export DET_MASTER=<master IP>
You might want to place this into the appropriate configuration file for your login shell, such as
After the wheel is installed, the CLI is invoked with the
det command. Use
det --help for
more information about the individual CLI commands.
CLI subcommands usually follow a
<noun> <verb> form, similar to the paradigm of ip. Certain abbreviations are supported, and a
missing verb is the same as
list, when possible.
For example, the different commands within each of the blocks below all do the same thing:
# List all experiments. $ det experiment list $ det e list $ det e
# List all agents. $ det agent list $ det a list $ det a
# List all slots. $ det slot list $ det slot $ det s
For a complete description of the available nouns and abbreviations, see the output of
Each noun also provides a
help verb that describes the possible verbs for that noun. Or, you can
--help argument anywhere, which causes the CLI to exit after printing a
help message for the object or action specified to that point.
DET_MASTER: The network address of the master of the HPE Machine Learning Development Environment installation. The value can be overridden using the
DET_PASS: Specifies the current HPE Machine Learning Development Environment user and password for use when non-interactive behaviour is required such as scripts.
det user loginis preferred for normal usage. Both
DET_PASSmust be set together to take effect. These variables can be overridden by using the
Show information about experiments in the cluster.
Show information about experiments in the cluster
at network address
Show the logs for trial 289 and continue showing new logs as they arrive.
Add the label
Display information about experiment 493, including full metrics, in CSV format.
Create an experiment with the configuration file
Ensure that experiment 85 does not use more than 4 slots in the cluster.
Create a new user named
Show detailed information about the CLI and master. This command does not take both an object and an action.
Shell-related CLI commands start with
det shell. To start a persistent SSH server container in
the HPE Machine Learning Development Environment cluster and connect an interactive session to it,
det shell start:
det shell start
After starting a server with
det shell start, you can make another independent connection to the
same server by running
det shell open <UUID>. You can get the UUID from the output of the
det shell start or
det shell list command:
$ det shell list Id | Owner | Description | State | Exit Status --------------------------------------+------------+------------------------------+---------+--------------- d75c3908-fb11-4fa5-852c-4c32ed30703b | determined | Shell (annually-alert-crane) | RUNNING | N/A $ det shell open d75c3908-fb11-4fa5-852c-4c32ed30703b
Optionally, you can provide extra options to pass to the SSH client when using
det shell start
det shell open by including them after
--. For example, this command starts a new shell
and forwards a port from the local machine to the container:
det shell start -- -L8080:localhost:8080
To stop the SSH server container and free cluster resources, run
det shell kill <UUID>.
Command-line Interface (CLI) Reference¶
usage: det [-h] [-u username] [-m address] [-v] command ... Determined command-line client positional arguments: command help show help for this command auth manage auth agent (a) manage agents command (cmd) manage commands checkpoint (c) manage checkpoints deploy (d) manage deployments experiment (e) manage experiments job (j) manage job master (m) manage master model (m) manage models notebook manage notebooks oauth manage OAuth preview-search preview search resources (res) query historical resource allocation shell manage shells slot (s) manage slots task manage tasks (commands, experiments, notebooks, shells, tensorboards) template (tpl) manage config templates tensorboard manage TensorBoard instances trial (t) manage trials user (u) manage users version show version information optional arguments: -h, --help show this help message and exit -u username, --user username run as the given user (default: None) -m address, --master address master address (default: localhost:8080) -v, --version print CLI version and exit