Deploy on Kubernetes#

This document describes how Determined runs on Kubernetes. For instructions on installing Determined on Kubernetes, see the installation guide.

This guide covers:

  1. How Determined works on Kubernetes.

  2. Limitations of Determined on Kubernetes.

  3. Useful Helm and Kubectl commands.

How Determined Works on Kubernetes#

Installing Determined on Kubernetes deploys an instance of the Determined master and a Postgres database in the Kubernetes cluster.

Determined AI system architecture diagram describing how the master node works on kubernetes in dark mode Determined AI system architecture diagram describing how the master node works on kubernetes in light mode

Once the master is running, you can launch experiments, notebooks, TensorBoards, commands, and shells. When new workloads are submitted to the Determined master, the master launches jobs and config maps on the Kubernetes cluster to execute those workloads. Users do not need to interact with Kubernetes directly after installation, as Determined handles all the necessary interaction with the Kubernetes cluster. Kubernetes creates and cleans up pods for all jobs requested by Determined.

Note

When running Determined on Kubernetes, a higher priority value means a higher priority (e.g., a priority 50 task will run before a priority 40 task). This is different from non-Kubernetes deployments, where lower priority values mean higher priority (e.g., a priority 40 task will run before a priority 50 task).

Limitations on Kubernetes#

Scheduling#

By default, the Kubernetes scheduler does not support gang scheduling or preemption. This can be problematic for distributed deep learning workloads that require multiple pods to be scheduled before execution starts.

Determined includes built-in support for the lightweight coscheduling plugin, which extends the default Kubernetes scheduler to support gang scheduling. Determined also supports priority-based preemption scheduling. Neither feature is enabled by default. For more details and instructions on how to enable the coscheduling plugin, refer to Gang Scheduling and Priority Scheduling with Preemption.

Dynamic Agents#

Determined cannot autoscale your cluster. However, equivalent functionality is available by using the Kubernetes Cluster Autoscaler, which is supported on GKE and EKS.

Pod Security#

By default, Determined runs task containers as root. However, it is possible to associate a Determined user with a Unix user and group, provided that the Unix user and group already exist. Tasks initiated by the associated Determined user will run under the linked Unix user rather than root. For more information, see: Run Tasks as Specific Agent Users.

Useful Helm and Kubectl Commands#

kubectl is a command-line tool for interacting with a Kubernetes cluster. Helm is used to install and upgrade Determined on Kubernetes. This section covers some useful kubectl and helm commands when running Determined on Kubernetes.

For all the commands listed below, include -n <kubernetes namespace name> if running Determined in a non-default namespace.

List Installations of Determined#

To list the current installation of Determined on the Kubernetes cluster:

# To list in the current namespace.
helm list

# To list in all namespaces.
helm list -A

It is recommended to have just one instance of Determined per Kubernetes cluster.

Get the IP Address of the Determined Master#

To get the IP and port address of the Determined master:

# Get all services.
kubectl get services

# Get the master service. The exact name of the master service depends on
# the name given to your helm deployment, which can be looked up by running
# ``helm list``.
kubectl get service determined-master-service-<helm deployment name>

Check the Status of the Determined Master#

Logs for the Determined master are available via the CLI and WebUI. Kubectl commands are useful for diagnosing any issues that arise during installation.

# Get all deployments.
kubectl get deployments

# Describe the current state of Determined master deployment. The exact name
# of the master deployment depends on the name given to your helm deploy
# which can be looked up by running `helm list`.
kubectl describe deployment determined-master-deployment-<helm deployment name>

# Get all pods associated with the Determined master deployment. Note this
# will only include pods that are running the Determined master, not pods
# that are running tasks associated with Determined workloads.
kubectl get pods -l=app=determined-master-<helm deployment name>

# Get logs for the pod running the Determined master.
kubectl logs <determined-master-pod-name>