Customize a Pod#

In a Determined cluster running on Kubernetes, tasks (e.g., experiments, notebooks) are executed by launching a Kubernetes job. These jobs launch one or more Kubernetes pods. You can customize these pods by providing custom pod specs. Common use cases include assigning pods to specific nodes, specifying additional volume mounts, and attaching permissions. Configuring pod specs is not required to use Determined on Kubernetes.

In this topic guide, we will cover:

  1. How Determined uses pod specs.

  2. The different ways to configure custom pod specs.

  3. Supported pod spec fields.

  4. How to configure default pod specs.

  5. How to configure per-task pods specs.

How Determined Uses Pod Specs#

All Determined tasks are launched as pods. Determined pods consists of an initContainer named determined-init-container and a container named determined-container which executes the workload. When you provide a pod spec, Determined inserts the determined-init-container and determined-container into the provided pod spec. You may also configure some of the fields for the determined-container, as described below.

Ways to Configure Pod Specs#

Determined provides two ways to configure pod specs. When Determined is installed, the system administrator can configure pod specs that are used by default for all GPU and CPU tasks. In addition, you can specify a custom pod spec for individual tasks (e.g., for an experiment by specifying environment.pod_spec in the experiment configuration). If a custom pod spec is specified for a task, it overrides the default pod spec (if any).

Supported Pod Spec Fields#

This section describes which fields can and cannot be configured when specifying custom pod specs.

Not Supported#

Determined does not support configuring the following fields:

  • Pod Name - Determined automatically assigns a name for every pod that is created.

  • Pod Namespace - Determined automatically sets the pod namespace based on the resource pool the task belongs to. The mapping between resource pools and namespaces can be configured in the resourcePools section of the Helm values.yaml.

  • Host Networking - This must be configured via the master configuration.

  • Restart Policy - This is always set to Never.

Supported#

As part of your pod spec, you can specify initContainers and containers. Additionally you can configure the determined-container that executes the task (e.g., training), by setting the container name in the pod spec to determined-container. For the determined-container,

Determined supports configuring the following fields:

  • Resource requests and limits (except GPU resources).

  • Volume mounts and volumes.

  • All securityContext fields within the pod spec of the determined-container container except for RunAsUser and RunAsGroup.

    For those fields, use det user link-with-agent-user instead.

    Example of configuring a Pachyderm notebook plugin to run in det notebook:

    environment:
      pod_spec:
        apiVersion: v1
        kind: Pod
        spec:
          containers:
            - name: determined-container
                securityContext:
                  privileged: true
    

Default Pod Specs#

Default pod specs must be configured when installing or upgrading Determined. The default pod specs are configured in values.yaml of the Helm Chart Configuration Reference under taskContainerDefaults.cpuPodSpec and taskContainerDefaults.gpuPodSpec. The gpuPodSpec is applied to all tasks that use GPUs (e.g., experiments, notebooks). cpuPodSpec is applied to all tasks that only use CPUs (e.g., TensorBoards, CPU-only notebooks). Fields that are not specified will remain at their default Determined values.

Example of configuring default pod specs in values.yaml:

taskContainerDefaults:
  cpuPodSpec:
    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        customLabel: cpu-label
    spec:
      containers:
        # Will be applied to the container executing the task.
        - name: determined-container
          volumeMounts:
            - name: example-volume
              mountPath: /example-data
        # Custom sidecar container.
        - name: sidecar-container
          image: alpine:latest
      volumes:
        - name: example-volume
          hostPath:
            path: /data
  gpuPodSpec:
    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        customLabel: gpu-label
    spec:
      containers:
        - name: determined-container
          volumeMounts:
            - name: example-volume
              mountPath: /example-data
      volumes:
        - name: example-volume
          hostPath:
            path: /data

The default pod specs can also be configured on a resource pool level. Cluster administrators can define pools in terms of node selectors and/or node affinities here. GPU jobs submitted in the resource pool will have the task spec applied. If a job is submitted in a resource pool with a matching CPU / GPU pod spec then the top level taskContainerDefaults.gpuPodSpec or taskContainerDefaults.cpuPodSpec will not be applied.

Example of configuring resource pool default pod spec in values.yaml.

resourcePools:
  - pool_name: prod_pool
    kubernetes_namespace: default
    task_container_defaults:
      gpu_pod_spec:
        apiVersion: v1
        kind: Pod
        spec:
          affinity:
           # Define an example node selector label.
           nodeSelectorTerms:
             kubernetes.io/hostname: foo
           # Define an example node affinity.
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                  - matchExpressions:
                      - key: topology.kubernetes.io/zone
                        operator: In
                        values:
                          - antarctica-west1

Per-task Pod Specs#

In addition to default pod specs, it is also possible to configure custom pod specs for individual tasks. Pod specs for individual tasks can be configured under the environment field in the experiment config (for experiments) or the task configuration (for other tasks).

Example of configuring a pod spec for an individual task:

environment:
  pod_spec:
    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        customLabel: task-specific-label
    spec:
      # Specify a pull secret for task container image.
      imagePullSecrets:
        - name: regcred
      # Specify a service account that allows writing checkpoints to S3 (for EKS).
      serviceAccountName: <checkpoint-storage-s3-bucket>
      # Specify tolerations for scheduling on tainted nodes.
      tolerations:
        - key: "tained-nodegroup-name"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"

When a custom pod spec is provided for a task, it will merge with the default pod spec (either resourcePools.task_container_defaults or top level task_container_defaults if resourcePools.task_container_defaults is not specified) according to Kubernetes strategic merge patch. Determined does not support setting the strategic merge patch strategy, so the section titled “Use strategic merge patch to update a Deployment using the retainKeys strategy” in the linked Kubernetes docs will not work.

Some fields in pod specs are merged by values of items in lists. Volumes for example are merged by volume name. If for some reason you would want to remove a volume mount specific in the default task container you would need to override it with an empty volume of the same path.

Example values.yaml

resourcePools:
  - pool_name: prod_pool
    kubernetes_namespace: default
    task_container_defaults:
      gpu_pod_spec:
        apiVersion: v1
        kind: Pod
        spec:
          volumes:
            - name: secret-volume
              secret:
                secretName: prod-test-secret
          containers:
            - name: determined-container
              volumeMounts:
                - name: secret-volume
                  mountPath: /etc/secret-volume

Example expconf.yaml

environment:
  pod_spec:
    apiVersion: v1
    kind: Pod
    spec:
      volumes:
        - name: empty-dir-override
          emptyDir:
            sizeLimit: 100Mi
      containers:
        - name: determined-container
          volumeMounts:
            - name: empty-dir-override
              mountPath: /etc/secret-volume
resources:
  resource_pool: prod_pool

Custom CheckpointGC Pod Specs#

Determined also provides a way to configure CheckpointGC pod specs. This configuration is done using the task_container_defaults.checkpointGcPodSpec field within your value.yaml file. User can create a custom pod specification for CheckpointGC, it will override the default experiment’s pod spec settings. Determined by default uses the experiment’s pod spec, but by providing custom pod spec users have the flexibility to customize and configure the pod spec directly in this field. User can tailor the garbage collection settings according to the specific GC needs.

Example of configuring custom CheckpointGC pod specs in values.yaml:

taskContainerDefaults:
  checkpointGcPodSpec:
    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        customLabel: checkpointgc-label
    spec:
      containers:
        - name: determined-container
          volumeMounts:
            - name: example-volume
              mountPath: /example-data
        - name: example-container
          image: alpine:latest
      volumes:
        - name: example-volume
          hostPath:
            path: /data