Training APIs#

You can train almost any deep learning model using the Determined Training APIs. The Training API guides describe how to take your existing model code and train your model in Determined. Each API guide contains a link to its corresponding API reference.

Core API#

The Core API is a low-level, flexible API that lets you train models in any deep learning framework. With the Core API, you can plug in your existing training code. You’ll then use an experiment configuration to tell Determined how to train the model - e.g., multi-GPU, hyperparameter search, etc.

High-Level APIs#

The Trial APIs offer higher-level integrations with popular deep learning frameworks. With the Trial APIs, you first convert your existing training code by subclassing a Trial class and implementing methods that define each component of training - e.g., model architecture, data loader, optimizer, learning rate scheduler, callbacks, etc. This is called the Trial definition. With the code structured in this way, Determined is able to run the training loop and provide advanced training and model management capabilities.

Once you have converted your code, you can use an experiment configuration to tell Determined how to train the model - e.g., multi-GPU, hyperparameter search, etc.

Looking for a Basic Tutorial?#

If you’d like to review how to implement the Determined APIs on simple models, visit our Tutorials.

Prefer to use an Example Model?#

If you’d like to build off of an existing model that already runs on Determined, visit our Examples to see if the model you’d like to train is already available.

AMD ROCm Support#

Determined has experimental support for ROCm. Determined provides a prebuilt Docker image that includes ROCm 5.0, PyTorch 1.10 and TensorFlow 2.7:

  • determinedai/environments:rocm-5.0-pytorch-1.10-tf-2.7-rocm-0.26.4

Known limitations:

  • Only agent-based deployments are available; Kubernetes is not yet supported.

  • GPU profiling is not yet supported.