TensorFlow Keras Fashion MNIST Tutorial#
This tutorial describes how to port an existing
tf.keras model to HPE Machine Learning
Development Environment. We will port a simple image classification model for the Fashion MNIST
dataset. This tutorial is based on the official TensorFlow Basic Image Classification Tutorial.
To use a TensorFlow model in HPE Machine Learning Development Environment, you need to port the model to HPE Machine Learning Development Environment’s API. For most models, this porting process is straightforward, and once the model has been ported, all of the features of HPE Machine Learning Development Environment will then be available: for example, you can do distributed training or hyperparameter search without changing your model code, and HPE Machine Learning Development Environment will store and visualize your model metrics automatically.
When training a
tf.keras model, HPE Machine Learning Development Environment provides a built-in
training loop that feeds batches of data into your model, performs backpropagation, and computes
training metrics. HPE Machine Learning Development Environment also handles evaluating your model on
the validation set, as well as other details like checkpointing, log management, and device
initialization. To plug your model code into the HPE Machine Learning Development Environment
training loop, you define methods to perform the following tasks:
build the model graph
load the training dataset
load the validation dataset
The HPE Machine Learning Development Environment training loop will then invoke these functions
automatically. These methods should be organized into a trial class, which is a user-defined
Python class that inherits from
determined.keras.TFKerasTrial. The following sections walk
through how to write your first trial class and then how to run a training job with HPE Machine
Learning Development Environment.
The complete code for this tutorial can be downloaded here:
fashion_mnist_tf_keras.tgz. After downloading this file, open a terminal window,
extract the file, and
cd into the
tar xzvf fashion_mnist_tf_keras.tgz cd fashion_mnist_tf_keras
We suggest you follow along with the code as you read through this tutorial.
Access to an HPE Machine Learning Development Environment cluster. If you have not yet installed HPE Machine Learning Development Environment, refer to the Install and Set Up Determined.
The HPE Machine Learning Development Environment CLI should be installed on your local machine. For installation instructions, see here. After installing the CLI, configure it to connect to your HPE Machine Learning Development Environment cluster by setting the
DET_MASTERenvironment variable to the hostname or IP address where HPE Machine Learning Development Environment is running.
Build a Trial Class#
Here is what the skeleton of our trial class looks like:
import keras from determined.keras import TFKerasTrial, TFKerasTrialContext class FashionMNISTTrial(TFKerasTrial): def __init__(self, context: TFKerasTrialContext): # Initialize the trial class. pass def build_model(self): # Define and compile model graph. pass def build_training_data_loader(self): # Create the training data loader. This should return a keras.Sequence, # a tf.data.Dataset, or NumPy arrays. pass def build_validation_data_loader(self): # Create the validation data loader. This should return a keras.Sequence, # a tf.data.Dataset, or NumPy arrays. pass
We now discuss how to implement each of these methods in more detail.
As with any Python class, the
__init__ method is invoked to construct our trial class. HPE
Machine Learning Development Environment passes this method a single parameter,
TrialContext. The trial context contains information about the trial, such as
the values of the hyperparameters to use for training. For the time being, we don’t need to access
any properties from the trial context object, but we assign it to an instance variable so that we
can use it later:
def __init__(self, context: TFKerasTrialContext): # Store trial context for later use. self.context = context
Build the Model#
build_model() method returns a compiled
object. The Fashion MNIST model code uses the Keras Sequential API and we can continue to use that
API in our implementation of
build_model. The only minor differences are that the model needs to
be wrapped by calling
self.context.wrap_model() before it is compiled and the optimizer needs to
be wrapped by calling
def build_model(self): model = keras.Sequential( [ keras.layers.Flatten(input_shape=(28, 28)), keras.layers.Dense(self.context.get_hparam("dense1"), activation="relu"), keras.layers.Dense(10), ] ) # Wrap the model. model = self.context.wrap_model(model) # Create and wrap optimizer. optimizer = tf.keras.optimizers.Adam() optimizer = self.context.wrap_optimizer(optimizer) model.compile( optimizer=optimizer, loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True), metrics=[tf.keras.metrics.SparseCategoricalAccuracy(name="accuracy")], ) return model
The last two methods we need to define are
build_validation_data_loader(). HPE Machine Learning
Development Environment uses these methods to load the training and validation datasets,
HPE Machine Learning Development Environment supports three ways of loading data into a
model: as a tf.keras.utils.Sequence, a tf.data.Dataset, or as a pair of NumPy arrays.
Because the dataset is small, the Fashion MNIST model represents the data using NumPy arrays.
def build_training_data_loader(self): train_images, train_labels = data.load_training_data() train_images = train_images / 255.0 return train_images, train_labels
The implementation of
build_validation_data_loader is similar:
def build_validation_data_loader(self): test_images, test_labels = data.load_validation_data() test_images = test_images / 255.0 return test_images, test_labels
Train the Model#
Now that we have ported our model code to the trial API, we can use HPE Machine Learning Development Environment to train a single instance of the model or to do a hyperparameter search. In HPE Machine Learning Development Environment, a trial is a training task that consists of a dataset, a deep learning model, and values for all of the model’s hyperparameters. An experiment is a collection of one or more trials: an experiment can either train a single model (with a single trial), or it can perform a search over a user-defined hyperparameter space.
To create an experiment, we start by writing a configuration file which defines the kind of experiment we want to run. In this case, we want to train a single model for five epochs, using fixed values for the model’s hyperparameters:
name: fashion_mnist_keras_const hyperparameters: global_batch_size: 32 dense1: 128 records_per_epoch: 50000 searcher: name: single metric: val_accuracy max_length: epochs: 5 entrypoint: model_def:FashionMNISTTrial
For this model, we have chosen two hyperparameters: the size of the
Dense layer and the batch
size. Training the model for five epochs should reach about 85% accuracy on the validation set,
which matches the original
entrypoint specifies the name of the trial class to use. This is useful if the model code
contains more than one trial class. In this case, we use an entrypoint of
model_def:FashionMNISTTrial because our trial class is named
FashionMNISTTrial and it is
defined in a Python file named
For more information on experiment configuration, see the experiment configuration reference.
Run an Experiment#
The HPE Machine Learning Development Environment CLI can be used to create a new experiment, which will immediately start running on the cluster. To do this, we run:
det experiment create const.yaml .
Here, the first argument (
const.yaml) is the name of the experiment configuration file and the
second argument (
.) is the location of the directory that contains our model definition files.
You may need to configure the CLI with the network address where the HPE Machine Learning
Development Environment master is running, via the
-m flag or the
Once the experiment is started, you will see a notification:
Preparing files (../fashion_mnist_tf_keras) to send to master... 2.5KB and 4 files Created experiment xxx
Evaluate the Model#
Model evaluation is done automatically for you by HPE Machine Learning Development Environment. To access information on both training and validation performance, simply go to the WebUI by entering the address of the HPE Machine Learning Development Environment master in your web browser.
Once you are on the HPE Machine Learning Development Environment landing page, you can find your experiment either via the experiment ID (xxx) or via its description.