Python SDK Client Module Reference#
The client module exposes many of the same capabilities as the det
CLI tool directly to Python
code with an object-oriented interface.
Client
#
The client
module exposes many of the same capabilities as the det
CLI tool directly to
Python code with an object-oriented interface.
As a simple example, let’s walk through the most basic workflow for creating an experiment, waiting for it to complete, and finding the top-performing checkpoint.
The first step is to import the client
module and possibly to call
login()
:
from determined.experimental import client
# If you have called `det user login`, environment variables will have been set such that
# logging in with `login` is unnecessary:
# client.login(master=..., user=..., password=...)
The next step is to call create_experiment()
:
# config can be a path to a config file or a python dict of the config.
exp = client.create_experiment(config="my_config.yaml", model_dir=".")
print(f"started experiment {exp.id}")
The returned object will be an Experiment
which has methods for controlling the lifetime of the experiment running on the cluster.
In this example, we will just wait for the experiment to complete.
exit_status = exp.wait()
print(f"experiment completed with status {exit_status}")
Now that the experiment has completed, you can grab the top-performing checkpoint from training:
best_checkpoint = exp.list_checkpoints()[0]
print(f"best checkpoint was {best_checkpoint.uuid}")
See Checkpoints for more ideas on what to do next.
- determined.experimental.client.login(master: Optional[str] = None, user: Optional[str] = None, password: Optional[str] = None, cert_path: Optional[str] = None, cert_name: Optional[str] = None, noverify: bool = False) None #
login()
will configure the default Determined() singleton used by all of the other functions in the client module.It is often unnecessary to call
login()
. If you have configured your environment so that the Determined CLI works without any extra arguments or environment variables, you should not have to calllogin()
at all.If you do need to call
login()
, it must be called before any calling any other functions from this module, otherwise it will fail.If you have reason to connect to multiple masters, you should use explicit
Determined
objects instead. Each explicitDetermined
object accepts the same parameters aslogin()
, and offers the same functions as what are offered in this module.Note
Try to avoid having your password in your python code. If you are running on your local machine, you should always be able to use
det user login
on the CLI, andlogin()
will not need either a user or a password. If you have randet user login
with multiple users (and you have not randet user logout
), then you should be able to runlogin(user=...)
for any of those users without putting your password in your code.- Parameters
master (string, optional) – The URL of the Determined master. If this argument is not specified, the environment variables DET_MASTER and DET_MASTER_ADDR will be checked for the master URL in that order.
user (string, optional) – The Determined username used for authentication. (default:
determined
)password (string, optional) – The password associated with the user.
cert_path (string, optional) – A path to a custom PEM-encoded certificate, against which to validate the master. (default:
None
)cert_name (string, optional) – The name of the master hostname to use during certificate validation. Normally this is taken from the master URL, but there may be cases where the master is exposed on multiple networks that this value might need to be overridden. (default:
None
)noverify (boolean, optional) – disable all TLS verification entirely. (default:
False
)
- determined.experimental.client.create_experiment(config: Union[str, pathlib.Path, Dict], model_dir: Optional[str] = None, includes: Optional[Iterable[Union[str, pathlib.Path]]] = None, project_id: Optional[int] = None, template: Optional[str] = None) determined.common.experimental.experiment.Experiment #
Create an experiment with config parameters and model directory.
- Parameters
config – Experiment config filename (.yaml) or a dict.
model_dir – Directory containing model definition.
includes – Additional files or directories to include in the model definition.
project_id – The id of the project this experiment should belong to.
template – The name of the template for the experiment. See Configuration Templates for more details.
- Returns
An
Experiment
of the created experiment.
- determined.experimental.client.get_experiment(experiment_id: int) determined.common.experimental.experiment.Experiment #
Get the Experiment representing the experiment with the provided experiment ID.
- Parameters
experiment_id (int) – The experiment ID.
- Returns
The fetched
Experiment
.
- determined.experimental.client.list_experiments(sort_by: Optional[determined.common.experimental.experiment.ExperimentSortBy] = None, order_by: Optional[determined.common.experimental._util.OrderBy] = None, experiment_ids: Optional[List[int]] = None, labels: Optional[List[str]] = None, users: Optional[List[str]] = None, states: Optional[List[determined.common.experimental.experiment.ExperimentState]] = None, name: Optional[str] = None, project_id: Optional[int] = None) List[determined.common.experimental.experiment.Experiment] #
Get a list of experiments (
Experiment
).- Parameters
sort_by – Which field to sort by. See
ExperimentSortBy
.order_by – Whether to sort in ascending or descending order. See
OrderBy
.name – If this parameter is set, experiments will be filtered to only include those with names matching this parameter.
experiment_ids – Only return experiments with these IDs.
labels – Only return experiments with a label in this list.
users – Only return experiments belonging to these users. Defaults to all users.
states – Only return experiments that are in these states.
project_id – Only return experiments associated with this project ID.
- Returns
A list of experiments.
- determined.experimental.client.create_user(username: str, admin: bool, password: Optional[str] = None, remote: bool = False) determined.common.experimental.user.User #
Creates a user.
The user’s credentials may be managed by a remote service (Enterprise edition only), in which case the remote argument should be set to true, and then SSO should be configured for the user. A remote user has no password and cannot log in except via SSO. Otherwise, a password must be set that meets complexity requirements.
- The complexity requirements are:
Must be at least 8 characters long.
Must contain at least one upper-case letter.
Must contain at least one lower-case letter.
Must contain at least one number.
- Arg:
username: username of the user. admin: indicates whether the user is an admin. password: password of the user. remote: indicates whether the user is managed by a remote service.
- Returns
A
User
of the created user.- Raises
ValueError – an error describing why the password does not meet complexity requirements.
- determined.experimental.client.get_user_by_id(user_id: int) determined.common.experimental.user.User #
Get the User with the provided user id.
- determined.experimental.client.get_user_by_name(user_name: str) determined.common.experimental.user.User #
Get the User with the provided username.
- determined.experimental.client.get_session_username() str #
Get the username of the currently signed in user.
- determined.experimental.client.whoami() determined.common.experimental.user.User #
Get the current User.
- determined.experimental.client.logout() None #
Log out of the current session.
- determined.experimental.client.list_users(active: Optional[bool] = None) List[determined.common.experimental.user.User] #
Get a list of all Users.
- Arg:
- active: if this parameter is set to True, filter for active users only.
When false, filter for inactive users. Return all users otherwise.
- Returns
A list of
User
objects.
- determined.experimental.client.get_trial(trial_id: int) determined.common.experimental.trial.Trial #
Get the Trial representing the trial with the provided ID.
- Arg:
trial_id: The trial ID.
- Returns
The fetched
Trial
.
- determined.experimental.client.get_checkpoint(uuid: str) determined.common.experimental.checkpoint._checkpoint.Checkpoint #
Get the Checkpoint representing with the provided UUID.
- Parameters
uuid – The checkpoint UUID.
- Returns
The fetched
Checkpoint
.
- determined.experimental.client.get_workspace(name: str) determined.common.experimental.workspace.Workspace #
Get the Workspace with the provided name.
- Parameters
name – The workspace name.
- Returns
The fetched
Workspace
.
- determined.experimental.client.list_workspaces() List[determined.common.experimental.workspace.Workspace] #
Get the list
Workspace
of all Workspaces.
- determined.experimental.client.create_workspace(name: str) determined.common.experimental.workspace.Workspace #
Create a new workspace with the provided name.
- Parameters
name – The name of the workspace to create.
- Returns
The newly-created
Workspace
.- Raises
errors.APIException – If a workspace with the provided name already exists.
- determined.experimental.client.delete_workspace(name: str) None #
Delete the workspace with the provided name.
- Parameters
name – The name of the workspace to delete.
- Raises
errors.NotFoundException – If no workspace with the provided name exists.
- determined.experimental.client.create_model(name: str, description: Optional[str] = '', metadata: Optional[Dict[str, Any]] = None) determined.common.experimental.model.Model #
Add a model to the model registry.
- Parameters
name – The name of the model. This name must be unique.
description – A description of the model.
metadata – Dictionary of metadata to add to the model.
- Returns
A
Model
of the created model.
- determined.experimental.client.get_model(identifier: Union[str, int]) determined.common.experimental.model.Model #
Get the model from the model registry with the provided numeric id.
If no model with that name is found in the registry, an exception is raised.
- Parameters
identifier – The unique name or numeric ID of the model.
- Returns
The fetched
Model
.
- determined.experimental.client.get_model_by_id(model_id: int) determined.common.experimental.model.Model #
Get the model from the model registry with the provided numeric id.
If no model with that id is found in the registry, an exception is raised.
- Parameters
model_id – The unique id of the model.
- Returns
The fetched
Model
.
Warning
client.get_model_by_id() has been deprecated and will be removed in a future version. Please call client.get_model() with either a string-type name or an integer-type model ID.
- determined.experimental.client.get_models(sort_by: determined.common.experimental.model.ModelSortBy = ModelSortBy.NAME, order_by: determined.common.experimental._util.OrderBy = OrderBy.ASCENDING, name: str = '', description: str = '') List[determined.common.experimental.model.Model] #
Get a list of all models in the model registry.
- Parameters
sort_by – Which field to sort by. See
ModelSortBy
.order_by – Whether to sort in ascending or descending order. See
OrderBy
.name – If this parameter is set, models will be filtered to only include models with names matching this parameter.
description – If this parameter is set, models will be filtered to only include models with descriptions matching this parameter.
- Returns
A list of
Model
objects matching any passed filters.
- determined.experimental.client.list_models(sort_by: determined.common.experimental.model.ModelSortBy = ModelSortBy.NAME, order_by: determined.common.experimental._util.OrderBy = OrderBy.ASCENDING, name: Optional[str] = None, description: Optional[str] = None, model_id: Optional[int] = None, workspace_names: Optional[List[str]] = None, workspace_ids: Optional[List[int]] = None) List[determined.common.experimental.model.Model] #
Get a list of all models in the model registry.
- Parameters
sort_by – Which field to sort by. See
ModelSortBy
.order_by – Whether to sort in ascending or descending order. See
OrderBy
.name – If this parameter is set, models will be filtered to only include models with names matching this parameter.
description – If this parameter is set, models will be filtered to only include models with descriptions matching this parameter.
model_id – If this parameter is set, models will be filtered to only include the model with this unique numeric id.
workspace_names – Only return models with names in this list.
workspace_ids – Only return models with workspace IDs in this list.
- Returns
A list of
Model
objects matching any passed filters.
- determined.experimental.client.get_model_labels() List[str] #
Get a list of labels used on any models in the model registry.
- Returns
A list of model labels sorted from most-popular to least-popular.
- determined.experimental.client.list_oauth_clients() Sequence[determined.common.experimental.oauth2_scim_client.Oauth2ScimClient] #
Get a list of Oauth2 Scim clients.
- determined.experimental.client.add_oauth_client(domain: str, name: str) determined.common.experimental.oauth2_scim_client.Oauth2ScimClient #
Add an oauth client.
- Parameters
domain – Domain of OAuth client.
name – Name of OAuth client.
- Returns
A
Oauth2ScimClient
of the created client.
- determined.experimental.client.remove_oauth_client(client_id: str) None #
Remove an oauth client.
- determined.experimental.client.iter_trials_metrics(trial_ids: List[int], group: str) Iterable[determined.common.experimental.metrics.TrialMetrics] #
Iterate over the metrics for one or more trials.
This function opens up a persistent connection to the Determined master to receive trial metrics. For as long as the connection remains open, the generator it returns yields the TrialMetrics it receives.
- Parameters
trial_ids – The trial IDs to iterate over metrics for.
group – The metric group to iterate over. Common values are “validation” and “training”, but group can be any value passed to master when reporting metrics during training (usually via a context’s report_metrics).
- Returns
A generator of
TrialMetrics
objects.
OrderBy
#
- class determined.experimental.client.OrderBy(value)#
Specifies whether a sorted list of objects should be in ascending or descending order.
Checkpoint
#
- class determined.experimental.client.Checkpoint(session: determined.common.api._session.Session, uuid: str)#
A class representing a Checkpoint instance of a trained model.
A Checkpoint object is usually obtained from
determined.experimental.client.get_checkpoint()
. This class provides helper functionality for downloading checkpoints to local storage and loading checkpoints into memory.The
Trial
class contains methods that return instances of this class.- session#
HTTP request session.
- uuid#
UUID of checkpoint in storage.
- task_id#
(Mutable, Optional[str]) ID of associated task.
- allocation_id#
(Mutable, Optional[str]) ID of associated allocation.
- report_time#
(Mutable, Optional[str]) Timestamp checkpoint reported.
- resources#
(Mutable, Optional[Dict]) Dictionary of file paths to file sizes in bytes of all files in the checkpoint.
- metadata#
(Mutable, Optional[Dict]) User-defined metadata associated with the checkpoint.
- state#
(Mutable, Optional[CheckpointState]) State of the checkpoint.
- training#
(Mutable, Optional[CheckpointTrainingMetadata]) Training-related metadata for the checkpoint.
- Note#
All attributes are cached by default.
Some attributes are mutable and may be changed by methods that update these values, either automatically (eg.
add_metadata()
) or explicitly withreload()
.
- download(path: Optional[str] = None, mode: determined.common.experimental.checkpoint._checkpoint.DownloadMode = DownloadMode.AUTO) str #
Download checkpoint to local storage.
See also
- Parameters
path (string, optional) – Top level directory to place the checkpoint under. If this parameter is not set, the checkpoint will be downloaded to
checkpoints/<checkpoint_uuid>
relative to the current working directory.mode (DownloadMode, optional) – Governs how a checkpoint is downloaded. Defaults to
AUTO
.
- write_metadata_file(path: str) None #
Write a file with this Checkpoint’s metadata inside of it.
This is normally executed as part of Checkpoint.download(). However, in the special case where you are accessing the checkpoint files directly (not via Checkpoint.download) you may use this method directly to obtain the latest metadata.
- add_metadata(metadata: Dict[str, Any]) None #
Adds user-defined metadata to the checkpoint. The
metadata
argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the checkpoint metadata, the corresponding dictionary entries in the checkpoint are replaced by the passed-in dictionary values.Warning: this metadata change is not propagated to the checkpoint storage.
- Parameters
metadata (dict) – Dictionary of metadata to add to the checkpoint.
- remove_metadata(keys: List[str]) None #
Removes user-defined metadata from the checkpoint. Any top-level keys that appear in the
keys
list are removed from the checkpoint.Warning: this metadata change is not propagated to the checkpoint storage.
- Parameters
keys (List[string]) – Top-level keys to remove from the checkpoint metadata.
- delete() None #
Notifies the master of a checkpoint deletion request, which will be handled asynchronously. Master will delete checkpoint and all associated data in the checkpoint storage.
- remove_files(globs: List[str]) None #
Removes any files from the checkpoint in checkpoint storage that match one or more of the provided
globs
. The checkpoint resources and state will be updated in master asynchronously to reflect checkpoint storage. Ifglobs
is the empty list then no files will be deleted and the resources and state will only be refreshed in master.- Parameters
globs (List[string]) – Globs to match checkpoint files against.
- get_metrics(group: Optional[str] = None) Iterable[determined.common.experimental.metrics.TrialMetrics] #
Gets all metrics for a given metric group associated with this checkpoint. The checkpoint can be originally associated by calling
core_context.experimental.report_task_using_checkpoint(<CHECKPOINT>)
from within a task.- Parameters
group (str, optional) – Group name for the metrics (example: “training”, “validation”). All metrics will be returned when querying by “”.
- get_pachyderm_commit() str #
Return the Pachyderm commit ID associated with this checkpoint.
- reload() None #
Explicit refresh of cached properties.
Determined
#
- class determined.experimental.client.Determined(master: Optional[str] = None, user: Optional[str] = None, password: Optional[str] = None, cert_path: Optional[str] = None, cert_name: Optional[str] = None, noverify: bool = False)#
Determined gives access to Determined API objects.
- Parameters
master (string, optional) – The URL of the Determined master. If this argument is not specified, the environment variables
DET_MASTER
andDET_MASTER_ADDR
will be checked for the master URL in that order.user (string, optional) – The Determined username used for authentication. (default:
determined
)
- create_user(username: str, admin: bool, password: Optional[str] = None, remote: bool = False) determined.common.experimental.user.User #
Creates a user.
The user’s credentials may be managed by a remote service (Enterprise edition only), in which case the remote argument should be set to true, and then SSO should be configured for the user. A remote user has no password and cannot log in except via SSO. Otherwise, a password must be set that meets complexity requirements.
- The complexity requirements are:
Must be at least 8 characters long.
Must contain at least one upper-case letter.
Must contain at least one lower-case letter.
Must contain at least one number.
- Arg:
username: username of the user. admin: indicates whether the user is an admin. password: password of the user. remote: indicates whether the user is managed by a remote service.
- Returns
A
User
of the created user.- Raises
ValueError – an error describing why the password does not meet complexity requirements.
- logout() None #
Log out of the current session.
This results in dropping any cached credentials and sending a request to master to invalidate the session’s token.
- create_experiment(config: Union[str, pathlib.Path, Dict], model_dir: Optional[Union[str, pathlib.Path]] = None, includes: Optional[Iterable[Union[str, pathlib.Path]]] = None, project_id: Optional[int] = None, template: Optional[str] = None) determined.common.experimental.experiment.Experiment #
Create an experiment with config parameters and model directory. The function returns an
Experiment
.- Parameters
config (string, pathlib.Path, dictionary) – experiment config filename (.yaml) or a dict.
model_dir (string, optional) – directory containing model definition. (default:
None
)includes (Iterable[Union[str, pathlib.Path]], optional) – Additional files or directories to include in the model definition. (default:
None
)project_id (int, optional) – The id of the project this experiment should belong to.
(default –
None
)template (string, optional) – The name of the template for the experiment. See Configuration Templates for moredetails.
(default –
None
)
- get_experiment(experiment_id: int) determined.common.experimental.experiment.Experiment #
Get an experiment (
Experiment
) by experiment ID.
- list_experiments(sort_by: Optional[determined.common.experimental.experiment.ExperimentSortBy] = None, order_by: Optional[determined.common.experimental._util.OrderBy] = None, experiment_ids: Optional[List[int]] = None, labels: Optional[List[str]] = None, users: Optional[List[str]] = None, states: Optional[List[determined.common.experimental.experiment.ExperimentState]] = None, name: Optional[str] = None, project_id: Optional[int] = None) List[determined.common.experimental.experiment.Experiment] #
Get a list of experiments (
Experiment
).- Parameters
sort_by – Which field to sort by. See
ExperimentSortBy
.order_by – Whether to sort in ascending or descending order. See
OrderBy
.name – If this parameter is set, experiments will be filtered to only include those with names matching this parameter.
experiment_ids – Only return experiments with these IDs.
labels – Only return experiments with a label in this list.
users – Only return experiments belonging to these users. Defaults to all users.
states – Only return experiments that are in these states.
project_id – Only return experiments associated with this project ID.
- Returns
A list of experiments.
- get_trial(trial_id: int) determined.common.experimental.trial.Trial #
Get the
Trial
representing the trial with the provided trial ID.
- get_checkpoint(uuid: str) determined.common.experimental.checkpoint._checkpoint.Checkpoint #
Get the
Checkpoint
representing the checkpoint with the provided UUID.
- create_workspace(name: str) determined.common.experimental.workspace.Workspace #
Create a new workspace with the provided name.
- Parameters
name – The name of the workspace to create.
- Returns
The newly-created
Workspace
.- Raises
errors.APIException – If a workspace with the provided name already exists.
- delete_workspace(name: str) None #
Delete the workspace with the provided name.
- Parameters
name – The name of the workspace to delete.
- Raises
errors.NotFoundException – If no workspace with the provided name exists.
- create_model(name: str, description: Optional[str] = '', metadata: Optional[Dict[str, Any]] = None, labels: Optional[List[str]] = None, workspace_name: Optional[str] = None) determined.common.experimental.model.Model #
Add a model to the model registry.
- Parameters
name (string) – The name of the model. This name must be unique.
description (string, optional) – A description of the model.
metadata (dict, optional) – Dictionary of metadata to add to the model.
- get_model(identifier: Union[str, int]) determined.common.experimental.model.Model #
Get the
Model
from the model registry with the provided identifer, which is either a string-type name or an integer-type model ID. If no corresponding model is found in the registry, an exception is raised.- Parameters
identifier (string, int) – The unique name or ID of the model.
- get_model_by_id(model_id: int) determined.common.experimental.model.Model #
Get the
Model
from the model registry with the provided id. If no model with that id is found in the registry, an exception is raised.Warning
Determined.get_model_by_id() has been deprecated and will be removed in a future version. Please call Determined.get_model() with either a string-type name or an integer-type model ID.
- list_models(sort_by: determined.common.experimental.model.ModelSortBy = ModelSortBy.NAME, order_by: determined.common.experimental._util.OrderBy = OrderBy.ASCENDING, name: Optional[str] = None, description: Optional[str] = None, model_id: Optional[int] = None, workspace_names: Optional[List[str]] = None, workspace_ids: Optional[List[int]] = None) List[determined.common.experimental.model.Model] #
Get a list of all models in the model registry.
- Parameters
sort_by – Which field to sort by. See
ModelSortBy
.order_by – Whether to sort in ascending or descending order. See
OrderBy
.name – If this parameter is set, models will be filtered to only include models with names matching this parameter.
description – If this parameter is set, models will be filtered to only include models with descriptions matching this parameter.
model_id – If this parameter is set, models will be filtered to only include the model with this unique numeric id.
workspace_names – Only return models with names in this list.
workspace_ids – Only return models with workspace IDs in this list.
- Returns
A list of models.
- get_model_labels() List[str] #
Get a list of labels used on any models, sorted from most-popular to least-popular.
- iter_trials_metrics(trial_ids: List[int], group: str) Iterable[determined.common.experimental.metrics.TrialMetrics] #
Generate an iterator of metrics for the passed trials.
This function opens up a persistent connection to the Determined master to receive trial metrics. For as long as the connection remains open, the generator it returns yields the TrialMetrics it receives.
- Parameters
trial_ids – The trial IDs to iterate over metrics for.
group – The metric group to iterate over. Common values are “validation” and “training”, but group can be any value passed to master when reporting metrics during training (usually via a context’s report_metrics).
- Returns
An iterable of
TrialMetrics
objects.
Experiment
#
- class determined.experimental.client.Experiment(experiment_id: int, session: determined.common.api._session.Session)#
A class representing an Experiment object.
An Experiment object is usually obtained from
determined.experimental.client.create_experiment()
ordetermined.experimental.client.get_experiment()
and contains helper methods that support querying the set of checkpoints associated with an experiment.- id#
ID of experiment object in database.
- session#
HTTP request session.
- config#
(Mutable, Optional[Dict]) Experiment config for the experiment.
- state#
(Mutable, Optional[experimentv1State) State of the experiment.
- archived#
(Mutable, bool) True if experiment is archived, else false.
- name#
(Mutable, str) Human-friendly name of the experiment.
- progress#
(Mutable, float) Completion progress of experiment in range (0, 1.0) where 1.0 is 100% completion.
- description#
(Mutable, string) Description of the experiment.
- notes#
(Mutable, str) Notes for the experiment.
- labels#
(Mutable, Optional[List]) Labels associated with the experiment.
- project_id#
(Mutable, int) The ID of the project associated with the experiment.
- workspace_id#
(Mutable, int) The ID of the workspace associated with the experiment.
Note
All attributes are cached by default.
Some attributes are mutable and may be changed by methods that update these values, either automatically (eg.
wait()
) or explicitly withreload()
.- reload() None #
Explicit refresh of cached properties.
- set_name(name: str) None #
Set (overwrite if existing) name on the experiment.
- set_description(description: str) None #
Set description (overwrite if existing) description on the experiment.
- set_notes(notes: str) None #
Set notes (overwrite if existing) description on the experiment.
- add_label(label: str) None #
Add a label to the experiment.
- Makes a PUT request to the master and sets
self.labels
to the server’s updated labels.
- Parameters
label – a string label to add to the experiment. If the label already exists, the method call will be a no-op.
- Makes a PUT request to the master and sets
- remove_label(label: str) None #
Removes a label from the experiment.
- Makes a DELETE request to the master and sets
self.labels
to the server’s updated labels.
- Parameters
label – a string label to remove from the experiment. If the specified label does not exist on the experiment, this method call will be a no-op.
- Makes a DELETE request to the master and sets
- set_labels(labels: Set[str]) None #
Sets experiment labels to the given set.
- This method makes a PATCH request to the master and sets
self.labels
to the server’s response. This will overwrite any existing labels on the experiment with the specified labels.
- Parameters
labels – a set of string labels to set on the experiment.
- This method makes a PATCH request to the master and sets
- delete() None #
Delete an experiment and all its artifacts from persistent storage.
Note: You must be authenticated as admin to delete an experiment.
- download_code(output_dir: Optional[str] = None) str #
Downloads a zipped tarball (
*.tar.gz
) of the experiment’s submitted code locally.Saves a file named
exp-{ID}_model_def.tar.gz
to a local output directory. If a file with the same name already exists in the output directory, overwrites the file.- Parameters
output_dir (string, optional) – The local directory path to save downloaded archive to, creating directory if it does not exist. If unspecified, will save to current working directory.
- Returns
Filepath of downloaded code archive.
- list_trials(sort_by: determined.common.experimental.trial.TrialSortBy = TrialSortBy.ID, order_by: determined.common.experimental._util.OrderBy = OrderBy.ASCENDING) List[determined.common.experimental.trial.Trial] #
Fetch all trials of an experiment.
- Parameters
sort_by – Which field to sort by. See
TrialSortBy
.order_by – Whether to sort in ascending or descending order. See
TrialOrderBy
.
- iter_trials(sort_by: determined.common.experimental.trial.TrialSortBy = TrialSortBy.ID, order_by: determined.common.experimental._util.OrderBy = OrderBy.ASCENDING, limit: Optional[int] = None) Iterator[determined.common.experimental.trial.Trial] #
Generate an iterator of trials of an experiment.
- Parameters
sort_by – Which field to sort by. See
TrialSortBy
.order_by – Whether to sort in ascending or descending order. See
OrderBy
.limit – Optional field that sets maximum page size of the response from the server. When there are many trials to return, a lower page size can result in shorter latency at the expense of more HTTP requests to the server. Defaults to no maximum.
- Returns
This method returns an Iterable of
Trial
instances that lazily fetches response objects.
- await_first_trial(interval: float = 0.1) determined.common.experimental.trial.Trial #
Wait for the first trial to be started for this experiment.
- Parameters
interval – An interval time in seconds before checking next experiment state.
- Returns
The first trial of the experiment.
- Raises
RuntimeError – If the experiment terminates before a trial starts.
- move_to_project(workspace_name: str, project_name: str) None #
Move an experiment to a different project.
Updates both the local object and the master database with the new project and workspace.
- Parameters
project_name – The name of the destination project for the experiment.
workspace_name – The name of the workspace containing the project.
- wait(interval: float = 5.0) determined.common.experimental.experiment.ExperimentState #
Wait for the experiment to reach a complete or terminal state.
- Parameters
interval – An interval time in seconds before checking next experiment state.
- Returns
The terminal state the experiment is in after waiting
- top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) determined.common.experimental.checkpoint._checkpoint.Checkpoint #
Return the
Checkpoint
for this experiment that has the best validation metric, as defined by thesort_by
andsmaller_is_better
arguments.- Parameters
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is not specified, the metric defined in the experiment configuration
searcher
field will be used.smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.
- list_checkpoints(sort_by: Optional[Union[str, determined.common.experimental.checkpoint._checkpoint.CheckpointSortBy]] = None, order_by: Optional[determined.common.experimental._util.OrderBy] = None, max_results: Optional[int] = None) List[determined.common.experimental.checkpoint._checkpoint.Checkpoint] #
Returns a list of sorted
Checkpoint
instances.Requires either both sort_by and order_by to be defined, or neither. If neither are specified, will default to sorting by the experiment’s configured searcher metric, and ordering by smaller_is_better.
Only checkpoints in a
COMPLETED
state with a matchingCOMPLETED
validation are considered.- Parameters
sort_by – (Optional) Parameter to sort checkpoints by. Accepts either
checkpoint.CheckpointSortBy
or a string representing a validation metric name.order_by – (Optional) Order of sorted checkpoints (ascending or descending).
max_results – (Optional) Maximum number of results to return. Defaults to no maximum.
- Returns
A list of sorted and ordered checkpoints.
- top_n_checkpoints(limit: int, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) List[determined.common.experimental.checkpoint._checkpoint.Checkpoint] #
Return the N
Checkpoint
instances with the best validation metrics, as defined by thesort_by
andsmaller_is_better
arguments. This method will return the best checkpoint from the top N best-performing distinct trials of the experiment. Only checkpoints in aCOMPLETED
state with a matchingCOMPLETED
validation are considered.- Parameters
limit (int) – The maximum number of checkpoints to return.
sort_by (string, optional) – The name of the validation metric to use for sorting checkpoints. If this parameter is unset, the metric defined in the experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Specifies whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.
- delete_tensorboard_files() None #
Delete tensorboard files for this experiment.
- This will remove the directory:
/<root>/tensorboard/experiment/<id>
- from
/<root>/tensorboard/experiment
for the id of this experiment.
- get_pachyderm_config() Dict[str, Any] #
Return the Pachyderm configuration for this experiment.
Pachyderm configs are defined in integrations.pachyderm in the experiment config.
DownloadMode
#
Model
#
- class determined.experimental.client.Model(session: determined.common.api._session.Session, name: str)#
Class representing a model in the model registry.
A Model object is usually obtained from
determined.experimental.client.create_model()
ordetermined.experimental.client.get_model()
. It contains methods for model versions and metadata.- get_version(version: int = - 1) Optional[determined.common.experimental.model.ModelVersion] #
Retrieve the checkpoint corresponding to the specified id of the model version. If the specified version of the model does not exist, an exception is raised.
If no version is specified, the latest version of the model is returned. In this case, if there are no registered versions of the model,
None
is returned.- Parameters
version (int, optional) – The model version ID requested.
- list_versions(order_by: determined.common.experimental._util.OrderBy = OrderBy.DESCENDING) List[determined.common.experimental.model.ModelVersion] #
Get a list of ModelVersions with checkpoints of this model.
The model versions are sorted by model version ID and are returned in descending order by default.
- Parameters
order_by (enum) – A member of the
OrderBy
enum.
- register_version(checkpoint_uuid: str) determined.common.experimental.model.ModelVersion #
Creates a new model version and returns the
ModelVersion
corresponding to the version.- Parameters
checkpoint_uuid – The UUID of the checkpoint to register.
- add_metadata(metadata: Dict[str, Any]) None #
Adds user-defined metadata to the model. The
metadata
argument must be a JSON-serializable dictionary. If any keys from this dictionary already appear in the model’s metadata, the previous dictionary entries are replaced.- Parameters
metadata (dict) – Dictionary of metadata to add to the model.
- remove_metadata(keys: List[str]) None #
Removes user-defined metadata from the model. Any top-level keys that appear in the
keys
list are removed from the model.- Parameters
keys (List[str]) – Top-level keys to remove from the model metadata.
- set_labels(labels: List[str]) None #
Sets user-defined labels for the model. The
labels
argument must be an array of strings. If the model previously had labels, they are replaced.- Parameters
labels (List[str]) – All labels to set on the model.
- archive() None #
Sets the model’s state to archived
- unarchive() None #
Removes the model’s archived state
- delete() None #
Deletes the model in the registry
- reload() None #
Explicit refresh of cached properties.
ModelSortBy
#
ModelVersion
#
- class determined.experimental.model.ModelVersion(session: determined.common.api._session.Session, model_version: int, model_name: str)#
A class representing a combination of Model and Checkpoint.
This class can be fetched using the
model.get_version()
method. Once a model has been added to the registry, checkpoints can be added to it. These registered checkpoints are ModelVersions.- session#
HTTP request session.
- model_version#
(int) Version number assigned by the registry, starting from 1 and incrementing each time a new model version is registered.
- model_name#
(str) Name of the parent model.
- checkpoint#
(Mutable, Optional[checkpoint.Checkpoint]) Checkpoint associated with this model version.
- model_id#
(Mutable, Optional[int]) ID of the parent model.
- metadata#
(Mutable, Optional[Dict]) Metadata of this model version.
- name#
(Mutable, Optional[str]) Human-friendly name of this model version.
Note
All attributes are cached by default.
Mutable properties may be changed by methods that update these values either automatically (eg.
set_name()
,set_notes()
) or explicitly withreload()
.- set_name(name: str) None #
Sets the human-friendly name for this model version
- Parameters
name (string) – New name for model version
- set_notes(notes: str) None #
Sets the human-friendly notes / readme for this model version
- Parameters
notes (string) – Replaces notes for model version in registry
- delete() None #
Deletes the model version in the registry
- iter_metrics(group: Optional[str] = None) Iterable[determined.common.experimental.metrics.TrialMetrics] #
Gets all metrics for a given metric group associated with this model version. The checkpoint can be originally associated by calling
core_context.experimental.report_task_using_model_version(<MODEL_VERSION>)
from within a task.- Parameters
group (str, optional) – Group name for the metrics (example: “training”, “validation”). All metrics will be returned when querying by None.
Project
#
- class determined.experimental.client.Project(session: determined.common.api._session.Session, project_id: int)#
A class representing a Project object.
- id#
(int) The ID of the project.
- key#
(Mutable, str) The key of the project.
- archived#
(Mutable, bool) True if experiment is archived, else false.
- description#
(Mutable, str) The description of the project.
- n_active_experiments#
(int) The number of active experiments in the project.
- n_experiments#
(Mutable, int) The number of experiments in the project.
- name#
(Mutable, str) Human-friendly name of the project.
- notes#
(Mutable, List[Dict[str,str]) Notes about the project. As determined upstream, each note is a dict with exactly the keys “name” and “contents”.
- username#
(Mutable, str) The username of the project owner.
- workspace_id#
(int) The ID of the workspace this project belongs to.
- set_description(description: str) None #
Set the project’s description locally and on master.
- Parameters
description – The new description to set.
- set_key(key: str) None #
Set the project’s key locally and on master.
- Parameters
key – The new key to set.
- set_name(name: str) None #
Set the project’s name locally and on master.
- Parameters
name – The new name to set.
- archive() None #
Set the project to archived (archived = True) locally and on the master.
- unarchive() None #
Set the project to unarchived (archived = False) locally and on the master.
- add_note(name: str, contents: str) None #
Add a note to the project.
Because there is not yet functionality on the backend to add a single note, this method: 1. fetches current notes for this project from the master. 2. adds the new note to the list of notes. 3. sends the updated list of notes to the master.
WARNING: On exit, the object’s notes attribute matches the updated master’s notes, possibly reflecting changes to the project that have happened since the project was last hydrated from master.
- Parameters
name – The name of the note.
contents – The contents of the note.
- remove_note(name: str) None #
Remove a note from the project.
Because there is not yet functionality on the backend to remove a single note, this method: 1. fetches current notes for this project from the master. 2. removes the note with the passed name from the list of notes. 3. sends the updated list of notes to the master.
WARNING: On exit, the object’s notes attribute matches the updated master’s notes, possibly reflecting changes to the project that have happened since the project was last hydrated from master.
- Parameters
name – The name of the note to remove. Note names are not necessarily unique within a project. This function can only remove notes with unique names. If you need to remove a note whose name isn’t unique to this project, you must use the web UI.
- Raises
ValueError – If one of - no note with the passed name is found - multiple notes with the passed name are found
ResourcePool
#
- class determined.experimental.client.ResourcePool(session: determined.common.api._session.Session, name: str = '')#
A class representing a resource pool object.
- name#
(str) The name of the resource pool.
- add_bindings(workspace_names: List[str]) None #
Binds a resource pool to one or more workspaces.
A resource pool with bindings can only be used by workspaces bound to it. Attempting to add a binding that already exists results or binding workspaces or resource pools that do not exist will result in errors.
- remove_bindings(workspace_names: List[str]) None #
Unbinds a resource pool from one or more workspaces.
A resource pool with bindings can only be used by workspaces bound to it. Attempting to remove a binding that does not exist results in a no-op.
- list_workspaces() List[Optional[str]] #
Lists the workspaces bound to a specified resource pool.
A resource pool with bindings can only be used by workspaces bound to it.
- replace_bindings(workspace_names: List[str]) None #
Replaces all the workspaces bound to a resource pool with those specified.
If no bindings exist, new bindings will be added. Binding the same workspace more than once results in an SQL error. Binding workspaces or resource pools that do not exist result in Not Found errors.
Trial
#
- class determined.experimental.client.Trial(trial_id: int, session: determined.common.api._session.Session)#
A class representing a Trial object.
A Trial object is usually obtained from
determined.experimental.client.get_trial()
. Trial reference class used for querying relevantCheckpoint
instances.- trial_id#
ID of trial.
- session#
HTTP request session.
- experiment_id#
(Mutable, Optional[int]) ID of the experiment this trial belongs to.
- hparams#
(Mutable, Optional[Dict]) Dict[name, value] of the trial’s hyperparameters. This is an instance of the hyperparameter space defined by the experiment.
- state#
(Mutable, Optional[TrialState]) Trial state (ex: ACTIVE, PAUSED, COMPLETED).
- summary_metrics#
(Mutable, Optional[Dict]) Summary metrics for the trial. Includes aggregated metrics for training and validation steps for each reported metric name. Example:
{ "avg_metrics": { "loss": { "count": 100, "last": 0.2, "max": 0.4, "min", 0.2, "sum": 1.2, "type": "number", } }
Note
All attributes are cached by default.
The
hparams
andsummary_metrics
attributes are mutable and may be changed by methods that update these values, either automatically or explicitly withreload()
.- iter_logs(follow: bool = False, *, head: Optional[int] = None, tail: Optional[int] = None, container_ids: Optional[List[str]] = None, rank_ids: Optional[List[int]] = None, stdtypes: Optional[List[str]] = None, min_level: Optional[determined.common.experimental.trial.LogLevel] = None, timestamp_before: Optional[Union[str, int]] = None, timestamp_after: Optional[Union[str, int]] = None, sources: Optional[List[str]] = None, search_text: Optional[str] = None) Iterable[str] #
Return an iterable of log lines from this trial meeting the specified criteria.
- Parameters
follow (bool, optional) – If the iterable should block waiting for new logs to arrive. Mutually exclusive with
head
andtail
. Defaults toFalse
.head (int, optional) – When set, only fetches the first
head
lines. Mutually exclusive withfollow
andtail
. Defaults toNone
.tail (int, optional) – When set, only fetches the first
head
lines. Mutually exclusive withfollow
andhead
. Defaults toNone
.container_ids (List[str], optional) – When set, only fetch logs from lines from specific containers. Defaults to
None
.rank_ids (List[int], optional) – When set, only fetch logs from lines from specific ranks. Defaults to
None
.stdtypes (List[int], optional) – When set, only fetch logs from lines from the given stdio outputs. Defaults to
None
(same as["stdout", "stderr"]
).min_level (LogLevel, optional) – When set, defines the minimum log priority for lines that will be returned. Defaults to
None
(all logs returned).timestamp_before (Union[str, int], optional) – Specifies a timestamp that returns only logs before a certain time. Accepts either a string in RFC 3339 format (eg.
2021-10-26T23:17:12Z
) or an int representing the epoch second.timestamp_after (Union[str, int], optional) – Specifies a timestamp that returns only logs after a certain time. Accepts either a string in RFC 3339 format (eg.
2021-10-26T23:17:12Z
) or an int representing the epoch second.sources (List[str], optional) – When set, returns only logs originating from specified node name(s) (eg.
master
oragent
).search_text (str, Optional) – Filters individual logs to only return logs containing the specified string.
- logs(follow: bool = False, *, head: Optional[int] = None, tail: Optional[int] = None, container_ids: Optional[List[str]] = None, rank_ids: Optional[List[int]] = None, stdtypes: Optional[List[str]] = None, min_level: Optional[determined.common.experimental.trial.LogLevel] = None, timestamp_before: Optional[Union[str, int]] = None, timestamp_after: Optional[Union[str, int]] = None, sources: Optional[List[str]] = None, search_text: Optional[str] = None) Iterable[str] #
DEPRECATED: Use iter_logs instead.
- list_checkpoints(sort_by: Optional[Union[str, determined.common.experimental.checkpoint._checkpoint.CheckpointSortBy]] = None, order_by: Optional[determined.common.experimental._util.OrderBy] = None, max_results: Optional[int] = None) List[determined.common.experimental.checkpoint._checkpoint.Checkpoint] #
Returns an iterator of sorted
Checkpoint
instances.Requires either both sort_by and order_by to be defined, or neither. If neither are specified, will default to sorting by the experiment’s configured searcher metric, and ordering by smaller_is_better.
Only checkpoints in a
COMPLETED
state with a matchingCOMPLETED
validation are considered.- Parameters
sort_by – (Optional) Parameter to sort checkpoints by. Accepts either
checkpoint.CheckpointSortBy
or a string representing a validation metric name.order_by – (Optional) Order of sorted checkpoints (ascending or descending).
max_results – (Optional) Maximum number of results to return. Defaults to no maximum.
- Returns
A list of sorted and ordered checkpoints.
- top_checkpoint(sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) determined.common.experimental.checkpoint._checkpoint.Checkpoint #
Return the
Checkpoint
instance with the best validation metric as defined by thesort_by
andsmaller_is_better
arguments.- Parameters
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.
- select_checkpoint(latest: bool = False, best: bool = False, uuid: Optional[str] = None, sort_by: Optional[str] = None, smaller_is_better: Optional[bool] = None) determined.common.experimental.checkpoint._checkpoint.Checkpoint #
Return the
Checkpoint
instance with the best validation metric as defined by thesort_by
andsmaller_is_better
arguments.Exactly one of the
best
,latest
, oruuid
parameters must be set.- Parameters
latest (bool, optional) – Return the most recent checkpoint.
best (bool, optional) – Return the checkpoint with the best validation metric as defined by the
sort_by
andsmaller_is_better
arguments. Ifsort_by
andsmaller_is_better
are not specified, the values from the associated experiment configuration will be used.uuid (string, optional) – Return the checkpoint for the specified UUID.
sort_by (string, optional) – The name of the validation metric to order checkpoints by. If this parameter is unset the metric defined in the related experiment configuration searcher field will be used.
smaller_is_better (bool, optional) – Whether to sort the metric above in ascending or descending order. If
sort_by
is unset, this parameter is ignored. By default, the value ofsmaller_is_better
from the experiment’s configuration is used.
- get_checkpoints(sort_by: Optional[Union[str, determined.common.experimental.checkpoint._checkpoint.CheckpointSortBy]] = None, order_by: Optional[determined.common.experimental._util.OrderBy] = None) List[determined.common.experimental.checkpoint._checkpoint.Checkpoint] #
Return a list of
Checkpoint
instances for the current trial.Either
sort_by
andorder_by
are both specified or neither are.- Parameters
sort_by (string,
CheckpointSortBy
) – Which field to sort by. Strings are assumed to be validation metric names.order_by (
OrderBy
) – Whether to sort in ascending or descending order.
- iter_metrics(group: str) Iterable[determined.common.experimental.metrics.TrialMetrics] #
Generate an iterator of metrics for this trial.
- Parameters
group – The metric group to iterate over. Common values are “validation” and “training”, but group can be any value passed to master when reporting metrics during training (usually via a context’s report_metrics).
- Returns
An iterable of
TrialMetrics
objects.
- reload() None #
Explicit refresh of cached properties.
- get_experiment() determined.common.experimental.experiment.Experiment #
Return the parent
Experiment
for this trial.
TrialMetrics
#
- class determined.experimental.client.TrialMetrics(trial_id: int, trial_run_id: int, steps_completed: int, end_time: datetime.datetime, metrics: Dict[str, Any], group: str, batch_metrics: Optional[List[Dict[str, Any]]] = None)#
Specifies collection of metrics that the trial reported.
- trial_id#
The ID of the trial that reported the metric.
- Type
int
- trial_run_id#
The ID of the trial run that reported the metric.
- Type
int
- steps_completed#
The number of steps that the trial had completed when the metric was reported. Most generally, the value passed to a call to report_metrics as “steps_completed.”
- Type
int
- end_time#
The time when the metric was reported.
- Type
datetime.datetime
- metrics#
A dict of metrics that the trial reported.
- Type
Dict[str, Any]
- group#
The group that the metric was reported under. Usually either “validation” or “training”, but this can be any value passed to master when reporting metrics during training (usually via a context’s report_metrics).
- Type
str
- batch_metrics#
<do not use>
- Type
Optional[List[Dict[str, Any]]]
User
#
- class determined.experimental.client.User(user_id: int, session: determined.common.api._session.Session)#
A User object represents an individual account on a Determined installation.
It can be obtained from
determined.experimental.client.list_users()
ordetermined.experimental.client.get_user_by_name()
.- session#
HTTP request session.
- user_id#
(int) Unique ID for the user in the Determined database.
- username#
(Mutable, Optional[str]) Username of the user in the Determined cluster.
- admin#
(Mutable, Optional[bool]) Whether the user has admin privileges.
- remote#
(Mutable, Optional[bool]) When true, prevents password sign-on and requires user to sign-on using external IdP
- agent_uid#
(Mutable, Optional[int]) UID on the agent this user is linked to.
- agent_gid#
(Mutable, Optional[int]) GID on the agent this user is linked to.
- agent_user#
(Mutable, Optional[str]) Unix user on the agent this user is linked to.
- agent_group#
(Mutable, Optional[str]) Unix group on the agent this user is linked to.
- display_name#
(Mutable, Optional[str]) Human-friendly name of the user.
Note
All attributes are cached by default.
Mutable properties may be changed by methods that update these values either automatically (eg. rename, change_display_name) or explicitly with
reload()
.- change_password(new_password: str) None #
Changes this user’s password.
- Arg:
new_password: password to set.
- Raises
ValueError – an error describing why the password does not meet complexity requirements.
Workspace
#
- class determined.experimental.client.Workspace(session: determined.common.api._session.Session, workspace_id: Optional[int] = None, workspace_name: Optional[str] = None)#
A class representing a Workspace object.
- id#
(int) The ID of the workspace.
- name#
(Mutable, str) The name of the workspace.
- list_pools() List[determined.common.experimental.resource_pool.ResourcePool] #
Lists the resources pools that the workspace has access to. Tasks submitted to this workspace can only use the resource pools listed here.
- create_project(name: str, description: Optional[str] = None) determined.common.experimental.project.Project #
Creates a new project in this workspace with the provided name.
- Parameters
name – The name of the project to create.
description – Optional description to give the new project.
- Returns
The newly-created
Workspace
.- Raises
errors.APIException – If the project with the passed name already exists.
- delete_project(name: str) None #
Deletes a project from this workspace.
- Parameters
name – The name of the project to delete.
- Raises
errors.NotFoundException – If the project with the passed name is not found.
- get_project(project_name: str) determined.common.experimental.project.Project #
Gets a project that is a part of this workspace.
- Parameters
project_name – The name of the project to get.
- Raises
errors.NotFoundException – If the project with the passed name is not found.
- list_projects() List[determined.common.experimental.project.Project] #
Lists all projects that are a part of this workspace.