Custom Searcher Reference#

determined.searcher.LocalSearchRunner#

class determined.searcher.LocalSearchRunner(search_method: determined.searcher._search_method.SearchMethod, searcher_dir: Optional[pathlib.Path] = None)#

LocalSearchRunner performs a search for optimal hyperparameter values, applying the provided SearchMethod. It is executed locally and interacts with a Determined cluster where it starts a multi-trial experiment. It then reacts to event notifications coming from the running experiments by forwarding them to event handler methods in your SearchMethod implementation and sending the returned operations back to the experiment.

run(exp_config: Union[Dict[str, Any], str], model_dir: Optional[str] = None, includes: Optional[Iterable[Union[str, pathlib.Path]]] = None) int#

Run custom search.

Parameters
  • exp_config (dictionary, string) – experiment config filename (.yaml) or a dict.

  • model_dir (string) – directory containing model definition.

  • includes (Iterable[Union[str, pathlib.Path]], optional) – Additional files or directories to include in the model definition. (default: None)

determined.searcher.RemoteSearchRunner#

class determined.searcher.RemoteSearchRunner(search_method: determined.searcher._search_method.SearchMethod, context: determined.core._context.Context)#

RemoteSearchRunner performs a search for optimal hyperparameter values, applying the provided SearchMethod (you will subclass SearchMethod and provide an instance of the derived class). RemoteSearchRunner executes on-cluster: it runs a meta-experiment using Core API.

run(exp_config: Union[Dict[str, Any], str], model_dir: Optional[str] = None, includes: Optional[Iterable[Union[str, pathlib.Path]]] = None) int#

Run custom search as a Core API experiment (on-cluster).

Parameters
  • exp_config (dictionary, string) – experiment config filename (.yaml) or a dict.

  • model_dir (string) – directory containing model definition.

  • includes (Iterable[Union[str, pathlib.Path]], optional) – Additional files or directories to include in the model definition. (default: None)

determined.searcher.SearchMethod#

class determined.searcher.SearchMethod#

The implementation of a custom hyperparameter tuning algorithm.

To implement your specific hyperparameter tuning approach, subclass SearchMethod overriding the event handler methods.

Each event handler, except progress() returns a list of operations (List[Operation]) that will be submitted to master for processing.

Currently, we support the following Operation:

  • Create - starts a new trial with a unique trial id and a set of hyperparameter values.

  • ValidateAfter - sets number of steps (i.e., batches or epochs) after which a validation is run for a trial with a given id.

  • Progress - updates the progress of the multi-trial experiment to the master.

  • Close - closes a trial with a given id.

  • Shutdown - closes the experiment.

Note

Do not modify searcher_state passed into event handlers.

abstract initial_operations(searcher_state: determined.searcher._search_method.SearcherState) List[determined.searcher._search_method.Operation]#

Returns a list of initial operations that the custom hyperparameter search should perform. This is called by the Custom Searcher SearchRunner to initialize the trials

Example:

def initial_operations(self, _: searcher.SearcherState) -> List[searcher.Operation]:
    ops: List[searcher.Operation] = []
    N = 100
    hparams = {
        # ...
    }
    for _ in range(0, N):
        create = searcher.Create(
            request_id=uuid.uuid4(),
            hparams=hparams,
            checkpoint=None,
        )
        ops.append(create)
    return ops
Parameters

searcher_state (SearcherState) – Read-only current searcher state

Returns

Initial list of Operation to start the Hyperparameter search

Return type

List[Operation]

load(path: pathlib.Path) Tuple[determined.searcher._search_method.SearcherState, int]#

Loads searcher state and method-specific state.

load_method_state(path: pathlib.Path) None#

Loads method-specific search state.

abstract on_trial_closed(searcher_state: determined.searcher._search_method.SearcherState, request_id: uuid.UUID) List[determined.searcher._search_method.Operation]#

Informs the searcher that a trial has been closed as a result of a Close

Example:

def on_trial_closed(
    self, searcher_state: SearcherState, request_id: uuid.UUID
) -> List[Operation]:
    if searcher_state.trials_created < self.max_num_trials:
        hparams = {
            # ...
        }
        return [
            searcher.Create(
                request_id=uuid.uuid4(),
                hparams=hparams,
                checkpoint=None,
            )
        ]
    if searcher_state.trials_closed >= self.max_num_trials:
        return [searcher.Shutdown()]
    return []
Parameters
  • searcher_state (SearcherState) – Read-only current searcher state

  • request_id (uuid.UUID) – Request UUID of the Trial that was closed

Returns

List of Operation to run after closing the given trial

Return type

List[Operation]

abstract on_trial_created(searcher_state: determined.searcher._search_method.SearcherState, request_id: uuid.UUID) List[determined.searcher._search_method.Operation]#

Informs the searcher that a trial has been created as a result of Create operation.

Example:

def on_trial_created(
    self, _: SearcherState, request_id: uuid.UUID
) -> List[Operation]:
    return [
        searcher.ValidateAfter(
            request_id=request_id,
            length=1,  # Run for one unit of time (epoch, etc.)
        )
    ]

In this example, we are choosing to deterministically train for one unit of time

Parameters
  • searcher_state (SearcherState) – Read-only current searcher state

  • request_id (uuid.UUID) – Request UUID of the Trial that was created

Returns

List of Operation to run upon creation of the given trial

Return type

List[Operation]

abstract on_trial_exited_early(searcher_state: determined.searcher._search_method.SearcherState, request_id: uuid.UUID, exited_reason: determined.searcher._search_method.ExitedReason) List[determined.searcher._search_method.Operation]#

Informs the searcher that a trial has exited earlier than expected.

Example:

def on_trial_exited_early(
    self,
    searcher_state: SearcherState,
    request_id: uuid.UUID,
    exited_reason: ExitedReason,
) -> List[Operation]:
    if exited_reason == searcher.ExitedReason.USER_CANCELED:
        return [searcher.Shutdown(cancel=True)]
    if exited_reason == searcher.ExitedReason.INVALID_HP:
        return [searcher.Shutdown(failure=True)]
    if searcher_state.failures >= self.max_failures:
        return [searcher.Shutdown(failure=True)]
    return []

Note

The trial has already been internally closed when this callback is run. You do not need to explicitly issue a Close operation

Parameters
  • searcher_state (SearcherState) – Read-only current searcher state

  • request_id (uuid.UUID) – Request UUID of the Trial that exited early

  • exited_reason (ExitedReason) – The reason that the trial exited early

Returns

List of Operation to run in response to the given trial exiting early

Return type

List[Operation]

abstract on_validation_completed(searcher_state: determined.searcher._search_method.SearcherState, request_id: uuid.UUID, metric: Any, train_length: int) List[determined.searcher._search_method.Operation]#

Informs the searcher that the validation workload has completed after training for train_length units. It returns any new operations as a result of this workload completing

Example:

def on_validation_completed(
    self,
    searcher_state: SearcherState,
    request_id: uuid.UUID,
    metric: Any,
    train_length: int
) -> List[Operation]:
    if train_length < self.max_train_length:
        return [
            searcher.ValidateAfter(
                request_id=request_id,
                length=train_length + 1,  # Run an additional unit of time
            )
        ]
    return [searcher.Close(request_id=request_id)]
Parameters
  • searcher_state (SearcherState) – Read-only current searcher state

  • request_id (uuid.UUID) – Request UUID of the Trial that was trained

  • metric (Any) – Metric data returned by the trial

  • train_length (int) – The cumulative units of time that that trial has finished training for (epochs, etc.)

Returns

List of Operation to run upon completion of training for the given trial

Return type

List[Operation]

abstract progress(searcher_state: determined.searcher._search_method.SearcherState) float#

Returns experiment progress as a float between 0 and 1.

Example:

def progress(self, searcher_state: SearcherState) -> float:
    return searcher_state.trials_closed / float(self.max_num_trials)
Parameters

searcher_state (SearcherState) – Read-only current searcher state

Returns

Experiment progress as a float between 0 and 1.

Return type

float

save(searcher_state: determined.searcher._search_method.SearcherState, path: pathlib.Path, *, experiment_id: int) None#

Saves the searcher state and the search method state. It will be called by the SearchRunner after receiving operations from the SearchMethod

save_method_state(path: pathlib.Path) None#

Saves method-specific state

determined.searcher.SearcherState#

class determined.searcher.SearcherState#

Custom Searcher State.

Search runners maintain this state that can be used by a SearchMethod to inform event handling. In other words, this state can be taken into account when deciding which operations to return from your event handler. Do not modify SearcherState in your SearchMethod. If your hyperparameter tuning algorithm needs additional state variables, add those variable to your SearchMethod implementation.

failures#

number of failed trials

Type

Set[uuid.UUID]

trial_progress#

progress of each trial as a number between 0.0 and 1.0

Type

Dict[uuid.UUID, float]

trials_closed#

set of completed trials

Type

Set[uuid.UUID]

trials_created#

set of created trials

Type

Set[uuid.UUID]

determined.searcher.Operation#

class determined.searcher.Operation#

Abstract base class for all Operations

determined.searcher.Close#

class determined.searcher.Close(request_id: uuid.UUID)#

Operation for closing the specified trial

determined.searcher.Progress#

class determined.searcher.Progress(progress: float)#

Operation for signalling the relative progress of the hyperparameter search between 0 and 1

determined.searcher.Create#

class determined.searcher.Create(request_id: uuid.UUID, hparams: Dict[str, Any], checkpoint: Optional[determined.common.experimental.checkpoint._checkpoint.Checkpoint])#

Operation for creating a trial with a specified combination of hyperparameter values

determined.searcher.ValidateAfter#

class determined.searcher.ValidateAfter(request_id: uuid.UUID, length: int)#

Operation signaling the trial to train until its total units trained equals the specified length, where the units (batches, epochs, etc.) are specified in the searcher section of the experiment configuration

determined.searcher.Shutdown#

class determined.searcher.Shutdown(cancel: bool = False, failure: bool = False)#

Operation for shutting the experiment down

determined.searcher.ExitedReason#

class determined.searcher.ExitedReason(value)#

The reason why a trial exited early

The following reasons are supported:

  • ERRORED: The Trial encountered an exception

  • USER_CANCELLED: The Trial was manually closed by the user

  • INVALID_HP: The hyperparameters the trial was created with were invalid