Defining a Search Space

A search space consists of nested function expressions, including stochastic expressions. The stochastic expressions are the hyperparameters. Sampling from this nested stochastic program defines the random search algorithm. The hyperparameter optimization algorithms work by replacing normal "sampling" logic with adaptive exploration strategies, which make no attempt to actually sample from the distributions specified in the search space.

It's best to think of search spaces as stochastic argument-sampling programs. For example

from hyperopt import hp
space = hp.choice('a',
    [
        ('case 1', 1 + hp.lognormal('c1', 0, 1)),
        ('case 2', hp.uniform('c2', -10, 10))
    ])

The result of running this code fragment is a variable space that refers to a graph of expression identifiers and their arguments. Nothing has actually been sampled, it's just a graph describing how to sample a point. The code for dealing with this sort of expression graph is in hyperopt.pyll and I will refer to these graphs as pyll graphs or pyll programs.

If you like, you can evaluate a sample space by sampling from it.

import hyperopt.pyll.stochastic
print hyperopt.pyll.stochastic.sample(space)

This search space described by space has 3 parameters:

One thing to notice here is that every optimizable stochastic expression has a label as the first argument. These labels are used to return parameter choices to the caller, and in various ways internally as well.

A second thing to notice is that we used tuples in the middle of the graph (around each of 'case 1' and 'case 2'). Lists, dictionaries, and tuples are all upgraded to "deterministic function expressions" so that they can be part of the search space stochastic program.

A third thing to notice is the numeric expression 1 + hp.lognormal('c1', 0, 1), that is embedded into the description of the search space. As far as the optimization algorithms are concerned, there is no difference between adding the 1 directly in the search space and adding the 1 within the logic of the objective function itself. As the designer, you can choose where to put this sort of processing to achieve the kind modularity you want. Note that the intermediate expression results within the search space can be arbitrary Python objects, even when optimizing in parallel using mongodb. It is easy to add new types of non-stochastic expressions to a search space description, see below (Section 2.3) for how to do it.

A fourth thing to note is that 'c1' and 'c2' are examples what we will call conditional parameters. Each of 'c1' and 'c2' only figures in the returned sample for a particular value of 'a'. If 'a' is 0, then 'c1' is used but not 'c2'. If 'a' is 1, then 'c2' is used but not 'c1'. Whenever it makes sense to do so, you should encode parameters as conditional ones this way, rather than simply ignoring parameters in the objective function. If you expose the fact that 'c1' sometimes has no effect on the objective function (because it has no effect on the argument to the objective function) then search can be more efficient about credit assignment.

Parameter Expressions

The stochastic expressions currently recognized by hyperopt's optimization algorithms are:

A Search Space Example: scikit-learn

To see all these possibilities in action, let's look at how one might go about describing the space of hyperparameters of classification algorithms in scikit-learn. (This idea is being developed in hyperopt-sklearn)

from hyperopt import hp
space = hp.choice('classifier_type', [
    {
        'type': 'naive_bayes',
    },
    {
        'type': 'svm',
        'C': hp.lognormal('svm_C', 0, 1),
        'kernel': hp.choice('svm_kernel', [
            {'ktype': 'linear'},
            {'ktype': 'RBF', 'width': hp.lognormal('svm_rbf_width', 0, 1)},
            ]),
    },
    {
        'type': 'dtree',
        'criterion': hp.choice('dtree_criterion', ['gini', 'entropy']),
        'max_depth': hp.choice('dtree_max_depth',
            [None, hp.qlognormal('dtree_max_depth_int', 3, 1, 1)]),
        'min_samples_split': hp.qlognormal('dtree_min_samples_split', 2, 1, 1),
    },
    ])

Adding Non-Stochastic Expressions with pyll

You can use such nodes as arguments to pyll functions (see pyll). File a github issue if you want to know more about this.

In a nutshell, you just have to decorate a top-level (i.e. pickle-friendly) function so that it can be used via the scope object.

import hyperopt.pyll
from hyperopt.pyll import scope

@scope.define
def foo(a, b=0):
     print 'runing foo', a, b
     return a + b / 2

# -- this will print 0, foo is called as usual.
print foo(0)

# In describing search spaces you can use `foo` as you
# would in normal Python. These two calls will not actually call foo,
# they just record that foo should be called to evaluate the graph.

space1 = scope.foo(hp.uniform('a', 0, 10))
space2 = scope.foo(hp.uniform('a', 0, 10), hp.normal('b', 0, 1))

# -- this will print an pyll.Apply node
print space1

# -- this will draw a sample by running foo()
print hyperopt.pyll.stochastic.sample(space1)

Adding New Kinds of Hyperparameter

Adding new kinds of stochastic expressions for describing parameter search spaces should be avoided if possible. In order for all search algorithms to work on all spaces, the search algorithms must agree on the kinds of hyperparameter that describe the space. As the maintainer of the library, I am open to the possibility that some kinds of expressions should be added from time to time, but like I said, I would like to avoid it as much as possible. Adding new kinds of stochastic expressions is not one of the ways hyperopt is meant to be extensible.