Skip to content
Luigi Acerbi edited this page Sep 12, 2019 · 117 revisions

BADS: Frequently Asked Questions

This FAQ is curated by Luigi Acerbi, and in constant expansion. If you have questions not covered here, please feel free to ask me at [email protected] (putting 'BADS' in the subject of the email).

Table of contents

General

  • Which kind of problems is BADS suited for?

    We recommend BADS for problems in which:

    • the objective function landscape is rough (nonsmooth), typically due to numerical approximations or noise;
    • the objective function is moderately expensive to compute (e.g., more than 0.1 s per function evaluation);
    • the gradient is unavailable;
    • the number of input parameters is up to about D = 15 (maybe 20).

    If your objective function is fully analytical, BADS is most likely not suited for your problem (see below).

  • The performance of BADS on the real model-fitting problems reported in the paper is remarkable. Did you cherry-pick the results?

    No, but we selected projects that we thought BADS would have been suitable for.

  • What do I do if BADS is not suited for my problem?

    If the objective function is smooth and analytical, we would recommend to use fmincon instead (possibly feeding it the analytically calculated gradient). If you can afford tens or even hundreds of thousand function evaluations, cmaes (with active covariance adaptation) is also a valid alternative.

    In these cases, you may also consider computing the full posterior, instead of getting only a point estimate via optimization. To this end, you could use Markov Chain Monte Carlo, e.g. via Stan or PyMC3. Alternatively, we developed a method to compute approximate posterior distributions, Variational Bayesian Monte Carlo, which can be used in synergy with BADS.

Installation

  • Where can I download BADS?

    Here.

  • Which MATLAB toolboxes or external packages does BADS require?

    BADS does not require the separate installation of any external package. (BADS uses GPML v3.6, which is already included in the installation bundle.)

    BADS runs with a bare installation of MATLAB (no other toolboxes required). Having the MATLAB Optimization Toolbox™ speeds things up for the optimization of the Gaussian process hyperparameters.

  • I am having trouble installing BADS. Can you help?

    Sure. The BADS installation should be pretty straightforward, so write me in detail which problem you are having.

Input arguments

  • What is the objective function?

    The objective or target function is the function that you want BADS to minimize. It is specified by a function handle fun.

    For a typical model-fitting problem, fun is a function that computes the negative log likelihood of an input parameter vector x, for a given dataset and model.

  • Why the negative log likelihood?

    By mathematical convention, BADS minimizes the objective function, as most other optimization algorithms.

    In the typical model-fitting scenario we want to maximize the likelihood. Which is the same as maximizing the log likelihood. Which is the same as minimizing minus the log likelihood, aka the negative log likelihood.

  • My objective function requires additional data/inputs. How do I pass them to BADS?

    Suppose that your function takes two inputs, fun(x,data).

    The first solution consists of defining a new function handle

    funwdata = @(x) fun(x,data);

    where data has been defined before in the code. Now you can optimize funwdata, which takes a single input. Note that a copy of data is embedded in funwdata. If you subsequently make any change to the original data, you will need to redefine the function handle if you want the changes to reflect in the optimization.

    Alternatively, bads can take additional inputs after OPTIONS (the eight argument). Any additional input is passed to the objective function. In this case, you would write

    X = bads(fun,X0,LB,UB,PLB,PUB,NONBCON,OPTIONS,data) 
  • How do I choose the starting point X0?

    First of all, keep in mind that you should restart BADS from different starting points. Probably a minimum of ten, ideally dozens, depending on your problem.

    We recommend to choose starting points mostly inside the plausible box bounded by PLB and PUB, for example

    nvars = numel(PLB);
    X0 = PLB + rand(1,nvars) .* (PUB - PLB);

    If you think that you would like to also draw points outside PLB and PUB, then by definition it means that your choice of PLB and PUB is too narrow (see also below).

  • How do I choose LB and UB?

    LB and UB are the hard bounds of the optimization. In theory, you could set them to the mathematical limits of your variables. However, using the mathematical limits of a variable is a bad choice for optimization. Instead, we recommend to set them to no wider than their physical or experimental limits.

    For example, suppose that you have a parameter sigma that represents the standard deviation (SD) of the movement endpoint of a subject in task in which people are asked to rapidly touch targets on a screen. Mathematically, sigma, being a SD, could go from 0 to Inf. However, in this case, it is physically unrealistic that people would have no motor noise. Instead, we set as LB our experimental lower bound, e.g., the resolution of our motion tracker device, or maybe one screen pixel. Similarly, it is physically impossible for people's pointing error to be larger than, say, the length of their forearms. In fact, we could set as UB the size of the screen.

    Importantly, do not set LB to 0 for variables that can only be positive. Choose a small, experimentally meaningful number. Do not pick extremely small numbers such as eps (which is about 1e-16), unless they are justified in the context of your problem.

  • Does BADS support (partially) unconstrained optimization?

    Yes and no. You can specify that a variable is (partially) unconstrained by setting its hard bounds to -Inf or Inf.

    However, we encourage users to always set finite, empirically meaningful hard bounds (see the previous question). Infinities are never empirically meaningful, unless perhaps if you are in a black hole. For this reason, support for infinite bounds might be removed in future versions of BADS.

  • Can I set LB = UB for some variable to fix it to a given value?

    Yes, you can. Fixed variables are variables for which X0,LB,UB,PLB, and PUB are all equal. These variables will become constants, and BADS will internally run on an optimization problem with reduced dimensionality.

  • How do I choose PLB and PUB?

    PLB and PUB are the plausible (or reasonable) bounds of the optimization. Set them by thinking of a plausible range in which you would expect to find almost all solutions. The plausible box defined by PLB and PUB naturally represents a good region where to randomly draw starting points for the optimization (see above). If you really have no idea about a plausible range, you can leave PLB or PUB empty, or set them equal to LB and UB, but this should not be the norm.

    In the example above (see previous question), the plausible bounds for sigma (pointing motor noise) could go from a few pixels for PLB and several cm for PUB.

  • How do I prevent BADS from evaluating certain inputs or regions of input space?

    If these regions can be identified by coordinate-wise ranges, use LB and UB. Otherwise, use the barrier function NONBCON(X) (non-bound constraints). Set NONBCON to return values greater than 0 for non-allowed values of X.

    Do not have FVAL(X) return Inf or NaN for invalid inputs. BADS would simply crash (see this question).

    Absolutely do NOT have FVAL(X) return an arbitrarily large number to enforce BADS to avoid certain regions. This strategy may seem innocent enough, but in fact it completely cripples BADS by making its models of the objective function nonsensical.

  • Does BADS support integer constraints?

    No, the current version of BADS does not support integer constraints (that is, variables forced to be integers). However, this feature might be included in future versions of BADS.

    In the meanwhile, as a simple fix, you could adopt the following hack if you have a single integer variable m:

    • make the parameter m continuous for the purpose of optimization;
    • for a given parameter vector, evaluate separately the log likelihood at floor(m) and ceil(m);
    • return the linearly interpolated value of the log likelihood;
    • at the end of the optimization, either return round(m), or evaluate your function (several times, if noisy) at both floor(m) and ceil(m) and pick the best.

    Note that this will double the cost of each evaluation, so it is worth only if alternative solutions (such as looping over all values of m) would be computationally more expensive. Finally, this approach makes sense only if m is truly an integer (ordered and with an underlying metric), as opposed to merely a categorical variable.

Output arguments

  • How is FVAL computed?

    FVAL is the (estimated) value of the objective function at X, the returned optimum.

    For a deterministic (not-noisy) objective, this is simply FUN(X), where FUN is the objective function handle. For a noisy objective, FVAL is an estimate obtained by averaging several evaluations of FUN(X), whose number is defined by OPTIONS.NoiseFinalSamples (default is OPTIONS.NoiseFinalSamples = 10).

  • Why do you estimate FVAL by averaging additional function evaluations? Can't you return the Gaussian process prediction at X?

    Glad that you asked. Yes, in theory we could use the Gaussian process (GP) mean prediction at X. However, the GP prediction can occasionally fail, sometimes subtly. While this mismatch is not a major problem during optimization, it could potentially introduce hard-to-detect but substantial biases in FVAL, which could have catastrophic effects for model selection. For this reason, we chose a more conservative approach for estimating FVAL.

  • How is overhead in OUTPUT defined?

    This field of OUTPUT contains the fractional overhead, defined as (total running time / total function time - 1). Typically, you would expect overhead to be (much) smaller than 1 for normal runs of BADS. If the overhead is larger than, say, 0.75, your problem affords fast evaluations and it is possible that it would benefit from other algorithms than BADS (such as fmincon or cmaes).

    For BADS test problems and examples, you will find that the reported overhead is astronomical, which is expected since for demonstration purposes we are using simple analytical functions.

  • What's in the undocumented outputs of BADS?

    BADS currently has two undocumented outputs, OPTIMSTATE and GPSTRUCT. The former contains a large number of fields related to the state of the optimization, whereas the latter contains information about the GP model. As of now, these structs are returned for debugging purposes, and we are not providing explicit support. Future versions of BADS might change the interface or the internal structure of these outputs.

Noisy objective function

  • What is a noisy objective function?

    A noisy (or stochastic) objective function is an objective that will return different results if evaluated twice at the same point x. A non-noisy objective function is deterministic.

    For model fitting, objective functions can be noisy if the log likelihood is evaluated through simulation (e.g., via Monte Carlo methods).

  • Why are noisy objective functions treated differently?

    For a deterministic objective, we assume that the goal is to minimize f(x). For a noisy objective, we assume that the goal is to minimize the expected value of f(x), also written as E[f(x)]. For this reason, BADS will not simply blindly trust whatever f(x) returns, but will do some internal computation to estimate E[f(x)] (effectively, smoothing the observed function values via a Gaussian process).

    Incidentally, this means that ideally the function that you provide (and that computes the negative log likelihood) should be an unbiased estimator of the negative log likelihood, but this is a story for another time.

  • Can I make a noisy objective function deterministic by fixing the noise process?

    Well, technically yes, you could make a noisy objective function deterministic by fixing the random seed (e.g., via rng) every time you call it. However, this fix does not really solve the problem, because you are not eliminating the noise in the function observations. In fact, if you do it naively, you might be adding unwanted bias to your fits.

    Thus, it is not recommended to 'fix' the noise this way (by fixing the random seed at each function call). Instead, let your function be stochastic, and let BADS deal with it.

    Note that this is different from setting the random seed once at the beginning of an optimization run, for the sake of reproducibility, which is recommended as good practice.

  • Should I tell BADS that my objective is noisy?

    Please do so. Set OPTIONS.UncertaintyHandling = 1 to tell BADS that the optimization is noisy.

    If you forget about it, BADS will determine at initialization whether the provided objective is noisy, by evaluating it twice at X0. Note that this test can occasionally fail (about once every ten billion times).

  • Can BADS handle any arbitrary amount of noise in the objective?

    No. BADS works best if the standard deviation of the objective function, when evaluated in the vicinity of the solution, is small with respect to changes in the objective function itself (that is, there is a good signal-to-noise ratio). In many cases, a standard deviation of order 1 or less should work (this is the default assumption). If you approximately know the magnitude of the noise in the vicinity of the solution, you can help BADS by specifying it in advance (set OPTIONS.NoiseSize = sigmaest, where sigmaest is your estimate of the standard deviation).

    If the noise around the solution is too large, BADS will perform poorly. In that case, we recommend to increase the precision of your computation of the objective (e.g., by drawing more Monte Carlo samples) such that sigmaest is of order 1 or even lower, as needed by your problem (see also this question). Note that the noise farther away from the solution can be larger, and this is usually okay.

  • Does BADS assume that the noise is the same for all inputs?

    Yes and no. The Gaussian process (GP) model built by BADS is homoskedastic, that is, it assumes constant noise across the input space. However, the GP model is built using only a local set of points, so BADS will adapt to local characteristics of the objective function, including amounts of noise that depend on the location.

Display

  • What are the quantities displayed by BADS during optimization?

    If OPTIONS.Display is set to iter, BADS displays the traces of several optimization quantities:

    • the iteration number;
    • the number of objective function evaluations f-count;
    • the value of f(x) at the incumbent (current point);
    • the normalized MeshScale size (that is, the POLL size parameter normalized to the plausible box);
    • the current optimization stage and method (POLL or SEARCH, search type, and whether the are successful);
    • additional actions (such as re-training the Gaussian process).

    If the objective function is noisy, instead of f(x) BADS will report the expected value E[f(x)] and its standard deviation SD[f(x)] at the incumbent, both estimated via the current Gaussian process model.

  • For a noisy function, I noticed that the series of displayed E[f(x)] values is not monotonically decreasing. Should I worry?

    Well spotted, but nothing to worry about. BADS keeps updating the estimate of E[f(x)] at the incumbent, which means that occasionally this value will increase across iterations, and sometimes it will oscillate for a few iterations. Also, if the noise is large, the Gaussian process approximation might occasionally fail, leading to outlier estimates for E[f(x)] (which should then recover in the subsequent iterations). All of this is part of the normal functioning of BADS.

  • Sometimes as Actions during optimization I read Train (fail). What does it mean?

    It means that BADS was unable to refit the Gaussian process model to the current local training set, usually due to numerical issues. Occasional failures are not reason of concern, in particular at the beginning or towards the end of the optimization. However, if a large number of training attempts are systematically failing, it might mean that BADS is having trouble. Sometimes this can be fixed by changing the problem parameterization, or perhaps there are other issues with the model.

Troubleshooting

  • BADS crashes saying that The returned function value must be a finite real-valued scalar. What do I do?

    This error means that your objective function has returned Inf, NaN, or a complex number. You should check your code and understand why it returned a non-finite or complex value.

    Infs and NaNs often arise because there are outcomes in your dataset (e.g., responses in a trial) to which the tested model assigns a probability of 0 (usually due to numerical truncation). log(0) yields -Inf, which is then propagated. In these cases, we recommend to make the model more robust by forcing all outcomes (e.g., the likelihood associated with each trial) to have a minimum non-null probability, such as sqrt(eps) or some other small value. This should not be necessary if the model already includes a non-zero lapse rate.

    Complex numbers usually arise because you are taking either a log or a sqrt of a negative number (of a quantity that should not be negative). You might be setting wrong bounds for your variables, or maybe there are indexing issues.

    Note that some MATLAB optimizers, such as fmincon, are robust to Infs and NaNs and just keep going, avoiding the problematic region. However, we believe this is dangerous as it might hide deeper issues with the model implementation.

  • During optimization I received a warning that The mesh attempted to expand above maximum size too many times. What does it mean?

    It means that probably your PLB or PUB bounds are too narrow; try widening them. If these are already as wide as LB and UB, it might be that your hard bounds are too narrow.

    If you do not think that this is the case, you can disable this warning by setting OPTIONS.MeshOverflowsWarning = Inf.

  • I am passing a NONBCON function handle to BADS, but I get an error that NONBCON should be a function handle that takes a matrix X as input and returns a column vector of bound violations. What am I doing wrong?

    Most likely, your NONBCON accepts only a vector input and returns a scalar, whereas you should be sure that it takes a matrix input, and returns a vector of constraint violations.

    For example, suppose that your input variables need to be ordered, such that X(1) <= X(2) and X(2) <= X(3). Then you should set

    NONBCON = @(X) X(:,1) > X(:,2) | X(:,2) > X(:,3);

    whereas NONBCON = @(X) X(1) > X(2) | X(2) > X(3) would yield an error.

  • I have been running BADS with a deterministic objective function from the same starting point, but I get different results each time. Is something wrong?

    Nothing is wrong per se. BADS is a stochastic optimizer, so results may differ between different runs, even with the same initial condition X0. If the returned FVAL varies substantially across runs from the same starting point, it might be a sign that your function landscape is particularly difficult. If FVAL is similar across runs, but the returned optimum X varies substantially, it is a sign that your function has a plateau or ridge, with trade-offs between parameters.

    If you want to have reproducible results (and this advice applies beyond BADS), we recommend to fix MATLAB's random seed to some known quantity via rng (e.g., setting it to the run number).

  • I have been running BADS with a stochastic objective function from different starting point and I get different results each time. What can I do?

    First of all, slightly different results are expected if your target function is noisy (be sure to have read and understood all the points under the noisy objective function section of the FAQ). So, if you are asking this question, I assume that it is because you find wildly different results.

    Generally, substantially different results suggest that BADS is getting stuck due to excess noise in the target function with respect to actual improvements of the function in the neighborhood of the current point. For example, even if the expected value of the target function would have a non-zero gradient, it might be too hard for BADS to find a direction of improvement due to low signal-to-noise ratio, and thus both the poll and search steps repeatedly fail. For this reason, it is common for BADS to get stuck at various points in nearly-flat regions of parameter space.

    The general solution of this problem, as also mentioned here, is to decrease the amount of noise in the target function (e.g., if you estimate the target function with Monte Carlo sampling, try increasing the number of samples).

Miscellanea

  • This is interesting, but shouldn't we ideally compute full posterior distributions?

    Yes, we should! However, given the typical class of model-fitting problems BADS is designed for (see here), obtaining the full posterior, or even an approximation thereof, can be a challenging task. We recently developed a novel method and related toolbox, Variational Bayesian Monte Carlo, which addresses exactly this problem. Check it out!

  • Can BADS return an approximate posterior, e.g. by computing the Hessian at the optimum?

    First, just to clarify, the Hessian is a matrix of second derivatives, which can be used to build a crude approximation of the posterior via Laplace's method. The answer to the question is nope, BADS cannot return the Hessian or an approximate posterior. The reason is that even if BADS tries to build a local Gaussian process approximation of the objective function, this might fail and we cannot trust this approximation at all to represent a valid posterior.

    Instead, you should look into Variational Bayesian Monte Carlo, a method that we developed specifically to compute approximate posterior distributions, and that can be used in synergy with BADS.

  • Are you planning to port BADS to other languages?

    There is a plan to port BADS to Python, although there is no roadmap for that yet. It will depend on several factors, primarily whether I can find someone to help me along the way. See also here for updates.

Clone this wiki locally