-
Notifications
You must be signed in to change notification settings - Fork 38
Home
This FAQ is curated by Luigi Acerbi, and in constant expansion. If you have questions not covered here, please feel free to ask me at [email protected] (putting 'BADS' in the subject of the email).
- General
- Installation
-
Input arguments
- What is the objective function?
- Why the negative log likelihood?
- My objective function requires additional data/inputs. How do I pass them to BADS?
- How do I choose the starting point
X0
? - How do I choose
LB
andUB
? - Does BADS support (partially) unconstrained optimization?
- Can I set
LB = UB
for some variable to fix it to a given value? - How do I choose
PLB
andPUB
? - How do I prevent BADS from evaluating certain inputs or regions of input space?
- Does BADS support integer constraints?
- Output arguments
-
Noisy objective function
- What is a noisy objective function?
- Why are noisy objective functions treated differently?
- Can I make a noisy objective function deterministic by fixing the noise process?
- Should I tell BADS that my objective is noisy?
- Can BADS handle any arbitrary amount of noise in the objective?
- Does BADS assume that the noise is the same for all inputs?
- Display
-
Troubleshooting
- BADS crashes saying that
The returned function value must be a finite real-valued scalar
. What do I do? - During optimization I received a warning that
The mesh attempted to expand above maximum size too many times
. What does it mean? - I am passing a
NONBCON
function handle to BADS, but I get an error thatNONBCON should be a function handle that takes a matrix X as input and returns a column vector of bound violations
. What am I doing wrong? - I have been running BADS with a deterministic objective function from the same starting point, but I get different results each time. Is something wrong?
- I have been running BADS with a stochastic objective function from different starting points and I get different results each time. What can I do?
- BADS crashes saying that
- Miscellanea
-
We recommend BADS for problems in which:
- the objective function landscape is rough (nonsmooth), typically due to numerical approximations or noise;
- the objective function is moderately expensive to compute (e.g., more than 0.1 s per function evaluation);
- the gradient is unavailable;
- the number of input parameters is up to about
D = 15
(maybe20
).
If your objective function is fully analytical, BADS is most likely not suited for your problem (see below).
-
The performance of BADS on the real model-fitting problems reported in the paper is remarkable. Did you cherry-pick the results?
No, but we selected projects that we thought BADS would have been suitable for.
-
If the objective function is smooth and analytical, we would recommend to use
fmincon
instead (possibly feeding it the analytically calculated gradient). If you can afford tens or even hundreds of thousand function evaluations,cmaes
(with active covariance adaptation) is also a valid alternative.In these cases, you may also consider computing the full posterior, instead of getting only a point estimate via optimization. To this end, you could use Markov Chain Monte Carlo, e.g. via Stan or PyMC3. Alternatively, we developed a method to compute approximate posterior distributions, Variational Bayesian Monte Carlo, which can be used in synergy with BADS.
-
Here.
-
BADS does not require the separate installation of any external package. (BADS uses GPML v3.6, which is already included in the installation bundle.)
BADS runs with a bare installation of MATLAB (no other toolboxes required). Having the MATLAB Optimization Toolbox™ speeds things up for the optimization of the Gaussian process hyperparameters.
-
Sure. The BADS installation should be pretty straightforward, so write me in detail which problem you are having.
-
The objective or target function is the function that you want BADS to minimize. It is specified by a function handle
fun
.For a typical model-fitting problem,
fun
is a function that computes the negative log likelihood of an input parameter vectorx
, for a given dataset and model. -
By mathematical convention, BADS minimizes the objective function, as most other optimization algorithms.
In the typical model-fitting scenario we want to maximize the likelihood. Which is the same as maximizing the log likelihood. Which is the same as minimizing minus the log likelihood, aka the negative log likelihood.
-
Suppose that your function takes two inputs,
fun(x,data)
.The first solution consists of defining a new function handle
funwdata = @(x) fun(x,data);
where
data
has been defined before in the code. Now you can optimizefunwdata
, which takes a single input. Note that a copy ofdata
is embedded infunwdata
. If you subsequently make any change to the originaldata
, you will need to redefine the function handle if you want the changes to reflect in the optimization.Alternatively,
bads
can take additional inputs afterOPTIONS
(the eight argument). Any additional input is passed to the objective function. In this case, you would writeX = bads(fun,X0,LB,UB,PLB,PUB,NONBCON,OPTIONS,data)
-
First of all, keep in mind that you should restart BADS from different starting points. Probably a minimum of ten, ideally dozens, depending on your problem.
We recommend to choose starting points mostly inside the plausible box bounded by
PLB
andPUB
, for examplenvars = numel(PLB); X0 = PLB + rand(1,nvars) .* (PUB - PLB);
If you think that you would like to also draw points outside
PLB
andPUB
, then by definition it means that your choice ofPLB
andPUB
is too narrow (see also below). -
LB
andUB
are the hard bounds of the optimization. In theory, you could set them to the mathematical limits of your variables. However, using the mathematical limits of a variable is a bad choice for optimization. Instead, we recommend to set them to no wider than their physical or experimental limits.For example, suppose that you have a parameter
sigma
that represents the standard deviation (SD) of the movement endpoint of a subject in task in which people are asked to rapidly touch targets on a screen. Mathematically,sigma
, being a SD, could go from0
toInf
. However, in this case, it is physically unrealistic that people would have no motor noise. Instead, we set asLB
our experimental lower bound, e.g., the resolution of our motion tracker device, or maybe one screen pixel. Similarly, it is physically impossible for people's pointing error to be larger than, say, the length of their forearms. In fact, we could set asUB
the size of the screen.Importantly, do not set
LB
to0
for variables that can only be positive. Choose a small, experimentally meaningful number. Do not pick extremely small numbers such aseps
(which is about1e-16
), unless they are justified in the context of your problem. -
Yes and no. You can specify that a variable is (partially) unconstrained by setting its hard bounds to
-Inf
orInf
.However, we encourage users to always set finite, empirically meaningful hard bounds (see the previous question). Infinities are never empirically meaningful, unless perhaps if you are in a black hole. For this reason, support for infinite bounds might be removed in future versions of BADS.
-
Yes, you can. Fixed variables are variables for which
X0
,LB
,UB
,PLB
, andPUB
are all equal. These variables will become constants, and BADS will internally run on an optimization problem with reduced dimensionality. -
PLB
andPUB
are the plausible (or reasonable) bounds of the optimization. Set them by thinking of a plausible range in which you would expect to find almost all solutions. The plausible box defined byPLB
andPUB
naturally represents a good region where to randomly draw starting points for the optimization (see above). If you really have no idea about a plausible range, you can leavePLB
orPUB
empty, or set them equal toLB
andUB
, but this should not be the norm.In the example above (see previous question), the plausible bounds for
sigma
(pointing motor noise) could go from a few pixels forPLB
and several cm forPUB
. -
If these regions can be identified by coordinate-wise ranges, use
LB
andUB
. Otherwise, use the barrier functionNONBCON(X)
(non-bound constraints). SetNONBCON
to return values greater than0
for non-allowed values ofX
.Do not have
FVAL(X)
returnInf
orNaN
for invalid inputs. BADS would simply crash (see this question).Absolutely do NOT have
FVAL(X)
return an arbitrarily large number to enforce BADS to avoid certain regions. This strategy may seem innocent enough, but in fact it completely cripples BADS by making its models of the objective function nonsensical. -
No, the current version of BADS does not support integer constraints (that is, variables forced to be integers). However, this feature might be included in future versions of BADS.
In the meanwhile, as a simple fix, you could adopt the following hack if you have a single integer variable
m
:- make the parameter
m
continuous for the purpose of optimization; - for a given parameter vector, evaluate separately the log likelihood at
floor(m)
andceil(m)
; - return the linearly interpolated value of the log likelihood;
- at the end of the optimization, either return
round(m)
, or evaluate your function (several times, if noisy) at bothfloor(m)
andceil(m)
and pick the best.
Note that this will double the cost of each evaluation, so it is worth only if alternative solutions (such as looping over all values of
m
) would be computationally more expensive. Finally, this approach makes sense only ifm
is truly an integer (ordered and with an underlying metric), as opposed to merely a categorical variable. - make the parameter
-
FVAL
is the (estimated) value of the objective function atX
, the returned optimum.For a deterministic (not-noisy) objective, this is simply
FUN(X)
, whereFUN
is the objective function handle. For a noisy objective,FVAL
is an estimate obtained by averaging several evaluations ofFUN(X)
, whose number is defined byOPTIONS.NoiseFinalSamples
(default isOPTIONS.NoiseFinalSamples = 10
). -
Why do you estimate
FVAL
by averaging additional function evaluations? Can't you return the Gaussian process prediction atX
?Glad that you asked. Yes, in theory we could use the Gaussian process (GP) mean prediction at
X
. However, the GP prediction can occasionally fail, sometimes subtly. While this mismatch is not a major problem during optimization, it could potentially introduce hard-to-detect but substantial biases inFVAL
, which could have catastrophic effects for model selection. For this reason, we chose a more conservative approach for estimatingFVAL
. -
This field of
OUTPUT
contains the fractional overhead, defined as (total running time / total function time - 1). Typically, you would expectoverhead
to be (much) smaller than 1 for normal runs of BADS. If the overhead is larger than, say, 0.75, your problem affords fast evaluations and it is possible that it would benefit from other algorithms than BADS (such asfmincon
orcmaes
).For BADS test problems and examples, you will find that the reported overhead is astronomical, which is expected since for demonstration purposes we are using simple analytical functions.
-
BADS currently has two undocumented outputs,
OPTIMSTATE
andGPSTRUCT
. The former contains a large number of fields related to the state of the optimization, whereas the latter contains information about the GP model. As of now, these structs are returned for debugging purposes, and we are not providing explicit support. Future versions of BADS might change the interface or the internal structure of these outputs. -
The fifth (unofficial) output of BADS is a struct
OPTIMSTATE
which contains everything that happens during optimization. In particular,OPTIMSTATE.X
is the list of sampled points, andOPTIMSTATE.Y
their observed function values.
-
A noisy (or stochastic) objective function is an objective that will return different results if evaluated twice at the same point
x
. A non-noisy objective function is deterministic.For model fitting, objective functions can be noisy if the log likelihood is evaluated through simulation (e.g., via Monte Carlo methods).
-
For a deterministic objective, we assume that the goal is to minimize f(x). For a noisy objective, we assume that the goal is to minimize the expected value of f(x), also written as E[f(x)]. For this reason, BADS will not simply blindly trust whatever f(x) returns, but will do some internal computation to estimate E[f(x)] (effectively, smoothing the observed function values via a Gaussian process).
Incidentally, this means that ideally the function that you provide (and that computes the negative log likelihood) should be an unbiased estimator of the negative log likelihood, but this is a story for another time.
-
Well, technically yes, you could make a noisy objective function deterministic by fixing the random seed (e.g., via
rng
) every time you call it. However, this fix does not really solve the problem, because you are not eliminating the noise in the function observations. In fact, if you do it naively, you might be adding unwanted bias to your fits.Thus, it is not recommended to 'fix' the noise this way (by fixing the random seed at each function call). Instead, let your function be stochastic, and let BADS deal with it.
Note that this is different from setting the random seed once at the beginning of an optimization run, for the sake of reproducibility, which is recommended as good practice.
-
Please do so. Set
OPTIONS.UncertaintyHandling = 1
to tell BADS that the optimization is noisy.If you forget about it, BADS will determine at initialization whether the provided objective is noisy, by evaluating it twice at
X0
. Note that this test can occasionally fail (about once every ten billion times). -
No. BADS works best if the standard deviation of the objective function, when evaluated in the vicinity of the solution, is small with respect to changes in the objective function itself (that is, there is a good signal-to-noise ratio). In many cases, a standard deviation of order
1
or less should work (this is the default assumption). If you approximately know the magnitude of the noise in the vicinity of the solution, you can help BADS by specifying it in advance (setOPTIONS.NoiseSize = sigmaest
, wheresigmaest
is your estimate of the standard deviation).If the noise around the solution is too large, BADS will perform poorly. In that case, we recommend to increase the precision of your computation of the objective (e.g., by drawing more Monte Carlo samples) such that
sigmaest
is of order 1 or even lower, as needed by your problem (see also this related question). Note that the noise farther away from the solution can be larger, and this is usually okay. -
Yes and no. The Gaussian process (GP) model built by BADS is homoskedastic, that is, it assumes constant noise across the input space. However, the GP model is built using only a local set of points, so BADS will adapt to local characteristics of the objective function, including amounts of noise that depend on the location.
-
If
OPTIONS.Display
is set toiter
, BADS displays the traces of several optimization quantities:- the
iteration
number; - the number of objective function evaluations
f-count
; - the value of
f(x)
at the incumbent (current point); - the normalized
MeshScale
size (that is, the POLL size parameter normalized to the plausible box); - the current optimization stage and method (POLL or SEARCH, search type, and whether the are successful);
- additional actions (such as re-training the Gaussian process).
If the objective function is noisy, instead of
f(x)
BADS will report the expected valueE[f(x)]
and its standard deviationSD[f(x)]
at the incumbent, both estimated via the current Gaussian process model. - the
-
For a noisy function, I noticed that the series of displayed
E[f(x)]
values is not monotonically decreasing. Should I worry?Well spotted, but nothing to worry about. BADS keeps updating the estimate of
E[f(x)]
at the incumbent, which means that occasionally this value will increase across iterations, and sometimes it will oscillate for a few iterations. Also, if the noise is large, the Gaussian process approximation might occasionally fail, leading to outlier estimates forE[f(x)]
(which should then recover in the subsequent iterations). All of this is part of the normal functioning of BADS. -
For a noisy function, I noticed that the series of displayed
SD[f(x)]
shows sudden jumps (e.g., from ~1 to ~4). Is that normal?First, recall that
SD[f(x)]
is the estimated posterior standard deviation of the target function at the incumbent (current best point). This estimate is obtained via the Gaussian process model built by BADS every few iterations. When the incumbent changes, or when BADS re-trains the Gaussian process model, the uncertainty about the value at the incumbent will also change, sometimes substantially. In most cases, such jumps are part of the normal behavior of the algorithm. -
It means that BADS was unable to refit the Gaussian process model to the current local training set, usually due to numerical issues. Occasional failures are not reason of concern, in particular at the beginning or towards the end of the optimization. However, if a large number of training attempts are systematically failing, it might mean that BADS is having trouble. Sometimes this can be fixed by changing the problem parameterization, or perhaps there are other issues with the model.
-
BADS crashes saying that
The returned function value must be a finite real-valued scalar
. What do I do?This error means that your objective function has returned
Inf
,NaN
, or a complex number. You should check your code and understand why it returned a non-finite or complex value.Inf
s andNaN
s often arise because there are outcomes in your dataset (e.g., responses in a trial) to which the tested model assigns a probability of0
(usually due to numerical truncation).log(0)
yields-Inf
, which is then propagated. In these cases, we recommend to make the model more robust by forcing all outcomes (e.g., the likelihood associated with each trial) to have a minimum non-null probability, such assqrt(eps)
or some other small value. This should not be necessary if the model already includes a non-zero lapse rate.Complex numbers usually arise because you are taking either a
log
or asqrt
of a negative number (of a quantity that should not be negative). You might be setting wrong bounds for your variables, or maybe there are indexing issues.Note that some MATLAB optimizers, such as
fmincon
, are robust toInf
s andNaN
s and just keep going, avoiding the problematic region. However, we believe this is dangerous as it might hide deeper issues with the model implementation. -
During optimization I received a warning that
The mesh attempted to expand above maximum size too many times
. What does it mean?It means that probably your
PLB
orPUB
bounds are too narrow; try widening them. If these are already as wide asLB
andUB
, it might be that your hard bounds are too narrow.If you do not think that this is the case, you can disable this warning by setting
OPTIONS.MeshOverflowsWarning = Inf
. -
I am passing a
NONBCON
function handle to BADS, but I get an error thatNONBCON should be a function handle that takes a matrix X as input and returns a column vector of bound violations
. What am I doing wrong?Most likely, your
NONBCON
accepts only a vector input and returns a scalar, whereas you should be sure that it takes a matrix input, and returns a vector of constraint violations.For example, suppose that your input variables need to be ordered, such that
X(1) <= X(2)
andX(2) <= X(3)
. Then you should setNONBCON = @(X) X(:,1) > X(:,2) | X(:,2) > X(:,3);
whereas
NONBCON = @(X) X(1) > X(2) | X(2) > X(3)
would yield an error. -
I have been running BADS with a deterministic objective function from the same starting point, but I get different results each time. Is something wrong?
Nothing is wrong per se. BADS is a stochastic optimizer, so results may differ between different runs, even with the same initial condition
X0
. If the returnedFVAL
varies substantially across runs from the same starting point, it might be a sign that your function landscape is particularly difficult. IfFVAL
is similar across runs, but the returned optimumX
varies substantially, it is a sign that your function has a plateau or ridge, with trade-offs between parameters.If you want to have reproducible results (and this advice applies beyond BADS), we recommend to fix MATLAB's random seed to some known quantity via
rng
(e.g., setting it to the run number). -
I have been running BADS with a stochastic objective function from different starting points and I get different results each time. What can I do?
First of all, slightly different results are expected if your target function is noisy (be sure to have read and understood all the points under the noisy objective function section of the FAQ). So, if you are asking this question, it is because you find wildly different results.
Generally, substantially different results suggest that BADS is getting stuck due to excess noise in the target function with respect to actual improvements of the function in the neighborhood of the current point. For example, even if the expected value of the target function would have a non-zero gradient, it might be too hard for BADS to find a direction of improvement due to low signal-to-noise ratio. In particular, because of a slight conservative bias of the algorithm under uncertainty (needed to avoid chasing random fluctuations), both the poll and search steps repeatedly fail to find a significant improvement, and thus the algorithm stops moving. For this reason, it is possible for BADS to get stuck at very different points in noisy, nearly-flat regions of the input space, with more scattered results for flatter and wider plateaus.
The general solution of this problem, as also mentioned in this question, is to decrease the amount of noise in the target function (e.g., if you estimate the target function via Monte Carlo sampling, try increasing the number of samples). While we generally found that a SD of the noise of 1 or less in the vicinity of the solution works for most problems, a particularly difficult (e.g., flat) objective function might need even lower amounts of noise for robust convergence, so YMMV.
-
Yes, we should! However, given the typical class of model-fitting problems BADS is designed for (see here), obtaining the full posterior, or even an approximation thereof, can be a challenging task. We recently developed a novel method and related toolbox, Variational Bayesian Monte Carlo, which addresses exactly this problem. Check it out!
-
First, just to clarify, the Hessian is a matrix of second derivatives, which can be used to build a crude approximation of the posterior via Laplace's method. The answer to the question is nope, BADS cannot return the Hessian or an approximate posterior. The reason is that even if BADS tries to build a local Gaussian process approximation of the objective function, this might fail and we cannot trust this approximation at all to represent a valid posterior.
Instead, you should look into Variational Bayesian Monte Carlo, a method that we developed specifically to compute approximate posterior distributions, and that can be used in synergy with BADS.
-
There is a plan to port BADS to Python, although there is no roadmap for that yet. It will depend on several factors, primarily whether I can find someone to help me along the way. See also here for updates.