next up previous index
Next: Multiparameter errors Up: Interpretation of the errors Previous: Function normalization and ERROR   Index

Subsections


Non-linearities: MIGRAD versus HESSE versus MINOS

In the theory of statistics, one can show that in the asymptotic limit, any of several methods of determining parameter errors are equivalent and will give the same result. Let us for the moment call these methods MIGRAD, HESSE, and MINOS (SIMPLEX is a special case). It turns out that the conditlons under which these methods yield exactly the same errors are either of the following:

1.
The model to be fitted (y or f) is exactly a linear function of the fit parameters a, or
2.
The amount of observed data is infinite.
It may happen that (1) is satisfied, in whlch case you don't really need Minuit, a smaller, simpler, and faster program would do, since a linear problem can be solved directly without iterations (see [5], p. 163-165), for example with CERN library program LSQQR. Nevertheless, it may be convenient to use Minuit slnce non-linear terms can then be added later if desired, without major changes to the method. Condition (2) is of course never satisfied, although in practice it often happens that there is enough data to make the problem ``almost linear'', that is there is so much data that the range of parameters allowed by the data becomes very small, and any physical function behaves linearly over a small enough region. The following sections explain the dirrerences between the various parameter errors given by Minuit.

Errors printed by Minuit

The errors printed by Minuit at any given stage represent the best symmetric error estimates available at that stage, which may not be very good. For example, at the first entry to FCN, the user's step slzes are given, and these may bear no resemblance at all to proper parameter errors, although they are supposed to be order-of-magnltude estimates. After crude minimizers like SEEK or SIMPLEX, a revised error estimate may be given, but this too is only meant to be an order-or-magnitude estimate, and must certainly not be taken seriously as a physical result. Such numbers are mainly for the internal use of Minuit, which must after all assume a step size for future minimizations and derivative calculations, and uses these ``errors'' as a first guess to be modified on the basis of experience.

Errors after MIGRAD (or MINIMIZE)

The minimizing technique currently implemented in MIGRAD is a stable variation (the ``switching'' method) of the Davidon-Fletcher-Powell variable-metric algorithm. This algorithm converges to the correct error matrix as it converges to the function minimum.

This algorithm requires at each step a ``working approximation'' of the error matrix, and a rather good approximation to the gradient vector at the current best point. The starting approximation to the error matrix may be obtained in different ways, depending on the status of the error matrix before MIGRAD is called as well as the value of STRATEGY. Usually it is found to be advantageous to evaluate the error matrix rather carefully at the start point in order to avoid premature convergence, but in principle even the unit matrix can be used as a starting approximation. Usually the Minuit default is to start by calculating the full error matrix by calculating all the second derivatives and inverting the matrix. If the user wants to make sure this is done, he can call HESSE before MIGRAD.

If a unit matrix is taken to start, then the first step will be in a steepest descent direction, which is not bad, but the estimate of EDM, needed to judge convergence, will be poor. At each successive step, the information gathered from the change of gradient is used to improve the approximation to the error matrix, without the need to calculate any second derivatives or invert any matrices. The algorithm used for this updating is supposed to be the best known, but if there are a lot of highly correlated parameters, it may take many steps before the off-diagonal elements of the error matrix approach the correct values.

In practice, MIGRAD usually yields good estimates of the error matrix, but it is not absolutely reliable for two reasons:

1.
Convergence to the minimum may occur ``too fast'' for MIGRAD to have a good estimate of the error matrix. In the most flagrant of such cases, MIGRAD realizes this and automatically introduces an additional call to HESSE (described below), informing the user that the covariance matrix is being recalculated. Since, for n variable parameters, there are n(n + 1)/2 elements in the error matrix, the number of FCN calls from MIGRAD must be large compared with n2 in order for the MIGRAD error matrix calculation to be reliable.
2.
MIGRAD gathers information about the error matrix as it proceeds, based on function values calculated away from the minimum and assuming that the error matrix is nearly constant as a function of the parameters, as it would be if the problem were nearly linear. If the problem is highly non-linear, the error matrix will depend strongly on the parameters, MIGRAD will converge more slowly, and the resulting error matrix will at best represent some average over the last part of the trajectory in parameter-space traversed by MIGRAD.
If MIGRAD errors are wrong because of (1), HESSE should be commanded after MIGRAD and will give the correct errors. If MIGRAD errors are wrong because of (2), HESSE will help, but only in an academic sense, since in this case the error matrix is not the whole story and for proper error calculation MINOS must be used. As a general rule, anyone seriously interested in the parameter errors should always put at least a HESSE command after each MIGRAD (or MINIMIZE) command.

Errors after HESSE

HESSE simply calculates the full second-derivative matrix by finite differences and inverts it. It therefore calculates the error matrix at the point where it happens to be when it is called. If the error matrix is not positive-definite, diagnostics are printed, and an attempt is made to form a positive-definite approximation. The error matrix must be positive-definite at the solution (minimum) for any real physical problem. It may well not be positive away from the minimum, but most algorithms including the MIGRAD algorithm require a positive-definite ``working matrix''. The error matrix produced by HESSE is used to calculate what Minuit prints as the parameter errors, which therefore contain the effects due to parameter correlations. The extent of the two-by-two correlations can be seen from the correlation coefficients printed by Minuit, and the global correlations (see [5], p. 23) are also printed. All of these correlation coefficients must be less than one in absolute value. If any of them are very close to one or minus one, this indicates an illposed problem with more free parameters than can be determined by the model and the data.

Errors by MINOS

MINOS is designed to calculate the correct errors in all cases, especially when there are non-linearities as described above. The theory behind the method is described in [5], pp. 204-205 (where ``non-parabolic likelihood'' should of course read ``non-parabolic log-likelihood'', which is equivalent to ``nonparabolic chi-square''). MINOS actually follows the function out from the minimum to find where it crosses the function value (minimum + UP), instead of using the curvature at the minimum and assuming a parabolic shape. This method not only yields errors which may be different from those of HESSE, but in general also different positive and negative errors (asymmetric error interval). Indeed the most frequent result for most physical problems is that the (symmetric) HESSE error lies between the positive and negative errors of MINOS. The difference between these three numbers is one measure of the non-linearity of the problem (or rather of its formulation). In practice, MINOS errors usually turn out to be close to, or somewhat larger than errors derived from the error matrix, although in cases of very bad behaviour (very little data or ill-posed model) anything can happen. In particular, it is often not true in MINOS that two-standard-deviation errors (UP=4) and three-standard-deviation errors (UP=9) are respectively two and three times as big as one-standard-deviation errors, as is true by definition for errors derived from the error matrix (MIGRAD or HESSE).


next up previous index
Next: Multiparameter errors Up: Interpretation of the errors Previous: Function normalization and ERROR   Index
Back to CERN | IT | ASD | CERN Program Library Home
MG (last mod. 1998-08-19)