Index: [thread] [date] [subject] [author]

  From: Hensley, C Preston <c_preston_hensley@groton.pfizer.com>
  To  : ">"'Peter Schuck'" , rasmb@alpha.bbri.org
  Date: Sat, 29 Apr 2000 22:31:34 -0400

RE: linear approximation for error estimates

Peter,

(Department of Beating a Dead Horse Department ...)

We have had a little experience in the area you are discussing.  In the same
book you quote (Schuster/Laue - we also have a Methods chapter ... maybe
Volume 240), we have a chapter on determining confidence intervals,
empirically.  We compare the results to linear assumption standard errors
and (at least in these cases) do not find a huge difference between to two
sets of numbers.

As you point out, this was much to our surprise.  I too have been cautioned
by MJ to abhor linear assumption statistics and expected to find hugely
asymmetric error spaces and confidence intervals.  We did not.  I hasten to
add, ... for these data sets.  We have looked at two systems with this rigor
and get the same result (a monomer-dimer, and a nested monomer,
monomer-dimer and a monomer-dimer-tetramer set of systems).  In these cases,
out to 99% confidence, we found as little as a two-fold difference between
empirical confidence intervals and standard errors.

In these cases, the data was pretty good, with Gossip error.  I can
certainly imagine cases (tougher models, less ideal data sets) where one
might not find the above to be the case.  I guess, we were surprised to see
that it ever held.

I guess the lesson is, as you point out (I just re-read your comments), that
if you're concerned, you should ALWAYS determine confidence intervals
empirically.  However, in daily life just sloughing though data, standard
errors will get you started.

As someone once said, if you have to use statistics to prove your model,
you've done the wrong experiment (this, of course, assumes that your have
the luxury of being able to do the right experiment).

Preston

Preston Hensley
Manager, Protein and Peptide Chemistry
Bldg 118, Rm 311G, Bin3
Pfizer Pharmaceuticals
Eastern Point Road
Groton, CT 06340

email:c_preston_hensley@groton.pfizer.com
ph 860-715-2190
fx 860-441-4734



-----Original Message-----
From: Peter Schuck [pschuck@helix.nih.gov]">mailto:pschuck@helix.nih.gov]
Sent: Thursday, April 27, 2000 5:01 PM
To: rasmb@alpha.bbri.org
Subject: linear approximation for error estimates


With regard to Yujita's comment about the validity of the "linear
approximation", I'm very much surprised that this should be the case.
Maybe there is a misunderstanding.  I can see that with large data sets and
with Gaussian noise the statistical assumptions underlying least-squares
optimization is valid, but not that the actual confidence limits are those
from "linear" least-squares.  

The reason is that by simply mapping the error surface for an equilibrium
model, i.e. if you keep one parameter fixed at a non-optimal value,
optimize the others and observe the chi-square of the constrained fit, I
find that a symmetrical parabolical minimum is really the exception (the
linear least squares implies a symmetric parabolical minimum).  I think
Michael Johnson has worked a lot on this, and in his book chapter "Comments
on the Analysis of Sedimentation Equilibrium Experiments", he explicitely
says that "[the asymptotic standard errors from the covariance matrix]
almost always significantly underestimate the true confidence intervals of
the determined parameters" (Todd Schuster and Tom Laue's book, Birkaeuser,
1994, p. 51)

My experience has been completely consistent with this.  If the parameters
you're looking for are binding constants, then these traces are frequently
fairly asymmetric, and sometimes I even only get a one-sided error limit.
I understand theoretically that a large number of data points does help,
but I usually work with about 10 long-column equilibrium scans, and there
the asymmetry can definitely be very pronounced, depending on the model.
In my hands, the comparison with the correlation coefficients shows that
they frequently seriously underestimate the errors, and I would caution
about using them for any quantitative interpretation of the errors of the
best-fit parameters.  They may be OK in some cases with good data and
well-behaved models, and certainly can give you a feel for what to expect,
but I agree with Olin about the necessity of the rigorous error analysis in
the end. 

In case that the unknown parameters are the species concentrations only,
which is the simplest case, the model actually looks like a linear
least-squares model, because the parameters are linear.  However, even
there, the non-negativity of the concentrations can cause the error surface
to be asymmetric.  It has been shown (using algebraic methods for linear
least squares with inequality constraints)  that even in this simple case,
the error surface is described by step-wise parabolic functions.  As a
consequence, even here you have to map the error surface, although in this
case the confidence intervals can actually get smaller than those predicted
from the linear approximation (this is described in Progr. Coll. Polym. Sci
(1994) 94:1-13).





***********************************************************
Peter Schuck, PhD
Molecular Interactions Resource
Division of Bioengineering and Physical Science, ORS
National Institutes of Health
Bldg. 13 Rm. 3N17
13 South Drive 
Bethesda, MD 20892 - 5766
Tel: (301) 435-1950
Fax: (301) 496-6608
email: Peter_Schuck@nih.gov
***********************************************************

Index: [thread] [date] [subject] [author]