Index:
[thread]
[date]
[subject]
[author]
From: JOHN PHILO <JOHN.PHILO@amgen.com>
To : rasmb <rasmb@bbri.eri.harvard.edu>
Date: 19 May 1995 12:50:06 -0800
Re: data depot; vision
Jeff Hansen has raised some interesting points.
First, let me apologize if I seemed too negative about the idea of a
raw data depository---I certainly do not think it is an entirely bad or
worthless idea.
The potential use of such data for teaching purposes is a good
point that I confess I had not thought about (lack of vision, I suppose).
I also partially agree that access to appropriate real experimental
data may help foster the development of new analytical methods.
However, my own experience is that there is much truth in the adage
that 'necessity is the mother of invention'. It seems to me people
are most likely to develop new methods when they encounter
the need for them in the course of their own work; i.e. after they
already have suitable data. If I understand correctly, Jeff
envisions future methods developers browsing through gigabytes
of data for complicated systems and deciding to develop new
methodologies to tackle them. That is not how I work, but perhaps
this does represent the future of science.
The main point I would like to make is that I think we need to
define more clearly what sort of data we are talking about, and
what other information in addition to raw data scans from the XL-A
would be needed in order to make such data really useful to
others.
Is a simple, unedited, repository of XL-A scans really going
to be useful? Or are we talking about data selected for certain
purposes?
For teaching purposes in particular, I would only want the raw
data for samples which are well characterized as to the
materials, purity, etc., and which cover a range of types of
problems (single species, monomer-dimer, protein-DNA, ...)
I think similar considerations apply to data suitable for testing or
developing new analytical methods.
Further, I think that along with the scans, one would need
a lot more information about the sample and the
experiment. Ideally, for a protein you would
want the amino acid composition, vbar, and extinction
coefficient. Plus maybe gels and HPLC data to assess
purity. You would also want the buffer
composition, density, viscosity, etc. If its an
equilibrium experiment, you really can't evaluate
how good the data are unless you know the whole
experiment sequence: how long at each speed,
were fresh samples run at each speed or was
the same sample run sequentially at multiple
speeds, etc.
As you see, this easily gets into a whole lot of
supplementary information that is needed, but
for which there is no standard format.
So I think we need to hear more discussion about
exactly what type of depository we are talking
about and what would be in it. I think Jeff has raised
some good reasons for having at least some teaching/
testing data available. But if it is going to be a
selected/edited depository then who will do the
selection? And how will the necessary supplementary
information be gathered, stored, and retrieved?
One (at least short term) approach to providing some
raw data for teaching purposes along with needed
supplementary infomation would simply be to deposit the
raw data corresponding to published manuscripts. This
would allow the manuscript to provide some of the many
important details about the experiments and other sample
characterization, as well as appropriate references about
the system. Such data might also serve for some test
and methods development purposes. However, a
limitation to published work would obviously preclude
data for the really tough & messy systems that never gets
published because we can't analyze it, and thus such a
limited depository would probably not meet Jeff's
visionary goals.
Anyway, that is one view. Let's hear from more of you.
John Philo, Amgen
Index:
[thread]
[date]
[subject]
[author]