Index: [thread] [date] [subject] [author]
  From: JOHN PHILO <JOHN.PHILO@amgen.com>
  To  : rasmb <rasmb@bbri.eri.harvard.edu>
  Date: 19 May 1995 12:50:06 -0800

Re: data depot; vision

Jeff Hansen has raised some interesting points.  

First, let me apologize if I seemed too negative about the idea of a 
raw data depository---I certainly do not think it is an entirely bad or 
worthless idea.

The potential use of such data for teaching purposes is a good 
point that I confess I had not thought about (lack of vision, I suppose).

I also partially agree that access to appropriate real experimental 
data may help foster the development of new analytical methods.  
However, my own experience is that there is much truth in the adage 
that 'necessity is the mother of invention'.  It seems to me people 
are most likely to develop new methods when they encounter 
the need for them in the course of their own work; i.e. after they 
already have suitable data.  If I understand correctly, Jeff 
envisions future methods developers browsing through gigabytes 
of data for complicated systems and deciding to develop new 
methodologies to tackle them.  That is not how I work, but perhaps 
this does represent the future of science.

The main point I would like to make is that I think we need to 
define more clearly what sort of data we are talking about, and 
what other information in addition to raw data scans from the XL-A 
would be needed in order to make such data really useful to 
others.

Is a simple, unedited, repository of XL-A scans really going 
to be useful?  Or are we talking about data selected for certain 
purposes?  

For teaching purposes in particular, I would only want the raw 
data for samples which are well characterized as to the 
materials, purity, etc., and which cover a range of types of 
problems (single species, monomer-dimer, protein-DNA, ...)  
I think similar considerations apply to data suitable for testing or 
developing new analytical methods.    

Further, I think that along with the scans, one would need 
a lot more information about the sample and the 
experiment.  Ideally, for a protein you would   
want the amino acid composition, vbar, and extinction 
coefficient.  Plus maybe gels and HPLC data to assess 
purity.  You would also want the buffer 
composition, density, viscosity, etc.  If its an 
equilibrium experiment, you really can't evaluate 
how good the data are unless you know the whole 
experiment sequence: how long at each speed, 
were fresh samples run at each speed or was 
the same sample run sequentially at multiple 
speeds, etc.  

As you see, this easily gets into a whole lot of 
supplementary information that is needed, but 
for which there is no standard format.  

So I think we need to hear more discussion about 
exactly what type of depository we are talking 
about and what would be in it.  I think Jeff has raised 
some good reasons for having at least some teaching/
testing data available.  But if it is going to be a 
selected/edited depository then who will do the 
selection?  And how will the necessary supplementary 
information be gathered, stored, and retrieved?

One (at least short term) approach to providing some 
raw data for teaching purposes along with needed 
supplementary infomation would simply be to deposit the 
raw data corresponding to published manuscripts.  This 
would allow the manuscript to provide some of the many 
important details about the experiments and other sample 
characterization, as well as appropriate references about 
the system.   Such data might also serve for some test 
and methods development purposes.  However, a 
limitation to published work would obviously preclude 
data for the really tough & messy systems that never gets 
published because we can't analyze it, and thus such a 
limited depository would probably not meet Jeff's 
visionary goals. 

Anyway, that is one view.  Let's hear from more of you.

John Philo, Amgen

Index: [thread] [date] [subject] [author]