Index: [thread] [date] [subject] [author]
  From: Tom Laue <tom.laue@unh.edu>
  To  : rasmb@bbri.harvard.edu
  Date: Thu, 10 Apr 1997 07:57:29 -0400

My plugged nickel

Dear Rasmb,

I would like to jump into the fray about binary versus ASCII files as an
advocate for the *option* of using a binary output file format from within
the program. There are actually a few issues that I raised in the earlier
message that need clarification.

Those who advocate for ASCII are right about the universality of ASCII
files, and I agree that they should be retained. For those of us working
out at the edge (which will become the norm in just a few years), it has
become obvious that some form of data compression will be needed. We can
easily generate 100 MBytes of data in a single day, and even with very
large disks, the problem of retaining and cataloging all of the files is
overwhelming. With the addition of new detectors, this problem will be
exacerbated. Thus, two issues emerge- compacting the data and cataloging
the data. They can be treated separately.  

Binary data will result in significant compaction of the data- the problem
of the disk segment size can be overcome. I'll raise another firestorm by
suggesting that the binary files contain more than one scan each, and that
each file will be a moving picture and could be treated as such. David
Yphantis and Jeff Lary have shown how that can work. The real beauty of
their approach is that the raw data are always available for validation and
later analysis. Pardon me while I get my asbestos suit out. The idea is
that since a new output file format is being discussed, it is worthwhile to
unshackle ourselves from the idea of one scan = one file, and that
subdirectories make a good cataloging system.

I believe that the idea of a database remains viable regardless of the
output file format. Conversion between formats within the database could be
available. Perhaps a viable alternative would be to add a binary output
file option to the operating software. For starts, the ASCII format would
be the default- those of us who needed/wanted the binary format could
select it. If there is no outcry, then ASCII would remain the default.
Regardless, the ASCII output would remain available. 

In making the data files more compact, I raised the issue of eliminating
the radial positions and replacing them with the parameters needed to
regenerate them. Only the absorbance system uses a sensor to determine the
radial position. Beckman is honest in putting the converted output from
that sensor into the output files as the radial reading. The interference
and fluorescence systems calculate the radial position- for a binary file
it is far more compact to save the parameters needed to reconstruct the
radii than it is to save each number. There would be no loss in precision.
Though I could make a strong case that the same is true for the absorbance
system, I can fully appreciate the desire to keep the real data. A
different binary format might be used for the absorbance system. Since the
binary format is there for 'internal' use, it matters little whether all
file types are in the same format. Again, the ASCII output file format
would be the default. Those of us who wanted to take advantage of binary
format would have it available.

I also raised the possibility of storing the intensities rather than the
absorbances for the absorbance system, and the utility of this idea was
questioned by some. The issue of intensities versus absorbance is probably
the result of my experience answering questions concerning the quality of
absorbance data. The vast majority of the questions can be cleared up by
looking at the intensities (a publication on this is available over the net
at http://www.beckman.com/biorsrch/prodinfo/xla/a_1821a.htm). Likewise,
obtaining high quality absorbance data is made a lot easier if the
intensities are examined before setting off to acquire a set of scans. This
isn't made easy with the present operating system, and the conversion from
intensity to absorbance is clunky (largely due to the ASCII files, by the
way). Those who questioned my desire to store the intensities are entirely
right that the vast majority of users want to get good absorbance scans and
could care less about the intensities. What I want to make sure is that
they get good quality scans. I imagine that storage of the absorbance scan
will remain the default, and that it will fall on us working on the
software to devise a means for making it easy for users of the software.

We are at the start of this venture, hopefully group wisdom will show itself!

Best wishes,
Tom Laue


Tom Laue
Biochemistry and Molecular Biology
University of New Hampshire
Durham, NH 03824
Ph:  603-862-2459
FAX: 603-862-4013

Index: [thread] [date] [subject] [author]