Index: [thread] [date] [subject] [author]

  From: Philo, John <jphilo@amgen.com>
  To  : 'rasmb' <rasmb@bbri.harvard.edu>
  Date: Tue, 15 Apr 1997 09:55:25 -0700

files & naming clarifications

Sorry, it's me again, but I think the last couple of messages suggest a
need for some clarifications.

First, Jo Butler is correct, and the "consensus" in my last message was
indeed stated incorrectly.  I should have said binary files would be "a"
primary format, not "the" primary format,  i.e. binary would be
optional. (By "primary" I mean a format generated directly during data
acquisition, not necessarily "preferred"). At least someone is still
paying attention!

Second, in discussions of file naming and organization, it is important
to be clear whether you are talking about the existing ASCII format
files, new binary files, or possibly also a revised ASCII format.
Further, if we are going to have both ASCII and binary formats then we
need naming/organization conventions for BOTH (and if for the ASCII we
want maximum compatibility with existing programs then we must not
change the naming convention, even if we don't like it).

Thus, for example, Jo Butler's suggestion of dropping the comment line
from all but the first scan really constitutes a new ASCII format, and
one which would crash most existing programs.  Bo Demeler's proposed
naming convention sounds good for binary files, but would not work for
the existing ASCII file format because there wouldn't be enough
characters left in the filenames to indicate the scan number in the
sequence.

Lastly, I also think there is some confusion about Tom Laue's database
proposal.  Here we must distinguish between what is essentially a
database consisting of file names and locations and experiment
information, and a database that actually contains raw scan data.  Tom's
proposal is for a database that would not contain raw data, but it would
know the type of sample, rotor speed, temperature, date, scan type, etc.
for each experiment and then be able to retrieve the names and directory
locations of the corresponding data files.  The retrieved file names
could then be output to specify the data to be used by various analysis
programs.  Since the files might still reside in multiple directories,
it should also be possible to have the database program copy all the
selected data files to one temporary directory so that they could be
easily found and loaded by any analysis package.

If taken to the extreme, with such a database the file names for the raw
data would no longer need to contain ANY information identifying the
nature of the data in them, e.g. they could be random characters
(although that would certainly be a bad idea!).

One aspect of this that is not yet clear is whether the intention is to
provide a simple means for putting our existing data into such a
database.  In my view that is essential.  

It is also possible to consider actually putting the raw data directly
into the database  (i.e. the database's internal format would be the
"binary format").  With presently available database implementations I
believe that this would defeat most of the advantages of binary files,
and would raise a number of new issues regarding cross-platform
compatibility.  However, this situation could change rapidly, and this
might be the best alternative in the long run.

John Philo

Index: [thread] [date] [subject] [author]