Index: [thread] [date] [subject] [author]
  From: Philo, John <jphilo@amgen.com>
  To  : 'rasmb' <rasmb@bbri.harvard.edu>
  Date: Tue, 8 Apr 1997 08:27:55 -0700

RE: binary file formats

RASMB,

I imagine most of you are puzzling over Tom Laue's last message about
data file formats and naming conventions.  Tom's message was in response
to a message I sent yesterday (copy below) to a sub-set of RASMB members
who write data analysis software and therefore need to be able to read
the scan files from the XL-A/I.  The purpose of that message was to
initiate a discussion about future changes in file formats etc.,
including possibly changing to a binary format instead of the present
ASCII one.

It is my belief that such a discussion will only be of interest to a
small number of people, and therefore should not include all RASMB
members.  Therefore my intent is to establish a smaller e-mail group for
this purpose.  Consequently I initially contacted a group of 16 people I
know have been involved in software development, and asked them to tell
me if they wish to participate, and to suggest other people that I might
have missed. 

Unfortunately it was apparently not clear to everyone I contacted that
my message did not go to the entire RASMB list.  Given that Tom's
response has gone out to everyone, I thought it best to post this
message to be sure everyone knows what he is talking about and what is
going on.  I apologize for the confusion, and encourage any of you who
wish to participate in this discussion to contact me so that you will be
on the distribution list.

John Philo
----------------------------------------------
(copy of message sent 4/7/97 to Behlke, Joachim; Coelfen, Helmut;
Demeler, Borries; Furst, Allen; Hensley, Preston; Holladay, Les;
Johnson, Michael; Lary, Jeff; Laue, Tom; Lewis, Marc; McRorie, Don;
Minton, Allen; Ralston, Greg; Stafford, Walt; Wu, Jia-Wen; Yphantis,
David)

Sedimentation software developers,

Walt Stafford's RASMB message earlier today about his new DCDT version
has prompted me to write to all of you to initiate a discussion and
information exchange about binary file formats for raw sedimentation
scans.

Walt has now defined a binary format on the Macintosh that can be used
by his DCDT software to speed up data loading.  In a somewhat similar
vein, Jeff Lary and Dave Yphantis have apparently defined two binary
formats for storing raw interference images for use with their MATCH
utility.

Certainly there are many advantages to binary data files, and many of us
may have contemplated doing something similar in our own software.
However, if each of us goes our own way, this could potentially lead to
a proliferation of different file formats and file naming conventions
that could cause trouble and confusion for those who use our software.
At the very least, we will certainly lose much of the advantage of
reduced disk storage if we need to maintain the same data in several
different binary formats.

Therefore the question on my mind is: Could we collectively define a
binary format for scan data that would be usable across all (or at least
many) platforms (PC, Mac, DEC, Unix?) and programs? If so, then each of
us could potentially support that format in future versions of our
analysis programs.  

If we could define such a thing and an appropriate naming convention, I
would be willing to undertake writing a Windows utility that could run
in the background on the data acquisition host and automatically convert
all newly acquired scans into that format.  It could also potentially
convert the other way for those daring enough to delete the original
ASCII files.

I also recall hearing that Beckman is planning to adopt a binary format
in future releases of the data acquisition software (which I suspect is
whatever binary format Origin uses).  I am therefore hoping that Don
McRorie and/or Allen Furst will reply to all of us to tell us their
plans in this regard and the anticipated time scale, since this might
potentially influence what, if anything, we choose to do at this time.  

Furthermore, I am hoping that we can establish a dialog with Beckman
about their future data formats so those of us who develop analysis
software can be better prepared for, and informed about, the details
than we were about the new formats etc. for the XL-I data.

With regard to the possibility of collectively defining a format that
would work across platforms, one immediate question is whether binary
floating point data can possibly be made cross-platform.  On Wintel
systems the 32 bit and 64 bit single and double precision floating point
formats stored in binary files are generally the ones used directly by
the floating point coprocessors, which I believe are the standard
formats defined by the IEEE.  I confess I am clueless about the floating
point formats used on Macs or DEC systems.  Another potential headache
is that even if the formats are nominally identical, the bytes may not
be stored in the same order (i.e. some hardware expects most significant
bit first, some least significant bit first).  So if any of you know
more specifics about that issue, please respond.

So, how should we proceed from here?

First, I would ask each of you to reply to me to indicate whether you
wish to participate in the discussion.  Also, please let me know of
anyone else who should be contacted.  

In addition, I would like to hear your thoughts and comments about this
issue and also to hear about any experience you have with these
cross-platform issues. Since at this point I don't know which of you
wish to participate in such a discussion, for now I suggest that each of
you reply to me and I will then summarize or forward the messages to
those who wish to remain on the list.  
 
If it looks like this will indeed become an ongoing discussion among a
group of reasonable size then perhaps we could impose on Walt to set up
another discussion list like the large RASMB list for this specific
purpose.

'Best,

John Philo

Index: [thread] [date] [subject] [author]