Index:
[thread]
[date]
[subject]
[author]
From: Philo, John <jphilo@amgen.com>
To : 'rasmb' <rasmb@bbri.harvard.edu>
Date: Wed, 9 Apr 1997 15:24:35 -0700
binary files: the smaller size GOTCHA!
(I promise this is my last message --- for TODAY!)
"Lies, damn lies, and statistics" - Mark Twain
Besides data loading speed, the primary advantage of binary files is
their smaller size, and indeed a number of you have expressed concerns
about large amounts of interference data chewing up hard disk space.
However, it is important to point out that the advantage of binary files
is not necessarily NEARLY as great as the numbers seem to imply.
For example, the interference ASCII files are about 43 Kbytes each.
Walt Stafford's binary format reduces this information to 4257 bytes.
Therefore switching to this format should allow you to store 10 times
more data on your PC hard drive, right?
WRONG! Disk space on PC's is assigned in "allocation units", and even a
file only 1 byte long occupies at least one full allocation unit. On
drives over 1 MB capacity, such as the 2 MB drive supplied by Beckman
with the XL-I, the allocation unit is 32768 bytes. (The unit may be 1/2
or 1/4 this size on smaller hard drives.)
Thus changing from ASCII to Walt's (or any other) binary format can only
increase the storage capacity by the ratio of the allocation units
needed. On the XL-I computers this is 2 allocation units versus 1, so
the true gain is at most a factor of 2! For absorbance files, which are
at most ~26 KB (1 unit) each, using a binary format gives no real
savings whatsoever!
GOTCHA!!!
Obviously this is a very PC hardware-specific thing, and eventually
Microsoft will fix this large allocation unit mess (Windows "97"), but
that IS the reality today. I don't know the details for other platforms
and to what extent they share this problem, but the main point is that
apparent advantages can be illusory.
While on this subject it is also worth pointing out that when you are
not actively working with your data files you can save a tremendous
amount of disk space by compressing all the data sets for an experiment
into a single archive using ZIP or other compression formats. (If you
followed the above you will realize there is almost no advantage to
compressing individual data files). This also has the nice advantage of
keeping everything together in one place.
I use WINZIP which compresses interference files by ~76% and absorbance
files by ~87%. Thus if I compress 40 interference scans into one
archive the archive will be ~40 * 0.24 * 43 KB = 413 KB and will occupy
13 allocation units instead of the original 80 units, a 6-fold gain.
Similarly for 40 absorbance scans one ends up with one file in 5 units
instead of 40 files of 1 unit each, an 8-fold gain.
John Philo
Index:
[thread]
[date]
[subject]
[author]