Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

The case for using on disk compression for your Notes Data directory

By Andrew Pollack on 04/22/2005 at 02:21 PM EDT

I did some very extensive testing & reviews back in the 90's when Stacker and its ilk were on the market. Microsoft bought out much of that technology and its now pat of the operating system. You can select a folder, a file, or a drive and elect to "compress" the data.

For the sake of this discussion, lets assume a compressed file will be 50% smaller than a normal one. That's a lower than normal compression ratio compared to zipping an NSF file, but on disk compression isn't quite as effective because of the way it works.

Taking just disk I/O into account, that means 50% less data traveling to and from the mechanical media itself -- the slowest part of the transfer. In a perfect world, that means you double the speed. In practice, there is overhead however. You have to assign processor time to the compression, and you have the overhead of the programming itself and in memory copying of data that must take place.
12 years or so ago when I did tests, I determined that on a 386 25mhz processor with an ide hard disk operating at 33mhz you crossed the threshold where processor time loss was lower than disk read/write gain. In other words, compression was faster.

Today there are a lot of different options for drives. Today's drives transfers about 7-10 times faster than those of 12 years ago. Processors, however, tend to run 20 or more times faster (its not just about clock speed, but leave it at that). On a multiprocessor or Hyperthreading machine even more. Add to that, that the processor utilization on most servers is the only resource that isn't being tapped out. Most of the time we have 20-75% free processor cycles.

My purely subjective testing thus far definitely supports my hypothesis that increased processor use is well worth it you can cut disk usage in half. I'm seeing end user noticeable performance increases using compressed NSF files on both servers and workstations. I have not done any objective empirical testing recently however.

In the early days of Stacker, my big worry was reliability, but the technology proved out, and has been extremely reliable. Yes, its true that a sector failure will wipe twice as much data this way. Has it been an issue for you? Today's file systems are all virtualized anyway, so its not like you're getting in the way of hardware calls to the drive on interrupt 0x13 like you were back then.


There are  - loading -  comments....

Very interesting...By Ben Rose on 04/23/2005 at 03:15 PM EDT
And ties in nicelessly with my RAM drive testing this week.

I'll shortly be posting an update to my RAM drive blog explaining that in a
production system I just can't produce any measurable benefits, although
they're clearly visible on an old disk subsystem.

I'll enjoy testing the compressed file system this week.

I think some full compact tasks against a large DB with and without compression
will be a good indicator of your theory.
Works for meBy Chris Linfoot on 04/28/2005 at 08:25 AM EDT
On local workstation, Notes data folder is 3/5 of its previous size.
Performance as measured by time to open a .nsf and start working on it has
actually improved.

I may try this on a Domino server here which based on my workstation may yield
between 70 and 100 GB of extra usable space.

Why did I not think of this before?
But...By Chris Linfoot on 04/29/2005 at 07:20 AM EDT
One of my readers points this out:

http://www-1.ibm.com/support/docview.wss?rs=463&context=SSKTMJ&context=SSKTWP&q1
=Compression&uid=swg21103313&loc=en_US&cs=utf-8&lang=en
Check out your server CPU performance....By Andrew Pollack on 04/29/2005 at 07:42 AM EDT
....Take a look at your machine. Watch CPU percentage vs. hard disk wait
time. If you've got a machine that's so busy its actually bogging down CPU
time, then I'd have to agree. However, I suspect the servers used in these
examples are hardly indicative of general use machines who's CPU power far
outstrips I/O.

If you're using an old dual P3 but with super fast scsi raid drives, you may in
fact be worse off. If, however, you're running a hyperthreaded dual p4 or
xeon machine you're sitting on so much processor power that you're rarely
seeing above 20% usage in total.
Pretty much where I was with thisBy Chris Linfoot on 04/29/2005 at 10:55 AM EDT
Yup. I see plenty of idle CPU, so I'm ignoring the naysayers and just doing it.


Other Recent Stories...

  1. 03/26/2019Undestanding how OAUTH scopes will bring the concept of APPS to your Domino serverWhile a full description of OATH is way beyond what I can do in this quick blog entry, I wanted to talk a bit about how "SCOPES" interact with the already rich authorization model used by Domino. Thanks to the fantastic work by John Curtis and his team, the node.js integration with Domino is going to be getting a rich security model. What we know is that a user's authorizations will be respected through the node.js application to the Domino server -- including reader names, ACLs, Roles, and so on. The way ...... 
  2. 02/05/2019Toro Yard Equipment - Not really a premium brand as far as I am concernedDear Toro Customer Service, I arm writing about the following machine: Toro Power Max 1120 OXEModel:38654S/N:31000#### Specifically, bearing part #:63-3450 This is the part ($15 online / $25 at the local dealer) that caused me to raise my objections on-line. This piece of garbage is supposed to be a bearing. It carries the shaft which drives both stages of the auger. The shaft passes through the bearing (which is what bearings do) after the auger drive pulley as the shaft goes through the back (engine ...... 
  3. 10/08/2018Will you be at the NYC Launch Event for HCL Domino v10 -- Find me!Come find me in NYC on Wednesday at the Launch Event if you're there. I really do want to talk to ...... 
  4. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects.  
  5. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino? 
  6. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back. 
  7. 02/15/2018Andrew’s Proposed Gun Laws 
  8. 05/05/2016Is the growing social-sourced economy the modern back door into socialism? 
  9. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertising 
  10. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.