Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

The case for using on disk compression for your Notes Data directory

By Andrew Pollack on 04/22/2005 at 02:21 PM EDT

I did some very extensive testing & reviews back in the 90's when Stacker and its ilk were on the market. Microsoft bought out much of that technology and its now pat of the operating system. You can select a folder, a file, or a drive and elect to "compress" the data.

For the sake of this discussion, lets assume a compressed file will be 50% smaller than a normal one. That's a lower than normal compression ratio compared to zipping an NSF file, but on disk compression isn't quite as effective because of the way it works.

Taking just disk I/O into account, that means 50% less data traveling to and from the mechanical media itself -- the slowest part of the transfer. In a perfect world, that means you double the speed. In practice, there is overhead however. You have to assign processor time to the compression, and you have the overhead of the programming itself and in memory copying of data that must take place.
12 years or so ago when I did tests, I determined that on a 386 25mhz processor with an ide hard disk operating at 33mhz you crossed the threshold where processor time loss was lower than disk read/write gain. In other words, compression was faster.

Today there are a lot of different options for drives. Today's drives transfers about 7-10 times faster than those of 12 years ago. Processors, however, tend to run 20 or more times faster (its not just about clock speed, but leave it at that). On a multiprocessor or Hyperthreading machine even more. Add to that, that the processor utilization on most servers is the only resource that isn't being tapped out. Most of the time we have 20-75% free processor cycles.

My purely subjective testing thus far definitely supports my hypothesis that increased processor use is well worth it you can cut disk usage in half. I'm seeing end user noticeable performance increases using compressed NSF files on both servers and workstations. I have not done any objective empirical testing recently however.

In the early days of Stacker, my big worry was reliability, but the technology proved out, and has been extremely reliable. Yes, its true that a sector failure will wipe twice as much data this way. Has it been an issue for you? Today's file systems are all virtualized anyway, so its not like you're getting in the way of hardware calls to the drive on interrupt 0x13 like you were back then.


There are  - loading -  comments....

Very interesting...By Ben Rose on 04/23/2005 at 03:15 PM EDT
And ties in nicelessly with my RAM drive testing this week.

I'll shortly be posting an update to my RAM drive blog explaining that in a
production system I just can't produce any measurable benefits, although
they're clearly visible on an old disk subsystem.

I'll enjoy testing the compressed file system this week.

I think some full compact tasks against a large DB with and without compression
will be a good indicator of your theory.
Works for meBy Chris Linfoot on 04/28/2005 at 08:25 AM EDT
On local workstation, Notes data folder is 3/5 of its previous size.
Performance as measured by time to open a .nsf and start working on it has
actually improved.

I may try this on a Domino server here which based on my workstation may yield
between 70 and 100 GB of extra usable space.

Why did I not think of this before?
But...By Chris Linfoot on 04/29/2005 at 07:20 AM EDT
One of my readers points this out:

http://www-1.ibm.com/support/docview.wss?rs=463&context=SSKTMJ&context=SSKTWP&q1
=Compression&uid=swg21103313&loc=en_US&cs=utf-8&lang=en
Check out your server CPU performance....By Andrew Pollack on 04/29/2005 at 07:42 AM EDT
....Take a look at your machine. Watch CPU percentage vs. hard disk wait
time. If you've got a machine that's so busy its actually bogging down CPU
time, then I'd have to agree. However, I suspect the servers used in these
examples are hardly indicative of general use machines who's CPU power far
outstrips I/O.

If you're using an old dual P3 but with super fast scsi raid drives, you may in
fact be worse off. If, however, you're running a hyperthreaded dual p4 or
xeon machine you're sitting on so much processor power that you're rarely
seeing above 20% usage in total.
Pretty much where I was with thisBy Chris Linfoot on 04/29/2005 at 10:55 AM EDT
Yup. I see plenty of idle CPU, so I'm ignoring the naysayers and just doing it.


Other Recent Stories...

  1. 01/26/2023Better Running VirtualBox or VMWARE Virtual Machines on Windows 10+ Forgive me, Reader, for I have sinned. I has been nearly 3 years since my last blog entry. The truth is, I haven't had much to say that was worthy of more than a basic social media post -- until today. For my current work, I was assigned a new laptop. It's a real powerhouse machine with 14 processor cores and 64 gigs of ram. It should be perfect for running my development environment in a virtual machine, but it wasn't. VirtualBox was barely starting, and no matter how many features I turned off, it could ...... 
  2. 04/04/2020How many Ventilators for the price of those tanks the Pentagon didn't even want?This goes WAY beyond Trump or Obama. This is decades of poor planning and poor use of funds. Certainly it should have been addressed in the Trump, Obama, Bush, Clinton, Bush, and Reagan administrations -- all of which were well aware of the implications of a pandemic. I want a military prepared to help us, not just hurt other people. As an American I expect that with the ridiculous funding of our military might, we are prepared for damn near everything. Not just killing people and breaking things, but ...... 
  3. 01/28/2020Copyright Troll WarningThere's a copyright troll firm that has automated reverse-image searches and goes around looking for any posted images that they can make a quick copyright claim on. This is not quite a scam because it's technically legal, but it's run very much like a scam. This company works with a few "clients" that have vast repositories of copyrighted images. The trolls do a reverse web search on those images looking for hits. When they find one on a site that looks like someone they can scare, they work it like ...... 
  4. 03/26/2019Undestanding how OAUTH scopes will bring the concept of APPS to your Domino server 
  5. 02/05/2019Toro Yard Equipment - Not really a premium brand as far as I am concerned 
  6. 10/08/2018Will you be at the NYC Launch Event for HCL Domino v10 -- Find me! 
  7. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects.  
  8. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino? 
  9. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back. 
  10. 02/15/2018Andrew’s Proposed Gun Laws 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.