Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

50 Percent Reduction in mail file size with no loss of data or functionality

By Andrew Pollack on 06/24/2008 at 01:48 PM EDT

This morning, I set up DAOS on my Domino 8.5 Beta server. Its a Win32 machine running in my office. My customer facing servers are not upgraded yet. Since my mail file is replicated onto more than one clustered server, I saw no danger in giving DAOS a try today. So far, everyone I know who's been testing it is increasingly comfortable with its ability to handle problems as well or better than data stored in the NSF natively.

DAOS pulls file attachments out of the NSF and stores them in an arcane file tree on disk maintained by the Domino server. It eliminates duplicates of the same file (based on hash values) and It is TOTALLY transparent to users. When you open documents, you see the attachments as normal. If you have local replicas, they're unaffected. You literally cannot tell unless you're in the Admin client and looking for the information. It's what the old "single copy object store" was supposed to do (at least the way we wanted it to), but never worked out at all.

Setting it up means enabling it on the server document, setting a couple of options on the database advanced properties, and running a copy-style compact on the database. I knew I'd see some heavy benefit, but didn't expect to see a 50% reduction in the total disk space used. At the end of the run, I've gone from half a gigabyte to well under 250megs. On top of that, nearly 200 megs of file is now stored in DAOS which means that view updates and anything else requiring a database scan is going to be much faster.

The real big value comes when you look at using DAOS in places with many mail users sharing files. The lack of duplication should be tremendous.


There are  - loading -  comments....

re: 50 Reduction in mail file size with no loss of data or functionalityBy Chuck Hauble on 06/24/2008 at 02:07 PM EDT
Great news.. Have you thought at all about how to architect the backups for
the attachments DAOS? I wonder if the Domino Backup APIs see them or if there
is some other process we will need to use
backups...By Andrew Pollack on 06/24/2008 at 02:13 PM EDT
One of the things that makes DAOS manageable, is that it is more loosely
coupled to the nsf files. As long as your backups include the data tree
containing the attachments, you should be ok.

On top of this, deleting a document doesn't immediately delete the attachment
record (unless you want). The attachments are "pruned" periodically after "n"
days. This should also serve to make restores more trouble free.

File names on the attachments are not 1:1 with the attachment names. The on
disk schema is designed to be very robust and repairable.
re: 50 Percent Reduction in mail file size with no loss of data or functionalityBy Paul Gagnon on 06/24/2008 at 03:57 PM EDT
thats awesome. We have some users approaching 10 gigs in size for their mail
files, many all have the same 15mb powerpoint and all its revisions.

What happens to the attachments when you need to do a hardware upgrade and load
Domino on the new box?
DependsBy Andrew Pollack on 06/24/2008 at 05:10 PM EDT
As long as you do your new box by making a new replica on the new machine, it
would be totally transparent.

If you're doing it by copying data directories, you need to make sure you also
copy the data tree DAOS uses to store its data. Since you reference the
location of the root of that DAOS tree on the server document, you want to make
sure that if its not the same it gets changed before you start the server up.

Its really just a file system with lots of oddly named files in it. As long as
you make sure the server knows where it is, you should be fine.

Once key thing, is that you can store it on a different spindle from your
databases. Maybe put it out on SAN and keep your local nsf's on a local RAID
array.

I think this is going to be a big driver for a lot of companies to move to 8.5.
re: 50 Percent Reduction in mail file size with no loss of data or functionalityBy Dave Harris on 06/25/2008 at 08:00 AM EDT
Andrew, 50% compared to what? If it's against ND7, then, yeah well, nothing to
write home about really (I mean, it is, but you'll see where I'm going with
this).

If it's against 8.0.1 with document/design compression enabled across the
board, then yes it's truly impressive: I already managed to squeeze a 35%
reduction on mail with that enabled, so a further 50% would be truly
remarkable.
re: 50 Percent Reduction in mail file size with no loss of data or functionalityBy Yancy Lent on 06/26/2008 at 03:01 PM EDT
Great post, great feature. I was mainly looking for one piece of detail which
you answered; "based on hash values". There is always a need to know how files
are considered 'the same'.

Here is another great post about this: http://planetlotus.org/27c29b
re: 50 Percent Reduction in mail file size with no loss of data or functionalityBy John possi on 06/27/2008 at 02:08 PM EDT
What about backup? I have a backup utility that is a file-system backup based
backup utility. So when I backup the DAOS directory and then one year after I
need to restore one DB, how do I know which DAOS files belong to each database?

Also it sounds like if there are thousands of mails then there will be millons
of attachments in my file system. Not really good.

Somebody told me that if you open a DAOS cache file, sometimes it can be opened
and you can read the attachment content since it's not encrypted. The said that
this is when you uncheck the compress checkbox in the file-attach dialog.


Other Recent Stories...

  1. 01/26/2023Better Running VirtualBox or VMWARE Virtual Machines on Windows 10+ Forgive me, Reader, for I have sinned. I has been nearly 3 years since my last blog entry. The truth is, I haven't had much to say that was worthy of more than a basic social media post -- until today. For my current work, I was assigned a new laptop. It's a real powerhouse machine with 14 processor cores and 64 gigs of ram. It should be perfect for running my development environment in a virtual machine, but it wasn't. VirtualBox was barely starting, and no matter how many features I turned off, it could ...... 
  2. 04/04/2020How many Ventilators for the price of those tanks the Pentagon didn't even want?This goes WAY beyond Trump or Obama. This is decades of poor planning and poor use of funds. Certainly it should have been addressed in the Trump, Obama, Bush, Clinton, Bush, and Reagan administrations -- all of which were well aware of the implications of a pandemic. I want a military prepared to help us, not just hurt other people. As an American I expect that with the ridiculous funding of our military might, we are prepared for damn near everything. Not just killing people and breaking things, but ...... 
  3. 01/28/2020Copyright Troll WarningThere's a copyright troll firm that has automated reverse-image searches and goes around looking for any posted images that they can make a quick copyright claim on. This is not quite a scam because it's technically legal, but it's run very much like a scam. This company works with a few "clients" that have vast repositories of copyrighted images. The trolls do a reverse web search on those images looking for hits. When they find one on a site that looks like someone they can scare, they work it like ...... 
  4. 03/26/2019Undestanding how OAUTH scopes will bring the concept of APPS to your Domino server 
  5. 02/05/2019Toro Yard Equipment - Not really a premium brand as far as I am concerned 
  6. 10/08/2018Will you be at the NYC Launch Event for HCL Domino v10 -- Find me! 
  7. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects.  
  8. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino? 
  9. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back. 
  10. 02/15/2018Andrew’s Proposed Gun Laws 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.