Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

50 Percent Reduction in mail file size with no loss of data or functionality

By Andrew Pollack on 06/24/2008 at 01:48 PM EDT

This morning, I set up DAOS on my Domino 8.5 Beta server. Its a Win32 machine running in my office. My customer facing servers are not upgraded yet. Since my mail file is replicated onto more than one clustered server, I saw no danger in giving DAOS a try today. So far, everyone I know who's been testing it is increasingly comfortable with its ability to handle problems as well or better than data stored in the NSF natively.

DAOS pulls file attachments out of the NSF and stores them in an arcane file tree on disk maintained by the Domino server. It eliminates duplicates of the same file (based on hash values) and It is TOTALLY transparent to users. When you open documents, you see the attachments as normal. If you have local replicas, they're unaffected. You literally cannot tell unless you're in the Admin client and looking for the information. It's what the old "single copy object store" was supposed to do (at least the way we wanted it to), but never worked out at all.

Setting it up means enabling it on the server document, setting a couple of options on the database advanced properties, and running a copy-style compact on the database. I knew I'd see some heavy benefit, but didn't expect to see a 50% reduction in the total disk space used. At the end of the run, I've gone from half a gigabyte to well under 250megs. On top of that, nearly 200 megs of file is now stored in DAOS which means that view updates and anything else requiring a database scan is going to be much faster.

The real big value comes when you look at using DAOS in places with many mail users sharing files. The lack of duplication should be tremendous.


There are  - loading -  comments....

re: 50 Reduction in mail file size with no loss of data or functionalityBy Chuck Hauble on 06/24/2008 at 02:07 PM EDT
Great news.. Have you thought at all about how to architect the backups for
the attachments DAOS? I wonder if the Domino Backup APIs see them or if there
is some other process we will need to use
backups...By Andrew Pollack on 06/24/2008 at 02:13 PM EDT
One of the things that makes DAOS manageable, is that it is more loosely
coupled to the nsf files. As long as your backups include the data tree
containing the attachments, you should be ok.

On top of this, deleting a document doesn't immediately delete the attachment
record (unless you want). The attachments are "pruned" periodically after "n"
days. This should also serve to make restores more trouble free.

File names on the attachments are not 1:1 with the attachment names. The on
disk schema is designed to be very robust and repairable.
re: 50 Percent Reduction in mail file size with no loss of data or functionalityBy Paul Gagnon on 06/24/2008 at 03:57 PM EDT
thats awesome. We have some users approaching 10 gigs in size for their mail
files, many all have the same 15mb powerpoint and all its revisions.

What happens to the attachments when you need to do a hardware upgrade and load
Domino on the new box?
DependsBy Andrew Pollack on 06/24/2008 at 05:10 PM EDT
As long as you do your new box by making a new replica on the new machine, it
would be totally transparent.

If you're doing it by copying data directories, you need to make sure you also
copy the data tree DAOS uses to store its data. Since you reference the
location of the root of that DAOS tree on the server document, you want to make
sure that if its not the same it gets changed before you start the server up.

Its really just a file system with lots of oddly named files in it. As long as
you make sure the server knows where it is, you should be fine.

Once key thing, is that you can store it on a different spindle from your
databases. Maybe put it out on SAN and keep your local nsf's on a local RAID
array.

I think this is going to be a big driver for a lot of companies to move to 8.5.
re: 50 Percent Reduction in mail file size with no loss of data or functionalityBy Dave Harris on 06/25/2008 at 08:00 AM EDT
Andrew, 50% compared to what? If it's against ND7, then, yeah well, nothing to
write home about really (I mean, it is, but you'll see where I'm going with
this).

If it's against 8.0.1 with document/design compression enabled across the
board, then yes it's truly impressive: I already managed to squeeze a 35%
reduction on mail with that enabled, so a further 50% would be truly
remarkable.
re: 50 Percent Reduction in mail file size with no loss of data or functionalityBy Yancy Lent on 06/26/2008 at 03:01 PM EDT
Great post, great feature. I was mainly looking for one piece of detail which
you answered; "based on hash values". There is always a need to know how files
are considered 'the same'.

Here is another great post about this: http://planetlotus.org/27c29b
re: 50 Percent Reduction in mail file size with no loss of data or functionalityBy John possi on 06/27/2008 at 02:08 PM EDT
What about backup? I have a backup utility that is a file-system backup based
backup utility. So when I backup the DAOS directory and then one year after I
need to restore one DB, how do I know which DAOS files belong to each database?

Also it sounds like if there are thousands of mails then there will be millons
of attachments in my file system. Not really good.

Somebody told me that if you open a DAOS cache file, sometimes it can be opened
and you can read the attachment content since it's not encrypted. The said that
this is when you uncheck the compress checkbox in the file-attach dialog.


Other Recent Stories...

  1. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects. In twenty five years, I don't think I've ever written an entry like this, but if you need the kind of work I do now would be a great time to get in touch. Both of the big projects I had lined up for late summer and early fall have been placed on hold and will be that way for a while. With the kids now all off at college and careers, I'm open to more travel than such than I have been in decades, but unless something else comes along, I'll be here working on updates to Second Signal and other things that ...... 
  2. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino?We need to address some biases here. IBM has made a deal under which the Notes & Domino software and intellectual property is now being developed and maintained by HCL America. HCL America is part of the very large "HCL Technologies" company that has grown from its roots in India to become an 8 Billion Dollar company with a global presence in the IT Industry. You could be excused for initially believing, as many people do when they hear this, that "they've outsourced the code to India where they'll milk it ...... 
  3. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back.BOOM. This will be as important for the platform as Traveler. If your company has ditched Notes and Domino, I feel sorry for you. For companies that do use Notes/Domino this is a game changer and Apple should be paying attention. Here's why: There are hundreds of little Notes client applications you'd never spend the time and money to build and deploy for your internal user base on IOS that we use Notes for all the time (those of us still using it). Now, those are suddenly ALL available on the iPad. ...... 
  4. 02/15/2018Andrew’s Proposed Gun Laws 
  5. 05/05/2016Is the growing social-sourced economy the modern back door into socialism? 
  6. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertising 
  7. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction 
  8. 10/20/2015My suggestion is to stay away from PayAnywhere(dot)com  
  9. 08/07/2015Here is one for you VMWARE gurus - particularly if you run ESXi without fancy drive arrays 
  10. 08/06/2015The Killer of Orphans (Orphan Documents) 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.