Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

The Killer of Orphans (Orphan Documents)

By Andrew Pollack on 08/06/2015 at 10:40 AM EDT

Those damn orphans are harder to kill than you think.

Maybe you'll spot the error faster than I did -- or maybe this will help you.

I have a customer with a help desk application created in the mid 1990s. It started causing major issues so I looked into it and found it had grown to over 20gb in size. A check of the database properties showed a whopping 453,355 documents, and of course, many of those have screen shots. When I spoke to the client, she swore she'd deleted everything older than 1/1/2015 and could see only a few thousand documents.

Well, by know you know what happened is that she had been deleting main documents and leaving all the responses as orphans. The application did not have any on-Delete code to clean that sort of thing up. You'd be surprised how many do not.

I decided to write some code that would look at every document in the database, make sure it was not a configuration record, see if it was a response, and if so if the parent document existed. If not, kill the document. I did lots of fancy things with list elements to cache known and unknown unids and things to speed it up, but basically that's what it did. It also was designed to make repeated passes through the database so that it could pick out response-to-response level that became orphaned in the previous pass. Yes, I could have done this form within a view, or by following the chain of parent documents all the way up with each document to avoid repeating the loop, but that has it's own issues as well and I didn't feel like writing a recursive function just for this "simple" task.

The thing is, it didn't work. It kept not finding orphans. It turns out an old and well known problem was manifesting in a whole new way.

To find a parent, I was using code like this (simplified by removing declarations and all the hash based caching to avoid repeated document loads)

if doc.isresponse then
    set parentDoc = nothing ' make sure I don't have an old one still there
    on error resume next ' don't throw errors for bad UNIDs
    set parentDoc = thisdb.getDocumentByUniversalID( doc.parentdocumentunid )
    on error goto errorhandle ' re-establish my normal error handling
    if parentDoc is nothing then
        ' *** Do whatever it is I do to an orphan document ***
    end if
end if

Can you spot why it failed?

Sadly it took me a long time to realize that if the parent formerly existed, there would be a deletion stub. The deletion stub still in the database mean that the "parentDoc" object was still set to a document object, just not a valid one. Testing it to see if it was "Nothing" wouldn't work. After way too many hours, I changed the code to look like this:

if doc.isresponse then
    set parentDoc = nothing ' make sure I don't have an old one still there
    on error resume next ' don't throw errors for bad UNIDs
    set parentDoc = thisdb.getDocumentByUniversalID( doc.parentdocumentunid )
    on error goto errorhandle ' re-establish my normal error handling
    haveParentBoolean = true
    if parentDoc is nothing then
        haveParentBoolean = false
        if not parentDoc.isValid then haveParentBoolean = false
    end if
    if haveParentBoolean = false then
        ' *** Do whatever it is I do to an orphan document ***
    end if
end if

There are two different ways to fail looking up the parent. Either getting nothing at all, or getting an invalid (deleted) document handle. This is very much the same reason why you always have to check for .isvalid when looping through a collection. A deleted document handle is not "nothing", it's just not useable.

The result -- The database size on disk is down from 20gb to 261mb, and from 453,355 documents to 8,110.

There are  - loading -  comments....

re: The Killer of Orphans (Orphan Documents)By Timothy Briley on 08/07/2015 at 09:47 PM EDT
So if I understand the issue correctly, the morale of the story is to add a "If
doc.IsValid Then" before processing a doc retrieved via unid or, I'm guessing,
by extension by noteid.

But the other part of this I didn't realize was that not getting a hit using
unid doesn't simply result in doc = Nothing, that instead it throws an error.

But according to Notes documentation, not getting a hit using notesid doesn't
throw an error.

I'm not sure I really get the point of Notes doing that, but at least now I
know. Thanks.
re: The Killer of Orphans (Orphan Documents)By Andrew Pollack on 08/08/2015 at 08:01 AM EDT
Tim, also critical in a collection -- whether from a search, a view, or a
database.allDocuments. You always want to check for .isvalid. I just hadn't
thought of it in terms of NOT finding a document.
re: The Killer of Orphans (Orphan Documents)By Timothy Briley on 08/08/2015 at 06:00 PM EDT
It makes sense. Since defensive coding is always a good idea, "If doc.IsValid
Then" probably should be a standard part of code, just like routing errors to
OpenLog, etc.

In 11 days I'll be in ATL for MWLUG. If you are there and see me, look me up.
I'll buy you a beer!
re: The Killer of Orphans (Orphan Documents)By Lars Berntrop-Bos on 08/22/2015 at 08:04 PM EDT
I use this function:
Function isValidDoc
Description: Returns if the supplied NotesDocument is a valid useable
Function isValidDoc(doc As NotesDocument) As Boolean
isValidDoc = False
If doc Is Nothing Then Exit Function
If doc.Size = 0 Or doc.IsDeleted Or Not doc.IsValid Then Exit Function
If doc.HasItem("$Conflict") Then Exit Function
isValidDoc = True
End Function ' isValidDoc

Sometimes, just checking isValid is not enough. I've seen 'ghost' documents pop
into existence with size zero (hypothesis: to enable viewing threaded
discussions, where the original Main document has been deleted). Also, you may
want to evaluate if you want to treat a save-conflict as a valid parent or not.
I generally prefer to have responses to a normal Main document, and not to a
save conflict.

Other Recent Stories...

  1. 03/26/2019Undestanding how OAUTH scopes will bring the concept of APPS to your Domino serverWhile a full description of OATH is way beyond what I can do in this quick blog entry, I wanted to talk a bit about how "SCOPES" interact with the already rich authorization model used by Domino. Thanks to the fantastic work by John Curtis and his team, the node.js integration with Domino is going to be getting a rich security model. What we know is that a user's authorizations will be respected through the node.js application to the Domino server -- including reader names, ACLs, Roles, and so on. The way ...... 
  2. 02/05/2019Toro Yard Equipment - Not really a premium brand as far as I am concernedDear Toro Customer Service, I arm writing about the following machine: Toro Power Max 1120 OXEModel:38654S/N:31000#### Specifically, bearing part #:63-3450 This is the part ($15 online / $25 at the local dealer) that caused me to raise my objections on-line. This piece of garbage is supposed to be a bearing. It carries the shaft which drives both stages of the auger. The shaft passes through the bearing (which is what bearings do) after the auger drive pulley as the shaft goes through the back (engine ...... 
  3. 10/08/2018Will you be at the NYC Launch Event for HCL Domino v10 -- Find me!Come find me in NYC on Wednesday at the Launch Event if you're there. I really do want to talk to ...... 
  4. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects.  
  5. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino? 
  6. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back. 
  7. 02/15/2018Andrew’s Proposed Gun Laws 
  8. 05/05/2016Is the growing social-sourced economy the modern back door into socialism? 
  9. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertising 
  10. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction 
Click here for more articles.....

pen icon Comment Entry
Your Name
*Your Email
* Your email address is required, but not displayed.
Your thoughts....
Remember Me  

Please wait while your document is saved.