Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

The Killer of Orphans (Orphan Documents)

By Andrew Pollack on 08/06/2015 at 10:40 AM EDT

Those damn orphans are harder to kill than you think.

Maybe you'll spot the error faster than I did -- or maybe this will help you.

I have a customer with a help desk application created in the mid 1990s. It started causing major issues so I looked into it and found it had grown to over 20gb in size. A check of the database properties showed a whopping 453,355 documents, and of course, many of those have screen shots. When I spoke to the client, she swore she'd deleted everything older than 1/1/2015 and could see only a few thousand documents.

Well, by know you know what happened is that she had been deleting main documents and leaving all the responses as orphans. The application did not have any on-Delete code to clean that sort of thing up. You'd be surprised how many do not.

I decided to write some code that would look at every document in the database, make sure it was not a configuration record, see if it was a response, and if so if the parent document existed. If not, kill the document. I did lots of fancy things with list elements to cache known and unknown unids and things to speed it up, but basically that's what it did. It also was designed to make repeated passes through the database so that it could pick out response-to-response level that became orphaned in the previous pass. Yes, I could have done this form within a view, or by following the chain of parent documents all the way up with each document to avoid repeating the loop, but that has it's own issues as well and I didn't feel like writing a recursive function just for this "simple" task.

The thing is, it didn't work. It kept not finding orphans. It turns out an old and well known problem was manifesting in a whole new way.

To find a parent, I was using code like this (simplified by removing declarations and all the hash based caching to avoid repeated document loads)

if doc.isresponse then
    set parentDoc = nothing ' make sure I don't have an old one still there
    on error resume next ' don't throw errors for bad UNIDs
    set parentDoc = thisdb.getDocumentByUniversalID( doc.parentdocumentunid )
    on error goto errorhandle ' re-establish my normal error handling
    if parentDoc is nothing then
        ' *** Do whatever it is I do to an orphan document ***
    end if
end if

Can you spot why it failed?

Sadly it took me a long time to realize that if the parent formerly existed, there would be a deletion stub. The deletion stub still in the database mean that the "parentDoc" object was still set to a document object, just not a valid one. Testing it to see if it was "Nothing" wouldn't work. After way too many hours, I changed the code to look like this:

if doc.isresponse then
    set parentDoc = nothing ' make sure I don't have an old one still there
    on error resume next ' don't throw errors for bad UNIDs
    set parentDoc = thisdb.getDocumentByUniversalID( doc.parentdocumentunid )
    on error goto errorhandle ' re-establish my normal error handling
    haveParentBoolean = true
    if parentDoc is nothing then
        haveParentBoolean = false
        if not parentDoc.isValid then haveParentBoolean = false
    end if
    if haveParentBoolean = false then
        ' *** Do whatever it is I do to an orphan document ***
    end if
end if

There are two different ways to fail looking up the parent. Either getting nothing at all, or getting an invalid (deleted) document handle. This is very much the same reason why you always have to check for .isvalid when looping through a collection. A deleted document handle is not "nothing", it's just not useable.

The result -- The database size on disk is down from 20gb to 261mb, and from 453,355 documents to 8,110.

There are  - loading -  comments....

re: The Killer of Orphans (Orphan Documents)By Timothy Briley on 08/07/2015 at 09:47 PM EDT
So if I understand the issue correctly, the morale of the story is to add a "If
doc.IsValid Then" before processing a doc retrieved via unid or, I'm guessing,
by extension by noteid.

But the other part of this I didn't realize was that not getting a hit using
unid doesn't simply result in doc = Nothing, that instead it throws an error.

But according to Notes documentation, not getting a hit using notesid doesn't
throw an error.

I'm not sure I really get the point of Notes doing that, but at least now I
know. Thanks.
re: The Killer of Orphans (Orphan Documents)By Andrew Pollack on 08/08/2015 at 08:01 AM EDT
Tim, also critical in a collection -- whether from a search, a view, or a
database.allDocuments. You always want to check for .isvalid. I just hadn't
thought of it in terms of NOT finding a document.
re: The Killer of Orphans (Orphan Documents)By Timothy Briley on 08/08/2015 at 06:00 PM EDT
It makes sense. Since defensive coding is always a good idea, "If doc.IsValid
Then" probably should be a standard part of code, just like routing errors to
OpenLog, etc.

In 11 days I'll be in ATL for MWLUG. If you are there and see me, look me up.
I'll buy you a beer!
re: The Killer of Orphans (Orphan Documents)By Lars Berntrop-Bos on 08/22/2015 at 08:04 PM EDT
I use this function:
Function isValidDoc
Description: Returns if the supplied NotesDocument is a valid useable
Function isValidDoc(doc As NotesDocument) As Boolean
isValidDoc = False
If doc Is Nothing Then Exit Function
If doc.Size = 0 Or doc.IsDeleted Or Not doc.IsValid Then Exit Function
If doc.HasItem("$Conflict") Then Exit Function
isValidDoc = True
End Function ' isValidDoc

Sometimes, just checking isValid is not enough. I've seen 'ghost' documents pop
into existence with size zero (hypothesis: to enable viewing threaded
discussions, where the original Main document has been deleted). Also, you may
want to evaluate if you want to treat a save-conflict as a valid parent or not.
I generally prefer to have responses to a normal Main document, and not to a
save conflict.

Other Recent Stories...

  1. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino?We need to address some biases here. IBM has made a deal under which the Notes & Domino software and intellectual property is now being developed and maintained by HCL America. HCL America is part of the very large "HCL Technologies" company that has grown from its roots in India to become an 8 Billion Dollar company with a global presence in the IT Industry. You could be excused for initially believing, as many people do when they hear this, that "they've outsourced the code to India where they'll milk it ...... 
  2. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back.BOOM. This will be as important for the platform as Traveler. If your company has ditched Notes and Domino, I feel sorry for you. For companies that do use Notes/Domino this is a game changer and Apple should be paying attention. Here's why: There are hundreds of little Notes client applications you'd never spend the time and money to build and deploy for your internal user base on IOS that we use Notes for all the time (those of us still using it). Now, those are suddenly ALL available on the iPad. ...... 
  3. 02/15/2018Andrew’s Proposed Gun LawsThese are my current thoughts on gun laws that would radically change the culture and safety of gun ownership in the United States without removing the rights of gun owners or compromising their privacy rights. * Please feel free to link to, or just copy, these ideas. It would be wonderful to see them spread widely and eventually become the basis for something to rally around and become legislation. Update: 3/3/2018 I added #7, increasing the age to purchase. Update: 4/27/2018 Please be aware that I am not ...... 
  4. 05/05/2016Is the growing social-sourced economy the modern back door into socialism? 
  5. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertising 
  6. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction 
  7. 10/20/2015My suggestion is to stay away from PayAnywhere(dot)com  
  8. 08/07/2015Here is one for you VMWARE gurus - particularly if you run ESXi without fancy drive arrays 
  9. 08/06/2015The Killer of Orphans (Orphan Documents) 
  10. 06/02/2015Homeopathic Marketing: Traveler on my Android is now calling itself VERSE. Allow me to translate that for the IBM Notes community... 
Click here for more articles.....

pen icon Comment Entry
Your Name
*Your Email
* Your email address is required, but not displayed.
Your thoughts....
Remember Me  

Please wait while your document is saved.