Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

The Killer of Orphans (Orphan Documents)

By Andrew Pollack on 08/06/2015 at 10:40 AM EDT

Those damn orphans are harder to kill than you think.

Maybe you'll spot the error faster than I did -- or maybe this will help you.

I have a customer with a help desk application created in the mid 1990s. It started causing major issues so I looked into it and found it had grown to over 20gb in size. A check of the database properties showed a whopping 453,355 documents, and of course, many of those have screen shots. When I spoke to the client, she swore she'd deleted everything older than 1/1/2015 and could see only a few thousand documents.

Well, by know you know what happened is that she had been deleting main documents and leaving all the responses as orphans. The application did not have any on-Delete code to clean that sort of thing up. You'd be surprised how many do not.

I decided to write some code that would look at every document in the database, make sure it was not a configuration record, see if it was a response, and if so if the parent document existed. If not, kill the document. I did lots of fancy things with list elements to cache known and unknown unids and things to speed it up, but basically that's what it did. It also was designed to make repeated passes through the database so that it could pick out response-to-response level that became orphaned in the previous pass. Yes, I could have done this form within a view, or by following the chain of parent documents all the way up with each document to avoid repeating the loop, but that has it's own issues as well and I didn't feel like writing a recursive function just for this "simple" task.

The thing is, it didn't work. It kept not finding orphans. It turns out an old and well known problem was manifesting in a whole new way.

To find a parent, I was using code like this (simplified by removing declarations and all the hash based caching to avoid repeated document loads)

if doc.isresponse then
    set parentDoc = nothing ' make sure I don't have an old one still there
    on error resume next ' don't throw errors for bad UNIDs
    set parentDoc = thisdb.getDocumentByUniversalID( doc.parentdocumentunid )
    on error goto errorhandle ' re-establish my normal error handling
    if parentDoc is nothing then
        ' *** Do whatever it is I do to an orphan document ***
    end if
end if

Can you spot why it failed?

Sadly it took me a long time to realize that if the parent formerly existed, there would be a deletion stub. The deletion stub still in the database mean that the "parentDoc" object was still set to a document object, just not a valid one. Testing it to see if it was "Nothing" wouldn't work. After way too many hours, I changed the code to look like this:

if doc.isresponse then
    set parentDoc = nothing ' make sure I don't have an old one still there
    on error resume next ' don't throw errors for bad UNIDs
    set parentDoc = thisdb.getDocumentByUniversalID( doc.parentdocumentunid )
    on error goto errorhandle ' re-establish my normal error handling
    haveParentBoolean = true
    if parentDoc is nothing then
        haveParentBoolean = false
    else
        if not parentDoc.isValid then haveParentBoolean = false
    end if
    if haveParentBoolean = false then
        ' *** Do whatever it is I do to an orphan document ***
    end if
end if

There are two different ways to fail looking up the parent. Either getting nothing at all, or getting an invalid (deleted) document handle. This is very much the same reason why you always have to check for .isvalid when looping through a collection. A deleted document handle is not "nothing", it's just not useable.

The result -- The database size on disk is down from 20gb to 261mb, and from 453,355 documents to 8,110.


There are  - loading -  comments....

re: The Killer of Orphans (Orphan Documents)By Timothy Briley on 08/07/2015 at 09:47 PM EDT
So if I understand the issue correctly, the morale of the story is to add a "If
doc.IsValid Then" before processing a doc retrieved via unid or, I'm guessing,
by extension by noteid.

But the other part of this I didn't realize was that not getting a hit using
unid doesn't simply result in doc = Nothing, that instead it throws an error.

But according to Notes documentation, not getting a hit using notesid doesn't
throw an error.

I'm not sure I really get the point of Notes doing that, but at least now I
know. Thanks.
re: The Killer of Orphans (Orphan Documents)By Andrew Pollack on 08/08/2015 at 08:01 AM EDT
Tim, also critical in a collection -- whether from a search, a view, or a
database.allDocuments. You always want to check for .isvalid. I just hadn't
thought of it in terms of NOT finding a document.
re: The Killer of Orphans (Orphan Documents)By Timothy Briley on 08/08/2015 at 06:00 PM EDT
It makes sense. Since defensive coding is always a good idea, "If doc.IsValid
Then" probably should be a standard part of code, just like routing errors to
OpenLog, etc.

In 11 days I'll be in ATL for MWLUG. If you are there and see me, look me up.
I'll buy you a beer!
re: The Killer of Orphans (Orphan Documents)By Lars Berntrop-Bos on 08/22/2015 at 08:04 PM EDT
I use this function:
%REM
Function isValidDoc
Description: Returns if the supplied NotesDocument is a valid useable
document
%END REM
Function isValidDoc(doc As NotesDocument) As Boolean
isValidDoc = False
If doc Is Nothing Then Exit Function
If doc.Size = 0 Or doc.IsDeleted Or Not doc.IsValid Then Exit Function
If doc.HasItem("$Conflict") Then Exit Function
isValidDoc = True
End Function ' isValidDoc

Sometimes, just checking isValid is not enough. I've seen 'ghost' documents pop
into existence with size zero (hypothesis: to enable viewing threaded
discussions, where the original Main document has been deleted). Also, you may
want to evaluate if you want to treat a save-conflict as a valid parent or not.
I generally prefer to have responses to a normal Main document, and not to a
save conflict.


Other Recent Stories...

  1. 05/05/2016Is the growing social-sourced economy the modern back door into socialism?Is the growing social-sourced economy the modern back door into socialism? I read a really insightful post a couple of days ago that suggested the use of social network funding sites like “Go Fund Me” and “Kickstarter” have come about and gained popularity in part because the existing economy in no longer serving its purpose for anyone who isn’t already wealthy. Have the traditional ways to get new ventures funded become closed to all but a few who aren’t already connected to them and so onerous as to make ...... 
  2. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertisingAn increasing number of websites are now detecting when users have ad-blocking enabled, and refuse to show content unless you "whitelist" their site (disable your ad-blocking for them). I think that is a fair decision on their part, it's how they pay for the site. However, if you want me (and many others) to white list your site, there are some rules you should follow. If you violate these rules, I won't whitelist your site, I'll just find content elsewhere. 1. The total space taken up by advertisements ...... 
  3. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction[] “The Expanse” is a new science fiction series being broadcast onthe Syfy channelthis winter. It’s closely based on a series of books by author James S. A. Corey beginning with “Leviathan Wakes”. There are 5 books in the “Expanse” series so far. If you’re a fan of the novels you’ll appreciate how closely the books are followed.TIP: The first five episodes are already available on Syfy.com. If you’re having trouble getting into the characters and plot, use those to get up to speed.The worlds created for ...... 
  4. 10/20/2015My suggestion is to stay away from PayAnywhere(dot)com  
  5. 08/07/2015Here is one for you VMWARE gurus - particularly if you run ESXi without fancy drive arrays 
  6. 08/06/2015The Killer of Orphans (Orphan Documents) 
  7. 06/02/2015Homeopathic Marketing: Traveler on my Android is now calling itself VERSE. Allow me to translate that for the IBM Notes community... 
  8. 03/17/2015A review of British Airways Premium Economy Service – How to destroy customer goodwill all at once 
  9. 02/26/2015There's a bug in how @TextToTime() and @ToTime() process date strings related to international standards and browser settings. 
  10. 01/21/2015Delivering two new presentations at Developer Camp (EntwicklerCamp) 2015 in Germany 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.