Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

The Killer of Orphans (Orphan Documents)

By Andrew Pollack on 08/06/2015 at 10:40 AM EDT

Those damn orphans are harder to kill than you think.

Maybe you'll spot the error faster than I did -- or maybe this will help you.

I have a customer with a help desk application created in the mid 1990s. It started causing major issues so I looked into it and found it had grown to over 20gb in size. A check of the database properties showed a whopping 453,355 documents, and of course, many of those have screen shots. When I spoke to the client, she swore she'd deleted everything older than 1/1/2015 and could see only a few thousand documents.

Well, by know you know what happened is that she had been deleting main documents and leaving all the responses as orphans. The application did not have any on-Delete code to clean that sort of thing up. You'd be surprised how many do not.

I decided to write some code that would look at every document in the database, make sure it was not a configuration record, see if it was a response, and if so if the parent document existed. If not, kill the document. I did lots of fancy things with list elements to cache known and unknown unids and things to speed it up, but basically that's what it did. It also was designed to make repeated passes through the database so that it could pick out response-to-response level that became orphaned in the previous pass. Yes, I could have done this form within a view, or by following the chain of parent documents all the way up with each document to avoid repeating the loop, but that has it's own issues as well and I didn't feel like writing a recursive function just for this "simple" task.

The thing is, it didn't work. It kept not finding orphans. It turns out an old and well known problem was manifesting in a whole new way.

To find a parent, I was using code like this (simplified by removing declarations and all the hash based caching to avoid repeated document loads)

if doc.isresponse then
    set parentDoc = nothing ' make sure I don't have an old one still there
    on error resume next ' don't throw errors for bad UNIDs
    set parentDoc = thisdb.getDocumentByUniversalID( doc.parentdocumentunid )
    on error goto errorhandle ' re-establish my normal error handling
    if parentDoc is nothing then
        ' *** Do whatever it is I do to an orphan document ***
    end if
end if

Can you spot why it failed?

Sadly it took me a long time to realize that if the parent formerly existed, there would be a deletion stub. The deletion stub still in the database mean that the "parentDoc" object was still set to a document object, just not a valid one. Testing it to see if it was "Nothing" wouldn't work. After way too many hours, I changed the code to look like this:

if doc.isresponse then
    set parentDoc = nothing ' make sure I don't have an old one still there
    on error resume next ' don't throw errors for bad UNIDs
    set parentDoc = thisdb.getDocumentByUniversalID( doc.parentdocumentunid )
    on error goto errorhandle ' re-establish my normal error handling
    haveParentBoolean = true
    if parentDoc is nothing then
        haveParentBoolean = false
    else
        if not parentDoc.isValid then haveParentBoolean = false
    end if
    if haveParentBoolean = false then
        ' *** Do whatever it is I do to an orphan document ***
    end if
end if

There are two different ways to fail looking up the parent. Either getting nothing at all, or getting an invalid (deleted) document handle. This is very much the same reason why you always have to check for .isvalid when looping through a collection. A deleted document handle is not "nothing", it's just not useable.

The result -- The database size on disk is down from 20gb to 261mb, and from 453,355 documents to 8,110.


There are  - loading -  comments....

re: The Killer of Orphans (Orphan Documents)By Timothy Briley on 08/07/2015 at 09:47 PM EDT
So if I understand the issue correctly, the morale of the story is to add a "If
doc.IsValid Then" before processing a doc retrieved via unid or, I'm guessing,
by extension by noteid.

But the other part of this I didn't realize was that not getting a hit using
unid doesn't simply result in doc = Nothing, that instead it throws an error.

But according to Notes documentation, not getting a hit using notesid doesn't
throw an error.

I'm not sure I really get the point of Notes doing that, but at least now I
know. Thanks.
re: The Killer of Orphans (Orphan Documents)By Andrew Pollack on 08/08/2015 at 08:01 AM EDT
Tim, also critical in a collection -- whether from a search, a view, or a
database.allDocuments. You always want to check for .isvalid. I just hadn't
thought of it in terms of NOT finding a document.
re: The Killer of Orphans (Orphan Documents)By Timothy Briley on 08/08/2015 at 06:00 PM EDT
It makes sense. Since defensive coding is always a good idea, "If doc.IsValid
Then" probably should be a standard part of code, just like routing errors to
OpenLog, etc.

In 11 days I'll be in ATL for MWLUG. If you are there and see me, look me up.
I'll buy you a beer!
re: The Killer of Orphans (Orphan Documents)By Lars Berntrop-Bos on 08/22/2015 at 08:04 PM EDT
I use this function:
%REM
Function isValidDoc
Description: Returns if the supplied NotesDocument is a valid useable
document
%END REM
Function isValidDoc(doc As NotesDocument) As Boolean
isValidDoc = False
If doc Is Nothing Then Exit Function
If doc.Size = 0 Or doc.IsDeleted Or Not doc.IsValid Then Exit Function
If doc.HasItem("$Conflict") Then Exit Function
isValidDoc = True
End Function ' isValidDoc

Sometimes, just checking isValid is not enough. I've seen 'ghost' documents pop
into existence with size zero (hypothesis: to enable viewing threaded
discussions, where the original Main document has been deleted). Also, you may
want to evaluate if you want to treat a save-conflict as a valid parent or not.
I generally prefer to have responses to a normal Main document, and not to a
save conflict.


Other Recent Stories...

  1. 04/04/2020How many Ventilators for the price of those tanks the Pentagon didn't even want?This goes WAY beyond Trump or Obama. This is decades of poor planning and poor use of funds. Certainly it should have been addressed in the Trump, Obama, Bush, Clinton, Bush, and Reagan administrations -- all of which were well aware of the implications of a pandemic. I want a military prepared to help us, not just hurt other people. As an American I expect that with the ridiculous funding of our military might, we are prepared for damn near everything. Not just killing people and breaking things, but ...... 
  2. 01/28/2020Copyright Troll WarningThere's a copyright troll firm that has automated reverse-image searches and goes around looking for any posted images that they can make a quick copyright claim on. This is not quite a scam because it's technically legal, but it's run very much like a scam. This company works with a few "clients" that have vast repositories of copyrighted images. The trolls do a reverse web search on those images looking for hits. When they find one on a site that looks like someone they can scare, they work it like ...... 
  3. 03/26/2019Undestanding how OAUTH scopes will bring the concept of APPS to your Domino serverWhile a full description of OATH is way beyond what I can do in this quick blog entry, I wanted to talk a bit about how "SCOPES" interact with the already rich authorization model used by Domino. Thanks to the fantastic work by John Curtis and his team, the node.js integration with Domino is going to be getting a rich security model. What we know is that a user's authorizations will be respected through the node.js application to the Domino server -- including reader names, ACLs, Roles, and so on. The way ...... 
  4. 02/05/2019Toro Yard Equipment - Not really a premium brand as far as I am concerned 
  5. 10/08/2018Will you be at the NYC Launch Event for HCL Domino v10 -- Find me! 
  6. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects.  
  7. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino? 
  8. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back. 
  9. 02/15/2018Andrew’s Proposed Gun Laws 
  10. 05/05/2016Is the growing social-sourced economy the modern back door into socialism? 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.