Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Building and managing a hot failover server for Domino

By Andrew Pollack on 06/05/2008 at 08:27 AM EDT

A pair of clustered servers behind a network dispatcher in your building is easy to manage. What about failover machines that aren’t located in the same city?

I may be a day late and a dollar short, as they say, but I’ve spent a good bit of the last couple of days implementing a new server at a data center in Dallas. The new server will act as a hot standby. While Domino has built in cluster replication to keep itself up to date, there are many other aspects of a complex system to think about. Here are some of the things I’ve done to manage a true failover environment.

IP Address Failover – There are a number of ways to do this, of course, but none are perfect.

You can spend a great deal of money and use something like BGP or another public network failover solution so that your actual IP addresses get re-routed to the other data center. That’s expensive but very effective.

A common approach is to use “Round Robin DNS” but that has its own issues. It isn’t really very accurate because there’s no specific requirement that says a local DNS client has to pick the top address to try first, or if it has to try the second at all. Some client software even sorts the returned list of entries by the number of network hops away and picks the closest.

You can use an address dispatching tool like Domino’s ICM. ICM is a good tool – and there are other products like it that are more advanced – but of course you have to worry about the ICM itself staying available and that gets into further redundancy issues.

Finally, there’s the simple expedient of simply changing the IP address associated with the DNS entry itself and using a very low TTL value so the client side has to re-request the address frequently. That’s effective but it requires intervention to make the changeover happen, and it means that most of the time you’re getting excessive DNS request hits and a slightly longer initial connection time because you’ve effectively disabled any caching that the DNS networking can do for you.

The best solution, of course, is to build failover directly into the application side. This won’t work so well for web servers, but for proprietary client software – like Lotus Notes, it works very well.

For now, I’m using the DNS change method with a very low TTL, and working on first creating an automatically scripted DNS updater that will take care of doing the update when the primary sever fails, and also working on a longer term solution which is to build secondary connection addresses directly into the client side software for Second Signal.

Data Synchronization of Non-Domino (On Disk) Data

Keeping data in Domino’s NSF files up to date is easy with cluster replication. Keeping on disk files such as those in your …/domino/html/ directory up to date is much trickier. I’m using a free tool called Unison to do this. Unison works a lot like Domino replication. It builds meta data about files in its own internal database, so that file comparisons are done with hash algorithms and updates are well managed in both directions. In Linux, I’ve built a script to handle this task. This script runs periodically as a cron job, and keeps the Domino html directory as well as a bunch of other audio directories and many configuration settings files in sync on the failover.

In addition to the data itself – html data, sound files, and so on – I use Unison to keep many configuration files up to date on both machines. Even though the failover machine is setup as a slave DNS, I keep the full set of zone definitions fully up to date on it. If I do have a long term outage, I can just change the settings in named.conf to make the failover a master and I haven’t lost any recent updates. This method also keeps my asterisk configurations and scripts in sync automatically.

Machine Specific Configurations & Scripts

Some of the scripting I do – especially within Asterisk AGI scripts – has to make http or web services calls to the local Domino server for data. Since I want to keep the scripts in sync, I don’t want to hardcode the local server’s IP address. There are a number of ways to handle this. Traditional script development can grab the servers HOSTNAME environment variable, but since I tie specific IP addresses on a multi-homed machine to different Domino partitions and sites, that’s not good enough. I can set a local environment variable on the shell for each machine, or I can use the /etc/hosts file. My scripted URL’s can then use generic name for the server portion of the URL.

Some configurations are machine specific, or are different if the machine is acting as a primary or failover machine. For example, in Asterisk I use two different carriers for local telephone numbers. The better of the two will connect to my machine by its IP address, and will automatically fail-over to a second IP address if the first doesn’t connect. That’s easy. The other provider requires that the server “register” and update its registration every 60 seconds, saying “hey, I’m over here”. If two machines are trying to register, the resulting incoming call will go to which ever was last. To manage this, I use an “Include” on the configuration file for those registrations in Asterisk (in this case, sip.conf and iax.conf). This way, the full configuration file can be synchronized, but the part with the registration configuration can be stored in a non-synchronized directory and thus be different on each machine.

What’s left to do?

I’ve got the new machine in place and running. The data synchronization is up and active as well now. What remains is for me to test, test, and test – and then build the automatic scripts for detecting a failover on the primary and automatically cutting over to the secondary by performing the following actions:

1. Update the local configuration “include” files to change their state to the “primary” configuration.

2. Update the DNS zone file and notify the secondaries.

3. Update the VoIP provider to failover to the secondary machine automatically.

4. Make sure that all the remote software clients are using DNS entries and not IP addresses or host files.

Longer term, I hope to consolidate much of this. Building auto-failover to a secondary dns name into the client software, and creating a single “Localized” configuration directory on the servers that includes everything not automatically kept in sync along with a “README” file with a checklist.

What about you? Is your failover plan up to date?

There are  - loading -  comments....

re: Building and managing a hot failover server for DominoBy Mike Sweeney on 06/11/2008 at 07:58 PM EDT
Hey Andy

Your a geek! :)


Other Recent Stories...

  1. 05/05/2016Is the growing social-sourced economy the modern back door into socialism?Is the growing social-sourced economy the modern back door into socialism? I read a really insightful post a couple of days ago that suggested the use of social network funding sites like “Go Fund Me” and “Kickstarter” have come about and gained popularity in part because the existing economy in no longer serving its purpose for anyone who isn’t already wealthy. Have the traditional ways to get new ventures funded become closed to all but a few who aren’t already connected to them and so onerous as to make ...... 
  2. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertisingAn increasing number of websites are now detecting when users have ad-blocking enabled, and refuse to show content unless you "whitelist" their site (disable your ad-blocking for them). I think that is a fair decision on their part, it's how they pay for the site. However, if you want me (and many others) to white list your site, there are some rules you should follow. If you violate these rules, I won't whitelist your site, I'll just find content elsewhere. 1. The total space taken up by advertisements ...... 
  3. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction[] “The Expanse” is a new science fiction series being broadcast onthe Syfy channelthis winter. It’s closely based on a series of books by author James S. A. Corey beginning with “Leviathan Wakes”. There are 5 books in the “Expanse” series so far. If you’re a fan of the novels you’ll appreciate how closely the books are followed.TIP: The first five episodes are already available on If you’re having trouble getting into the characters and plot, use those to get up to speed.The worlds created for ...... 
  4. 10/20/2015My suggestion is to stay away from PayAnywhere(dot)com  
  5. 08/07/2015Here is one for you VMWARE gurus - particularly if you run ESXi without fancy drive arrays 
  6. 08/06/2015The Killer of Orphans (Orphan Documents) 
  7. 06/02/2015Homeopathic Marketing: Traveler on my Android is now calling itself VERSE. Allow me to translate that for the IBM Notes community... 
  8. 03/17/2015A review of British Airways Premium Economy Service – How to destroy customer goodwill all at once 
  9. 02/26/2015There's a bug in how @TextToTime() and @ToTime() process date strings related to international standards and browser settings. 
  10. 01/21/2015Delivering two new presentations at Developer Camp (EntwicklerCamp) 2015 in Germany 
Click here for more articles.....

pen icon Comment Entry
Your Name
*Your Email
* Your email address is required, but not displayed.
Your thoughts....
Remember Me  

Please wait while your document is saved.