Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Building and managing a hot failover server for Domino

By Andrew Pollack on 06/05/2008 at 08:27 AM EDT

A pair of clustered servers behind a network dispatcher in your building is easy to manage. What about failover machines that aren’t located in the same city?

I may be a day late and a dollar short, as they say, but I’ve spent a good bit of the last couple of days implementing a new server at a data center in Dallas. The new server will act as a hot standby. While Domino has built in cluster replication to keep itself up to date, there are many other aspects of a complex system to think about. Here are some of the things I’ve done to manage a true failover environment.

IP Address Failover – There are a number of ways to do this, of course, but none are perfect.

You can spend a great deal of money and use something like BGP or another public network failover solution so that your actual IP addresses get re-routed to the other data center. That’s expensive but very effective.

A common approach is to use “Round Robin DNS” but that has its own issues. It isn’t really very accurate because there’s no specific requirement that says a local DNS client has to pick the top address to try first, or if it has to try the second at all. Some client software even sorts the returned list of entries by the number of network hops away and picks the closest.

You can use an address dispatching tool like Domino’s ICM. ICM is a good tool – and there are other products like it that are more advanced – but of course you have to worry about the ICM itself staying available and that gets into further redundancy issues.

Finally, there’s the simple expedient of simply changing the IP address associated with the DNS entry itself and using a very low TTL value so the client side has to re-request the address frequently. That’s effective but it requires intervention to make the changeover happen, and it means that most of the time you’re getting excessive DNS request hits and a slightly longer initial connection time because you’ve effectively disabled any caching that the DNS networking can do for you.

The best solution, of course, is to build failover directly into the application side. This won’t work so well for web servers, but for proprietary client software – like Lotus Notes, it works very well.

For now, I’m using the DNS change method with a very low TTL, and working on first creating an automatically scripted DNS updater that will take care of doing the update when the primary sever fails, and also working on a longer term solution which is to build secondary connection addresses directly into the client side software for Second Signal.

Data Synchronization of Non-Domino (On Disk) Data

Keeping data in Domino’s NSF files up to date is easy with cluster replication. Keeping on disk files such as those in your …/domino/html/ directory up to date is much trickier. I’m using a free tool called Unison to do this. Unison works a lot like Domino replication. It builds meta data about files in its own internal database, so that file comparisons are done with hash algorithms and updates are well managed in both directions. In Linux, I’ve built a script to handle this task. This script runs periodically as a cron job, and keeps the Domino html directory as well as a bunch of other audio directories and many configuration settings files in sync on the failover.

In addition to the data itself – html data, sound files, and so on – I use Unison to keep many configuration files up to date on both machines. Even though the failover machine is setup as a slave DNS, I keep the full set of zone definitions fully up to date on it. If I do have a long term outage, I can just change the settings in named.conf to make the failover a master and I haven’t lost any recent updates. This method also keeps my asterisk configurations and scripts in sync automatically.

Machine Specific Configurations & Scripts

Some of the scripting I do – especially within Asterisk AGI scripts – has to make http or web services calls to the local Domino server for data. Since I want to keep the scripts in sync, I don’t want to hardcode the local server’s IP address. There are a number of ways to handle this. Traditional script development can grab the servers HOSTNAME environment variable, but since I tie specific IP addresses on a multi-homed machine to different Domino partitions and sites, that’s not good enough. I can set a local environment variable on the shell for each machine, or I can use the /etc/hosts file. My scripted URL’s can then use generic name for the server portion of the URL.

Some configurations are machine specific, or are different if the machine is acting as a primary or failover machine. For example, in Asterisk I use two different carriers for local telephone numbers. The better of the two will connect to my machine by its IP address, and will automatically fail-over to a second IP address if the first doesn’t connect. That’s easy. The other provider requires that the server “register” and update its registration every 60 seconds, saying “hey, I’m over here”. If two machines are trying to register, the resulting incoming call will go to which ever was last. To manage this, I use an “Include” on the configuration file for those registrations in Asterisk (in this case, sip.conf and iax.conf). This way, the full configuration file can be synchronized, but the part with the registration configuration can be stored in a non-synchronized directory and thus be different on each machine.

What’s left to do?

I’ve got the new machine in place and running. The data synchronization is up and active as well now. What remains is for me to test, test, and test – and then build the automatic scripts for detecting a failover on the primary and automatically cutting over to the secondary by performing the following actions:

1. Update the local configuration “include” files to change their state to the “primary” configuration.

2. Update the DNS zone file and notify the secondaries.

3. Update the VoIP provider to failover to the secondary machine automatically.

4. Make sure that all the remote software clients are using DNS entries and not IP addresses or host files.

Longer term, I hope to consolidate much of this. Building auto-failover to a secondary dns name into the client software, and creating a single “Localized” configuration directory on the servers that includes everything not automatically kept in sync along with a “README” file with a checklist.

What about you? Is your failover plan up to date?


There are  - loading -  comments....

re: Building and managing a hot failover server for DominoBy Mike Sweeney on 06/11/2008 at 07:58 PM EDT
Hey Andy

Your a geek! :)

Mike


Other Recent Stories...

  1. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects. In twenty five years, I don't think I've ever written an entry like this, but if you need the kind of work I do now would be a great time to get in touch. Both of the big projects I had lined up for late summer and early fall have been placed on hold and will be that way for a while. With the kids now all off at college and careers, I'm open to more travel than such than I have been in decades, but unless something else comes along, I'll be here working on updates to Second Signal and other things that ...... 
  2. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino?We need to address some biases here. IBM has made a deal under which the Notes & Domino software and intellectual property is now being developed and maintained by HCL America. HCL America is part of the very large "HCL Technologies" company that has grown from its roots in India to become an 8 Billion Dollar company with a global presence in the IT Industry. You could be excused for initially believing, as many people do when they hear this, that "they've outsourced the code to India where they'll milk it ...... 
  3. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back.BOOM. This will be as important for the platform as Traveler. If your company has ditched Notes and Domino, I feel sorry for you. For companies that do use Notes/Domino this is a game changer and Apple should be paying attention. Here's why: There are hundreds of little Notes client applications you'd never spend the time and money to build and deploy for your internal user base on IOS that we use Notes for all the time (those of us still using it). Now, those are suddenly ALL available on the iPad. ...... 
  4. 02/15/2018Andrew’s Proposed Gun Laws 
  5. 05/05/2016Is the growing social-sourced economy the modern back door into socialism? 
  6. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertising 
  7. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction 
  8. 10/20/2015My suggestion is to stay away from PayAnywhere(dot)com  
  9. 08/07/2015Here is one for you VMWARE gurus - particularly if you run ESXi without fancy drive arrays 
  10. 08/06/2015The Killer of Orphans (Orphan Documents) 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.