Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Building and managing a hot failover server for Domino

By Andrew Pollack on 06/05/2008 at 08:27 AM EDT

A pair of clustered servers behind a network dispatcher in your building is easy to manage. What about failover machines that aren’t located in the same city?

I may be a day late and a dollar short, as they say, but I’ve spent a good bit of the last couple of days implementing a new server at a data center in Dallas. The new server will act as a hot standby. While Domino has built in cluster replication to keep itself up to date, there are many other aspects of a complex system to think about. Here are some of the things I’ve done to manage a true failover environment.

IP Address Failover – There are a number of ways to do this, of course, but none are perfect.

You can spend a great deal of money and use something like BGP or another public network failover solution so that your actual IP addresses get re-routed to the other data center. That’s expensive but very effective.

A common approach is to use “Round Robin DNS” but that has its own issues. It isn’t really very accurate because there’s no specific requirement that says a local DNS client has to pick the top address to try first, or if it has to try the second at all. Some client software even sorts the returned list of entries by the number of network hops away and picks the closest.

You can use an address dispatching tool like Domino’s ICM. ICM is a good tool – and there are other products like it that are more advanced – but of course you have to worry about the ICM itself staying available and that gets into further redundancy issues.

Finally, there’s the simple expedient of simply changing the IP address associated with the DNS entry itself and using a very low TTL value so the client side has to re-request the address frequently. That’s effective but it requires intervention to make the changeover happen, and it means that most of the time you’re getting excessive DNS request hits and a slightly longer initial connection time because you’ve effectively disabled any caching that the DNS networking can do for you.

The best solution, of course, is to build failover directly into the application side. This won’t work so well for web servers, but for proprietary client software – like Lotus Notes, it works very well.

For now, I’m using the DNS change method with a very low TTL, and working on first creating an automatically scripted DNS updater that will take care of doing the update when the primary sever fails, and also working on a longer term solution which is to build secondary connection addresses directly into the client side software for Second Signal.

Data Synchronization of Non-Domino (On Disk) Data

Keeping data in Domino’s NSF files up to date is easy with cluster replication. Keeping on disk files such as those in your …/domino/html/ directory up to date is much trickier. I’m using a free tool called Unison to do this. Unison works a lot like Domino replication. It builds meta data about files in its own internal database, so that file comparisons are done with hash algorithms and updates are well managed in both directions. In Linux, I’ve built a script to handle this task. This script runs periodically as a cron job, and keeps the Domino html directory as well as a bunch of other audio directories and many configuration settings files in sync on the failover.

In addition to the data itself – html data, sound files, and so on – I use Unison to keep many configuration files up to date on both machines. Even though the failover machine is setup as a slave DNS, I keep the full set of zone definitions fully up to date on it. If I do have a long term outage, I can just change the settings in named.conf to make the failover a master and I haven’t lost any recent updates. This method also keeps my asterisk configurations and scripts in sync automatically.

Machine Specific Configurations & Scripts

Some of the scripting I do – especially within Asterisk AGI scripts – has to make http or web services calls to the local Domino server for data. Since I want to keep the scripts in sync, I don’t want to hardcode the local server’s IP address. There are a number of ways to handle this. Traditional script development can grab the servers HOSTNAME environment variable, but since I tie specific IP addresses on a multi-homed machine to different Domino partitions and sites, that’s not good enough. I can set a local environment variable on the shell for each machine, or I can use the /etc/hosts file. My scripted URL’s can then use generic name for the server portion of the URL.

Some configurations are machine specific, or are different if the machine is acting as a primary or failover machine. For example, in Asterisk I use two different carriers for local telephone numbers. The better of the two will connect to my machine by its IP address, and will automatically fail-over to a second IP address if the first doesn’t connect. That’s easy. The other provider requires that the server “register” and update its registration every 60 seconds, saying “hey, I’m over here”. If two machines are trying to register, the resulting incoming call will go to which ever was last. To manage this, I use an “Include” on the configuration file for those registrations in Asterisk (in this case, sip.conf and iax.conf). This way, the full configuration file can be synchronized, but the part with the registration configuration can be stored in a non-synchronized directory and thus be different on each machine.

What’s left to do?

I’ve got the new machine in place and running. The data synchronization is up and active as well now. What remains is for me to test, test, and test – and then build the automatic scripts for detecting a failover on the primary and automatically cutting over to the secondary by performing the following actions:

1. Update the local configuration “include” files to change their state to the “primary” configuration.

2. Update the DNS zone file and notify the secondaries.

3. Update the VoIP provider to failover to the secondary machine automatically.

4. Make sure that all the remote software clients are using DNS entries and not IP addresses or host files.

Longer term, I hope to consolidate much of this. Building auto-failover to a secondary dns name into the client software, and creating a single “Localized” configuration directory on the servers that includes everything not automatically kept in sync along with a “README” file with a checklist.

What about you? Is your failover plan up to date?


There are  - loading -  comments....

re: Building and managing a hot failover server for DominoBy Mike Sweeney on 06/11/2008 at 07:58 PM EDT
Hey Andy

Your a geek! :)

Mike


Other Recent Stories...

  1. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back.BOOM. This will be as important for the platform as Traveler. If your company has ditched Notes and Domino, I feel sorry for you. For companies that do use Notes/Domino this is a game changer and Apple should be paying attention. Here's why: There are hundreds of little Notes client applications you'd never spend the time and money to build and deploy for your internal user base on IOS that we use Notes for all the time (those of us still using it). Now, those are suddenly ALL available on the iPad. ...... 
  2. 02/15/2018Andrew’s Proposed Gun LawsThese are my current thoughts on gun laws that would radically change the culture and safety of gun ownership in the United States without removing the rights of gun owners or compromising their privacy rights. * Please feel free to link to, or just copy, these ideas. It would be wonderful to see them spread widely and eventually become the basis for something to rally around and become legislation. Update: 3/3/2018 I added #7, increasing the age to purchase. Update: 4/27/2018 Please be aware that I am not ...... 
  3. 05/05/2016Is the growing social-sourced economy the modern back door into socialism?Is the growing social-sourced economy the modern back door into socialism? I read a really insightful post a couple of days ago that suggested the use of social network funding sites like “Go Fund Me” and “Kickstarter” have come about and gained popularity in part because the existing economy in no longer serving its purpose for anyone who isn’t already wealthy. Have the traditional ways to get new ventures funded become closed to all but a few who aren’t already connected to them and so onerous as to make ...... 
  4. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertising 
  5. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction 
  6. 10/20/2015My suggestion is to stay away from PayAnywhere(dot)com  
  7. 08/07/2015Here is one for you VMWARE gurus - particularly if you run ESXi without fancy drive arrays 
  8. 08/06/2015The Killer of Orphans (Orphan Documents) 
  9. 06/02/2015Homeopathic Marketing: Traveler on my Android is now calling itself VERSE. Allow me to translate that for the IBM Notes community... 
  10. 03/17/2015A review of British Airways Premium Economy Service – How to destroy customer goodwill all at once 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.