Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Building and managing a hot failover server for Domino

By Andrew Pollack on 06/05/2008 at 08:27 AM EDT

A pair of clustered servers behind a network dispatcher in your building is easy to manage. What about failover machines that aren’t located in the same city?

I may be a day late and a dollar short, as they say, but I’ve spent a good bit of the last couple of days implementing a new server at a data center in Dallas. The new server will act as a hot standby. While Domino has built in cluster replication to keep itself up to date, there are many other aspects of a complex system to think about. Here are some of the things I’ve done to manage a true failover environment.

IP Address Failover – There are a number of ways to do this, of course, but none are perfect.

You can spend a great deal of money and use something like BGP or another public network failover solution so that your actual IP addresses get re-routed to the other data center. That’s expensive but very effective.

A common approach is to use “Round Robin DNS” but that has its own issues. It isn’t really very accurate because there’s no specific requirement that says a local DNS client has to pick the top address to try first, or if it has to try the second at all. Some client software even sorts the returned list of entries by the number of network hops away and picks the closest.

You can use an address dispatching tool like Domino’s ICM. ICM is a good tool – and there are other products like it that are more advanced – but of course you have to worry about the ICM itself staying available and that gets into further redundancy issues.

Finally, there’s the simple expedient of simply changing the IP address associated with the DNS entry itself and using a very low TTL value so the client side has to re-request the address frequently. That’s effective but it requires intervention to make the changeover happen, and it means that most of the time you’re getting excessive DNS request hits and a slightly longer initial connection time because you’ve effectively disabled any caching that the DNS networking can do for you.

The best solution, of course, is to build failover directly into the application side. This won’t work so well for web servers, but for proprietary client software – like Lotus Notes, it works very well.

For now, I’m using the DNS change method with a very low TTL, and working on first creating an automatically scripted DNS updater that will take care of doing the update when the primary sever fails, and also working on a longer term solution which is to build secondary connection addresses directly into the client side software for Second Signal.

Data Synchronization of Non-Domino (On Disk) Data

Keeping data in Domino’s NSF files up to date is easy with cluster replication. Keeping on disk files such as those in your …/domino/html/ directory up to date is much trickier. I’m using a free tool called Unison to do this. Unison works a lot like Domino replication. It builds meta data about files in its own internal database, so that file comparisons are done with hash algorithms and updates are well managed in both directions. In Linux, I’ve built a script to handle this task. This script runs periodically as a cron job, and keeps the Domino html directory as well as a bunch of other audio directories and many configuration settings files in sync on the failover.

In addition to the data itself – html data, sound files, and so on – I use Unison to keep many configuration files up to date on both machines. Even though the failover machine is setup as a slave DNS, I keep the full set of zone definitions fully up to date on it. If I do have a long term outage, I can just change the settings in named.conf to make the failover a master and I haven’t lost any recent updates. This method also keeps my asterisk configurations and scripts in sync automatically.

Machine Specific Configurations & Scripts

Some of the scripting I do – especially within Asterisk AGI scripts – has to make http or web services calls to the local Domino server for data. Since I want to keep the scripts in sync, I don’t want to hardcode the local server’s IP address. There are a number of ways to handle this. Traditional script development can grab the servers HOSTNAME environment variable, but since I tie specific IP addresses on a multi-homed machine to different Domino partitions and sites, that’s not good enough. I can set a local environment variable on the shell for each machine, or I can use the /etc/hosts file. My scripted URL’s can then use generic name for the server portion of the URL.

Some configurations are machine specific, or are different if the machine is acting as a primary or failover machine. For example, in Asterisk I use two different carriers for local telephone numbers. The better of the two will connect to my machine by its IP address, and will automatically fail-over to a second IP address if the first doesn’t connect. That’s easy. The other provider requires that the server “register” and update its registration every 60 seconds, saying “hey, I’m over here”. If two machines are trying to register, the resulting incoming call will go to which ever was last. To manage this, I use an “Include” on the configuration file for those registrations in Asterisk (in this case, sip.conf and iax.conf). This way, the full configuration file can be synchronized, but the part with the registration configuration can be stored in a non-synchronized directory and thus be different on each machine.

What’s left to do?

I’ve got the new machine in place and running. The data synchronization is up and active as well now. What remains is for me to test, test, and test – and then build the automatic scripts for detecting a failover on the primary and automatically cutting over to the secondary by performing the following actions:

1. Update the local configuration “include” files to change their state to the “primary” configuration.

2. Update the DNS zone file and notify the secondaries.

3. Update the VoIP provider to failover to the secondary machine automatically.

4. Make sure that all the remote software clients are using DNS entries and not IP addresses or host files.

Longer term, I hope to consolidate much of this. Building auto-failover to a secondary dns name into the client software, and creating a single “Localized” configuration directory on the servers that includes everything not automatically kept in sync along with a “README” file with a checklist.

What about you? Is your failover plan up to date?


There are  - loading -  comments....

re: Building and managing a hot failover server for DominoBy Mike Sweeney on 06/11/2008 at 07:58 PM EDT
Hey Andy

Your a geek! :)

Mike


Other Recent Stories...

  1. 01/26/2023Better Running VirtualBox or VMWARE Virtual Machines on Windows 10+ Forgive me, Reader, for I have sinned. I has been nearly 3 years since my last blog entry. The truth is, I haven't had much to say that was worthy of more than a basic social media post -- until today. For my current work, I was assigned a new laptop. It's a real powerhouse machine with 14 processor cores and 64 gigs of ram. It should be perfect for running my development environment in a virtual machine, but it wasn't. VirtualBox was barely starting, and no matter how many features I turned off, it could ...... 
  2. 04/04/2020How many Ventilators for the price of those tanks the Pentagon didn't even want?This goes WAY beyond Trump or Obama. This is decades of poor planning and poor use of funds. Certainly it should have been addressed in the Trump, Obama, Bush, Clinton, Bush, and Reagan administrations -- all of which were well aware of the implications of a pandemic. I want a military prepared to help us, not just hurt other people. As an American I expect that with the ridiculous funding of our military might, we are prepared for damn near everything. Not just killing people and breaking things, but ...... 
  3. 01/28/2020Copyright Troll WarningThere's a copyright troll firm that has automated reverse-image searches and goes around looking for any posted images that they can make a quick copyright claim on. This is not quite a scam because it's technically legal, but it's run very much like a scam. This company works with a few "clients" that have vast repositories of copyrighted images. The trolls do a reverse web search on those images looking for hits. When they find one on a site that looks like someone they can scare, they work it like ...... 
  4. 03/26/2019Undestanding how OAUTH scopes will bring the concept of APPS to your Domino server 
  5. 02/05/2019Toro Yard Equipment - Not really a premium brand as far as I am concerned 
  6. 10/08/2018Will you be at the NYC Launch Event for HCL Domino v10 -- Find me! 
  7. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects.  
  8. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino? 
  9. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back. 
  10. 02/15/2018Andrew’s Proposed Gun Laws 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.