Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Wow, I've been busy. Here's a meaty tech post, about the intermittent time-out issue some of us have seen in Domino/Notes communication

By Andrew Pollack on 01/06/2005 at 08:29 AM EST

Sorry I haven't been posting. I'm working on some new things here, and have been REALLY busy on client work.

BTW: I am aware of at least one IBM person working on this problem. As a courtesy, I will not mention the person's name as they are not a frequent poster on this forum and may not want a bunch of emails. I'll pass along anything you have to add which confirms or contradicts whats in this note.

PLEASE read this, and if you can confirm the observations I've made, contradict any, or add thoughts to the theory or resolution ideas -- it would be of great value.

Definition of the issue:

Lotus Notes client, designer, admin, or other servers attempt to use a connection session which was established already but which has been idle for some period of time. The call to the other end across the connection goes unanswered, and a model wait period is imposed. On the client, this is the lightning bolt display, at the server its simply that task which is unavailable during the period. At the end of the connection timeout period on the client side, an error is reported indicating a failure to connect. In some circumstances, the client side will then attempt to re-establish one, often being successful. Sometimes, hitting crtl-break to interrupt the connection attempt early will have the same effect. This is because it forces the client side to drop any existing connections. The timing is important to the issue, but seems to vary based on some unknown variable. If its less than 5 minutes, I believe its another issue. Remember, other kinds of connection issues exist so not all connection issues will be this connection issue.

Some further observations:

This has been more common on Windows based servers. I've only seen it personally connecting to one non-windows based server. Over the last few days, on a Suse linux based 6.5.3 server. The problem in this case was an extreme case, and seems to have gone away after removing the SMB networking package from that machine (SAMBA) though I cannot be sure that this was the cause or that this was the same issue. The behavior was very much same.

I have seen this with much greater frequency -- and almost, though unconfirmed duplicability -- in cases where the Windows 2000 based Domino server had more than one network IP address AND the connection was being established to an address which was not the primary (first configured) network IP address.

I have seen similar looking, but unrelated issues due to bad firewall configurations (particularly when dealing with source routed packets, ip tunnels, vpns, and improper masquerade or NAT configurations). This is usually manifest as sessions that start by don't complete, and often leave multiple visible connection sessions on the server seen with the SH U command.

I have seen EXACTLY the same behavior in a connection from a DOMINO server to a Microsoft Active Directory LDAP port on another machine, when the Domino server as configured with Directory Assistance to use that LDAP server for http login credentials. In this case, if the session has been idle for some time, the first attempt to use it must time out and fail, then subsequent re-attempts work. This was discovered because the timeout in the ldap configuration was set to 60 seconds, and it was requiring 62 seconds for the first connection in morning to work. We are currently working around the issue by setting the ldap timeout to 15 seconds, and now the first connection in the morning takes 17 seconds -- which the users find acceptable.

I don't believe I've seen this issue when dealing with partitioned Domino servers, in cases where the partitions are defined to use distinct IP addresses as set in the notes.ini. I have not yet confirmed this in a lab.

A hypothesis and some suggestions for resolution of symptoms

I have a belief that at some point around 3 years ago, in an attempt to harden Win/32's TCP stack against what was a common denial of service attack of opening thousands of connections and never dropping them, that a change was made to the length of time that an idle connection was left open (or perhaps the total number of open connections on a port), and that a further change was made which in some way made the fact of the dropped listener unknown to the client the side. This may be simply by not sending a packet to the client indicating a drop, so that the client wouldn't immediately respond with a re-establish message. Another theory is that the OS changed the way it reports or handles multiple IP's on the same NIC which are on the same subnet when "talking" with the software (in this case, Domino). This may be causing the sever to respond but the response to go out slightly different down the stack and out the NIC, in a way that the client side does not see the response as being part of the same session. Its possible also that this is a change at the client side, and that while this change was always happening, the client side now is less accepting of packets outside what it expects to see. This would also be a defensive move.

To resolve, try the following:

a) Make sure your Domino server's "official" IP address -- the one your clients connect to (the internal one if its behind a firewall doing NAT) is the primary address on the only NIC in the server (if that's possible in your environment).

b) Configure even single partition Domino servers as if they are one of several partitions on the server, adding the proper parameters to the INI settings to specifically use an IP address rather than all the IP addresses reported by the operating system.

There are  - loading -  comments....

SYN-ACK AttackBy Declan Lynch on 01/06/2005 at 09:32 AM EST
Sounds like a accidental attack of the SYN-ACK exploit. On certain Windows
servers if there are lots of unanswered sessions then tcp/ip memory gets eaten
up and causes major slowdown of the stack.

This 'fix' in Windows 2003Sp1 might help explain it better then me :

but then again, I may be totally wrong here...
I don't think its an accidental attack, I think its....By Andrew Pollack on 01/06/2005 at 11:28 AM EST
a result of the service pack (or a service pack) using a common technique to
silently drop rather than announcing an end connection.
My own thoughts on this are...By Amy B on 01/06/2005 at 10:21 AM EST
I use hostnames wherever possible rather than IP addresses. I have A records
internally in my DNS so that the hostname maps properly to the Domino server's
internal non-routable IP address when I'm on the LAN. When I'm traveling, I
use the same connection docs, etc. but the A records resolve to public DNS A
records which provide the server's external, public IP address. Clean and
simple and easy to modify.

- A
Right, that common and all -- but this issue isn't about resolution.By Andrew Pollack on 01/06/2005 at 11:30 AM EST
This is a valid resolved link, that intermittenly drops but does so in a way
that the client side doesn't know it. So, the client side tries to use the
link, but has to timeout and fail then reestablish on.
Other ReferencesBy Julian Robichaux on 01/06/2005 at 11:45 AM EST
Other references to this issue (I think it's the same one):
One Other Follow-Up On Ed's SiteBy Julian Robichaux on 01/06/2005 at 11:47 AM EST
My own thoughts on this are...By duanebear on 01/07/2005 at 08:03 AM EST
I have been working on this with IBM for the past two months. This issue or
one close to it has been addressed in Domino 6.5.4. The SPR is RGET5Q5TJL. It
appears to be a problem with the IOCP interface in Domino. It can "sleep" for
4 minutes. Check out the SPR and see if this matches what you are
experiencing. We are seeing it while running Domino 6.5.1 and 6.5.3 on AIX
specifically. Duane
Infor on some known issuesBy Ted Stanton on 01/07/2005 at 10:04 AM EST
SPR# IDEA5VSS27 - Resolved/Fixed in 6.5.3 - Notes 6.5 client pages out after 60
seconds of inactivity

SPR# JEIN5XSLJP - Resolved/Fixed in 6.5.3 - Poor performance of in-memory
design cache

SPR# SVRO63SNKW - Resloved/Fixed in 6.5.4 - 1MB in-memory design cache isn't
big enough

SPR# BSPR65FJ2R - Resolved/Fixed in 6.5.3 FP1 - Chronos: Error full text
indexing mail\xxxx.nsf: Message Queue is full

Above are some SPR's related to client side performance.

Other Recent Stories...

  1. 01/26/2023Better Running VirtualBox or VMWARE Virtual Machines on Windows 10+ Forgive me, Reader, for I have sinned. I has been nearly 3 years since my last blog entry. The truth is, I haven't had much to say that was worthy of more than a basic social media post -- until today. For my current work, I was assigned a new laptop. It's a real powerhouse machine with 14 processor cores and 64 gigs of ram. It should be perfect for running my development environment in a virtual machine, but it wasn't. VirtualBox was barely starting, and no matter how many features I turned off, it could ...... 
  2. 04/04/2020How many Ventilators for the price of those tanks the Pentagon didn't even want?This goes WAY beyond Trump or Obama. This is decades of poor planning and poor use of funds. Certainly it should have been addressed in the Trump, Obama, Bush, Clinton, Bush, and Reagan administrations -- all of which were well aware of the implications of a pandemic. I want a military prepared to help us, not just hurt other people. As an American I expect that with the ridiculous funding of our military might, we are prepared for damn near everything. Not just killing people and breaking things, but ...... 
  3. 01/28/2020Copyright Troll WarningThere's a copyright troll firm that has automated reverse-image searches and goes around looking for any posted images that they can make a quick copyright claim on. This is not quite a scam because it's technically legal, but it's run very much like a scam. This company works with a few "clients" that have vast repositories of copyrighted images. The trolls do a reverse web search on those images looking for hits. When they find one on a site that looks like someone they can scare, they work it like ...... 
  4. 03/26/2019Undestanding how OAUTH scopes will bring the concept of APPS to your Domino server 
  5. 02/05/2019Toro Yard Equipment - Not really a premium brand as far as I am concerned 
  6. 10/08/2018Will you be at the NYC Launch Event for HCL Domino v10 -- Find me! 
  7. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects.  
  8. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino? 
  9. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back. 
  10. 02/15/2018Andrew’s Proposed Gun Laws 
Click here for more articles.....

pen icon Comment Entry
Your Name
*Your Email
* Your email address is required, but not displayed.
Your thoughts....
Remember Me  

Please wait while your document is saved.