Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Linux ext3 file system performance wierdness

By Andrew Pollack on 09/02/2010 at 07:01 AM EDT

I've had trouble off and on with a couple of Domino servers on linux. The server goes pear shaped and when I ssh in and look at the console, Domino is reporting drive errors. If you attempt to do anything on the OS at all, you quickly see that the whole file system has shifted into a "read-only" state. This is a bit like a car with a transmission problem shifting into "limp-home" mode. Needless to say, Domino doesn't like being unable to write to the disk.

It has happened to me specifically with the most recent updates of CENTOS5, but since that's the only distribution I use I can't tell you that it's specifically related to the distro. I don't think so, because I've seen reports of others with similar issues. I also know that it isn't Domino's fault, but rather that Domino is so disk intensive that it tends to be one of the places where the problem comes up.

The problem manifests when the disk is so busy that at some point the driver just can't keep up. When this happens in the Windows server world, either Domino will crash or the entire OS will just halt. Usually people think this is a RAID controller problem and start replacing hardware. In fact, it's just the driver reporting an error state to the OS that it can't keep up and the OS reacting badly. On linux, the ext3 file system (roughly equivalent to the ntfs file system in Windows) will react to a any write fault based on an option stored in the superblock. The options are "continue", which will ignore the problem and just keep chugging along; "remount-ro" which will cause the file system to remount in a read-only state; and "panic" which will essential crash the OS and reboot.

Generally speaking, the default mode is the best for most important servers. It is the most likely to have no ill effects on existing data. It will stop the server from doing anything new, however. The option to "panic" is never good. Rebooting the OS with a drive that's reporting problems is at best going to send it into a lengthy file system check, and if the problem is serious could mean the drive will never come back up at all. Since I have plenty of redundancy throughout the environment, I decided to give the "continue" option a try. You can alter the setting using "tune2fs" (e.g. $ sudo tune2fs /dev/sda1 -e continue ).

What's interesting, and purely anecdotal at this point, is that disk i/o on this machine is now performing far better, even without any errors. I'll be keeping an eye on this over the next few days and let you know if that changes. It is strange though.


There are  - loading -  comments....

re: Linux ext3 file system performance wierdnessBy Chad Scott on 09/02/2010 at 11:40 AM EDT
Are you mounting your partitions with the noatime option? If not, you
definitely want to give that a shot. Background info:

http://www.howtoforge.com/reducing-disk-io-by-mounting-partitions-with-noatime
re: Linux ext3 file system performance wierdnessBy Andrew Pollack on 09/02/2010 at 01:25 PM EDT
You win! Definitely a new one on me, and it's been added now so we'll we'll
see. In my case, that's going to make a HUGE difference with the linux server
I use for Second Signal. That one spends a great deal of time reading
thousands of small files.
re: Linux ext3 file system performance wierdnessBy Philip Storry on 09/02/2010 at 07:03 PM EDT
Second this - atime is a silly thing for a database server. For a mail server
that uses the filesystem to store files, it may be required. But for ant
database server with indexing (not just Domino), it's usually not required.

You were probably mounting with relatime. Which is a decent option for most
purposes. That article doesn't really explain the mounting options for atime,
so here's a rundown:

1. noatime - don't maintain the last access time at all.
2. atime - maintain the last access time - so every time a file is read, a
write is generated to update the atime in the file's inode.
3. relatime - the compromise, wherein every update to the file which requires
mtime (modified time) to be updated in the inode will also update atime to
match mtime.

Some UNIX programs may depend on atime to know that an item (stored as a file)
has been read or handled, by comparing the ctime/mtime and atime, so relatime
is a cunning compromise. It's the default in just about every distribution
I've seen.

However, for a Domino server, I'd imagine atime is about as useful as a
non-alcoholic whisky, and about as welcome...
re: Linux ext3 file system performance wierdnessBy mark Myers on 09/02/2010 at 12:20 PM EDT
this kind of stuff is worth knowing, waiting with interest to see how it turns
out
re: Linux ext3 file system performance wierdnessBy Victor Toal on 09/02/2010 at 02:25 PM EDT
I have seen some of this as well, mounting with the noatime option is something
I have used (I did not find it, another colleague did) but we are thinking of
moving all file systems to etx4 for new servers and *possibly* for existing
ones as well ... if we see the need.
re: Linux ext3 file system performance wierdnessBy Chad Scott on 09/02/2010 at 04:20 PM EDT
@Victor: FYI, I use noatime with ext4 and did extended I/O stress testing
(Domino stuff...FTIs and such) with awesome results.
re: Linux ext3 file system performance wierdnessBy Philip Storry on 09/02/2010 at 07:19 PM EDT
I've been using ext4 for almost a year now, on several machines, but still
don't feel totally comfortable with it being used for production servers.

Granted, I've only had one disk get thoroughly stuffed with it, but it was one
disk too many. And I can't even necessarily blame ext4 with certainty - but I
do note that the JFS volumes on the same machine were fine.

That said, ext4 has performed fine on my netbook and a couple of home servers,
and I moved my desktop to ext4 earlier in the year. But then, they're backed
up. ;-)

The real problems with filesystems is that the code can have nasty bugs in it
which are caused by applications doing things that the filesystem developers
never expected (see
http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-fi
le-problem/ for an example). On multi-tasking systems, those conditions may
not even be planned - they could just happen.
So I like to be cautious, and prefer to use only well-tested filesystems on my
data drives. ;-)

ext4 is doing well for me now though. No problems of late, and the fsck
speedup alone is well worth the move from an end-user perspective!


Other Recent Stories...

  1. 09/04/2018With two big projects on hold, I suddenly find myself very available for new short and long term projects. In twenty five years, I don't think I've ever written an entry like this, but if you need the kind of work I do now would be a great time to get in touch. Both of the big projects I had lined up for late summer and early fall have been placed on hold and will be that way for a while. With the kids now all off at college and careers, I'm open to more travel than such than I have been in decades, but unless something else comes along, I'll be here working on updates to Second Signal and other things that ...... 
  2. 07/13/2018Who is HCL and why is it a good thing that they are now the ones behind Notes and Domino?We need to address some biases here. IBM has made a deal under which the Notes & Domino software and intellectual property is now being developed and maintained by HCL America. HCL America is part of the very large "HCL Technologies" company that has grown from its roots in India to become an 8 Billion Dollar company with a global presence in the IT Industry. You could be excused for initially believing, as many people do when they hear this, that "they've outsourced the code to India where they'll milk it ...... 
  3. 03/21/2018Domino Apps on IOS is a Game Changer. Quit holding back.BOOM. This will be as important for the platform as Traveler. If your company has ditched Notes and Domino, I feel sorry for you. For companies that do use Notes/Domino this is a game changer and Apple should be paying attention. Here's why: There are hundreds of little Notes client applications you'd never spend the time and money to build and deploy for your internal user base on IOS that we use Notes for all the time (those of us still using it). Now, those are suddenly ALL available on the iPad. ...... 
  4. 02/15/2018Andrew’s Proposed Gun Laws 
  5. 05/05/2016Is the growing social-sourced economy the modern back door into socialism? 
  6. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertising 
  7. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction 
  8. 10/20/2015My suggestion is to stay away from PayAnywhere(dot)com  
  9. 08/07/2015Here is one for you VMWARE gurus - particularly if you run ESXi without fancy drive arrays 
  10. 08/06/2015The Killer of Orphans (Orphan Documents) 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.