Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Linux ext3 file system performance wierdness

By Andrew Pollack on 09/02/2010 at 07:01 AM EDT

I've had trouble off and on with a couple of Domino servers on linux. The server goes pear shaped and when I ssh in and look at the console, Domino is reporting drive errors. If you attempt to do anything on the OS at all, you quickly see that the whole file system has shifted into a "read-only" state. This is a bit like a car with a transmission problem shifting into "limp-home" mode. Needless to say, Domino doesn't like being unable to write to the disk.

It has happened to me specifically with the most recent updates of CENTOS5, but since that's the only distribution I use I can't tell you that it's specifically related to the distro. I don't think so, because I've seen reports of others with similar issues. I also know that it isn't Domino's fault, but rather that Domino is so disk intensive that it tends to be one of the places where the problem comes up.

The problem manifests when the disk is so busy that at some point the driver just can't keep up. When this happens in the Windows server world, either Domino will crash or the entire OS will just halt. Usually people think this is a RAID controller problem and start replacing hardware. In fact, it's just the driver reporting an error state to the OS that it can't keep up and the OS reacting badly. On linux, the ext3 file system (roughly equivalent to the ntfs file system in Windows) will react to a any write fault based on an option stored in the superblock. The options are "continue", which will ignore the problem and just keep chugging along; "remount-ro" which will cause the file system to remount in a read-only state; and "panic" which will essential crash the OS and reboot.

Generally speaking, the default mode is the best for most important servers. It is the most likely to have no ill effects on existing data. It will stop the server from doing anything new, however. The option to "panic" is never good. Rebooting the OS with a drive that's reporting problems is at best going to send it into a lengthy file system check, and if the problem is serious could mean the drive will never come back up at all. Since I have plenty of redundancy throughout the environment, I decided to give the "continue" option a try. You can alter the setting using "tune2fs" (e.g. $ sudo tune2fs /dev/sda1 -e continue ).

What's interesting, and purely anecdotal at this point, is that disk i/o on this machine is now performing far better, even without any errors. I'll be keeping an eye on this over the next few days and let you know if that changes. It is strange though.


There are  - loading -  comments....

re: Linux ext3 file system performance wierdnessBy Chad Scott on 09/02/2010 at 11:40 AM EDT
Are you mounting your partitions with the noatime option? If not, you
definitely want to give that a shot. Background info:

http://www.howtoforge.com/reducing-disk-io-by-mounting-partitions-with-noatime
re: Linux ext3 file system performance wierdnessBy Andrew Pollack on 09/02/2010 at 01:25 PM EDT
You win! Definitely a new one on me, and it's been added now so we'll we'll
see. In my case, that's going to make a HUGE difference with the linux server
I use for Second Signal. That one spends a great deal of time reading
thousands of small files.
re: Linux ext3 file system performance wierdnessBy Philip Storry on 09/02/2010 at 07:03 PM EDT
Second this - atime is a silly thing for a database server. For a mail server
that uses the filesystem to store files, it may be required. But for ant
database server with indexing (not just Domino), it's usually not required.

You were probably mounting with relatime. Which is a decent option for most
purposes. That article doesn't really explain the mounting options for atime,
so here's a rundown:

1. noatime - don't maintain the last access time at all.
2. atime - maintain the last access time - so every time a file is read, a
write is generated to update the atime in the file's inode.
3. relatime - the compromise, wherein every update to the file which requires
mtime (modified time) to be updated in the inode will also update atime to
match mtime.

Some UNIX programs may depend on atime to know that an item (stored as a file)
has been read or handled, by comparing the ctime/mtime and atime, so relatime
is a cunning compromise. It's the default in just about every distribution
I've seen.

However, for a Domino server, I'd imagine atime is about as useful as a
non-alcoholic whisky, and about as welcome...
re: Linux ext3 file system performance wierdnessBy mark Myers on 09/02/2010 at 12:20 PM EDT
this kind of stuff is worth knowing, waiting with interest to see how it turns
out
re: Linux ext3 file system performance wierdnessBy Victor Toal on 09/02/2010 at 02:25 PM EDT
I have seen some of this as well, mounting with the noatime option is something
I have used (I did not find it, another colleague did) but we are thinking of
moving all file systems to etx4 for new servers and *possibly* for existing
ones as well ... if we see the need.
re: Linux ext3 file system performance wierdnessBy Chad Scott on 09/02/2010 at 04:20 PM EDT
@Victor: FYI, I use noatime with ext4 and did extended I/O stress testing
(Domino stuff...FTIs and such) with awesome results.
re: Linux ext3 file system performance wierdnessBy Philip Storry on 09/02/2010 at 07:19 PM EDT
I've been using ext4 for almost a year now, on several machines, but still
don't feel totally comfortable with it being used for production servers.

Granted, I've only had one disk get thoroughly stuffed with it, but it was one
disk too many. And I can't even necessarily blame ext4 with certainty - but I
do note that the JFS volumes on the same machine were fine.

That said, ext4 has performed fine on my netbook and a couple of home servers,
and I moved my desktop to ext4 earlier in the year. But then, they're backed
up. ;-)

The real problems with filesystems is that the code can have nasty bugs in it
which are caused by applications doing things that the filesystem developers
never expected (see
http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-fi
le-problem/ for an example). On multi-tasking systems, those conditions may
not even be planned - they could just happen.
So I like to be cautious, and prefer to use only well-tested filesystems on my
data drives. ;-)

ext4 is doing well for me now though. No problems of late, and the fsck
speedup alone is well worth the move from an end-user perspective!


Other Recent Stories...

  1. 05/05/2016Is the growing social-sourced economy the modern back door into socialism?Is the growing social-sourced economy the modern back door into socialism? I read a really insightful post a couple of days ago that suggested the use of social network funding sites like “Go Fund Me” and “Kickstarter” have come about and gained popularity in part because the existing economy in no longer serving its purpose for anyone who isn’t already wealthy. Have the traditional ways to get new ventures funded become closed to all but a few who aren’t already connected to them and so onerous as to make ...... 
  2. 04/20/2016Want to be whitelisted? Here are some sensible rules for web site advertisingAn increasing number of websites are now detecting when users have ad-blocking enabled, and refuse to show content unless you "whitelist" their site (disable your ad-blocking for them). I think that is a fair decision on their part, it's how they pay for the site. However, if you want me (and many others) to white list your site, there are some rules you should follow. If you violate these rules, I won't whitelist your site, I'll just find content elsewhere. 1. The total space taken up by advertisements ...... 
  3. 12/30/2015Fantastic new series on Syfy called “The Expanse” – for people who love traditional science fiction[] “The Expanse” is a new science fiction series being broadcast onthe Syfy channelthis winter. It’s closely based on a series of books by author James S. A. Corey beginning with “Leviathan Wakes”. There are 5 books in the “Expanse” series so far. If you’re a fan of the novels you’ll appreciate how closely the books are followed.TIP: The first five episodes are already available on Syfy.com. If you’re having trouble getting into the characters and plot, use those to get up to speed.The worlds created for ...... 
  4. 10/20/2015My suggestion is to stay away from PayAnywhere(dot)com  
  5. 08/07/2015Here is one for you VMWARE gurus - particularly if you run ESXi without fancy drive arrays 
  6. 08/06/2015The Killer of Orphans (Orphan Documents) 
  7. 06/02/2015Homeopathic Marketing: Traveler on my Android is now calling itself VERSE. Allow me to translate that for the IBM Notes community... 
  8. 03/17/2015A review of British Airways Premium Economy Service – How to destroy customer goodwill all at once 
  9. 02/26/2015There's a bug in how @TextToTime() and @ToTime() process date strings related to international standards and browser settings. 
  10. 01/21/2015Delivering two new presentations at Developer Camp (EntwicklerCamp) 2015 in Germany 
Click here for more articles.....


pen icon Comment Entry
Subject
Your Name
Homepage
*Your Email
* Your email address is required, but not displayed.
 
Your thoughts....
 
Remember Me  

Please wait while your document is saved.