Andrew Pollack's Blog

Technology, Family, Entertainment, Politics, and Random Noise

Professional Services

Second Signal

Presentations

Andrew's Blog

Support

Linux ext3 file system performance wierdness

By Andrew Pollack on 09/02/2010 at 07:01 AM EDT

I've had trouble off and on with a couple of Domino servers on linux. The server goes pear shaped and when I ssh in and look at the console, Domino is reporting drive errors. If you attempt to do anything on the OS at all, you quickly see that the whole file system has shifted into a "read-only" state. This is a bit like a car with a transmission problem shifting into "limp-home" mode. Needless to say, Domino doesn't like being unable to write to the disk.

It has happened to me specifically with the most recent updates of CENTOS5, but since that's the only distribution I use I can't tell you that it's specifically related to the distro. I don't think so, because I've seen reports of others with similar issues. I also know that it isn't Domino's fault, but rather that Domino is so disk intensive that it tends to be one of the places where the problem comes up.

The problem manifests when the disk is so busy that at some point the driver just can't keep up. When this happens in the Windows server world, either Domino will crash or the entire OS will just halt. Usually people think this is a RAID controller problem and start replacing hardware. In fact, it's just the driver reporting an error state to the OS that it can't keep up and the OS reacting badly. On linux, the ext3 file system (roughly equivalent to the ntfs file system in Windows) will react to a any write fault based on an option stored in the superblock. The options are "continue", which will ignore the problem and just keep chugging along; "remount-ro" which will cause the file system to remount in a read-only state; and "panic" which will essential crash the OS and reboot.

Generally speaking, the default mode is the best for most important servers. It is the most likely to have no ill effects on existing data. It will stop the server from doing anything new, however. The option to "panic" is never good. Rebooting the OS with a drive that's reporting problems is at best going to send it into a lengthy file system check, and if the problem is serious could mean the drive will never come back up at all. Since I have plenty of redundancy throughout the environment, I decided to give the "continue" option a try. You can alter the setting using "tune2fs" (e.g. $ sudo tune2fs /dev/sda1 -e continue ).

What's interesting, and purely anecdotal at this point, is that disk i/o on this machine is now performing far better, even without any errors. I'll be keeping an eye on this over the next few days and let you know if that changes. It is strange though.

There are - loading - comments....

re: Linux ext3 file system performance wierdnessBy Chad Scott on 09/02/2010 at 11:40 AM EDT

Are you mounting your partitions with the noatime option? If not, you
definitely want to give that a shot. Background info:

http://www.howtoforge.com/reducing-disk-io-by-mounting-partitions-with-noatime

re: Linux ext3 file system performance wierdnessBy Andrew Pollack on 09/02/2010 at 01:25 PM EDT

You win! Definitely a new one on me, and it's been added now so we'll we'll
see. In my case, that's going to make a HUGE difference with the linux server
I use for Second Signal. That one spends a great deal of time reading
thousands of small files.

re: Linux ext3 file system performance wierdnessBy Philip Storry on 09/02/2010 at 07:03 PM EDT

Second this - atime is a silly thing for a database server. For a mail server
that uses the filesystem to store files, it may be required. But for ant
database server with indexing (not just Domino), it's usually not required.

You were probably mounting with relatime. Which is a decent option for most
purposes. That article doesn't really explain the mounting options for atime,
so here's a rundown:

1. noatime - don't maintain the last access time at all.
2. atime - maintain the last access time - so every time a file is read, a
write is generated to update the atime in the file's inode.
3. relatime - the compromise, wherein every update to the file which requires
mtime (modified time) to be updated in the inode will also update atime to
match mtime.

Some UNIX programs may depend on atime to know that an item (stored as a file)
has been read or handled, by comparing the ctime/mtime and atime, so relatime
is a cunning compromise. It's the default in just about every distribution
I've seen.

However, for a Domino server, I'd imagine atime is about as useful as a
non-alcoholic whisky, and about as welcome...

re: Linux ext3 file system performance wierdnessBy mark Myers on 09/02/2010 at 12:20 PM EDT

this kind of stuff is worth knowing, waiting with interest to see how it turns
out

re: Linux ext3 file system performance wierdnessBy Victor Toal on 09/02/2010 at 02:25 PM EDT

I have seen some of this as well, mounting with the noatime option is something
I have used (I did not find it, another colleague did) but we are thinking of
moving all file systems to etx4 for new servers and *possibly* for existing
ones as well ... if we see the need.

re: Linux ext3 file system performance wierdnessBy Chad Scott on 09/02/2010 at 04:20 PM EDT

@Victor: FYI, I use noatime with ext4 and did extended I/O stress testing
(Domino stuff...FTIs and such) with awesome results.

re: Linux ext3 file system performance wierdnessBy Philip Storry on 09/02/2010 at 07:19 PM EDT

I've been using ext4 for almost a year now, on several machines, but still
don't feel totally comfortable with it being used for production servers.

Granted, I've only had one disk get thoroughly stuffed with it, but it was one
disk too many. And I can't even necessarily blame ext4 with certainty - but I
do note that the JFS volumes on the same machine were fine.

That said, ext4 has performed fine on my netbook and a couple of home servers,
and I moved my desktop to ext4 earlier in the year. But then, they're backed
up. ;-)

The real problems with filesystems is that the code can have nasty bugs in it
which are caused by applications doing things that the filesystem developers
never expected (see
http://thunk.org/tytso/blog/2009/03/12/delayed-allocation-and-the-zero-length-fi
le-problem/ for an example). On multi-tasking systems, those conditions may
not even be planned - they could just happen.
So I like to be cautious, and prefer to use only well-tested filesystems on my
data drives. ;-)

ext4 is doing well for me now though. No problems of late, and the fsck
speedup alone is well worth the move from an end-user perspective!

Subject
Your Name
Homepage
*Your Email
	* Your email address is required, but not displayed.

Your thoughts....


	Remember Me

Andrew Pollack's Blog

Linux ext3 file system performance wierdness

Site Links

Useful Links

Other Recent Stories...