Re: DriveReady SeekComplete Error


Subject: Re: DriveReady SeekComplete Error
From: Timothy A. Seufert (tas@mindspring.com)
Date: Thu Nov 08 2001 - 20:01:59 MST


At 11:36 AM -0500 11/8/01, Ken Schweigert wrote:
>I just installed YDL2.0 on a Blue&White G3-300Mhz and it basically
>fell on. It's doing everything I need it to. The only thing is that
>I get a lot of dma_intr errors in my /var/log/messages. I can read
>files and write files to the disk and haven't noticed any problems
>except for these errors.

Based on the log snippet you posted, DO NOT TRUST the system to
reliably store data until you get this resolved. BadCRC means that
the kernel detected that data is being corrupted somewhere between
the computer and drive.

>I did a Google search and some things were that the cable could be
>too long (not likely in a Mac), cable could be going bad, or the
>drive might be getting ready to fail.

First thing to do is to remove the cable and visually inspect it for
damage. Your cable may have a few millimeters of one wire cut out of
the cable; that's normal (part of the IDE "cable select" method of
automatically configuring a drive to be master or slave). What
you're looking for is signs of damage that might happen when opening
and closing the G3 case: shaved off insulation, a dent where it was
poked by something, etc.

If the cable looks fine, put it back in and see if it works.
Sometimes cables just work loose.

Next thing to try is a replacement cable. You won't be able to find
the extremely short cable Apple uses easily, but it's OK to use an
80-wire UltraDMA cable from a PC dealer. Get the shortest one
available.

Next, if you haven't already, try a newer kernel. Dan Burcaw's
official 2.4 kernel for YDL 2.0 and 2.1 would be a good bet. The
driver for the CMD 646 has evolved a bit since 2.2.

If none of that works, it may be an interaction between the drive and
the IDE controller. Unfortunately, Apple used the CMD 646U2 revision
5 on the first version of the B&W G3 motherboard. The second rev of
the B&W G3 board uses revision 7 of the chip instead. The difference
between the two chips is that rev 5 has corruption problems talking
to certain drive models, particularly IBMs, at UltraDMA speeds.
(Revision 5 also has problems with almost any master-slave
configuration, which is why Apple does not endorse adding a second
drive on the Rev. 1 B&W G3.)

Normally Western Digital drives aren't a problem, but you never know,
especially since for a while WD was forced to ship drives that were
licensed clones of IBM designs using the same parts. Is it the
factory original drive or an upgrade? (Apple qualifies drives
thoroughly and to the best of my knowledge never shipped one that
triggered this problem.)

If it's the controller-drive interaction there's not much you can do
besides force it down to 16 MB/s Multiword DMA Mode 2, since the
corruption problem only happens in UltraDMA mode. Use the following
command (as root) to select MWDMA mode 2 for device hda:

hdparm -X34 /dev/hda

(-X66 instead of -X34 goes back to UltraDMA.)

Finally, I wrote a small corruption detection tool when trying to
diagnose this kind of corruption on my own Blue&White G3. I've
attached the C source code. With some drives you may get data
corruption even if the IDE CRC check doesn't get triggered, so the
only way to be absolutely sure you're safe is to run a tool like this
and let it pile up a huge amount of data transferred with no errors.
It's not a totally realistic tool (doesn't simulate random access
patterns at all), but it helped me out. You can compile it with the
following command:

g++ -O5 dctest.cpp -o dctest

You need to be in a directory on the drive to be tested to run the
tool. 30 test iterations with 10 read passes per test and a 2047 MB
file size should catch almost any corruption problem. It will take
many hours to run a test that size.

-- 
Tim Seufert




This archive was generated by hypermail 2a24 : Thu Nov 08 2001 - 20:14:04 MST