[ltp] Bad hard drive sector

Bill Sheppard linux-thinkpad@linux-thinkpad.org
Sun, 15 Jan 2006 19:17:00 -0800


This is a multi-part message in MIME format.

--Boundary_(ID_WMIIp9C6pKX04SW89ORe4g)
Content-type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: 7BIT

Hi,

Uwe Walter wrote:
> On So, 2006-01-15 at 10:16 -0800, Bill Sheppard wrote:
>   
>> So it appears there is a bad sector (or sectors on the hard drive). 
>> It's a fairly new Toshiba MK6026GAX 60GB.  I understand the drives are
>> supposed to reallocate bad sectors automatically, but it appears it
>> isn't since the error persists.
>>     
> If the drive was unable to *read* the content of the sector, it does not
> want to reallocate it yet (data loss), because it *might* be possible
> that the sector could be read again in the future.
> It is marked pending for reallocation. Some drives show this in their
> SMART values, e.g.:
>
> 197 Current_Pending_Sector  0x0008   253   253   000    Old_age
> Offline      -       1
>   
Which is what I see.
>> Any ideas how I can correct this?
>>     
> You must give up the content of this sector.
> Either low level format the drive, or do fill the destroyed file with
> zeros on your own:
>
> http://smartmontools.sourceforge.net/BadBlockHowTo.txt
>   
I saw reference to this error possibly being due to an incompleted write
operation and fixed upon rewriting the sector.  I know the file
(/bin/vim), so I did reinstall the vim package, which I presume would
overwrite the file in place.  I can now run vim fine, but doing a long
test with smartctl still shows an error in the same place, so perhaps
vim got rewritten elsewhere and the bad sector is now blank (or waiting
to be written).  I suppose I could create a very large file to fill most
of the empty space to ensure it's all written, then delete it, and see
if that corrects the error.  I'll try that.
>> didn't find any other bad sectors in the testing, but might this mean
>> the drive is going to be unreliable?
>>     
> Possibly. If the sector really got corrupt during nominal operation.
> There are other possibilities, e.g. if the drive was powered down
> exactly during half-way writing the sector. In this case, it is
> logically corrupt (but not physically).
>
> I wouldn't trust the drive to hold my most precious data any more.
>
> Post the full output of smartctl -a, but even with that, nobody can tell
> you for sure.
>   
Attached, thanks for checking to see if I missed anything...

Bill

-- 
-------------------------------------------------------------------------
Bill Sheppard                                  Industry Marketing Manager
bill.sheppard@sun.com                   Consumer and Mobile Systems Group
(408) 404-1254 (x68154)                            Sun Microsystems, Inc.


--Boundary_(ID_WMIIp9C6pKX04SW89ORe4g)
Content-type: text/plain; name=smartctl.txt
Content-transfer-encoding: 7BIT
Content-disposition: inline; filename=smartctl.txt

smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     TOSHIBA MK6026GAX
Serial Number:    556U0335T
Firmware Version: PA202G
User Capacity:    60,011,642,880 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Sun Jan 15 19:13:36 2006 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 112)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline 
data collection: 		 ( 239) seconds.
Offline data collection
capabilities: 			 (0x1b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					No General Purpose Logging support.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  47) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       1338
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       984
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       1212
 10 Spin_Retry_Count        0x0033   119   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       347
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       14
193 Load_Cycle_Count        0x0032   093   093   000    Old_age   Always       -       72290
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       31 (Lifetime Min/Max 18/60)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       8341
222 Loaded_Hours            0x0032   099   099   000    Old_age   Always       -       692
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       233
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 67 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 67 occurred at disk power-on lifetime: 1201 hours (50 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 1f ce d9 e1  Error: UNC at LBA = 0x01d9ce1f = 31051295

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c4 00 08 1f ce d9 e1 00      15:29:41.519  READ MULTIPLE
  c5 00 08 cf 16 04 e6 00      15:29:41.518  WRITE MULTIPLE
  e7 00 00 00 00 00 00 00      15:29:41.502  FLUSH CACHE
  c5 00 08 8f d8 31 e6 00      15:29:41.501  WRITE MULTIPLE
  e7 00 00 00 00 00 00 00      15:29:41.491  FLUSH CACHE

Error 66 occurred at disk power-on lifetime: 1201 hours (50 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 1f ce d9 e1  Error: UNC at LBA = 0x01d9ce1f = 31051295

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c4 00 08 1f ce d9 e1 00      15:29:34.731  READ MULTIPLE
  c4 00 08 4f ce d9 e1 00      15:29:34.729  READ MULTIPLE
  c4 00 08 6f ce d9 e1 00      15:29:34.727  READ MULTIPLE
  c4 00 08 77 c8 d9 e1 00      15:29:34.707  READ MULTIPLE
  c4 00 10 87 be 9a e2 00      15:29:34.700  READ MULTIPLE

Error 65 occurred at disk power-on lifetime: 1201 hours (50 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ce 21 ce d9 e1  Error: UNC 206 sectors at LBA = 0x01d9ce21 = 31051297

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 d0 1f ce d9 e1 00      15:21:39.165  READ DMA
  c8 00 d8 17 ce d9 e1 00      15:21:32.519  READ DMA
  c8 00 08 a7 83 99 e2 00      15:21:32.518  READ DMA
  c8 00 08 8f 83 99 e2 00      15:21:32.517  READ DMA
  c8 00 08 3f 83 99 e2 00      15:21:32.501  READ DMA

Error 64 occurred at disk power-on lifetime: 1201 hours (50 days + 1 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 ce 21 ce d9 e1  Error: UNC 206 sectors at LBA = 0x01d9ce21 = 31051297

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 d8 17 ce d9 e1 00      15:21:32.519  READ DMA
  c8 00 08 a7 83 99 e2 00      15:21:32.518  READ DMA
  c8 00 08 8f 83 99 e2 00      15:21:32.517  READ DMA
  c8 00 08 3f 83 99 e2 00      15:21:32.501  READ DMA
  c8 00 08 9f 8f d1 e4 00      15:21:32.486  READ DMA

Error 63 occurred at disk power-on lifetime: 1181 hours (49 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 1f ce d9 e1  Error: UNC at LBA = 0x01d9ce1f = 31051295

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c4 00 18 1f ce d9 e1 00      00:04:55.416  READ MULTIPLE
  c4 00 00 37 cd d9 e1 00      00:04:48.938  READ MULTIPLE
  c4 00 00 37 cc d9 e1 00      00:04:48.751  READ MULTIPLE
  c4 00 00 37 cb d9 e1 00      00:04:48.651  READ MULTIPLE
  c4 00 00 37 ca d9 e1 00      00:04:48.545  READ MULTIPLE

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       00%      1202         31051297
# 2  Extended offline    Completed: read failure       00%      1187         31051297
# 3  Extended offline    Completed without error       00%       169         -
# 4  Short offline       Completed without error       00%       168         -
# 5  Short offline       Completed without error       00%       127         -
# 6  Short offline       Completed without error       00%        97         -
# 7  Short offline       Completed without error       00%        44         -
# 8  Short offline       Completed without error       00%        24         -

Device does not support Selective Self Tests/Logging

--Boundary_(ID_WMIIp9C6pKX04SW89ORe4g)--