[ltp] Bad hard drive sector
Bill Sheppard
linux-thinkpad@linux-thinkpad.org
Sun, 15 Jan 2006 19:17:00 -0800
This is a multi-part message in MIME format.
--Boundary_(ID_WMIIp9C6pKX04SW89ORe4g)
Content-type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: 7BIT
Hi,
Uwe Walter wrote:
> On So, 2006-01-15 at 10:16 -0800, Bill Sheppard wrote:
>
>> So it appears there is a bad sector (or sectors on the hard drive).
>> It's a fairly new Toshiba MK6026GAX 60GB. I understand the drives are
>> supposed to reallocate bad sectors automatically, but it appears it
>> isn't since the error persists.
>>
> If the drive was unable to *read* the content of the sector, it does not
> want to reallocate it yet (data loss), because it *might* be possible
> that the sector could be read again in the future.
> It is marked pending for reallocation. Some drives show this in their
> SMART values, e.g.:
>
> 197 Current_Pending_Sector 0x0008 253 253 000 Old_age
> Offline - 1
>
Which is what I see.
>> Any ideas how I can correct this?
>>
> You must give up the content of this sector.
> Either low level format the drive, or do fill the destroyed file with
> zeros on your own:
>
> http://smartmontools.sourceforge.net/BadBlockHowTo.txt
>
I saw reference to this error possibly being due to an incompleted write
operation and fixed upon rewriting the sector. I know the file
(/bin/vim), so I did reinstall the vim package, which I presume would
overwrite the file in place. I can now run vim fine, but doing a long
test with smartctl still shows an error in the same place, so perhaps
vim got rewritten elsewhere and the bad sector is now blank (or waiting
to be written). I suppose I could create a very large file to fill most
of the empty space to ensure it's all written, then delete it, and see
if that corrects the error. I'll try that.
>> didn't find any other bad sectors in the testing, but might this mean
>> the drive is going to be unreliable?
>>
> Possibly. If the sector really got corrupt during nominal operation.
> There are other possibilities, e.g. if the drive was powered down
> exactly during half-way writing the sector. In this case, it is
> logically corrupt (but not physically).
>
> I wouldn't trust the drive to hold my most precious data any more.
>
> Post the full output of smartctl -a, but even with that, nobody can tell
> you for sure.
>
Attached, thanks for checking to see if I missed anything...
Bill
--
-------------------------------------------------------------------------
Bill Sheppard Industry Marketing Manager
bill.sheppard@sun.com Consumer and Mobile Systems Group
(408) 404-1254 (x68154) Sun Microsystems, Inc.
--Boundary_(ID_WMIIp9C6pKX04SW89ORe4g)
Content-type: text/plain; name=smartctl.txt
Content-transfer-encoding: 7BIT
Content-disposition: inline; filename=smartctl.txt
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF INFORMATION SECTION ===
Device Model: TOSHIBA MK6026GAX
Serial Number: 556U0335T
Firmware Version: PA202G
User Capacity: 60,011,642,880 bytes
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 6
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sun Jan 15 19:13:36 2006 PST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 112) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 239) seconds.
Offline data collection
capabilities: (0x1b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
No Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging support.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 47) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 1338
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 984
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 1212
10 Spin_Retry_Count 0x0033 119 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 347
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 14
193 Load_Cycle_Count 0x0032 093 093 000 Old_age Always - 72290
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 31 (Lifetime Min/Max 18/60)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 8341
222 Loaded_Hours 0x0032 099 099 000 Old_age Always - 692
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 233
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0
SMART Error Log Version: 1
ATA Error Count: 67 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 67 occurred at disk power-on lifetime: 1201 hours (50 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 1f ce d9 e1 Error: UNC at LBA = 0x01d9ce1f = 31051295
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 00 08 1f ce d9 e1 00 15:29:41.519 READ MULTIPLE
c5 00 08 cf 16 04 e6 00 15:29:41.518 WRITE MULTIPLE
e7 00 00 00 00 00 00 00 15:29:41.502 FLUSH CACHE
c5 00 08 8f d8 31 e6 00 15:29:41.501 WRITE MULTIPLE
e7 00 00 00 00 00 00 00 15:29:41.491 FLUSH CACHE
Error 66 occurred at disk power-on lifetime: 1201 hours (50 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 1f ce d9 e1 Error: UNC at LBA = 0x01d9ce1f = 31051295
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 00 08 1f ce d9 e1 00 15:29:34.731 READ MULTIPLE
c4 00 08 4f ce d9 e1 00 15:29:34.729 READ MULTIPLE
c4 00 08 6f ce d9 e1 00 15:29:34.727 READ MULTIPLE
c4 00 08 77 c8 d9 e1 00 15:29:34.707 READ MULTIPLE
c4 00 10 87 be 9a e2 00 15:29:34.700 READ MULTIPLE
Error 65 occurred at disk power-on lifetime: 1201 hours (50 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 ce 21 ce d9 e1 Error: UNC 206 sectors at LBA = 0x01d9ce21 = 31051297
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 d0 1f ce d9 e1 00 15:21:39.165 READ DMA
c8 00 d8 17 ce d9 e1 00 15:21:32.519 READ DMA
c8 00 08 a7 83 99 e2 00 15:21:32.518 READ DMA
c8 00 08 8f 83 99 e2 00 15:21:32.517 READ DMA
c8 00 08 3f 83 99 e2 00 15:21:32.501 READ DMA
Error 64 occurred at disk power-on lifetime: 1201 hours (50 days + 1 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 ce 21 ce d9 e1 Error: UNC 206 sectors at LBA = 0x01d9ce21 = 31051297
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 d8 17 ce d9 e1 00 15:21:32.519 READ DMA
c8 00 08 a7 83 99 e2 00 15:21:32.518 READ DMA
c8 00 08 8f 83 99 e2 00 15:21:32.517 READ DMA
c8 00 08 3f 83 99 e2 00 15:21:32.501 READ DMA
c8 00 08 9f 8f d1 e4 00 15:21:32.486 READ DMA
Error 63 occurred at disk power-on lifetime: 1181 hours (49 days + 5 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 08 1f ce d9 e1 Error: UNC at LBA = 0x01d9ce1f = 31051295
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c4 00 18 1f ce d9 e1 00 00:04:55.416 READ MULTIPLE
c4 00 00 37 cd d9 e1 00 00:04:48.938 READ MULTIPLE
c4 00 00 37 cc d9 e1 00 00:04:48.751 READ MULTIPLE
c4 00 00 37 cb d9 e1 00 00:04:48.651 READ MULTIPLE
c4 00 00 37 ca d9 e1 00 00:04:48.545 READ MULTIPLE
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 00% 1202 31051297
# 2 Extended offline Completed: read failure 00% 1187 31051297
# 3 Extended offline Completed without error 00% 169 -
# 4 Short offline Completed without error 00% 168 -
# 5 Short offline Completed without error 00% 127 -
# 6 Short offline Completed without error 00% 97 -
# 7 Short offline Completed without error 00% 44 -
# 8 Short offline Completed without error 00% 24 -
Device does not support Selective Self Tests/Logging
--Boundary_(ID_WMIIp9C6pKX04SW89ORe4g)--