Erreur de disque dur DRDY: est-ce un crash


J'utilise IBM Thinkpad, 1,7 GHz, 512 RAM avec Linux Mint 9 installé. J'ai deux partitions en plus de root.

Une des partitions est devenue en lecture seule hier, après quoi j'ai redémarré mon système. Il est extrêmement lent avec DRDY Erreur: mon disque dur est-il écrasé? Journal des erreurs lors du démarrage.

Differences between boot sector and its backup.
failed command : READ DMA
BMDMA : stat 0X25
ata 1.00 : status : { DRDY ERR }
ata 1.00 : status :{ UNC }
Buffer I/O error on logical device, logical block 65467

Sortie smartctl pour la partition:

mint mint # smartctl -a /dev/sda1
smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Device Model:     TOSHIBA MK4026GAX RoHS
Serial Number:    X5LY1623T
Firmware Version: PA107E
User Capacity:    40,007,761,920 bytes
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Thu Feb 17 06:48:25 2011 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)    Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:          ( 153) seconds.
Offline data collection
capabilities:              (0x1b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    No Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    No General Purpose Logging support.
Short self-test routine 
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      (  30) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
  1 Raw_Read_Error_Rate     0x000b   100   100   050    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   100   100   001    Pre-fail  Always       -       310
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       3968
  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       40
  7 Seek_Error_Rate         0x000b   100   100   050    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   100   100   050    Pre-fail  Offline      -       0
  9 Power_On_Hours          0x0032   082   082   000    Old_age   Always       -       7257
 10 Spin_Retry_Count        0x0033   179   100   030    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       3484
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       489
193 Load_Cycle_Count        0x0032   064   064   000    Old_age   Always       -       367150
194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always       -       36 (Lifetime Min/Max 14/57)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       33
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       82
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       1
199 UDMA_CRC_Error_Count    0x0032   200   253   000    Old_age   Always       -       0
220 Disk_Shift              0x0002   100   100   000    Old_age   Always       -       101
222 Loaded_Hours            0x0032   085   085   000    Old_age   Always       -       6146
223 Load_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
224 Load_Friction           0x0022   100   100   000    Old_age   Always       -       0
226 Load-in_Time            0x0026   100   100   000    Old_age   Always       -       227
240 Head_Flying_Hours       0x0001   100   100   001    Pre-fail  Offline      -       0

SMART Error Log Version: 1
ATA Error Count: 2371 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 2371 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  40 51 05 1a 1b 00 e0  Error: UNC 5 sectors at LBA = 0x00001b1a = 6938

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 05 1a 1b 00 e0 00      00:03:10.061  READ DMA
  f8 00 00 00 00 00 e0 00      00:03:10.061  READ NATIVE MAX ADDRESS
  ec 00 00 00 00 00 a0 02      00:03:10.053  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 02      00:03:10.053  SET FEATURES [Set transfer mode]
  f8 00 00 00 00 00 e0 00      00:03:10.053  READ NATIVE MAX ADDRESS

Error 2370 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  40 51 05 1a 1b 00 e0  Error: UNC 5 sectors at LBA = 0x00001b1a = 6938

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 05 1a 1b 00 e0 00      00:03:03.328  READ DMA
  f8 00 00 00 00 00 e0 00      00:03:03.327  READ NATIVE MAX ADDRESS
  ec 00 00 00 00 00 a0 02      00:03:03.320  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 02      00:03:03.319  SET FEATURES [Set transfer mode]
  f8 00 00 00 00 00 e0 00      00:03:03.319  READ NATIVE MAX ADDRESS

Error 2369 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  40 51 05 1a 1b 00 e0  Error: UNC 5 sectors at LBA = 0x00001b1a = 6938

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 05 1a 1b 00 e0 00      00:02:56.582  READ DMA
  f8 00 00 00 00 00 e0 00      00:02:56.582  READ NATIVE MAX ADDRESS
  ec 00 00 00 00 00 a0 02      00:02:56.574  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 02      00:02:56.574  SET FEATURES [Set transfer mode]
  f8 00 00 00 00 00 e0 00      00:02:56.574  READ NATIVE MAX ADDRESS

Error 2368 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  40 51 05 1a 1b 00 e0  Error: UNC 5 sectors at LBA = 0x00001b1a = 6938

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 05 1a 1b 00 e0 00      00:02:49.809  READ DMA
  f8 00 00 00 00 00 e0 00      00:02:49.809  READ NATIVE MAX ADDRESS
  ec 00 00 00 00 00 a0 02      00:02:49.801  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 02      00:02:49.801  SET FEATURES [Set transfer mode]
  f8 00 00 00 00 00 e0 00      00:02:49.801  READ NATIVE MAX ADDRESS

Error 2367 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  -- -- -- -- -- -- --
  40 51 05 1a 1b 00 e0  Error: UNC 5 sectors at LBA = 0x00001b1a = 6938

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 05 1a 1b 00 e0 00      00:02:43.056  READ DMA
  f8 00 00 00 00 00 e0 00      00:02:43.056  READ NATIVE MAX ADDRESS
  ec 00 00 00 00 00 a0 02      00:02:43.048  IDENTIFY DEVICE
  ef 03 45 00 00 00 a0 02      00:02:43.048  SET FEATURES [Set transfer mode]
  f8 00 00 00 00 00 e0 00      00:02:43.047  READ NATIVE MAX ADDRESS

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Device does not support Selective Self Tests/Logging

Dois-je obtenir un nouveau disque dur sur mon PC?

En plus de la ligne "ata 1.00: status", il devrait y avoir une lecture "ata 1.00: error ..." qui vous indiquera l'erreur exacte.



Le journal des erreurs SMART contient des informations utiles:

Erreur: secteurs UNC 5 à LBA = 0x00001b1a = 6938

Cela signifie une erreur non corrigible. La dernière commande était un DMA READ, il s’agit donc d’une erreur de lecture. Il semble que les secteurs 6938 à 6943 ne sont pas lisibles.

En outre, dans les attributs SMART, nous pouvons voir qu'il y a 40 secteurs réalloués avec succès, 82 secteurs en attente de réallocation et 1 erreur non corrigible (probablement celle du journal):

  5 Reallocated_Sector_Ct   0x0033   100   100   050    Pre-fail  Always       -       40
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       82
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       1

Tout indique que le lecteur est en panne, alors sauvegardez les données immédiatement. Si vous ne pouvez pas copier les données à cause des erreurs, utilisez ddrescue pour créer une image de la partition en ignorant les blocs défectueux. ce tutoriel est très utile.

Aleix Mercader
Vous devez considérer DRDY Errs comme une défaillance matérielle fatale.

Je pense que 'DRDY' est la partie du message d'erreur qui n'est pas si mauvaise:
Daniel Beck
DRDY n'est que le drapeau de disponibilité de l'appareil. ERR, et surtout UNC (uncorrectable) sont le problème.
Aleix Mercader