CONTENTS


Lastmodified 2015-10-24 (土) 16:42:51


SMART error (CurrentPendingSector) detected on host ZFS編

SMART error (CurrentPendingSector) detected on host

# zpool status
  pool: tank
 state: ONLINE
  scan: scrub canceled on Fri Aug 14 08:50:50 2015
config:

        NAME        STATE     READ WRITE CKSUM
        tank        ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            ada1    ONLINE       0     0     0
            ada2    ONLINE       0     0     0
            ada3    ONLINE       0     0     0

errors: No known data errors

  pool: zfspool
 state: ONLINE
  scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        zfspool     ONLINE       0     0     0
          ada0p3    ONLINE       0     0     0

errors: No known data errors

こんな構成のFreeBSD 9.x なサーバから、こんなメールが・・・

SMART error (CurrentPendingSector) detected on host: blackcube.smb.net 2015年8月13日

This email was generated by the smartd daemon running on:

  host name: blackcube.smb.net
  DNS domain: smb.net
  NIS domain: 

The following warning/error was logged by the smartd daemon:

Device: /dev/ada1, 2 Currently unreadable (pending) sectors


For details see host's SYSLOG.

You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.

ログはこんな具合。

root@blackcube:/home/kuji # grep -i "smartd" /var/log/messages | tail
Aug 14 06:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 06:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors

/dev/ada1 は、こんな・・・

# smartctl /dev/ada1 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     51597         -
# 2  Short offline       Interrupted (host reset)      00%     51573         -
# 3  Short offline       Completed without error       00%     51549         -
# 4  Short offline       Interrupted (host reset)      00%     51525         -
# 5  Short offline       Interrupted (host reset)      00%     51501         -
# 6  Short offline       Completed without error       00%     51477         -
# 7  Extended offline    Completed: read failure       90%     51457         51978742
# 8  Short offline       Interrupted (host reset)      00%     51453         -
# 9  Short offline       Interrupted (host reset)      00%     51429         -
#10  Short offline       Interrupted (host reset)      00%     51405         -
#11  Short offline       Completed without error       00%     51381         -
#12  Short offline       Completed without error       00%     51357         -
#13  Short offline       Completed without error       00%     51333         -
#14  Short offline       Completed without error       00%     51309         -
#15  Extended offline    Completed: read failure       90%     51296         51978742
#16  Extended offline    Completed: read failure       90%     51289         51978742
#17  Short offline       Interrupted (host reset)      00%     51285         -
#18  Short offline       Completed without error       00%     51261         -
#19  Short offline       Completed without error       00%     51237         -
#20  Short offline       Completed without error       00%     51213         -
#21  Short offline       Completed without error       00%     51173         -

と言うことで、

     b = (int)((L-S)*512/B)
      where:
      b = File System block number
      B = File system block size in bytes
      L = LBA of bad sector
      S = Starting sector of partition as shown by fdisk -lu
      and (int) denotes the integer part.

S = 0、B = 4096、L = 51978742を公式に代入。

b = (int)(1043624 * 512 / 4096) = (int) 130453 (小数点以下切捨)

で、

# dd if=/dev/zero of=/dev/ada1 bs=4096 count=1 seek=6497334

とか、してみたが、状況変わらず。

そこで、ブロックサイズが16kのほうの式

      b = (int)((L-S)*512/B)
      where:
      b = File System block number
      B = File system block size in bytes (dumpfs 16384)
      L = LBA of bad sector
      S = Starting sector of partition as shown by fdisk
      and (int) denotes the integer part.

S = 0、B = 16384、L = 51978742を公式に代入。

b = (int)(51978742 * 512 / 16384) = (int) 1624335.6875 (小数点以下切捨)

ということで、

# sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16
# dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=1624335
1+0 records in
1+0 records out
16384 bytes transferred in 0.000646 secs (25367101 bytes/sec)
# sysctl kern.geom.debugflags=0
kern.geom.debugflags: 16 -> 0

とすると、このように変化。

root@blackcube:/home/kuji # grep -i "smartd" /var/log/messages | tail
Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 08:57:15 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors (changed -1)
Aug 14 08:57:15 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors (changed -1)

エラーは、2個から1個に減っている。ディスクをみると、

# smartctl /dev/ada1 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     51597         -
# 2  Short offline       Interrupted (host reset)      00%     51573         -
# 3  Short offline       Completed without error       00%     51549         -
# 4  Short offline       Interrupted (host reset)      00%     51525         -
# 5  Short offline       Interrupted (host reset)      00%     51501         -
# 6  Short offline       Completed without error       00%     51477         -
# 7  Extended offline    Completed: read failure       90%     51457         51978742
# 8  Short offline       Interrupted (host reset)      00%     51453         -
# 9  Short offline       Interrupted (host reset)      00%     51429         -
#10  Short offline       Interrupted (host reset)      00%     51405         -
#11  Short offline       Completed without error       00%     51381         -
#12  Short offline       Completed without error       00%     51357         -
#13  Short offline       Completed without error       00%     51333         -
#14  Short offline       Completed without error       00%     51309         -
#15  Extended offline    Completed: read failure       90%     51296         51978742
#16  Extended offline    Completed: read failure       90%     51289         51978742
#17  Short offline       Interrupted (host reset)      00%     51285         -
#18  Short offline       Completed without error       00%     51261         -
#19  Short offline       Completed without error       00%     51237         -
#20  Short offline       Completed without error       00%     51213         -
#21  Short offline       Completed without error       00%     51173         -

と、変化無し!?

というわけで、もう一度テスト

# smartctl --test=long /dev/ada1
smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 116 minutes for test to complete.
Test will complete after Fri Aug 14 11:46:40 2015 

Use smartctl -X to abort test.

こんどは、別の場所が・・・?

# smartctl /dev/ada1 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%     51603         101630056
# 2  Short offline       Completed without error       00%     51597         -
# 3  Short offline       Interrupted (host reset)      00%     51573         -
# 4  Short offline       Completed without error       00%     51549         -
# 5  Short offline       Interrupted (host reset)      00%     51525         -
# 6  Short offline       Interrupted (host reset)      00%     51501         -
# 7  Short offline       Completed without error       00%     51477         -
# 8  Extended offline    Completed: read failure       90%     51457         51978742
# 9  Short offline       Interrupted (host reset)      00%     51453         -
#10  Short offline       Interrupted (host reset)      00%     51429         -
#11  Short offline       Interrupted (host reset)      00%     51405         -
#12  Short offline       Completed without error       00%     51381         -
#13  Short offline       Completed without error       00%     51357         -
#14  Short offline       Completed without error       00%     51333         -
#15  Short offline       Completed without error       00%     51309         -
#16  Extended offline    Completed: read failure       90%     51296         51978742
#17  Extended offline    Completed: read failure       90%     51289         51978742
#18  Short offline       Interrupted (host reset)      00%     51285         -
#19  Short offline       Completed without error       00%     51261         -
#20  Short offline       Completed without error       00%     51237         -
#21  Short offline       Completed without error       00%     51213         -
# grep -i "smartd" /var/log/messages | tail
Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, previous self-test completed with error (read test element)
Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, Self-Test Log error count increased from 3 to 4
Aug 14 10:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Aug 14 10:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Aug 14 11:13:13 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Aug 14 11:13:13 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Aug 14 11:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Aug 14 11:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors

計算

b = (int)(101630056 * 512 / 16384) = (int) 3175939.25 (小数点以下切捨)
 1017  11:57   sysctl kern.geom.debugflags=0x10
 1018  11:57   dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=3175939
 1019  11:58   sysctl kern.geom.debugflags=0

FreeBSD 11-CURRENT BS 4096

install 時にZrootの設定で、HDDのブロックサイズを4k(デフォルト)に指定。

root@blackhole:~ # smartctl --test=short /dev/ada0
smartctl 6.4 2015-06-04 r4109 [FreeBSD 11.0-CURRENT amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Tue Sep 22 11:15:30 2015

Use smartctl -X to abort test.
root@blackhole:~ # smartctl /dev/ada0 --log=selftest
smartctl 6.4 2015-06-04 r4109 [FreeBSD 11.0-CURRENT amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      1715         -
# 2  Short offline       Completed without error       00%      1707         -
# 3  Extended offline    Interrupted (host reset)      70%      1694         -
# 4  Short offline       Completed without error       00%      1692         -
# 5  Short offline       Completed without error       00%      1692         -
# 6  Short offline       Completed without error       00%      1692         -
# 7  Short offline       Completed: read failure       90%      1692         3480882840
# 8  Short offline       Completed without error       00%      1692         -
# 9  Short offline       Completed without error       00%      1692         -
#10  Short offline       Completed without error       00%      1691         -
#11  Short offline       Completed without error       00%      1691         -
#12  Short offline       Completed without error       00%      1691         -
#13  Short offline       Completed: read failure       90%      1691         3480882832
#14  Short offline       Completed without error       00%      1683         -
#15  Short captive       Completed: read failure       90%      1605         3480882828
#16  Short captive       Completed: read failure       90%      1605         3480882830
#17  Short captive       Completed: read failure       90%      1605         3480882824
#18  Short captive       Completed: read failure       90%      1605         3480882826
#19  Short captive       Completed: read failure       90%      1605         3480882828
#20  Short captive       Completed: read failure       90%      1605         3480882830
#21  Short offline       Completed: read failure       90%      1605         3480882824
     b = (int)((L-S)*512/B)
      where:
      b = File System block number
      B = File system block size in bytes
      L = LBA of bad sector
      S = Starting sector of partition as shown by fdisk -lu
      and (int) denotes the integer part.

S = 0、B = 4096、L = 3480882840を公式に代入。

b = (int)(1043624 * 512 / 4096) = (int) 424912454.8339844 (小数点以下切捨)
root@blackhole:~ # sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16
root@blackhole:~ # dd if=/dev/zero of=/dev/ada0 bs=4096 count=1 seek=424912454
1+0 records in
1+0 records out
4096 bytes transferred in 0.000181 secs (22647226 bytes/sec)
root@blackhole:~ # sysctl kern.geom.debugflags=0
kern.geom.debugflags: 16 -> 0
root@blackhole:~ #
  380  11:17   smartctl /dev/ada0 --log=selftest
  381  11:32   kern.geom.debugflags: 0 - > 16
  382  11:32   sysctl kern.geom.debugflags=0x10
  383  11:35   dd if=/dev/zero of=/dev/ada0 bs=4096 count=1 seek=424912454
  384  11:36   sysctl kern.geom.debugflags=0

Total access 3195:本日 2:昨日 1

Counter: 3195, today: 2, yesterday: 1

トップ   新規 一覧 検索 最終更新   ヘルプ   最終更新のRSS