CONTENTS
Lastmodified 2015-10-24 (土) 16:42:51
SMART error (CurrentPendingSector) detected on host
# zpool status pool: tank state: ONLINE scan: scrub canceled on Fri Aug 14 08:50:50 2015 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 ada1 ONLINE 0 0 0 ada2 ONLINE 0 0 0 ada3 ONLINE 0 0 0 errors: No known data errors pool: zfspool state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zfspool ONLINE 0 0 0 ada0p3 ONLINE 0 0 0 errors: No known data errors
こんな構成のFreeBSD 9.x なサーバから、こんなメールが・・・
This email was generated by the smartd daemon running on: host name: blackcube.smb.net DNS domain: smb.net NIS domain: The following warning/error was logged by the smartd daemon: Device: /dev/ada1, 2 Currently unreadable (pending) sectors For details see host's SYSLOG. You can also use the smartctl utility for further investigation. No additional email messages about this problem will be sent.
ログはこんな具合。
root@blackcube:/home/kuji # grep -i "smartd" /var/log/messages | tail Aug 14 06:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors Aug 14 06:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
/dev/ada1 は、こんな・・・
# smartctl /dev/ada1 --log=selftest smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 51597 - # 2 Short offline Interrupted (host reset) 00% 51573 - # 3 Short offline Completed without error 00% 51549 - # 4 Short offline Interrupted (host reset) 00% 51525 - # 5 Short offline Interrupted (host reset) 00% 51501 - # 6 Short offline Completed without error 00% 51477 - # 7 Extended offline Completed: read failure 90% 51457 51978742 # 8 Short offline Interrupted (host reset) 00% 51453 - # 9 Short offline Interrupted (host reset) 00% 51429 - #10 Short offline Interrupted (host reset) 00% 51405 - #11 Short offline Completed without error 00% 51381 - #12 Short offline Completed without error 00% 51357 - #13 Short offline Completed without error 00% 51333 - #14 Short offline Completed without error 00% 51309 - #15 Extended offline Completed: read failure 90% 51296 51978742 #16 Extended offline Completed: read failure 90% 51289 51978742 #17 Short offline Interrupted (host reset) 00% 51285 - #18 Short offline Completed without error 00% 51261 - #19 Short offline Completed without error 00% 51237 - #20 Short offline Completed without error 00% 51213 - #21 Short offline Completed without error 00% 51173 -
と言うことで、
b = (int)((L-S)*512/B) where: b = File System block number B = File system block size in bytes L = LBA of bad sector S = Starting sector of partition as shown by fdisk -lu and (int) denotes the integer part. S = 0、B = 4096、L = 51978742を公式に代入。 b = (int)(51978742 * 512 / 4096) = (int) 6497342.75 (小数点以下切捨)
で、
# dd if=/dev/zero of=/dev/ada1 bs=4096 count=1 seek=6497342
とか、してみたが、状況変わらず。
そこで、ブロックサイズが16kのほうの式
b = (int)((L-S)*512/B) where: b = File System block number B = File system block size in bytes (dumpfs 16384) L = LBA of bad sector S = Starting sector of partition as shown by fdisk and (int) denotes the integer part. S = 0、B = 16384、L = 51978742を公式に代入。 b = (int)(51978742 * 512 / 16384) = (int) 1624335.6875 (小数点以下切捨)
ということで、
# sysctl kern.geom.debugflags=0x10 kern.geom.debugflags: 0 -> 16 # dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=1624335 1+0 records in 1+0 records out 16384 bytes transferred in 0.000646 secs (25367101 bytes/sec) # sysctl kern.geom.debugflags=0 kern.geom.debugflags: 16 -> 0
とすると、このように変化。
root@blackcube:/home/kuji # grep -i "smartd" /var/log/messages | tail Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors Aug 14 08:57:15 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors (changed -1) Aug 14 08:57:15 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors (changed -1)
エラーは、2個から1個に減っている。ディスクをみると、
# smartctl /dev/ada1 --log=selftest smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 51597 - # 2 Short offline Interrupted (host reset) 00% 51573 - # 3 Short offline Completed without error 00% 51549 - # 4 Short offline Interrupted (host reset) 00% 51525 - # 5 Short offline Interrupted (host reset) 00% 51501 - # 6 Short offline Completed without error 00% 51477 - # 7 Extended offline Completed: read failure 90% 51457 51978742 # 8 Short offline Interrupted (host reset) 00% 51453 - # 9 Short offline Interrupted (host reset) 00% 51429 - #10 Short offline Interrupted (host reset) 00% 51405 - #11 Short offline Completed without error 00% 51381 - #12 Short offline Completed without error 00% 51357 - #13 Short offline Completed without error 00% 51333 - #14 Short offline Completed without error 00% 51309 - #15 Extended offline Completed: read failure 90% 51296 51978742 #16 Extended offline Completed: read failure 90% 51289 51978742 #17 Short offline Interrupted (host reset) 00% 51285 - #18 Short offline Completed without error 00% 51261 - #19 Short offline Completed without error 00% 51237 - #20 Short offline Completed without error 00% 51213 - #21 Short offline Completed without error 00% 51173 -
と、変化無し!?
というわけで、もう一度テスト
# smartctl --test=long /dev/ada1 smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Extended self-test routine immediately in off-line mode". Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 116 minutes for test to complete. Test will complete after Fri Aug 14 11:46:40 2015 Use smartctl -X to abort test.
こんどは、別の場所が・・・?
# smartctl /dev/ada1 --log=selftest smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build) Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 51603 101630056 # 2 Short offline Completed without error 00% 51597 - # 3 Short offline Interrupted (host reset) 00% 51573 - # 4 Short offline Completed without error 00% 51549 - # 5 Short offline Interrupted (host reset) 00% 51525 - # 6 Short offline Interrupted (host reset) 00% 51501 - # 7 Short offline Completed without error 00% 51477 - # 8 Extended offline Completed: read failure 90% 51457 51978742 # 9 Short offline Interrupted (host reset) 00% 51453 - #10 Short offline Interrupted (host reset) 00% 51429 - #11 Short offline Interrupted (host reset) 00% 51405 - #12 Short offline Completed without error 00% 51381 - #13 Short offline Completed without error 00% 51357 - #14 Short offline Completed without error 00% 51333 - #15 Short offline Completed without error 00% 51309 - #16 Extended offline Completed: read failure 90% 51296 51978742 #17 Extended offline Completed: read failure 90% 51289 51978742 #18 Short offline Interrupted (host reset) 00% 51285 - #19 Short offline Completed without error 00% 51261 - #20 Short offline Completed without error 00% 51237 - #21 Short offline Completed without error 00% 51213 -
# grep -i "smartd" /var/log/messages | tail Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, previous self-test completed with error (read test element) Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, Self-Test Log error count increased from 3 to 4 Aug 14 10:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors Aug 14 10:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors Aug 14 11:13:13 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors Aug 14 11:13:13 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors Aug 14 11:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors Aug 14 11:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors
計算
b = (int)(101630056 * 512 / 16384) = (int) 3175939.25 (小数点以下切捨)
1017 11:57 sysctl kern.geom.debugflags=0x10 1018 11:57 dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=3175939 1019 11:58 sysctl kern.geom.debugflags=0
install 時にZrootの設定で、HDDのブロックサイズを4k(デフォルト)に指定。
root@blackhole:~ # smartctl --test=short /dev/ada0 smartctl 6.4 2015-06-04 r4109 [FreeBSD 11.0-CURRENT amd64] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION === Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Tue Sep 22 11:15:30 2015 Use smartctl -X to abort test. root@blackhole:~ # smartctl /dev/ada0 --log=selftest smartctl 6.4 2015-06-04 r4109 [FreeBSD 11.0-CURRENT amd64] (local build) Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 1715 - # 2 Short offline Completed without error 00% 1707 - # 3 Extended offline Interrupted (host reset) 70% 1694 - # 4 Short offline Completed without error 00% 1692 - # 5 Short offline Completed without error 00% 1692 - # 6 Short offline Completed without error 00% 1692 - # 7 Short offline Completed: read failure 90% 1692 3480882840 # 8 Short offline Completed without error 00% 1692 - # 9 Short offline Completed without error 00% 1692 - #10 Short offline Completed without error 00% 1691 - #11 Short offline Completed without error 00% 1691 - #12 Short offline Completed without error 00% 1691 - #13 Short offline Completed: read failure 90% 1691 3480882832 #14 Short offline Completed without error 00% 1683 - #15 Short captive Completed: read failure 90% 1605 3480882828 #16 Short captive Completed: read failure 90% 1605 3480882830 #17 Short captive Completed: read failure 90% 1605 3480882824 #18 Short captive Completed: read failure 90% 1605 3480882826 #19 Short captive Completed: read failure 90% 1605 3480882828 #20 Short captive Completed: read failure 90% 1605 3480882830 #21 Short offline Completed: read failure 90% 1605 3480882824
b = (int)((L-S)*512/B) where: b = File System block number B = File system block size in bytes L = LBA of bad sector S = Starting sector of partition as shown by fdisk -lu and (int) denotes the integer part. S = 0、B = 4096、L = 3480882840を公式に代入。 b = (int)(1043624 * 512 / 4096) = (int) 424912454.8339844 (小数点以下切捨)
root@blackhole:~ # sysctl kern.geom.debugflags=0x10 kern.geom.debugflags: 0 -> 16 root@blackhole:~ # dd if=/dev/zero of=/dev/ada0 bs=4096 count=1 seek=424912454 1+0 records in 1+0 records out 4096 bytes transferred in 0.000181 secs (22647226 bytes/sec) root@blackhole:~ # sysctl kern.geom.debugflags=0 kern.geom.debugflags: 16 -> 0 root@blackhole:~ #
380 11:17 smartctl /dev/ada0 --log=selftest 381 11:32 kern.geom.debugflags: 0 - > 16 382 11:32 sysctl kern.geom.debugflags=0x10 383 11:35 dd if=/dev/zero of=/dev/ada0 bs=4096 count=1 seek=424912454 384 11:36 sysctl kern.geom.debugflags=0
http://d.hatena.ne.jp/flying-foozy/20131122/1385105842
http://blogs.yahoo.co.jp/alpha3166/10334103.html
Total access 3195:本日 2:昨日 1