- 追加された行はこの色です。
- 削除された行はこの色です。
CONTENTS
#contents
----
Lastmodified &lastmod;
----
*SMART error (CurrentPendingSector) detected on host ZFS編 [#d3de4893]
[[SMART error (CurrentPendingSector) detected on host]]
# zpool status
pool: tank
state: ONLINE
scan: scrub canceled on Fri Aug 14 08:50:50 2015
config:
NAME STATE READ WRITE CKSUM
tank ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
ada3 ONLINE 0 0 0
errors: No known data errors
pool: zfspool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
zfspool ONLINE 0 0 0
ada0p3 ONLINE 0 0 0
errors: No known data errors
こんな構成のFreeBSD 9.x なサーバから、こんなメールが・・・
** SMART error (CurrentPendingSector) detected on host: blackcube.smb.net 2015年8月13日 [#l31ae9d9]
This email was generated by the smartd daemon running on:
host name: blackcube.smb.net
DNS domain: smb.net
NIS domain:
The following warning/error was logged by the smartd daemon:
Device: /dev/ada1, 2 Currently unreadable (pending) sectors
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
No additional email messages about this problem will be sent.
ログはこんな具合。
root@blackcube:/home/kuji # grep -i "smartd" /var/log/messages | tail
Aug 14 06:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 06:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
/dev/ada1 は、こんな・・・
# smartctl /dev/ada1 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 51597 -
# 2 Short offline Interrupted (host reset) 00% 51573 -
# 3 Short offline Completed without error 00% 51549 -
# 4 Short offline Interrupted (host reset) 00% 51525 -
# 5 Short offline Interrupted (host reset) 00% 51501 -
# 6 Short offline Completed without error 00% 51477 -
# 7 Extended offline Completed: read failure 90% 51457 51978742
# 8 Short offline Interrupted (host reset) 00% 51453 -
# 9 Short offline Interrupted (host reset) 00% 51429 -
#10 Short offline Interrupted (host reset) 00% 51405 -
#11 Short offline Completed without error 00% 51381 -
#12 Short offline Completed without error 00% 51357 -
#13 Short offline Completed without error 00% 51333 -
#14 Short offline Completed without error 00% 51309 -
#15 Extended offline Completed: read failure 90% 51296 51978742
#16 Extended offline Completed: read failure 90% 51289 51978742
#17 Short offline Interrupted (host reset) 00% 51285 -
#18 Short offline Completed without error 00% 51261 -
#19 Short offline Completed without error 00% 51237 -
#20 Short offline Completed without error 00% 51213 -
#21 Short offline Completed without error 00% 51173 -
と言うことで、
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.
S = 0、B = 4096、L = 51978742を公式に代入。
b = (int)(1043624 * 512 / 4096) = (int) 130453 (小数点以下切捨)
で、
# dd if=/dev/zero of=/dev/ada1 bs=4096 count=1 seek=6497334
とか、してみたが、状況変わらず。
そこで、ブロックサイズが16kのほうの式
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes (dumpfs 16384)
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk
and (int) denotes the integer part.
S = 0、B = 16384、L = 51978742を公式に代入。
b = (int)(310658987 * 512 / 16384) = (int) 1624335.6875 (小数点以下切捨)
b = (int)(51978742 * 512 / 16384) = (int) 1624335.6875 (小数点以下切捨)
ということで、
# sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16
# dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=1624335
1+0 records in
1+0 records out
16384 bytes transferred in 0.000646 secs (25367101 bytes/sec)
# sysctl kern.geom.debugflags=0
kern.geom.debugflags: 16 -> 0
とすると、このように変化。
root@blackcube:/home/kuji # grep -i "smartd" /var/log/messages | tail
Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 06:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 07:27:16 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 07:57:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Currently unreadable (pending) sectors
Aug 14 08:27:15 blackcube smartd[1066]: Device: /dev/ada1, 2 Offline uncorrectable sectors
Aug 14 08:57:15 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors (changed -1)
Aug 14 08:57:15 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors (changed -1)
エラーは、2個から1個に減っている。ディスクをみると、
# smartctl /dev/ada1 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 51597 -
# 2 Short offline Interrupted (host reset) 00% 51573 -
# 3 Short offline Completed without error 00% 51549 -
# 4 Short offline Interrupted (host reset) 00% 51525 -
# 5 Short offline Interrupted (host reset) 00% 51501 -
# 6 Short offline Completed without error 00% 51477 -
# 7 Extended offline Completed: read failure 90% 51457 51978742
# 8 Short offline Interrupted (host reset) 00% 51453 -
# 9 Short offline Interrupted (host reset) 00% 51429 -
#10 Short offline Interrupted (host reset) 00% 51405 -
#11 Short offline Completed without error 00% 51381 -
#12 Short offline Completed without error 00% 51357 -
#13 Short offline Completed without error 00% 51333 -
#14 Short offline Completed without error 00% 51309 -
#15 Extended offline Completed: read failure 90% 51296 51978742
#16 Extended offline Completed: read failure 90% 51289 51978742
#17 Short offline Interrupted (host reset) 00% 51285 -
#18 Short offline Completed without error 00% 51261 -
#19 Short offline Completed without error 00% 51237 -
#20 Short offline Completed without error 00% 51213 -
#21 Short offline Completed without error 00% 51173 -
と、変化無し!?
というわけで、もう一度テスト
# smartctl --test=long /dev/ada1
smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 116 minutes for test to complete.
Test will complete after Fri Aug 14 11:46:40 2015
Use smartctl -X to abort test.
こんどは、別の場所が・・・?
# smartctl /dev/ada1 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 9.1-RELEASE-p22 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 51603 101630056
# 2 Short offline Completed without error 00% 51597 -
# 3 Short offline Interrupted (host reset) 00% 51573 -
# 4 Short offline Completed without error 00% 51549 -
# 5 Short offline Interrupted (host reset) 00% 51525 -
# 6 Short offline Interrupted (host reset) 00% 51501 -
# 7 Short offline Completed without error 00% 51477 -
# 8 Extended offline Completed: read failure 90% 51457 51978742
# 9 Short offline Interrupted (host reset) 00% 51453 -
#10 Short offline Interrupted (host reset) 00% 51429 -
#11 Short offline Interrupted (host reset) 00% 51405 -
#12 Short offline Completed without error 00% 51381 -
#13 Short offline Completed without error 00% 51357 -
#14 Short offline Completed without error 00% 51333 -
#15 Short offline Completed without error 00% 51309 -
#16 Extended offline Completed: read failure 90% 51296 51978742
#17 Extended offline Completed: read failure 90% 51289 51978742
#18 Short offline Interrupted (host reset) 00% 51285 -
#19 Short offline Completed without error 00% 51261 -
#20 Short offline Completed without error 00% 51237 -
#21 Short offline Completed without error 00% 51213 -
# grep -i "smartd" /var/log/messages | tail
Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, previous self-test completed with error (read test element)
Aug 14 10:13:12 blackcube smartd[1066]: Device: /dev/ada1, Self-Test Log error count increased from 3 to 4
Aug 14 10:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Aug 14 10:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Aug 14 11:13:13 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Aug 14 11:13:13 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors
Aug 14 11:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Currently unreadable (pending) sectors
Aug 14 11:43:12 blackcube smartd[1066]: Device: /dev/ada1, 1 Offline uncorrectable sectors
計算
b = (int)(101630056 * 512 / 16384) = (int) 3175939.25 (小数点以下切捨)
1017 11:57 sysctl kern.geom.debugflags=0x10
1018 11:57 dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=3175939
1019 11:58 sysctl kern.geom.debugflags=0
* FreeBSD 11-CURRENT BS 4096 [#be0f60da]
install 時にZrootの設定で、HDDのブロックサイズを4k(デフォルト)に指定。
root@blackhole:~ # smartctl --test=short /dev/ada0
smartctl 6.4 2015-06-04 r4109 [FreeBSD 11.0-CURRENT amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in off-line mode".
Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Tue Sep 22 11:15:30 2015
Use smartctl -X to abort test.
root@blackhole:~ # smartctl /dev/ada0 --log=selftest
smartctl 6.4 2015-06-04 r4109 [FreeBSD 11.0-CURRENT amd64] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 1715 -
# 2 Short offline Completed without error 00% 1707 -
# 3 Extended offline Interrupted (host reset) 70% 1694 -
# 4 Short offline Completed without error 00% 1692 -
# 5 Short offline Completed without error 00% 1692 -
# 6 Short offline Completed without error 00% 1692 -
# 7 Short offline Completed: read failure 90% 1692 3480882840
# 8 Short offline Completed without error 00% 1692 -
# 9 Short offline Completed without error 00% 1692 -
#10 Short offline Completed without error 00% 1691 -
#11 Short offline Completed without error 00% 1691 -
#12 Short offline Completed without error 00% 1691 -
#13 Short offline Completed: read failure 90% 1691 3480882832
#14 Short offline Completed without error 00% 1683 -
#15 Short captive Completed: read failure 90% 1605 3480882828
#16 Short captive Completed: read failure 90% 1605 3480882830
#17 Short captive Completed: read failure 90% 1605 3480882824
#18 Short captive Completed: read failure 90% 1605 3480882826
#19 Short captive Completed: read failure 90% 1605 3480882828
#20 Short captive Completed: read failure 90% 1605 3480882830
#21 Short offline Completed: read failure 90% 1605 3480882824
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.
S = 0、B = 4096、L = 3480882840を公式に代入。
b = (int)(1043624 * 512 / 4096) = (int) 424912454.8339844 (小数点以下切捨)
root@blackhole:~ # sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16
root@blackhole:~ # dd if=/dev/zero of=/dev/ada0 bs=4096 count=1 seek=424912454
1+0 records in
1+0 records out
4096 bytes transferred in 0.000181 secs (22647226 bytes/sec)
root@blackhole:~ # sysctl kern.geom.debugflags=0
kern.geom.debugflags: 16 -> 0
root@blackhole:~ #
380 11:17 smartctl /dev/ada0 --log=selftest
381 11:32 kern.geom.debugflags: 0 - > 16
382 11:32 sysctl kern.geom.debugflags=0x10
383 11:35 dd if=/dev/zero of=/dev/ada0 bs=4096 count=1 seek=424912454
384 11:36 sysctl kern.geom.debugflags=0
----
Total access &counter(total);:本日 &counter(today);:昨日 &counter(yesterday);
#counter([total|today|yesterday]);