[[Guard]]
#contents
*smartdでエラーメッセージを受け取ったときの応急対応・コマンドまとめ [#y75b9039]
*** # grep -i "smartd" /var/log/messages | tail [#q00a8915]
Jan 12 11:09:01 guard smartd[596]: Device: /dev/ada0, 96 Currently unreadable (pending) sectors (changed -8)
Jan 12 11:39:01 guard smartd[596]: Device: /dev/ada0, 96 Currently unreadable (pending) sectors
Jan 12 12:09:01 guard smartd[596]: Device: /dev/ada0, 96 Currently unreadable (pending) sectors
Jan 12 12:09:01 guard smartd[596]: Device: /dev/ada0, previous self-test completed with error (read test element)
Jan 12 12:09:01 guard smartd[596]: Device: /dev/ada0, Self-Test Log error count increased from 1 to 2
Jan 12 12:39:01 guard smartd[596]: Device: /dev/ada0, 88 Currently unreadable (pending) sectors (changed -8)
Jan 12 13:09:01 guard smartd[596]: Device: /dev/ada0, 88 Currently unreadable (pending) sectors
Jan 12 13:39:00 guard smartd[596]: Device: /dev/ada0, 80 Currently unreadable (pending) sectors (changed -8)
Jan 12 13:39:01 guard smartd[596]: Device: /dev/ada0, Self-Test Log error count increased from 2 to 3
Jan 12 14:09:00 guard smartd[596]: Device: /dev/ada0, 80 Currently unreadable (pending) sectors
*** # smartctl /dev/ada0 --log=selftest [#pb39d335]
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 10% 363 975890088
# 2 Extended offline Completed: read failure 10% 362 975884072
# 3 Extended offline Completed: read failure 10% 361 975876336
# 4 Short offline Completed without error 00% 360 -
# 5 Extended offline Completed: read failure 40% 359 577493400
# 6 Short offline Completed without error 00% 352 -
# 7 Short offline Completed without error 00% 328 -
# 8 Short offline Completed without error 00% 311 -
#
ブロックサイズが 32768 の場合
b = (int)(975890088 * 512 / 32768) = (int) 15248282 (小数点以下切捨)
*** # sysctl kern.geom.debugflags=0x10 [#v8eb0d4f]
kern.geom.debugflags: 0 -> 16
*** # dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=15248282 [#paff7c7c]
1+0 records in
1+0 records out
32768 bytes transferred in 0.000234 secs (140105438 bytes/sec)
*** # sysctl kern.geom.debugflags=0 [#rc7012c8]
kern.geom.debugflags: 16 -> 0
*** # smartctl --test=long /dev/ada0 [#v22d083d]
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 62 minutes for test to complete.
Test will complete after Fri Jan 12 15:35:51 2018
Use smartctl -X to abort test.
----
http://hiro-system.blog.ocn.ne.jp/blog/2010/11/smartd_995c.html
http://www.wizard-limit.net/mt/pc/archives/2011_08.html
**smartdでエラーメッセージを受け取ったときの応急対応 [#q2b4c761]
grep -i "smartd" /var/log/messages | tail
guard# grep -i "smartd" /var/log/messages | tail
Jan 11 12:39:00 guard smartd[596]: Device: /dev/ada0, 104 Currently unreadable (pending) sectors
Jan 11 13:09:00 guard smartd[596]: Device: /dev/ada0, 104 Currently unreadable (pending) sectors
Jan 11 13:39:00 guard smartd[596]: Device: /dev/ada0, 104 Currently unreadable (pending) sectors
**sector errorを修復を試みる。 [#nb685c9b]
# smartctl /dev/ada0 --log=selftest
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 40% 359 577493400
# 2 Short offline Completed without error 00% 352 -
# 3 Short offline Completed without error 00% 328 -
# 4 Short offline Completed without error 00% 311 -
# 5 Short offline Completed without error 00% 304 -
# 6 Short offline Completed without error 00% 280 -
# 7 Short offline Completed without error 00% 256 -
# 8 Short offline Completed without error 00% 232 -
# 9 Extended offline Completed without error 00% 213 -
#10 Short offline Completed without error 00% 208 -
#11 Short offline Completed without error 00% 184 -
#12 Short offline Completed without error 00% 160 -
#13 Short offline Completed without error 00% 136 -
#14 Short offline Completed without error 00% 112 -
#15 Short offline Completed without error 00% 88 -
#16 Short offline Completed without error 00% 64 -
#17 Extended offline Completed without error 00% 45 -
#18 Short offline Completed without error 00% 40 -
guard#
# fdisk
******* Working on device /dev/ada0 *******
parameters extracted from in-core disklabel are:
cylinders=969021 heads=16 sectors/track=63 (1008 blks/cyl)
Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=969021 heads=16 sectors/track=63 (1008 blks/cyl)
Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 64, size 976773103 (476939 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 2;
end: cyl 1023/ head 255/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
# disklabel -A /dev/ada0s1
# /dev/ada0s1:
type: unknown
disk:
label:
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 969020
sectors/unit: 976773103
rpm: 3600
interleave: 0
trackskew: 0
cylinderskew: 0
headswitch: 0 # milliseconds
track-to-track seek: 0 # milliseconds
drivedata: 0
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
a: 968884224 0 4.2BSD 0 0 0
b: 7888878 968884224 swap
c: 976773103 0 unused 0 0 # "raw" part, don't edit
ということで、bsize が0となってしまう・・・?
**ファイルシステムのブロックサイズを調べるには、次のコマンドを使います。 [#m5fe276c]
# dumpfs /some/filesystem | grep '^bsize'
ということなので、
guard# dumpfs /dev/ad4s1 | grep '^bsize'
bsize 16384 shift 14 mask 0xffffc000
piano2ndだと、gpart list すると、ada0p2がファイル領域のようなので
root@piano2nd:~ # dumpfs /dev/ada0p2 | grep '^bsize'
bsize 32768 shift 15 mask 0xffff8000
小さなファイルのstatを見てみて確認
# stat .screenrc
114 5056131 -rw-r--r-- 1 root wheel 10102700 54 "Jan 12 10:43:37 2018" "Dec 29 06:40:50 2017" "Dec 29 06:40:58 2017" "Dec 29 06:40:50 2017" 32768 8 0 .screenrc
http://d.hatena.ne.jp/parasporospa/touch/searchdiary?word=*%5Bunix%5D&of=20
によれば、st_blksize ファイルシステム I/O 操作での最適なブロックサイズ は、16384 ここでは、32768と。
**以下の公式からseek箇所を算出。 [#t7d208fe]
http://hiro-system.blog.ocn.ne.jp/blog/2010/11/smartd_995c.html
http://see-take.blogspot.jp/2010/01/hddsmart.html
では、以下の様な公式が掲載されていますが、これはFreeBSDだとそのままでは当てはまりません。
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.
S = 0、B = 4096、L = 577493400を公式に代入。
b = (int)(1043624 * 512 / 4096) = (int) 130453 (小数点以下切捨)
で、こちらとなります。
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes (dumpfs 32768)
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk
and (int) denotes the integer part.
S = 0、B = 32768、L = 577493400を公式に代入。
b = (int)(577493400 * 512 / 32768) = (int) 9023334.375 (小数点以下切捨)
となった。該当箇所をddでゼロで埋める。
guard# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=9023334
dd: /dev/ada0: Operation not permitted
と言われるので、ググって
guard# sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16
としてから、
guard# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=9023334
guard# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=9023334
1+0 records in
1+0 records out
16384 bytes transferred in 0.000281 secs (58286240 bytes/sec)
と、書き換えて、
guard# sysctl kern.geom.debugflags=0
kern.geom.debugflags: 16 -> 0
お目に留まればもとへとかえす。
そして、正常にもどったかどうかチェック!
guard# smartctl --test=long /dev/ad4
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 40 minutes for test to complete.
Test will complete after Sat Feb 16 13:59:56 2013
Use smartctl -X to abort test.
guard#
・・・14時になったので
guard# smartctl /dev/ad4 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 19437 -
# 2 Extended offline Completed: read failure 90% 19431 310658987
# 3 Short offline Completed: read failure 90% 19427 310658987
# 4 Short offline Completed: read failure 90% 19403 310658987
# 5 Short offline Completed: read failure 90% 19379 310658987
# 6 Short offline Completed: read failure 90% 19355 310658987
# 7 Short offline Completed: read failure 90% 19331 310658987
# 8 Short offline Completed: read failure 90% 19307 310658987
# 9 Short offline Completed: read failure 90% 19283 310658987
#10 Extended offline Completed: read failure 90% 19263 310658987
#11 Short offline Completed: read failure 90% 19259 310658987
#12 Short offline Completed: read failure 90% 19235 310658987
#13 Short offline Completed: read failure 90% 19211 310658987
#14 Short offline Completed: read failure 90% 19187 310658987
#15 Short offline Completed: read failure 90% 19163 310658987
#16 Short offline Completed: read failure 90% 19139 310658987
#17 Short offline Completed: read failure 90% 19115 310658987
#18 Extended offline Completed: read failure 90% 19095 310658987
#19 Short offline Completed: read failure 90% 19091 310658987
#20 Short offline Completed: read failure 90% 19067 310658987
#21 Short offline Completed: read failure 90% 19043 310658987
20 of 20 failed self-tests are outdated by newer successful extended offline self-test # 1
guard#
うまく、いったっぽい!( ´▽`)ノ