- 追加された行はこの色です。
- 削除された行はこの色です。
[[Guard]]
#contents
http://hiro-system.blog.ocn.ne.jp/blog/2010/11/smartd_995c.html
http://www.wizard-limit.net/mt/pc/archives/2011_08.html
***smartdでエラーメッセージを受け取ったときの応急対応 [#q2b4c761]
grep -i "smartd" /var/log/messages | tail
guard# grep -i "smartd" /var/log/messages | tail
Feb 14 05:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 06:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 06:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 07:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 07:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 08:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 08:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 09:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 09:57:57 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 10:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
guard#
***sector errorを修復を試みる。 [#nb685c9b]
guard# smartctl /dev/ad4 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 19379 310658987
# 2 Short offline Completed: read failure 90% 19355 310658987
# 3 Short offline Completed: read failure 90% 19331 310658987
# 4 Short offline Completed: read failure 90% 19307 310658987
# 5 Short offline Completed: read failure 90% 19283 310658987
# 6 Extended offline Completed: read failure 90% 19263 310658987
# 7 Short offline Completed: read failure 90% 19259 310658987
# 8 Short offline Completed: read failure 90% 19235 310658987
# 9 Short offline Completed: read failure 90% 19211 310658987
#10 Short offline Completed: read failure 90% 19187 310658987
#11 Short offline Completed: read failure 90% 19163 310658987
#12 Short offline Completed: read failure 90% 19139 310658987
#13 Short offline Completed: read failure 90% 19115 310658987
#14 Extended offline Completed: read failure 90% 19095 310658987
#15 Short offline Completed: read failure 90% 19091 310658987
#16 Short offline Completed: read failure 90% 19067 310658987
#17 Short offline Completed: read failure 90% 19043 310658987
#18 Short offline Completed: read failure 90% 19019 310658987
#19 Short offline Completed: read failure 90% 18995 310658987
#20 Short offline Completed: read failure 90% 18971 310658987
#21 Short offline Completed: read failure 90% 18947 310658987
guard#
***guardバックアップ先 [#a004a988]
30 2 * * * root /root/bin/backup_to_k222_all.sh
30 0 * * * root /root/bin/backup_to_BlackHole_all.sh
# smartctl /dev/sdz --log=selftest
smartctl version x.xx Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 10% 19935 1043624
# 2 Extended offline Completed without error 00% 18472 -
# 3 Short offline Completed without error 00% 18469 -
# 4 Extended offline Completed without error 00% 18447 -
# 5 Extended offline Completed without error 00% 16669 -
# 6 Short offline Completed without error 00% 16645 -
# 7 Extended offline Completed without error 00% 13278 -
# 8 Short offline Completed without error 00% 12081 -
guard# fdisk
******* Working on device /dev/ad4 *******
parameters extracted from in-core disklabel are:
cylinders=310020 heads=16 sectors/track=63 (1008 blks/cyl)
Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=310020 heads=16 sectors/track=63 (1008 blks/cyl)
Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
start 63, size 312496317 (152586 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 1023/ head 3/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
guard#
guard# disklabel -A /dev/ad4s1
# /dev/ad4s1:
type: ESDI
disk: ad4s1
label:
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 310020
sectors/unit: 312500160
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # milliseconds
track-to-track seek: 0 # milliseconds
drivedata: 0
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
a: 1048576 0 4.2BSD 0 0 0
b: 4092240 1048576 swap
c: 312496317 0 unused 0 0 # "raw" part, don't edit
d: 4143104 5140816 4.2BSD 0 0 0
e: 1048576 9283920 4.2BSD 0 0 0
f: 302163821 10332496 4.2BSD 0 0 0
disklabel: partition c doesn't cover the whole unit!
disklabel: An incorrect partition c may cause problems for standard system utilities
guard#
ということで、bsize が0となってしまう・・・?
***ファイルシステムのブロックサイズを調べるには、次のコマンドを使います。 [#m5fe276c]
# dumpfs /some/filesystem | grep '^bsize'
ということなので、
guard# dumpfs /dev/ad4s1 | grep '^bsize'
bsize 16384 shift 14 mask 0xffffc000
guard# stat
100728576 88 crw--w---- 1 root tty 88 0 "Feb 14 12:47:16 2013" "Feb 14 12:47:16 2013" "Feb 14 12:47:16 2013" "Jan 1 08:59:59 1970" 4096 0 0 /dev/pts/1
guard# stat w-filter_1_02.sh
80 16587 -rwxr-xr-x 1 root wheel 68531 1776 "Jan 12 14:00:52 2012" "Apr 27 14:46:06 2007" "Nov 15 11:53:50 2010" "Apr 27 14:46:06 2007" 16384 4 0 w-filter_1_02.sh
http://d.hatena.ne.jp/parasporospa/touch/searchdiary?word=*%5Bunix%5D&of=20
によれば、st_blksize ファイルシステム I/O 操作での最適なブロックサイズ は、16384
***以下の公式からseek箇所を算出。 [#t7d208fe]
http://hiro-system.blog.ocn.ne.jp/blog/2010/11/smartd_995c.html
では、以下の様な公式が掲載されていますが、これはFreeBSDだとそのままでは当てはまりません。
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.
S = 0、B = 4096、L = 1043624を公式に代入。
b = (int)(1043624 * 512 / 4096) = (int) 130453 (小数点以下切捨)
で、こちらとなります。
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes (dumpfs 16384)
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk
and (int) denotes the integer part.
S = 0、B = 16384、L = 310658987を公式に代入。
b = (int)(310658987 * 512 / 16384) = (int) 9708093.34375 (小数点以下切捨)
b = (int)(310658987 * 512 / 16384) = (int) 9708093.34375
b = (int)(310658987 * 512 / 32768) = (int) 9708093.34375
192384*512/32768=
となった。該当箇所をddでゼロで埋める。
guard# dd if=/dev/zero of=/dev/ad4 bs=16384 count=1 seek=9708093
dd: /dev/ad4: Operation not permitted
と言われるので、ググって
guard# sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16
としてから、
guard# dd if=/dev/zero of=/dev/ad4 bs=16384 count=1 seek=9708093
1+0 records in
1+0 records out
16384 bytes transferred in 0.000281 secs (58286240 bytes/sec)
と、書き換えて、
guard# sysctl kern.geom.debugflags=0
kern.geom.debugflags: 16 -> 0
お目に留まればもとへとかえす。
そして、正常にもどったかどうかチェック!
guard# smartctl --test=long /dev/ad4
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 40 minutes for test to complete.
Test will complete after Sat Feb 16 13:59:56 2013
Use smartctl -X to abort test.
guard#
・・・14時になったので
guard# smartctl /dev/ad4 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 19437 -
# 2 Extended offline Completed: read failure 90% 19431 310658987
# 3 Short offline Completed: read failure 90% 19427 310658987
# 4 Short offline Completed: read failure 90% 19403 310658987
# 5 Short offline Completed: read failure 90% 19379 310658987
# 6 Short offline Completed: read failure 90% 19355 310658987
# 7 Short offline Completed: read failure 90% 19331 310658987
# 8 Short offline Completed: read failure 90% 19307 310658987
# 9 Short offline Completed: read failure 90% 19283 310658987
#10 Extended offline Completed: read failure 90% 19263 310658987
#11 Short offline Completed: read failure 90% 19259 310658987
#12 Short offline Completed: read failure 90% 19235 310658987
#13 Short offline Completed: read failure 90% 19211 310658987
#14 Short offline Completed: read failure 90% 19187 310658987
#15 Short offline Completed: read failure 90% 19163 310658987
#16 Short offline Completed: read failure 90% 19139 310658987
#17 Short offline Completed: read failure 90% 19115 310658987
#18 Extended offline Completed: read failure 90% 19095 310658987
#19 Short offline Completed: read failure 90% 19091 310658987
#20 Short offline Completed: read failure 90% 19067 310658987
#21 Short offline Completed: read failure 90% 19043 310658987
20 of 20 failed self-tests are outdated by newer successful extended offline self-test # 1
guard#
うまく、いったっぽい!( ´▽`)ノ
2013/02/16 14:07
guard# df
Filesystem 1K-blocks Used Avail Capacity Mounted on
/dev/ad4s1a 507630 341756 125264 73% /
devfs 1 1 0 100% /dev
/dev/ad4s1e 507630 16 467004 0% /tmp
/dev/ad4s1f 146328056 14727436 119894376 11% /usr
/dev/ad4s1d 2000622 198004 1642570 11% /var
guard#