Guard

http://hiro-system.blog.ocn.ne.jp/blog/2010/11/smartd_995c.html

http://www.wizard-limit.net/mt/pc/archives/2011_08.html

smartdでエラーメッセージを受け取ったときの応急対応

grep -i "smartd" /var/log/messages | tail
guard# grep -i "smartd" /var/log/messages | tail
Feb 14 05:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 06:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 06:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 07:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 07:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 08:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 08:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 09:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 09:57:57 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
Feb 14 10:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
guard#

sector errorを修復を試みる。

guard# smartctl /dev/ad4 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%     19379         310658987
# 2  Short offline       Completed: read failure       90%     19355         310658987
# 3  Short offline       Completed: read failure       90%     19331         310658987
# 4  Short offline       Completed: read failure       90%     19307         310658987
# 5  Short offline       Completed: read failure       90%     19283         310658987
# 6  Extended offline    Completed: read failure       90%     19263         310658987
# 7  Short offline       Completed: read failure       90%     19259         310658987
# 8  Short offline       Completed: read failure       90%     19235         310658987
# 9  Short offline       Completed: read failure       90%     19211         310658987
#10  Short offline       Completed: read failure       90%     19187         310658987
#11  Short offline       Completed: read failure       90%     19163         310658987
#12  Short offline       Completed: read failure       90%     19139         310658987
#13  Short offline       Completed: read failure       90%     19115         310658987
#14  Extended offline    Completed: read failure       90%     19095         310658987
#15  Short offline       Completed: read failure       90%     19091         310658987
#16  Short offline       Completed: read failure       90%     19067         310658987
#17  Short offline       Completed: read failure       90%     19043         310658987
#18  Short offline       Completed: read failure       90%     19019         310658987
#19  Short offline       Completed: read failure       90%     18995         310658987
#20  Short offline       Completed: read failure       90%     18971         310658987
#21  Short offline       Completed: read failure       90%     18947         310658987

guard#

guardバックアップ先

30	2	*	*	*	root	/root/bin/backup_to_k222_all.sh
30	0	*	*	*	root	/root/bin/backup_to_BlackHole_all.sh
# smartctl /dev/sdz --log=selftest
smartctl version x.xx Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)   LBA_of_first_error
# 1  Short offline       Completed: read failure       10%     19935         1043624
# 2  Extended offline    Completed without error       00%     18472         -
# 3  Short offline       Completed without error       00%     18469         -
# 4  Extended offline    Completed without error       00%     18447         -
# 5  Extended offline    Completed without error       00%     16669         -
# 6  Short offline       Completed without error       00%     16645         -
# 7  Extended offline    Completed without error       00%     13278         -
# 8  Short offline       Completed without error       00%     12081         -
guard# fdisk
******* Working on device /dev/ad4 *******
parameters extracted from in-core disklabel are:
cylinders=310020 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=310020 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
   start 63, size 312496317 (152586 Meg), flag 80 (active)
       beg: cyl 0/ head 1/ sector 1;
       end: cyl 1023/ head 3/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
guard#
guard# disklabel  -A /dev/ad4s1
# /dev/ad4s1:
type: ESDI
disk: ad4s1
label:
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 310020
sectors/unit: 312500160
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0           # milliseconds
track-to-track seek: 0  # milliseconds
drivedata: 0

8 partitions:
#        size   offset    fstype   [fsize bsize bps/cpg]
 a:  1048576        0    4.2BSD        0     0     0
 b:  4092240  1048576      swap
 c: 312496317        0    unused        0     0         # "raw" part, don't edit
 d:  4143104  5140816    4.2BSD        0     0     0
 e:  1048576  9283920    4.2BSD        0     0     0
 f: 302163821 10332496    4.2BSD        0     0     0
disklabel: partition c doesn't cover the whole unit!
disklabel: An incorrect partition c may cause problems for standard system utilities
guard#

ということで、bsize が0となってしまう・・・?

ファイルシステムのブロックサイズを調べるには、次のコマンドを使います。

# dumpfs /some/filesystem | grep '^bsize'

ということなので、

guard# dumpfs /dev/ad4s1 | grep '^bsize'
bsize   16384   shift   14      mask    0xffffc000
guard# stat
100728576 88 crw--w---- 1 root tty 88 0 "Feb 14 12:47:16 2013" "Feb 14 12:47:16 2013" "Feb 14 12:47:16 2013" "Jan  1 08:59:59 1970" 4096 0 0 /dev/pts/1
guard# stat w-filter_1_02.sh
80 16587 -rwxr-xr-x 1 root wheel 68531 1776 "Jan 12 14:00:52 2012" "Apr 27 14:46:06 2007" "Nov 15 11:53:50 2010" "Apr 27 14:46:06 2007" 16384 4 0 w-filter_1_02.sh

http://d.hatena.ne.jp/parasporospa/touch/searchdiary?word=*%5Bunix%5D&of=20

によれば、st_blksize ファイルシステム I/O 操作での最適なブロックサイズ は、16384

以下の公式からseek箇所を算出。

http://hiro-system.blog.ocn.ne.jp/blog/2010/11/smartd_995c.html

http://see-take.blogspot.jp/2010/01/hddsmart.html

では、以下の様な公式が掲載されていますが、これはFreeBSDだとそのままでは当てはまりません。

       b = (int)((L-S)*512/B)
       where:
       b = File System block number
       B = File system block size in bytes
       L = LBA of bad sector
       S = Starting sector of partition as shown by fdisk -lu
       and (int) denotes the integer part.

S = 0、B = 4096、L = 1043624を公式に代入。

b = (int)(1043624 * 512 / 4096) = (int) 130453 (小数点以下切捨)

で、こちらとなります。

       b = (int)((L-S)*512/B)
       where:
       b = File System block number
       B = File system block size in bytes (dumpfs 16384)
       L = LBA of bad sector
       S = Starting sector of partition as shown by fdisk
       and (int) denotes the integer part.

S = 0、B = 16384、L = 310658987を公式に代入。

b = (int)(310658987 * 512 / 16384) = (int) 9708093.34375 (小数点以下切捨)
b = (int)(310658987 * 512 / 16384) = (int) 9708093.34375
b = (int)(310658987 * 512 / 32768) = (int) 9708093.34375

192384*512/32768=

となった。該当箇所をddでゼロで埋める。

guard# dd if=/dev/zero of=/dev/ad4 bs=16384 count=1 seek=9708093
dd: /dev/ad4: Operation not permitted

と言われるので、ググって

guard# sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16

としてから、

guard# dd if=/dev/zero of=/dev/ad4 bs=16384 count=1 seek=9708093
1+0 records in
1+0 records out
16384 bytes transferred in 0.000281 secs (58286240 bytes/sec)

と、書き換えて、

guard# sysctl kern.geom.debugflags=0
kern.geom.debugflags: 16 -> 0

お目に留まればもとへとかえす。

そして、正常にもどったかどうかチェック!

guard# smartctl --test=long /dev/ad4
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 40 minutes for test to complete.
Test will complete after Sat Feb 16 13:59:56 2013

Use smartctl -X to abort test.
guard#

・・・14時になったので

guard# smartctl /dev/ad4 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     19437         -
# 2  Extended offline    Completed: read failure       90%     19431         310658987
# 3  Short offline       Completed: read failure       90%     19427         310658987
# 4  Short offline       Completed: read failure       90%     19403         310658987
# 5  Short offline       Completed: read failure       90%     19379         310658987
# 6  Short offline       Completed: read failure       90%     19355         310658987
# 7  Short offline       Completed: read failure       90%     19331         310658987
# 8  Short offline       Completed: read failure       90%     19307         310658987
# 9  Short offline       Completed: read failure       90%     19283         310658987
#10  Extended offline    Completed: read failure       90%     19263         310658987
#11  Short offline       Completed: read failure       90%     19259         310658987
#12  Short offline       Completed: read failure       90%     19235         310658987
#13  Short offline       Completed: read failure       90%     19211         310658987
#14  Short offline       Completed: read failure       90%     19187         310658987
#15  Short offline       Completed: read failure       90%     19163         310658987
#16  Short offline       Completed: read failure       90%     19139         310658987
#17  Short offline       Completed: read failure       90%     19115         310658987
#18  Extended offline    Completed: read failure       90%     19095         310658987
#19  Short offline       Completed: read failure       90%     19091         310658987
#20  Short offline       Completed: read failure       90%     19067         310658987
#21  Short offline       Completed: read failure       90%     19043         310658987
20 of 20 failed self-tests are outdated by newer successful extended offline self-test # 1

guard#

うまく、いったっぽい!( ´▽`)ノ

2013/02/16 14:07

guard# df
Filesystem  1K-blocks     Used     Avail Capacity  Mounted on
/dev/ad4s1a    507630   341756    125264    73%    /
devfs               1        1         0   100%    /dev
/dev/ad4s1e    507630       16    467004     0%    /tmp
/dev/ad4s1f 146328056 14727436 119894376    11%    /usr
/dev/ad4s1d   2000622   198004   1642570    11%    /var
guard#
  268  6:54    smartctl --test=short /dev/ada1
  269  6:55    cat /var/log/console.log
  270  6:55    smartctl /dev/ada1 --log=selftest
  271  6:57    dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=6058
  272  6:57    smartctl --test=short /dev/ada1
  273  6:59    smartctl /dev/ada1 --log=selftest
  274  7:00    dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=6104
  275  7:00    smartctl --test=short /dev/ada1
  276  7:02    smartctl /dev/ada1 --log=selftest
  277  7:03    dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=6150
  278  7:03    smartctl --test=short /dev/ada1
  279  7:05    smartctl /dev/ada1 --log=selftest
  280  7:05    dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=6196
  281  7:06    smartctl --test=short /dev/ada1
  282  7:14    smartctl /dev/ada1 --log=selftest
  283  7:14    dd if=/dev/zero of=/dev/ada1 bs=16384 count=1 seek=6298
  284  7:14    smartctl --test=short /dev/ada1
  285  7:15    history

トップ   新規 一覧 検索 最終更新   ヘルプ   最終更新のRSS