Guard

smartdでエラーメッセージを受け取ったときの応急対応・コマンドまとめ

# grep -i "smartd" /var/log/messages | tail

Jan 12 11:09:01 guard smartd[596]: Device: /dev/ada0, 96 Currently unreadable (pending) sectors (changed -8)
Jan 12 11:39:01 guard smartd[596]: Device: /dev/ada0, 96 Currently unreadable (pending) sectors
Jan 12 12:09:01 guard smartd[596]: Device: /dev/ada0, 96 Currently unreadable (pending) sectors
Jan 12 12:09:01 guard smartd[596]: Device: /dev/ada0, previous self-test completed with error (read test element)
Jan 12 12:09:01 guard smartd[596]: Device: /dev/ada0, Self-Test Log error count increased from 1 to 2
Jan 12 12:39:01 guard smartd[596]: Device: /dev/ada0, 88 Currently unreadable (pending) sectors (changed -8)
Jan 12 13:09:01 guard smartd[596]: Device: /dev/ada0, 88 Currently unreadable (pending) sectors
Jan 12 13:39:00 guard smartd[596]: Device: /dev/ada0, 80 Currently unreadable (pending) sectors (changed -8)
Jan 12 13:39:01 guard smartd[596]: Device: /dev/ada0, Self-Test Log error count increased from 2 to 3
Jan 12 14:09:00 guard smartd[596]: Device: /dev/ada0, 80 Currently unreadable (pending) sectors

# smartctl /dev/ada0 --log=selftest

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       10%       363         975890088
# 2  Extended offline    Completed: read failure       10%       362         975884072
# 3  Extended offline    Completed: read failure       10%       361         975876336
# 4  Short offline       Completed without error       00%       360         -
# 5  Extended offline    Completed: read failure       40%       359         577493400
# 6  Short offline       Completed without error       00%       352         -
# 7  Short offline       Completed without error       00%       328         -
# 8  Short offline       Completed without error       00%       311         -
#

ブロックサイズが 32768 の場合

b = (int)(975890088 * 512 / 32768) = (int) 15248282 (小数点以下切捨) 

# sysctl kern.geom.debugflags=0x10

kern.geom.debugflags: 0 -> 16

# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=15248282

1+0 records in
1+0 records out
32768 bytes transferred in 0.000234 secs (140105438 bytes/sec)

# sysctl kern.geom.debugflags=0

kern.geom.debugflags: 16 -> 0

# smartctl --test=long /dev/ada0

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 62 minutes for test to complete.
Test will complete after Fri Jan 12 15:35:51 2018

Use smartctl -X to abort test.

http://hiro-system.blog.ocn.ne.jp/blog/2010/11/smartd_995c.html

http://www.wizard-limit.net/mt/pc/archives/2011_08.html

smartdでエラーメッセージを受け取ったときの応急対応

grep -i "smartd" /var/log/messages | tail
guard# grep -i "smartd" /var/log/messages | tail
Jan 11 12:39:00 guard smartd[596]: Device: /dev/ada0, 104 Currently unreadable (pending) sectors
Jan 11 13:09:00 guard smartd[596]: Device: /dev/ada0, 104 Currently unreadable (pending) sectors
Jan 11 13:39:00 guard smartd[596]: Device: /dev/ada0, 104 Currently unreadable (pending) sectors

sector errorを修復を試みる。

# smartctl /dev/ada0 --log=selftest
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       40%       359         577493400
# 2  Short offline       Completed without error       00%       352         -
# 3  Short offline       Completed without error       00%       328         -
# 4  Short offline       Completed without error       00%       311         -
# 5  Short offline       Completed without error       00%       304         -
# 6  Short offline       Completed without error       00%       280         -
# 7  Short offline       Completed without error       00%       256         -
# 8  Short offline       Completed without error       00%       232         -
# 9  Extended offline    Completed without error       00%       213         -
#10  Short offline       Completed without error       00%       208         -
#11  Short offline       Completed without error       00%       184         -
#12  Short offline       Completed without error       00%       160         -
#13  Short offline       Completed without error       00%       136         -
#14  Short offline       Completed without error       00%       112         -
#15  Short offline       Completed without error       00%        88         -
#16  Short offline       Completed without error       00%        64         -
#17  Extended offline    Completed without error       00%        45         -
#18  Short offline       Completed without error       00%        40         -
guard#
# fdisk
******* Working on device /dev/ada0 *******
parameters extracted from in-core disklabel are:
cylinders=969021 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=969021 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 64, size 976773103 (476939 Meg), flag 80 (active)
        beg: cyl 0/ head 1/ sector 2;
        end: cyl 1023/ head 255/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>
# disklabel -A /dev/ada0s1
# /dev/ada0s1:
type: unknown
disk:
label:
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 16
sectors/cylinder: 1008
cylinders: 969020
sectors/unit: 976773103
rpm: 3600
interleave: 0
trackskew: 0
cylinderskew: 0
headswitch: 0           # milliseconds
track-to-track seek: 0  # milliseconds
drivedata: 0

8 partitions:
#          size     offset    fstype   [fsize bsize bps/cpg]
  a:  968884224          0    4.2BSD        0     0     0
  b:    7888878  968884224      swap
  c:  976773103          0    unused        0     0     # "raw" part, don't edit

ということで、bsize が0となってしまう・・・?

ファイルシステムのブロックサイズを調べるには、次のコマンドを使います。

# dumpfs /some/filesystem | grep '^bsize'

ということなので、

guard# dumpfs /dev/ad4s1 | grep '^bsize'
bsize   16384   shift   14      mask    0xffffc000

piano2ndだと、gpart list すると、ada0p2がファイル領域のようなので

root@piano2nd:~ # dumpfs /dev/ada0p2  | grep '^bsize'
bsize   32768   shift   15      mask    0xffff8000

小さなファイルのstatを見てみて確認

# stat .screenrc
114 5056131 -rw-r--r-- 1 root wheel 10102700 54 "Jan 12 10:43:37 2018" "Dec 29 06:40:50 2017" "Dec 29 06:40:58 2017" "Dec 29 06:40:50 2017" 32768 8 0 .screenrc

http://d.hatena.ne.jp/parasporospa/touch/searchdiary?word=*%5Bunix%5D&of=20

によれば、st_blksize ファイルシステム I/O 操作での最適なブロックサイズ は、16384 ここでは、32768と。

以下の公式からseek箇所を算出。

http://hiro-system.blog.ocn.ne.jp/blog/2010/11/smartd_995c.html

http://see-take.blogspot.jp/2010/01/hddsmart.html

では、以下の様な公式が掲載されていますが、これはFreeBSDだとそのままでは当てはまりません。

       b = (int)((L-S)*512/B)
       where:
       b = File System block number
       B = File system block size in bytes
       L = LBA of bad sector
       S = Starting sector of partition as shown by fdisk -lu
       and (int) denotes the integer part.

S = 0、B = 4096、L = 577493400を公式に代入。

b = (int)(1043624 * 512 / 4096) = (int) 130453 (小数点以下切捨)

で、こちらとなります。

       b = (int)((L-S)*512/B)
       where:
       b = File System block number
       B = File system block size in bytes (dumpfs 32768)
       L = LBA of bad sector
       S = Starting sector of partition as shown by fdisk
       and (int) denotes the integer part.

S = 0、B = 32768、L = 577493400を公式に代入。

b = (int)(577493400 * 512 / 32768) = (int) 9023334.375 (小数点以下切捨) 

となった。該当箇所をddでゼロで埋める。

guard# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=9023334
dd: /dev/ada0: Operation not permitted

と言われるので、ググって

guard# sysctl kern.geom.debugflags=0x10
kern.geom.debugflags: 0 -> 16

としてから、

guard# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=9023334
guard# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=9023334
1+0 records in
1+0 records out
16384 bytes transferred in 0.000281 secs (58286240 bytes/sec)

と、書き換えて、

guard# sysctl kern.geom.debugflags=0
kern.geom.debugflags: 16 -> 0

お目に留まればもとへとかえす。

そして、正常にもどったかどうかチェック!

guard# smartctl --test=long /dev/ad4
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 40 minutes for test to complete.
Test will complete after Sat Feb 16 13:59:56 2013

Use smartctl -X to abort test.
guard#

・・・14時になったので

guard# smartctl /dev/ad4 --log=selftest
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     19437         -
# 2  Extended offline    Completed: read failure       90%     19431         310658987
# 3  Short offline       Completed: read failure       90%     19427         310658987
# 4  Short offline       Completed: read failure       90%     19403         310658987
# 5  Short offline       Completed: read failure       90%     19379         310658987
# 6  Short offline       Completed: read failure       90%     19355         310658987
# 7  Short offline       Completed: read failure       90%     19331         310658987
# 8  Short offline       Completed: read failure       90%     19307         310658987
# 9  Short offline       Completed: read failure       90%     19283         310658987
#10  Extended offline    Completed: read failure       90%     19263         310658987
#11  Short offline       Completed: read failure       90%     19259         310658987
#12  Short offline       Completed: read failure       90%     19235         310658987
#13  Short offline       Completed: read failure       90%     19211         310658987
#14  Short offline       Completed: read failure       90%     19187         310658987
#15  Short offline       Completed: read failure       90%     19163         310658987
#16  Short offline       Completed: read failure       90%     19139         310658987
#17  Short offline       Completed: read failure       90%     19115         310658987
#18  Extended offline    Completed: read failure       90%     19095         310658987
#19  Short offline       Completed: read failure       90%     19091         310658987
#20  Short offline       Completed: read failure       90%     19067         310658987
#21  Short offline       Completed: read failure       90%     19043         310658987
20 of 20 failed self-tests are outdated by newer successful extended offline self-test # 1

guard#

うまく、いったっぽい!( ´▽`)ノ


トップ   編集 凍結 差分 履歴 添付 複製 名前変更 リロード   新規 一覧 検索 最終更新   ヘルプ   最終更新のRSS
Last-modified: 2018-01-12 (金) 14:51:29