[[Guard]]
#contents


*smartdでエラーメッセージを受け取ったときの応急対応・コマンドまとめ [#y75b9039]


*** # grep -i "smartd" /var/log/messages | tail [#q00a8915]
 Jan 12 11:09:01 guard smartd[596]: Device: /dev/ada0, 96 Currently unreadable (pending) sectors (changed -8)
 Jan 12 11:39:01 guard smartd[596]: Device: /dev/ada0, 96 Currently unreadable (pending) sectors
 Jan 12 12:09:01 guard smartd[596]: Device: /dev/ada0, 96 Currently unreadable (pending) sectors
 Jan 12 12:09:01 guard smartd[596]: Device: /dev/ada0, previous self-test completed with error (read test element)
 Jan 12 12:09:01 guard smartd[596]: Device: /dev/ada0, Self-Test Log error count increased from 1 to 2
 Jan 12 12:39:01 guard smartd[596]: Device: /dev/ada0, 88 Currently unreadable (pending) sectors (changed -8)
 Jan 12 13:09:01 guard smartd[596]: Device: /dev/ada0, 88 Currently unreadable (pending) sectors
 Jan 12 13:39:00 guard smartd[596]: Device: /dev/ada0, 80 Currently unreadable (pending) sectors (changed -8)
 Jan 12 13:39:01 guard smartd[596]: Device: /dev/ada0, Self-Test Log error count increased from 2 to 3
 Jan 12 14:09:00 guard smartd[596]: Device: /dev/ada0, 80 Currently unreadable (pending) sectors

*** # smartctl /dev/ada0 --log=selftest [#pb39d335]
 smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p4 amd64] (local build)
 Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
 
 === START OF READ SMART DATA SECTION ===
 SMART Self-test log structure revision number 1
 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
 # 1  Extended offline    Completed: read failure       10%       363         975890088
 # 2  Extended offline    Completed: read failure       10%       362         975884072
 # 3  Extended offline    Completed: read failure       10%       361         975876336
 # 4  Short offline       Completed without error       00%       360         -
 # 5  Extended offline    Completed: read failure       40%       359         577493400
 # 6  Short offline       Completed without error       00%       352         -
 # 7  Short offline       Completed without error       00%       328         -
 # 8  Short offline       Completed without error       00%       311         -
 #

ブロックサイズが 32768 の場合

 b = (int)(975890088 * 512 / 32768) = (int) 15248282 (小数点以下切捨) 



*** # sysctl kern.geom.debugflags=0x10 [#v8eb0d4f]
 kern.geom.debugflags: 0 -> 16

*** # dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=15248282 [#paff7c7c]
 1+0 records in
 1+0 records out
 32768 bytes transferred in 0.000234 secs (140105438 bytes/sec)

*** # sysctl kern.geom.debugflags=0 [#rc7012c8]
 kern.geom.debugflags: 16 -> 0

*** # smartctl --test=long /dev/ada0 [#v22d083d]
 smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p4 amd64] (local build)
 Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
 
 === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
 Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
 Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
 Testing has begun.
 Please wait 62 minutes for test to complete.
 Test will complete after Fri Jan 12 15:35:51 2018
 
 Use smartctl -X to abort test.

----









http://hiro-system.blog.ocn.ne.jp/blog/2010/11/smartd_995c.html

http://www.wizard-limit.net/mt/pc/archives/2011_08.html

***smartdでエラーメッセージを受け取ったときの応急対応 [#q2b4c761]
**smartdでエラーメッセージを受け取ったときの応急対応 [#q2b4c761]
 grep -i "smartd" /var/log/messages | tail

 guard# grep -i "smartd" /var/log/messages | tail
 Feb 14 05:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
 Feb 14 06:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
 Feb 14 06:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
 Feb 14 07:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
 Feb 14 07:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
 Feb 14 08:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
 Feb 14 08:57:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
 Feb 14 09:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
 Feb 14 09:57:57 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
 Feb 14 10:27:56 guard smartd[779]: Device: /dev/ad4, 1 Currently unreadable (pending) sectors
 guard#
 Jan 11 12:39:00 guard smartd[596]: Device: /dev/ada0, 104 Currently unreadable (pending) sectors
 Jan 11 13:09:00 guard smartd[596]: Device: /dev/ada0, 104 Currently unreadable (pending) sectors
 Jan 11 13:39:00 guard smartd[596]: Device: /dev/ada0, 104 Currently unreadable (pending) sectors


***sector errorを修復を試みる。 [#nb685c9b]
**sector errorを修復を試みる。 [#nb685c9b]

 guard# smartctl /dev/ad4 --log=selftest
 smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
 Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
 # smartctl /dev/ada0 --log=selftest
 smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-RELEASE-p4 amd64] (local build)
 Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
 
 === START OF READ SMART DATA SECTION ===
 SMART Self-test log structure revision number 1
 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
 # 1  Short offline       Completed: read failure       90%     19379         310658987
 # 2  Short offline       Completed: read failure       90%     19355         310658987
 # 3  Short offline       Completed: read failure       90%     19331         310658987
 # 4  Short offline       Completed: read failure       90%     19307         310658987
 # 5  Short offline       Completed: read failure       90%     19283         310658987
 # 6  Extended offline    Completed: read failure       90%     19263         310658987
 # 7  Short offline       Completed: read failure       90%     19259         310658987
 # 8  Short offline       Completed: read failure       90%     19235         310658987
 # 9  Short offline       Completed: read failure       90%     19211         310658987
 #10  Short offline       Completed: read failure       90%     19187         310658987
 #11  Short offline       Completed: read failure       90%     19163         310658987
 #12  Short offline       Completed: read failure       90%     19139         310658987
 #13  Short offline       Completed: read failure       90%     19115         310658987
 #14  Extended offline    Completed: read failure       90%     19095         310658987
 #15  Short offline       Completed: read failure       90%     19091         310658987
 #16  Short offline       Completed: read failure       90%     19067         310658987
 #17  Short offline       Completed: read failure       90%     19043         310658987
 #18  Short offline       Completed: read failure       90%     19019         310658987
 #19  Short offline       Completed: read failure       90%     18995         310658987
 #20  Short offline       Completed: read failure       90%     18971         310658987
 #21  Short offline       Completed: read failure       90%     18947         310658987
 
 # 1  Extended offline    Completed: read failure       40%       359         577493400
 # 2  Short offline       Completed without error       00%       352         -
 # 3  Short offline       Completed without error       00%       328         -
 # 4  Short offline       Completed without error       00%       311         -
 # 5  Short offline       Completed without error       00%       304         -
 # 6  Short offline       Completed without error       00%       280         -
 # 7  Short offline       Completed without error       00%       256         -
 # 8  Short offline       Completed without error       00%       232         -
 # 9  Extended offline    Completed without error       00%       213         -
 #10  Short offline       Completed without error       00%       208         -
 #11  Short offline       Completed without error       00%       184         -
 #12  Short offline       Completed without error       00%       160         -
 #13  Short offline       Completed without error       00%       136         -
 #14  Short offline       Completed without error       00%       112         -
 #15  Short offline       Completed without error       00%        88         -
 #16  Short offline       Completed without error       00%        64         -
 #17  Extended offline    Completed without error       00%        45         -
 #18  Short offline       Completed without error       00%        40         -
 guard#
***guardバックアップ先 [#a004a988]
 30	2	*	*	*	root	/root/bin/backup_to_k222_all.sh
 30	0	*	*	*	root	/root/bin/backup_to_BlackHole_all.sh


 # smartctl /dev/sdz --log=selftest
 smartctl version x.xx Copyright (C) 2002-8 Bruce Allen
 Home page is http://smartmontools.sourceforge.net/
 
 === START OF READ SMART DATA SECTION ===
 SMART Self-test log structure revision number 1
 Num  Test_Description    Status                  Remaining  LifeTime(hours)   LBA_of_first_error
 # 1  Short offline       Completed: read failure       10%     19935         1043624
 # 2  Extended offline    Completed without error       00%     18472         -
 # 3  Short offline       Completed without error       00%     18469         -
 # 4  Extended offline    Completed without error       00%     18447         -
 # 5  Extended offline    Completed without error       00%     16669         -
 # 6  Short offline       Completed without error       00%     16645         -
 # 7  Extended offline    Completed without error       00%     13278         -
 # 8  Short offline       Completed without error       00%     12081         -

 guard# fdisk
 ******* Working on device /dev/ad4 *******
 # fdisk
 ******* Working on device /dev/ada0 *******
 parameters extracted from in-core disklabel are:
 cylinders=310020 heads=16 sectors/track=63 (1008 blks/cyl)
 cylinders=969021 heads=16 sectors/track=63 (1008 blks/cyl)
 
 Figures below won't work with BIOS for partitions not in cyl 1
 parameters to be used for BIOS calculations are:
 cylinders=310020 heads=16 sectors/track=63 (1008 blks/cyl)
 cylinders=969021 heads=16 sectors/track=63 (1008 blks/cyl)
 
 Media sector size is 512
 Warning: BIOS sector numbering starts with sector 1
 Information from DOS bootblock is:
 The data for partition 1 is:
 sysid 165 (0xa5),(FreeBSD/NetBSD/386BSD)
    start 63, size 312496317 (152586 Meg), flag 80 (active)
        beg: cyl 0/ head 1/ sector 1;
        end: cyl 1023/ head 3/ sector 63
     start 64, size 976773103 (476939 Meg), flag 80 (active)
         beg: cyl 0/ head 1/ sector 2;
         end: cyl 1023/ head 255/ sector 63
 The data for partition 2 is:
 <UNUSED>
 The data for partition 3 is:
 <UNUSED>
 The data for partition 4 is:
 <UNUSED>
 guard#
 

 guard# disklabel  -A /dev/ad4s1
 # /dev/ad4s1:
 type: ESDI
 disk: ad4s1
 # disklabel -A /dev/ada0s1
 # /dev/ada0s1:
 type: unknown
 disk:
 label:
 flags:
 bytes/sector: 512
 sectors/track: 63
 tracks/cylinder: 16
 sectors/cylinder: 1008
 cylinders: 310020
 sectors/unit: 312500160
 cylinders: 969020
 sectors/unit: 976773103
 rpm: 3600
 interleave: 1
 interleave: 0
 trackskew: 0
 cylinderskew: 0
 headswitch: 0           # milliseconds
 track-to-track seek: 0  # milliseconds
 drivedata: 0
 
 8 partitions:
 #        size   offset    fstype   [fsize bsize bps/cpg]
  a:  1048576        0    4.2BSD        0     0     0
  b:  4092240  1048576      swap
  c: 312496317        0    unused        0     0         # "raw" part, don't edit
  d:  4143104  5140816    4.2BSD        0     0     0
  e:  1048576  9283920    4.2BSD        0     0     0
  f: 302163821 10332496    4.2BSD        0     0     0
 disklabel: partition c doesn't cover the whole unit!
 disklabel: An incorrect partition c may cause problems for standard system utilities
 guard#
 #          size     offset    fstype   [fsize bsize bps/cpg]
   a:  968884224          0    4.2BSD        0     0     0
   b:    7888878  968884224      swap
   c:  976773103          0    unused        0     0     # "raw" part, don't edit


ということで、bsize が0となってしまう・・・?

***ファイルシステムのブロックサイズを調べるには、次のコマンドを使います。 [#m5fe276c]
**ファイルシステムのブロックサイズを調べるには、次のコマンドを使います。 [#m5fe276c]

 # dumpfs /some/filesystem | grep '^bsize'
ということなので、
 guard# dumpfs /dev/ad4s1 | grep '^bsize'
 bsize   16384   shift   14      mask    0xffffc000

piano2ndだと、gpart list すると、ada0p2がファイル領域のようなので

 root@piano2nd:~ # dumpfs /dev/ada0p2  | grep '^bsize'
 bsize   32768   shift   15      mask    0xffff8000

 guard# stat
 100728576 88 crw--w---- 1 root tty 88 0 "Feb 14 12:47:16 2013" "Feb 14 12:47:16 2013" "Feb 14 12:47:16 2013" "Jan  1 08:59:59 1970" 4096 0 0 /dev/pts/1
 guard# stat w-filter_1_02.sh
 80 16587 -rwxr-xr-x 1 root wheel 68531 1776 "Jan 12 14:00:52 2012" "Apr 27 14:46:06 2007" "Nov 15 11:53:50 2010" "Apr 27 14:46:06 2007" 16384 4 0 w-filter_1_02.sh
小さなファイルのstatを見てみて確認

 # stat .screenrc
 114 5056131 -rw-r--r-- 1 root wheel 10102700 54 "Jan 12 10:43:37 2018" "Dec 29 06:40:50 2017" "Dec 29 06:40:58 2017" "Dec 29 06:40:50 2017" 32768 8 0 .screenrc

http://d.hatena.ne.jp/parasporospa/touch/searchdiary?word=*%5Bunix%5D&of=20

によれば、st_blksize ファイルシステム I/O 操作での最適なブロックサイズ は、16384
によれば、st_blksize ファイルシステム I/O 操作での最適なブロックサイズ は、16384 ここでは、32768と。

***以下の公式からseek箇所を算出。 [#t7d208fe]
**以下の公式からseek箇所を算出。 [#t7d208fe]
http://hiro-system.blog.ocn.ne.jp/blog/2010/11/smartd_995c.html

http://see-take.blogspot.jp/2010/01/hddsmart.html

では、以下の様な公式が掲載されていますが、これはFreeBSDだとそのままでは当てはまりません。

        b = (int)((L-S)*512/B)
        where:
        b = File System block number
        B = File system block size in bytes
        L = LBA of bad sector
        S = Starting sector of partition as shown by fdisk -lu
        and (int) denotes the integer part.
 
 S = 0、B = 4096、L = 1043624を公式に代入。
 S = 0、B = 4096、L = 577493400を公式に代入。
 
 b = (int)(1043624 * 512 / 4096) = (int) 130453 (小数点以下切捨)

で、こちらとなります。
        b = (int)((L-S)*512/B)
        where:
        b = File System block number
        B = File system block size in bytes (dumpfs 16384)
        B = File system block size in bytes (dumpfs 32768)
        L = LBA of bad sector
        S = Starting sector of partition as shown by fdisk
        and (int) denotes the integer part.
 
 S = 0、B = 16384、L = 310658987を公式に代入。
 S = 0、B = 32768、L = 577493400を公式に代入。
 
 b = (int)(310658987 * 512 / 16384) = (int) 9708093.34375 (小数点以下切捨)
 b = (int)(577493400 * 512 / 32768) = (int) 9023334.375 (小数点以下切捨) 




となった。該当箇所をddでゼロで埋める。
 

 # dd if=/dev/zero of=/dev/sdz bs=4096 count=1 seek=130453 ⇒ dd if=/dev/zero of=/dev/ad4 bs=16384 count=1 seek=9708093
    1+0 records in
    1+0 records out
    4096 bytes (4.1 kB) copied, 0.000270448 s, 43.8 MB/s
 guard# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=9023334
 dd: /dev/ada0: Operation not permitted
と言われるので、ググって
 guard# sysctl kern.geom.debugflags=0x10
 kern.geom.debugflags: 0 -> 16
としてから、
 guard# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=9023334
 guard# dd if=/dev/zero of=/dev/ada0 bs=32768 count=1 seek=9023334
 1+0 records in
 1+0 records out
 16384 bytes transferred in 0.000281 secs (58286240 bytes/sec)
と、書き換えて、
 guard# sysctl kern.geom.debugflags=0
 kern.geom.debugflags: 16 -> 0
お目に留まればもとへとかえす。
 

そして、正常にもどったかどうかチェック!
 

 # smartctl –-test=long /dev/sdz
 guard# smartctl --test=long /dev/ad4
 smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
 Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
 

・・・数時間経過後。
 === START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
 Sending command: "Execute SMART Extended self-test routine immediately in off-line mode".
 Drive command "Execute SMART Extended self-test routine immediately in off-line mode" successful.
 Testing has begun.
 Please wait 40 minutes for test to complete.
 Test will complete after Sat Feb 16 13:59:56 2013
 
 Use smartctl -X to abort test.
 guard#

 # smartctl –-log=selftest /dev/sdz
 smartctl version x.xx Copyright (C) 2002-8 Bruce Allen
 Home page is http://smartmontools.sourceforge.net/
・・・14時になったので
 guard# smartctl /dev/ad4 --log=selftest
 smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.1-RELEASE-p13 i386] (local build)
 Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
 
 === START OF READ SMART DATA SECTION ===
 SMART Self-test log structure revision number 1
 Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
 # 1  Extended offline    Completed without error       00%     19937         -
 # 2  Short offline       Completed: read failure       10%     19935         1043624
 # 3  Extended offline    Completed without error       00%     18472         -
 # 4  Short offline       Completed without error       00%     18469         -
 # 5  Extended offline    Completed without error       00%     18447         -
 # 6  Extended offline    Completed without error       00%     16669         -
 # 7  Short offline       Completed without error       00%     16645         -
 # 8  Extended offline    Completed without error       00%     13278         -
 # 9  Short offline       Completed without error       00%     12081         -
 # 1  Extended offline    Completed without error       00%     19437         -
 # 2  Extended offline    Completed: read failure       90%     19431         310658987
 # 3  Short offline       Completed: read failure       90%     19427         310658987
 # 4  Short offline       Completed: read failure       90%     19403         310658987
 # 5  Short offline       Completed: read failure       90%     19379         310658987
 # 6  Short offline       Completed: read failure       90%     19355         310658987
 # 7  Short offline       Completed: read failure       90%     19331         310658987
 # 8  Short offline       Completed: read failure       90%     19307         310658987
 # 9  Short offline       Completed: read failure       90%     19283         310658987
 #10  Extended offline    Completed: read failure       90%     19263         310658987
 #11  Short offline       Completed: read failure       90%     19259         310658987
 #12  Short offline       Completed: read failure       90%     19235         310658987
 #13  Short offline       Completed: read failure       90%     19211         310658987
 #14  Short offline       Completed: read failure       90%     19187         310658987
 #15  Short offline       Completed: read failure       90%     19163         310658987
 #16  Short offline       Completed: read failure       90%     19139         310658987
 #17  Short offline       Completed: read failure       90%     19115         310658987
 #18  Extended offline    Completed: read failure       90%     19095         310658987
 #19  Short offline       Completed: read failure       90%     19091         310658987
 #20  Short offline       Completed: read failure       90%     19067         310658987
 #21  Short offline       Completed: read failure       90%     19043         310658987
 20 of 20 failed self-tests are outdated by newer successful extended offline self-test # 1
 
 guard#
うまく、いったっぽい!( ´▽`)ノ

以上。


トップ   新規 一覧 検索 最終更新   ヘルプ   最終更新のRSS