Monitorare/ricostruire un hardware RAID (Linux)

Controller hardware integrati

Nei root server di 1&1 IONOS vengono utilizzati due tipi di controller hardware: 3ware e Areca.

Puoi vedere direttamente sul server quale controller è installato sul tuo server:

# lspci|grep RAID
01:09.0 RAID bus controller: 3ware Inc 7xxx/8xxx-series PATA/SATA-RAID (rev 01)

oppure

# lspci|grep RAID
02:0e.0 RAID bus controller: Areca Technology Corp. ARC-1110 4-Port PCI-X to SATA RAID Controller

3ware RAID

Il seguente comando fornisce dettagli sul controller:

# dmesg|grep 3ware
3ware Storage Controller device driver for Linux v1.26.02.002.
scsi0 : 3ware Storage Controller
3w-xxxx: scsi0: Found a 3ware Storage Controller at 0xd800, IRQ: 18.
scsi 0:0:0:0: Direct-Access 3ware Logical Disk 0 1.2 PQ: 0 ANSI: 0
3ware 9000 Storage Controller device driver for Linux v2.26.02.010.

tw_cli

tw_cli apre la connessione alla console del controller.
Il comando help fornisce tutti i comandi disponibili.

# tw_cli
//XXX> help

Copyright(c) 2004-2006 Applied Micro Circuits Corporation(AMCC). All rights reserved.

AMCC/3ware CLI (version 2.00.06.007)


Commands Description
-------------------------------------------------------------------
focus Changes from one object to another. For Interactive Mode Only!
show Displays information about controller(s), unit(s) and port(s).
flush Flush write cache data to units in the system.
rescan Rescan all empty ports for new unit(s) and disk(s).
update Update controller firmware from an image file.
commit Commit dirty DCB to storage on controller(s). (Windows only)
/cx Controller specific commands.
/cx/ux Unit specific commands.
/cx/px Port specific commands.
/cx/bbu BBU specific commands. (9000 only)
/ex Enclosure specific commands. (9KSX/SE only)
/ex/slotx Enclosure Slot specific commands.
/ex/fanx Enclosure Fan specific commands.
/ex/tempx Enclosure Temperature Sensor specific commands.

Certain commands are qualified with constraints of controller type/model support.
Please consult the twi_cli documentation for explanation of the controller-qualifiers.

The controller-qualifiers of the Enclosure commands (/ex) also apply to Enclosure
Element specific commands (e.g., /ex/elementx).

Type help <command> to get more details about a particular command.
For more detail information see twi_cli's documentation.

//XXX>

Richiedere le informazioni e lo stato del raid:

//XXXX> info

Ctl Model Ports Drives Units NotOpt RRate VRate BBU
------------------------------------------------------------------------
c0 8006-2LP 2 2 1 0 2 - -

//XXXX> info c0

Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-1 OK - - - 232.885 ON -

Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 232.88 GB 488397168 4ND0XYFE
p1 OK u0 232.88 GB 488397168 4ND0YH77

Visualizzare gli allarmi:

//XXXX> show alarms

Ctl Date Severity Alarm Message
------------------------------------------------------------------------------
c0 - INFO (0x0F:0x0007): Initialization complete: Unit #0
c0 - INFO (0x0F:0x000C): Initialization started: Unit #0

In caso di errore, l'output assomiglia ad es. a questo:

//XXXX> show alarms

Ctl Date Severity Alarm Message
------------------------------------------------------------------------------
c0 - INFO (0x0F:0x000B): Rebuild started: Unit #0
c0 - ERROR (0x0F:0x0002): Unit degraded: Unit #0

Rimuovere il disco rigido difettoso dal RAID alla seconda porta:

//XXXX> maint remove c0 p1
Removing port /c0/p1 ... Done.

Dopo la sostituzione del disco rigido difettoso, il nuovo disco rigido è visibile:

//XXXX> maint rescan
Rescanning controller /c0 for units and drives ...Done.
Found the following unit(s): [none].
Found the following drive(s): [/c0/p1].

Successivamente il disco viene quindi collegato alla seconda porta e ricostruito:

//XXXX> maint rebuild c0 u0 p1
Sending rebuild start request to /c0/u0 on 1 disk(s) [1] ... Done.

Visualizzare lo stato di ricostruzione:

//XXXX> info c0

Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-1 REBUILDING 0 - - 232.885 ON -

Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 232.88 GB 488397168 4ND0XYFE
p1 DEGRADED u0 232.88 GB 488397168 4ND0YH77

3dm2

A fini di monitoraggio, 3ware offre il software 3dm2, che puoi scaricare dal sito http://www.3ware.com/support/downloadpage.asp. Per ulteriori informazioni sull'installazione, configurazione e applicazione, consulta la documentazione 3ware (http://www.3ware.com/support/userdocs.asp).

Areca RAID

Controlla se viene utilizzato un controller Areca:

# lspci|grep RAID
02:0e.0 RAID bus controller: Areca Technology Corp. ARC-1110 4-Port PCI-X to SATA RAID Controller

# dmesg|grep -i areca
ARECA RAID ADAPTER0: FIRMWARE VERSION V1.43 2007-4-17
scsi0 : Areca SATA Host Adapter RAID Controller
scsi 0:0:0:0: Direct-Access Areca ARC-1110-VOL#00 R001 PQ: 0 ANSI: 5
scsi 0:0:16:0: Processor Areca RAID controller R001 PQ: 0 ANSI: 0

Puoi scaricare il manuale CLI completo su Areca all'indirizzo http://areca.starline.de/RaidCards/Documents/Manual_Spec/Software/.

Nell'esempio seguente sono elencati alcuni comandi. È possibile accedere al controller nel Rescue System:

arcmsr_cli64
Copyright (c) 2004 Areca, Inc. All Rights Reserved.
Areca CLI, Version: 1.71.240( Linux )


Controllers List
----------------------------------------
Controller#01(PCI): ARC-1110
Current Controller: Controller#01
----------------------------------------

CMD Description
==========================================================
main Show Command Categories.
set General Settings.
rsf RaidSet Functions.
vsf VolumeSet Functions.
disk Physical Drive Functions.
sys System Functions.
net Ethernet Functions.
event Event Functions.
hw Hardware Monitor Information.
exit Exit CLI.
==========================================================
Command Format: <CMD> [Sub-Command] [Parameters].
Note: Use <CMD> -h or -help to get details.
CLI>

Puoi visualizzare le informazioni del sistema con il comando <cmd> Info, ad esempio le informazioni sul Monitor Hardware (temperatura):

CLI> hw info
The Hardware Monitor Information
===========================================
Fan#1 Speed (RPM) : 2673
HDD #1 Temp. : 48
HDD #2 Temp. : 47
HDD #3 Temp. : 51
HDD #4 Temp. : 0
===========================================
GuiErrMsg<0x00>: Success.

CLI>

Puoi ottenere informazioni sui dischi rigidi con:

CLI> disk info
# ModelName Serial# FirmRev Capacity State
===============================================================================
1 ST3750640AS 5QD5G7Z1 3.AAK 750.2GB RaidSet Member(1)
2 ST3750640AS 5QD5G6JR 3.AAK 750.2GB RaidSet Member(1)
3 ST3750640AS 5QD5G7XQ 3.AAK 750.2GB RaidSet Member(1)
===============================================================================
GuiErrMsg<0x00>: Success.

CLI>

Informazioni sul controller stesso:

CLI> sys info
The System Information
===========================================
Main Processor : 500MHz
CPU ICache Size : 32KB
CPU DCache Size : 32KB
System Memory : 256MB/333MHz
Firmware Version : V1.43 2007-4-17
BOOT ROM Version : V1.43 2007-4-17
Serial Number : Y813CAAAAR101890
Controller Name : ARC-1110
===========================================
GuiErrMsg<0x00>: Success.

CLI>

Visualizzare gli eventi attuali:

CLI> event info
Date-Time Device Event Type
===============================================================================
2009-07-09 07:23:14 H/W MONITOR Raid Powered On
2008-09-29 08:06:24 H/W MONITOR Raid Powered On
2008-09-29 07:51:37 H/W MONITOR Raid Powered On
...

Richiedere informazioni sul set di raid corrente (qui sono installati 3*750 GB):

CLI> rsf info
# Name Disks TotalCap FreeCap DiskChannels State
===============================================================================
1 Raid Set # 00 3 2250.5GB 0.0GB 123 Normal
===============================================================================
GuiErrMsg<0x00>: Success.

CLI>

Informazioni sui volumi RAID logici:

CLI> vsf info
# Name Raid# Level Capacity Ch/Id/Lun State
===============================================================================
1 ARC-1110-VOL#00 1 Raid5 1500.3GB 00/00/00 Normal
===============================================================================
GuiErrMsg<0x00>: Success.

CLI>

Se desideri apportare modifiche ad un hardware Raid con Controller Areca, è necessaria una password. La password predefinita è "0000". Qui puoi vedere un esempio di comando.

<CLI> set password=0000. 

Raid difettoso e ricostruzione

Un Raid difettoso potrebbe assomigliare a questo:

CLI> rsf info
# Name Disks TotalCap FreeCap DiskChannels State
===============================================================================
1 Raid Set # 00 3 2250.5GB 0.0GB 1x3 Degrade
2 Raid Set # 00 3 2250.5GB 2250.5GB x2x Incompleted
===============================================================================
GuiErrMsg<0x00>: Success.

Il set Raid con lo stato Incompleted dovrebbe essere eliminato:

CLI> rsf delete raid=2
GuiErrMsg<0x00>: Success.
CLI> rsf info
# Name Disks TotalCap FreeCap DiskChannels State
===============================================================================
1 Raid Set # 00 3 2250.5GB 0.0GB 1x3 Degrade
===============================================================================
GuiErrMsg<0x00>: Success.

Successivamente rimonta il disco rigido come Hot Spare:

CLI> rsf createhs drv=2
GuiErrMsg<0x00>: Success.

Il controller Areca rileva automaticamente un nuovo disco. Non è quindi necessario montare e ricostruire il disco.
La ricostruzione si avvia automaticamente e può essere monitorata:

CLI> rsf info
# Name Disks TotalCap FreeCap DiskChannels State
===============================================================================
1 Raid Set # 00 3 2250.5GB 0.0GB 123 Rebuilding
===============================================================================

GuiErrMsg<0x00>: Success.