dotplan

troubleshooting & performance analysis

Cannot see NetApp LUN’s from Linux?

Tags: , , ,

After some connectivity swap-a-roos in the lab, I could no longer see my LUNS from the linux host attached to my filer.

In this case I am using a QLogic HBA – and I am not using any of the NetApp host side tools – just the sanlun tool.

Using the SANsurfer Menu (/opt/QLogic_Corporation/SANsurferCLI) I can tell that this linux host can see the filers’ LUNS over FC. But there are no SCSI /dev/sdX devices for them, and so Linux cannot use them…

Here’s how I checked to see that there was FC connectivity – which also confirms that the FC protocol is working.

	SANsurfer FC/CNA HBA CLI

	v1.7.2 Build 7

    Main Menu

    1:	General Information  <---- Option 1
    2:	HBA Information
    3:	HBA Parameters
    4:	Target/LUN List
    5:	iiDMA Settings
...

    General Information Menu

    1:	Host Information
    2:	Host Topology
    3:	Report     <---- Option 3..
    4:	Refresh
    5:	Return to Previous Menu

	Note: 0 to return to Main Menu
	Enter Selection: 1

   Report Menu

    HBA Model QLE2462
      1: Port   1: WWPN: 21-00-00-E0-8B-9B-C5-36 Online
      2: Port   2: WWPN: 21-01-00-E0-8B-BB-C5-36 Online
      3: All HBAs  <---- Option 3
      4: Return to Previous Menu

	Note: 0 to return to Main Menu
	Enter Selection: 3

I could see that there was connectivity from the Linux host to the filer

---------------------------------------
LUN 1
---------------------------------------
Product Vendor                    : NETAPP
Product ID                        : LUN
Product Revision                  : 811a
LUN                               : 1
Size                              : 17.93 GB
Type                              : SBC-2 Direct access block device
			           (e.g., magnetic disk)
WWULN                             : 4E-45-54-41-50-50-20-20-20-4C-55-4E-20-32-46-68
			           72-53-3F-2D-68-4F-79-6C-33-00-00-00-00-00-00-00
OS LUN Name                       :

From the filer side, I could see that the host's FC adapters had connected to the filer,
and were in the right igroup

filer1*> igroup show
    filer1 (FCP) (ostype: linux):
        21:00:00:e0:8b:9b:c5:36 (logged in on: 0a)
        21:01:00:e0:8b:bb:c5:36 (logged in on: 0b)

The only thing that was missing was that there were no 'sd' devices created in Linux for these devices.

"sanlun" utility was not helpful and just told me that there wer no LUNs mapped.

The solution was to issue this very odd looking command

linuxhost:[/sys/class/scsi_host] $ echo "- - -" > host0/scan

This caused the sd devices to be created, representing the NetApp LUNs which I knew could already be seen over FC. Since I have both ports on the same HBA attached to the filer, host0 scan created my /dev/sdc* devices, and host1/scan created my /dev/sdd* devices.

The shell 'hung' for the duration of the command, and I would expect that Linux was off in kernel land for some time - and so i would NOT recommend issuing the command on a production server.

I'm still puzzled why the linux host did not see the luns even after reboot though.

Setup/change the default gateway in ONTAP without going through setup

Tags: , , , ,

To setup a default gateway as 10.199.50.1 use the following (note the final ’1′ at the end of the command line.

filer-3170> route add default 10.199.50.1 1

Converting a flexible volume to a traditional volume.

Tags: , , , ,

A simple way to convert a flexible volume to a traditional volume is to use ndmp, within the same filer.

Firstly enable ndmp and turn it on.
filer> options ndmpd.enable on
filer> ndmpd on

Next create the volume that you want to migrate the data to. In this case, I create a traditional volume with 5 disks, and use raid4 protection rather than raid-dp.

filer>vol create vol0_trad -t raid4 5

The output of the command should look similar to this :-

Tue Jul  8 20:33:25 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /vol0_trad/plex0/rg0/0d.4 Shelf 0 Bay 4 [SEAGATE  ST3500630NSAH    AQMZ] S/N [9QG7A9J2] to volume vol0_trad has completed successfully
Tue Jul  8 20:33:25 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /vol0_trad/plex0/rg0/0d.0 Shelf 0 Bay 3 [SEAGATE  ST3500630NSAH    AQMZ] S/N [9QG7KTSY] to volume vol0_trad has completed successfully
Tue Jul  8 20:33:25 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /vol0_trad/plex0/rg0/0d.3 Shelf 0 Bay 2 [SEAGATE  ST3500630NSAH    AQMZ] S/N [9QG7LHPR] to volume vol0_trad has completed successfully
Tue Jul  8 20:33:25 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /vol0_trad/plex0/rg0/0c.6 Shelf 0 Bay 7 [SEAGATE  ST3500630NSAH    AQMZ] S/N [9QG7C91R] to volume vol0_trad has completed successfully
Tue Jul  8 20:33:25 GMT [raid.vol.disk.add.done:notice]: Addition of Disk /vol0_trad/plex0/rg0/0d.1 Shelf 0 Bay 1 [SEAGATE  ST3500630NSAH    AQMZ] S/N [9QG7LHVF] to volume vol0_trad has completed successfully
Creation of a volume with 5 disks has completed.
perfdevsrv015> Tue Jul  8 20:33:26 GMT [wafl.vol.add:notice]: Volume vol0_trad has been added to the system.

Next I use ndmpcopy to copy the files to the new volume. I use the -f switch since I am copying the root volume.
ndmpcopy -f /vol/vol0 /vol/vol0_trad
The output looks as below

Ndmpcopy: Starting copy [ 0 ] ...Ndmpcopy: perfdevsrv015: Notify: Connection established
Ndmpcopy: perfdevsrv015: Notify: Connection established
Ndmpcopy: perfdevsrv015: Connect: Authentication successful
Ndmpcopy: perfdevsrv015: Connect: Authentication successful
Ndmpcopy: perfdevsrv015: Log: DUMP: creating "/vol/vol0/../snapshot_for_backup.0" snapshot.
Ndmpcopy: perfdevsrv015: Log: DUMP: Using Full Volume Dump
Ndmpcopy: perfdevsrv015: Log: DUMP: Date of this level 0 dump: Tue Jul  8 20:35:51 2008.
Ndmpcopy: perfdevsrv015: Log: DUMP: Date of last level 0 dump: the epoch.
Ndmpcopy: perfdevsrv015: Log: DUMP: Dumping /vol/vol0 to NDMP connection
Ndmpcopy: perfdevsrv015: Log: DUMP: mapping (Pass I)[regular files]
Ndmpcopy: perfdevsrv015: Log: DUMP: mapping (Pass II)[directories]
Ndmpcopy: perfdevsrv015: Log: DUMP: estimated 235187 KB.Ndmpcopy: perfdevsrv015: Log: DUMP: dumping (Pass III) [directories]
Ndmpcopy: perfdevsrv015: Log: RESTORE: Tue Jul  8 20:36:40 2008: Begin level 0 restore
Ndmpcopy: perfdevsrv015: Log: RESTORE: Tue Jul  8 20:36:40 2008: Reading directories from the backup
Ndmpcopy: perfdevsrv015: Log: DUMP: dumping (Pass IV) [regular files]
Ndmpcopy: perfdevsrv015: Log: RESTORE: Tue Jul  8 20:36:49 2008: Creating files and directories.
Ndmpcopy: perfdevsrv015: Log: RESTORE: Tue Jul  8 20:36:54 2008: Writing data to files.
Ndmpcopy: perfdevsrv015: Log: RESTORE: Tue Jul  8 20:37:30 2008: Restoring NT ACLs.
Ndmpcopy: perfdevsrv015: Log: DUMP: dumping (Pass V) [ACLs]
Ndmpcopy: perfdevsrv015: Log: DUMP: 238893 KB
Ndmpcopy: perfdevsrv015: Log: RESTORE: RESTORE IS DONE
Ndmpcopy: perfdevsrv015: Log: DUMP: DUMP IS DONE
Ndmpcopy: perfdevsrv015: Log: RESTORE: The destination path is /vol/vol0_trad/
Ndmpcopy: perfdevsrv015: Notify: restore successful
Ndmpcopy: perfdevsrv015: Log: DUMP: Deleting "/vol/vol0/../snapshot_for_backup.0" snapshot.
Ndmpcopy: perfdevsrv015: Notify: dump successful
Ndmpcopy: Transfer successful [ 1 minutes 44 seconds ]
Ndmpcopy: Done

Now since I want to boot from this volume I set the boot option

vol options vol0_trad root

ONTAP tells me that this volume will become the root volume on next boot;

Volume 'vol0_trad' will become root at the next boot.

I can now reboot. I only need to do this step because I am messing with the boot/root volume

filer>reboot

After reboot I can rename the volumes

filer>vol rename vol0 vol0_flex

And now I can safely remove the volume and the aggregate so that I can use the disks in my new traditional volume

filer> vol offline vol0_flex

Tue Jul  8 20:46:01 GMT [wafl.vvol.offline:info]: Volume 'vol0_flex' has been set temporarily offlineVolume 'vol0_flex' is now offline.

Now destroy the volume, so we can get rid of the aggregate

filer> vol destroy vol0_flex

Are you sure you want to destroy this volume? y

Volume 'vol0_flex' destroyed.

filer> aggr offline aggr0

Aggregate 'aggr0' is now offline.

filer> aggr destroy aggr0
Are you sure you want to destroy this aggregate? y

Tue Jul  8 20:46:36 GMT [raid.config.vol.destroyed:info]: Aggregate 'aggr0' destroyed.Aggregate 'aggr0' destroyed.

 Now that I have destroyed the volume and aggegate, I am free to reuse those disks. ONTAP gives a warning that I will have no spare disks, and that this is risky in case of a failure.


filer> vol add vol0 2

WARNING! Continuing with vol add will result in having
no spare disk available for one or more RAID groups.
Are you sure you want to continue with vol add? y

Addition of 2 disks to the volume has been initiated.  The disks need
to be zeroed before addition to the volume.  The process has been initiated
and you will be notified via the system log as disks are added.

Once the disks are zeroed all the disks now belong to the traditional volume.

Change disks used by an aggregate

Tags: , , , ,

Recently we needed to transfer an aggregate that was hosted on an external shelf to disks housed internally in our test FAS2020 and the same trick can be used to move an aggregate from one shelf to another.

The trick uses the disk-replace command. It works serially, and so it is a little time consuming, BUT it retains exactly the same layout on disk, since it’s a bit-for-bit copy.

In short the command to use is ‘disk replace start e.g.

 disk replace start 0b.19 0c.00.0

So here’s what my disk layout looks like – I am using traditional volumes

Volume vol1 (online, raid_dp) (block checksums)  Plex /vol1/plex0 (online, normal, active)    RAID group /vol1/plex0/rg0 (normal)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      dparity   0c.00.6         0c    0   6   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      parity    0c.00.1         0c    0   1   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0b.19           0b    1   3   FC:B   -  ATA   7200 211377/432901760  211921/434014304
      data      0c.00.3         0c    0   3   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.11        0c    0   11  SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.10        0c    0   10  SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.9         0c    0   9   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.8         0c    0   8   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.7         0c    0   7   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.4         0c    0   4   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0b.29           0b    1   13  FC:B   -  ATA   7200 211377/432901760  211921/434014304
      data      0c.00.5         0c    0   5   SA:1   -  SATA  7200 211377/432901760  211977/434130816

Spare disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare           0a.16           0a    1   0   FC:A   -  ATA   7200 211377/432901760  211921/434014304(not zeroed)
spare           0a.20           0a    1   4   FC:A   -  ATA   7200 211377/432901760  211921/434014304
spare           0a.22           0a    1   6   FC:A   -  ATA   7200 211377/432901760  211921/434014304
spare           0a.24           0a    1   8   FC:A   -  ATA   7200 211377/432901760  211921/434014304
spare           0a.26           0a    1   10  FC:A   -  ATA   7200 211377/432901760  211921/434014304
spare           0b.17           0b    1   1   FC:B   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0b.25           0b    1   9   FC:B   -  ATA   7200 211377/432901760  211921/434014304
spare           0b.27           0b    1   11  FC:B   -  ATA   7200 211377/432901760  211921/434014304
spare           0c.00.0         0c    0   0   SA:1   -  SATA  7200 211377/432901760  211977/434130816 (not zeroed)

All the disks marked with a ‘CHAN’ type of FC:A or FC:B are actually SATA drives on the other side of a FC->SATA bridge, and are basically SATA disks in an external shelf. The disks marked SA:1 are SATA drive internal to the FAS2020. In the output above I am part-way through migrating the disks. Some of the disks in my volume/aggregate are internal and some are external. On of the disks that is still on the external shelf is disk 0b.19. Using the command below I will transfer the data on that individual disk to an internal SATA disk (0c.00.0). The command I use is.

 disk replace start 0b.19 0c.00.0

Then I receive a warning messge

*** You are about to copy and replace the following file system disk ***  Disk /vol1/plex0/rg0/0b.19

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      data      0b.19           0b    1   3   FC:B   -  ATA   7200 211377/432901760  211921/434014304
***Really replace disk 0b.19 with 0c.00.0? ydisk replace: Disk 0b.19 was marked for replacing.

The output of sysconfig -r shows something like this

Volume vol1 (online, raid_dp) (block checksums)  Plex /vol1/plex0 (online, normal, active)    RAID group /vol1/plex0/rg0 (normal)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    -------------- 
     dparity   0c.00.6         0c    0   6   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      parity    0c.00.1         0c    0   1   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0b.19           0b    1   3   FC:B   -  ATA   7200 211377/432901760  211921/434014304 (replacing)
      data      0c.00.3         0c    0   3   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.11        0c    0   11  SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.10        0c    0   10  SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.9         0c    0   9   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.8         0c    0   8   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.7         0c    0   7   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.4         0c    0   4   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0b.29           0b    1   13  FC:B   -  ATA   7200 211377/432901760  211921/434014304
      data      0c.00.5         0c    0   5   SA:1   -  SATA  7200 211377/432901760  211977/434130816

And then once the replacement gets under way.. I see my copy under way.

Volume vol1 (online, raid_dp) (block checksums)  Plex /vol1/plex0 (online, normal, active)    RAID group /vol1/plex0/rg0 (normal)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      dparity   0c.00.6         0c    0   6   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      parity    0c.00.1         0c    0   1   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0b.19           0b    1   3   FC:B   -  ATA   7200 211377/432901760  211921/434014304 (replacing, copy in progress)      -> copy   0c.00.0         0c    0   0   SA:1   -  SATA  7200 211377/432901760  211977/434130816 (copy 1% completed)
      data      0c.00.3         0c    0   3   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.11        0c    0   11  SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.10        0c    0   10  SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.9         0c    0   9   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.8         0c    0   8   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.7         0c    0   7   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.4         0c    0   4   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0b.29           0b    1   13  FC:B   -  ATA   7200 211377/432901760  211921/434014304
      data      0c.00.5         0c    0   5   SA:1   -  SATA  7200 211377/432901760  211977/434130816

And finally when the copy is done, 0c.00.0 is part of the aggregate/volume and 0b.19 is spare.

Volume vol1 (online, raid_dp) (block checksums)  Plex /vol1/plex0 (online, normal, active)    RAID group /vol1/plex0/rg0 (normal)

      RAID Disk Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
      --------- ------          ------------- ---- ---- ---- ----- --------------    --------------
      dparity   0c.00.6         0c    0   6   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      parity    0c.00.1         0c    0   1   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.0         0c    0   0   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.3         0c    0   3   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.11        0c    0   11  SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.10        0c    0   10  SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.9         0c    0   9   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.8         0c    0   8   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.7         0c    0   7   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.4         0c    0   4   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.2         0c    0   2   SA:1   -  SATA  7200 211377/432901760  211977/434130816
      data      0c.00.5         0c    0   5   SA:1   -  SATA  7200 211377/432901760  211977/434130816
Spare disks

RAID Disk       Device          HA  SHELF BAY CHAN Pool Type  RPM  Used (MB/blks)    Phys (MB/blks)
---------       ------          ------------- ---- ---- ---- ----- --------------    --------------
Spare disks for block or zoned checksum traditional volumes or aggregates
spare           0a.16           0a    1   0   FC:A   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0a.20           0a    1   4   FC:A   -  ATA   7200 211377/432901760  211921/434014304
spare           0a.22           0a    1   6   FC:A   -  ATA   7200 211377/432901760  211921/434014304
spare           0a.24           0a    1   8   FC:A   -  ATA   7200 211377/432901760  211921/434014304
spare           0a.26           0a    1   10  FC:A   -  ATA   7200 211377/432901760  211921/434014304
spare           0b.17           0b    1   1   FC:B   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0b.19           0b    1   3   FC:B   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)
spare           0b.25           0b    1   9   FC:B   -  ATA   7200 211377/432901760  211921/434014304
spare           0b.27           0b    1   11  FC:B   -  ATA   7200 211377/432901760  211921/434014304
spare           0b.29           0b    1   13  FC:B   -  ATA   7200 211377/432901760  211921/434014304 (not zeroed)

© 2009 dotplan. All Rights Reserved.

This blog is powered by Wordpress and Magatheme by Bryan Helmig.