Archive for the ‘System Administration’ Category

Resize the ZFS Root Pool

OpenSolaris logoI originally installed OpenSolaris 2008.11 on my home media server, and 2009.06 has some features I’d like to take advantage of. Running a pkg image-update, I ran my root pool out of space since it’s located on a relatively small compact flash card.

I decided to grow the root pool by using an available external disk I have. The process involves attaching the new, larger disk to the root pool as a mirror, waiting for the resilver process to complete, installing the boot loader onto the new disk, then detaching the old, small device from the root pool. This information is documented at sun.com in the document How to Replace a Disk in the ZFS Root Pool.

Attempting to attach the new device to the pool with zpool attach, I ran in the error message "cannot label 'c3t0d0': EFI labeled devices are not supported on root pools." I tried wiping the EFI label, but kept running into the same error. I noticed other people talking about this issue;
Removing EFI (format -e not working?)
and
Please help need to remove EFI label: msg#00173

My problem was that I was not properly creating the root partition on the disk with an SMI label. I was properly using format -e, then executing “fdisk”, creating the VTOC on the entire disk, but I forgot the step where once the VTOC is created, you need to create partition 0, which will be used for the zpool vdev.

If you run into this error, make sure you use the “partition” option in fdisk -e, which will allow you to then define slice 0. Label the slice “root” and give it as much space as you’d like. Make sure it does not overlap with the boot slice which is automatically created when the VTOC is created.

Once slice 0 is present, use c1t0d0s0 rather than c1t0d0 when you attach the new disk to the root pool. For example:

Correct:
zpool attach rpool c4t0d0s0 c3t0d0s0
Incorrect:
zpool attach rpool c4t0d0s0 c3t0d0
If you receive an error about overlapping partitions, just use zpool attach -f to force the attach.

Once the device is in the pool and re-silvering, use installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c3t0d0s0 to install the boot block.

After testing the new boot device, use zpool detatch rpool c4t0d0s0 to remove the old device from the pool and complete the resize process.

Here’s my original partition layout:

Current partition table (original):
Total disk cylinders available: 3820 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders        Size            Blocks
  0       root    wm       1 - 3818        7.46GB    (3818/0/0) 15638528
  1 unassigned    wm       0               0         (0/0/0)           0
  2     backup    wu       0 - 3819        7.46GB    (3820/0/0) 15646720
  3 unassigned    wm       0               0         (0/0/0)           0
  4 unassigned    wm       0               0         (0/0/0)           0
  5 unassigned    wm       0               0         (0/0/0)           0
  6 unassigned    wm       0               0         (0/0/0)           0
  7 unassigned    wm       0               0         (0/0/0)           0
  8       boot    wu       0 -    0        2.00MB    (1/0/0)        4096
  9 unassigned    wm       0               0         (0/0/0)           0

Here’s my new, larger disk layout:

Current partition table (original):
Total disk cylinders available: 60797 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       boot    wm       1 - 60700      929.97GB    (60700/0/0) 1950291000
  1 unassigned    wm       0                0         (0/0/0)              0
  2     backup    wu       0 - 60796      931.46GB    (60797/0/0) 1953407610
  3 unassigned    wm       0                0         (0/0/0)              0
  4 unassigned    wm       0                0         (0/0/0)              0
  5 unassigned    wm       0                0         (0/0/0)              0
  6 unassigned    wm       0                0         (0/0/0)              0
  7 unassigned    wm       0                0         (0/0/0)              0
  8       boot    wu       0 -     0       15.69MB    (1/0/0)          32130
  9 unassigned    wm       0                0         (0/0/0)              0

After detaching the original, small disk from the mirror, the root pool expands to the size of the remaining vdev:

jmccune@rain:~$ zpool list rpool
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool   930G  5.57G   924G     0%  ONLINE  -
 

Solaris 10 Root Shell Recovery

Sun Solaris

Solaris


Contrary to recommendations from seasoned Unix admins, it’s perfectly acceptable to change the root shell from the bourne shell to something like bash. The most common reason to leave the root shell alone usually goes something like, “you need a valid and statically linked shell defined in /etc/passwd to boot into single user mode if you need to recover your system.”

There’s a really nice list of Solaris root shell misconceptions published at http://www.roble.com/docs/sol_root_shell.html.

Fortunately for me, this isn’t the case in Solaris 10. While setting up a new Solaris 10 system today, I accidentally set root’s shell to /sbin/bash instead of /usr/bin/bash. /sbin/bash doesn’t exist, so I could no longer log into the system.

Luckily, this is a system with a Dell RAC card setup for remote console access. I logged into the RAC and issued a “graceful shutdown” power off command, which Solaris responded to nicely and brought the system entirely down. Once I powered the system back on, it’s simply a matter of booting into single user mode by passing the -s flag to the kernel.

Solaris 10 is smart enough to fall back to /sbin/sh if it can’t invoke the shell defined in /etc/passwd booted into single user mode. So long as you don’t horribly mangle /sbin/sh and the libraries it’s linked to, you’ll be fine changing the root shell to anything you like.

Here’s how it went:
2009-04-01_1708
2009-04-01_1709
2009-04-01_1710
2009-04-01_1711
2009-04-01_1714

 

Solaris 10 Online LUN rescan in one step with cfgadm

Sun Solaris

Sun Solaris

Quick answer:

cfgadm -al

Searching the web for this information took a few more minutes than I expected it to, so I’m posting this article with as may relevant keywords as I can think of. Thanks to Pascal Gienger for the clear answer to this question.

The situation is pretty common for system administrators; you have a production server that’s running out of storage space and you remedy the situation by allocating a new LUN on your back end SAN.

In Linux, I’d typically echo ‘- – -’ > /sys/class/scsi_host/host1/scan in order to issue a rescan, then run multipath -v2, then add the resulting /dev/mpath/foobar device to LVM.

In Solaris 10, this process has been greatly simplified. One command even re-populates your scsi_vhci multipath controller for you.

Consider before scanning:

[jmccune@otto ~]$ sudo format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c4t00D0B2202E001900d0 <DEFAULT cyl 8352 alt 2 hd 255 sec 63>
          /scsi_vhci/disk@g00d0b2202e001900
Specify disk (enter its number): ^D
[jmccune@otto ~]$

And consider the online re-scan of the fiber channel storage system.

[jmccune@otto ~]$ sudo cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             fc-fabric    connected    configured   unknown
c0::212000d0b202e201           disk         connected    configured   unknown
c1                             fc-fabric    connected    configured   unknown
c1::212000d0b202e201           disk         connected    configured   unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
usb1/1                         usb-device   connected    configured   ok
usb1/2                         usb-device   connected    configured   ok
usb2/1                         unknown      empty        unconfigured ok
usb2/2                         unknown      empty        unconfigured ok
usb3/1                         unknown      empty        unconfigured ok
usb3/2                         usb-device   connected    configured   ok
usb3/3                         unknown      empty        unconfigured ok
usb3/4                         unknown      empty        unconfigured ok
usb3/5                         unknown      empty        unconfigured ok
usb3/6                         unknown      empty        unconfigured ok
[jmccune@otto ~]$ sudo format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c4t00D0B2202E000900d0 <DEFAULT cyl 50127 alt 2 hd 255 sec 63>
          /scsi_vhci/disk@g00d0b2202e000900
       1. c4t00D0B2202E001900d0 <DEFAULT cyl 8352 alt 2 hd 255 sec 63>
          /scsi_vhci/disk@g00d0b2202e001900
Specify disk (enter its number): ^D

Creating a new ZFS pool based on this new LUN is easy. Note, the new disk is ID zero from format, not id 1.

[jmccune@otto ~]$ sudo zpool create db1 c4t00D0B2202E000900d0
[jmccune@otto ~]$ zpool list
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
db1     382G   112K   382G     0%  ONLINE  -
rpool  63.5G  38.1G  25.4G    60%  ONLINE  -

Keywords: solaris, sun MPxIO, scsi_vhci, san, lun, attach, online, resize, scan, rescan, luxadm.

 

Enable Screen Sharing from the Terminal in Leopard

TerminalAfter graduation and my last day at work, I’ve taken a road trip to visit the Bennett’s in D.C. and was promptly chagrined while trying to show off Leopard’s screen sharing over OpenVPN.

Fortunately, it’s pretty easy to turn on Screen Sharing from an SSH session.

echo -n enabled > /Library/Preferences/com.apple.ScreenSharing.launchd

Launchd should automatically start the Screen Sharing service when this file is modified.

More information is available at Apple Remote Desktop: Configuring remotely via command line (kickstart)

 

Leopard VNC Server Serial Number Password

VNC GuestDigging around in a NetBoot-Install.dmg file created by NetRestore Helper, I found a nice little gem.

In Leopard, and perhaps earlier versions of Mac OS X, we’re able to start a VNC server with the machine serial number as a password. This is particularly interesting for a managed network or lab environment.

As an example, I’m starting a VNC server in my NetBoot-Install image with the following shell script:

# Credit to Mike Bombich for this snippet

VNC="/System/Library/CoreServices/RemoteManagement/AppleVNCServer.bundle/Contents/MacOS/AppleVNCServer"

if [ -x "$VNC" ]; then
    "$VNC" -noRegister -serialNumber &
fi

I’m then able to quickly connect with Cmd+K in the finder:
Connect to Server

If you’re scripting this, here’s a quick way to snag the serial number. I do this before I bless a client machine to netboot, so I have the serial number to connect back up once it’s in the NetRestore system.

system_profiler SPHardwareDataType | \
  grep -i 'serial number' | \
  perl -ple 's/.*:\s+(\w+).*?/$1/'
 

Excluding Directories with find

TerminalI’ve been using the find command for over a decade now, and I’m ashamed to say I never really learned how to properly exclude directories. Dealing with with subversion working copies that litter “.svn” folders everywhere, I finally sorted it all out this afternoon.

To exclude “.svn” folders and all contents:

$ find . '!' '(' -name '.svn' -prune ')'

This, combined with find -print0 and xargs -0 to execute arbitrary commands on every filesystem object found is a wonderful tool to keep handy.

 

My Love of Puppet

PuppetConsider the following statement in a puppet manifest (think of a manifest as a script).

node "subversion.math.ohio-state.edu" {
    subversion::server::webrepository {
        "support": path => "/var/svn/support";
        "test":    path => "/var/svn/test";
    }
}

Without describing the problem this puppet snippet addresses, one might guess that I need to configure two subversion repositories, available via HTTP on the host “subversion.math.ohio-state.edu”.

The reason I absolutely *love* Puppet is the above code is all there is to this entire problem. Think about all the work that actually needs to happen to setup a subversion repository on a SSL enabled web server:

  • Install apache
  • Setup SSL certificates
  • Install subversion and dependencies
  • Setup apache virtual host with mod_dav_svn
  • Setup apache htaccess for access control to the repositry
  • Punch holes in the firewall (80, 443)
  • Create the blank repository with svnadmin
  • Ensure the repository is owned by apache
  • Ensure post-commit hooks are put in the right place and executable

Now, this is a lot of work, and I’ve already had the need to create new subversion repositories on other hosts. Because I’ve already modeled this problem in puppet, it’s trivial for me to bring up subversion servers on arbitrary hosts. I just re-use the block you see above.

Now, for the tricky part… Here are the modules that actually model the subversion repository in question.

Note that I’ve left out the classes which model other aspects of the host in question. For example, web::baseserver::ssl, firewall::input-port, and site-files::certificates (SSL Certs).

# Subversion Module.

class subversion::server inherits subversion {
    File {
        mode => 0640,
        owner => "apache",
        group => 0,
        require => [ User["apache"], Package["subversion"] ]
    }

    define webrepository ($path = false) {
        File {
            owner => "apache",
            group => "0",
            mode => 0660
        }
        $path_real = $path ? {
            false => "$name",
            default => "$path"
        }
        include subversion::server
        repository {
            "$name": path => "$path_real";
        }
        file {
            "$path_real":
                recurse => true,
                require => [ User["apache"], Repository["$name"] ];
            "$path_real/hooks":
                ensure => directory;
            "$path_real/hooks/bin":
                ensure => directory;
            "$path_real/hooks/bin/commit-email.pl":
                content => template("subversion/hooks/bin/commit-email.pl"),
                mode => 0770;
            "$path_real/hooks/post-commit":
                content => template("subversion/hooks/post-commit"),
                mode => 0770;
        }
    }

    include web::baseserver::ssl

    file {
        "/var/svn":
            ensure => directory;
        "/etc/httpd/htaccess/authz_svn.htaccess":
            content => template("subversion/htaccess/authz_svn.htaccess.erb");
        "/etc/httpd/htaccess/authz_svn.users":
            content => template("subversion/htaccess/htpasswd.mathsvn.erb");
    }
    web::vhost {
        "subversion":
            template => "subversion.conf.erb";
    }
    package {
        "subversion-perl":;
        "mod_dav_svn":;
    }
}

class subversion {
    $authz_svn_access_file = "/etc/httpd/auth_SVNAccessFile.math"
    $auth_svn_users_file = "/etc/httpd/auth_htpasswd.mathsvn"
    $svn_base_parent_repo = "/var/svn"
    Package {
        ensure => present
    }
    package {
        "subversion":;
    }

    define repository ($path = false) {
        $path_real = $path ? {
            false => "$name",
            default => "$path"
        }
        include subversion
        # Create a blank repository.
        exec {
            "svnadmin_create_$path_real":
                command => "/usr/bin/svnadmin create '$path_real'",
                require => [ Package["subversion"] ],
                creates => "$path_real";
        }
    }
}
 

LDAP Berkeley Database Recovery

DirectoryWe experienced a power outage today, caused by someone tripping the emergency power off relay to our server room. Unfortunately, emergency power off really means “power off” so our UPS did the right thing and completely cut power rather than fall back to battery backup.

It was a little bit stressful getting everything back up, but everything appears to be working fine now.

The one serious error message we ran into is the following, when bring our OpenLDAP server back up:

[root@ldap ldap]# /etc/init.d/ldap restart
Stopping slapd:                                            [FAILED]
Checking configuration files for slapd:  bdb_db_open: unclean shutdown detected; attempting recovery.
bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if errors are encountered.
bdb(dc=math,dc=ohio-state,dc=edu): PANIC: fatal region error detected; run recovery
bdb_db_open: Database cannot be opened, err -30974. Restore from backup!
bdb(dc=math,dc=ohio-state,dc=edu): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem
backend_startup_one: bi_db_open failed! (-30974)
slap_startup failed (test would succeed using the -u switch)
                                                           [FAILED]
stale lock files may be present in /var/lib/ldap           [WARNING]

Fortunately, the solution to this problem is easy enough. Just run slapd_db_recover -v in the Berkeley Database directory.

cd /var/lib/ldap
slapd_db_recover -v

Finding last valid log LSN: file: 4 offset 4818337
Recovery starting from [4][4815752]
Recovery complete at Wed Feb  6 15:33:42 2008
Maximum transaction ID 80000ba7 Recovery checkpoint [4][4818337]

After that, slapd should startup just fine.

[root@ldap lib]# /etc/init.d/ldap start
Checking configuration files for slapd:  bdb_db_open: unclean shutdown detected; attempting recovery.
bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if errors are encountered.
config file testing succeeded
                                                           [  OK  ]
Starting slapd:                                            [  OK  ]
 

Macworld 2008 Puppet Slides

PuppetNigel has posted slides from our Macworld 2008 presentation on Puppet.

Please see: Puppet Macworld 2008 Project

I’ll post additional information once I find out the details of distribution of any audio/video recordings taken during the presentation.

 

Macworld 2008

I haven’t posted in awhile, mainly because I’ve been preoccupied with a relatively long and relaxing vacation over the winter break where I largely ignored all things technology.

I’ve been preparing for Macworld 2008, where Nigel Kersten and I will be presenting some demonstrations and technical details about our respective Puppet deployments at Google and Ohio State University.

If you’ll be attending Macworld, feel free to follow my Twitter feed. I don’t post much at the moment, though I believe it’ll really come in handy during the fast and furious pace of a week long conference like Macworld.

Some other links for gratuitous self promotion:

Please leave a comment if you’ll be attending Macworld this year.