Puppet Camp Europe 2010 Slides

PuppetCamp Log
Puppet Camp Europe 2010 has wrapped up and the slides from my presentation, Auditing Change Management Policies with Puppet and Splunk are available online at: http://bit.ly/puppetsplunkslides.

The code used in the demonstration is also available online on github at: http://github.com/jeffmccune/puppet-demotools/.

Please contact me on Twitter or via email with any questions. I’m traveling overseas at the moment and those are the best methods to reach me.

The conference was great fun and I enjoyed meeting many people doing interesting things with Puppet.

 

I’ve joined Puppet Labs

Puppet Labs LogoIn April I resigned from my position as SaaS Manager at Netsmart Technologies and accepted an offer to join Puppet Labs in Portland Oregon. The decision was not easy, Netsmart has been a great company to work for with interesting and wonderful people, however, the prospect of joining a start-up with an open source product I believe is changing the business of information technology for the better is not something I could pass up.

I started my new job at the beginning of May and have begun traveling the country providing training and consulting services to companies using puppet to manage their systems. I’m excited about the travel and looking forward to meeting people who use puppet in interesting and creative ways to solve complex problems.

In June I’ll be packing up my place in Columbus Ohio and moving across the country to my new place in Portland Oregon. Never having lived outside of Ohio, I’m really looking forward to this adventure and new phase of my life.

I’m hopping to attend technical conferences in the future as well, and will be speaking about a project I implemented at Netsmart related to the auditing of change control procedures through the integration of puppet, spunk and the git version control software at Puppetcamp Europe in Ghent this June. Drop me a line if you’ll be in the area or attending the conference. It’s going to be a really interesting conference and presentation.

As I travel, I’d love to meet people on the road, so please follow me on Twitter at 0xEFF and let me know if ill be in your area. Please let me know if there’s any way I might be able to help you solve a problem you may be facing as well.

 

OpenSolaris milestone/xvm grub dom0_mem problem

I’ve recently been struggling to track down a problem with my OpenSolaris xVM system. I’m running xvm in OpenSolaris b133. The issue is that my manual configuration of dom0_mem in /rpool/boot/grub/menu.lst seems to constantly be overwritten upon reboot. This is a problem since I need dom0 to be clamped down to prevent Xen’s balloon feature from fighting with the ZFS arc. In addition to this problem, there are bugs in the b132 and b133 of OpenSolaris which require the config/dom0-min-mem SMF property to be set to match dom0_mem.

I’ve also been running into the dom0-min-mem issues documented at My South – Sun xVM 3.4.2 available, dom0_min_mem. Pascal also mentions setting the dom0-min-mem propery, but doesn’t appear to be running into the issue I have with b132 and b133 where the property is consistently changes by the xvm-milestone service method script.

The problem is caused by the SMF xvm milestone ( svc:/milestone/xvm) constantly re-writing these properties and the menu.lst file. The solution is to disable the xvm milestone and re-enable all of the xvm services manually. This will allow you to make manual changes to the menu.lst file without the xvm milestone interfering with you.

OpenSolaris introduced the xvm milestone in b126 around October of 2009. Please see [xen-discuss] FYI: enable/disable the xVM hypervisor.

Here is the recipe to fix the problem. First, make a backup copy of your menu.lst file, then disable the xvm milestone, enable the other xvm SMF services, and finally restore your menu.lst file. We do this because disabling the xvm milestone disables all of xvm, where we really just want to prevent /lib/svc/method/xvm-milestone from executing.

This assumes you already have xVM enabled through the use of svcadn enable milestone/xvm.

cd /rpool/boot/grub
pfexec cp -p menu.lst menu.lst.milestone-xvm.enabled
pfexec svcadm disable milestone/xvm
pfexec svcadm enable -r svc:/system/xvm/domains:default
pfexec cp -p menu.lst menu.lst.milestone-xvm.disabled
pfexec cp -p menu.lst.milestone-xvm.enabled menu.lst

Before rebooting, ensure the dom0_mem setting is something reasonable. I find 1.5GB to be a good balance.

title os-133-xvm1
findroot (pool_rpool,0,a)
bootfs rpool/ROOT/os-133-xvm1
kernel$ /boot/$ISADIR/xen.gz console=vga dom0_mem=1536M dom0_vcpus_pin=false watchdog=false
module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -B $ZFS-BOOTFS
module$ /platform/i86pc/$ISADIR/boot_archive

Finally, ensure SMF properties match the dom0_mem value:

svccfg -s svc:/milestone/xvm listprop hypervisor/dom0_mem
svccfg -s xend listprop config/dom0-min-mem

If they don’t match, they may be set using:

pfexec svccfg -s svc:/system/xvm/xend setprop config/dom0-min-mem = 1536
pfexec /usr/sbin/svccfg -s svc:/milestone/xvm setprop hypervisor/dom0_mem = 1536

I plan to diagnose just why the xvm-milestone service method script is misbehaving so much and file the appropriate bug reports. If anyone has any suggestions or ideas, please let me know.

 

Tomato and AT&T U-Verse Disconnects

I recently ran into an issue with my home network setup where my Linksys WRT54G router running Tomato 1.27 was disconnecting my long-running active TCP connections every 10 minutes or so. After further investigation, this is known to be a common issue resulting from Tomato’s dhcp client performing a unicast DHCP renewal which the firewall blocks or misroutes.

A number of people have published similar reports, but none of the suggested solutions appeared to work reliably for me, so I decided to diagnose, troubleshoot and resolve the issue myself. Here’s how I solved the problem. The notes I gathered while working on this are also located at 2WIRE & Tomato – Google Docs.

If you’d like to stop reading and skip right to the pay off, simply add the following two lines to the firewall script which is located in the web based user interface under administration, scripts, in the firewall tab:

iptables -t nat -I PREROUTING -p udp -i vlan1 --dport 68 --sport 67 -j ACCEPT
iptables -I INPUT -p udp -i vlan1 --dport 68 --sport 67 -j ACCEPT

These firewall rules allow DHCP traffic to and from the Linksys router, regardless if the traffic is broadcast or unicast. Please let me know if these rules are not optimal or could be improved.

Here are some references to other reports of this issue:

My troubleshooting process follows.

I can see in the logs that udhcpc attempts a renewal right up until the lease expires:

Feb 17 15:14:26 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:16:56 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:18:11 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:18:48 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:06 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:15 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:19 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:21 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:22 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:22 tomato daemon.info udhcpc[285]: Lease lost, entering init state
Feb 17 15:19:22 tomato user.info kernel: vlan1: dev_set_allmulti(master, 1)
Feb 17 15:19:22 tomato user.info kernel: vlan1: dev_set_promiscuity(master, -1)
Feb 17 15:19:22 tomato user.info kernel: device vlan1 left promiscuous mode
Feb 17 15:19:22 tomato daemon.info udhcpc[285]: Sending discover...
Feb 17 15:19:22 tomato daemon.info udhcpc[285]: Sending select for 99.29.172.159...
Feb 17 15:19:22 tomato daemon.info udhcpc[285]: Lease of 99.29.172.159 obtained, lease time 600
Feb 17 15:19:22 tomato user.info kernel: vlan1: dev_set_allmulti(master, -1)
Feb 17 15:19:22 tomato daemon.info dnsmasq[12612]: exiting on receipt of SIGTERM
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: started, version 2.51 cachesize 150
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: compile time options: no-IPv6 GNU-getopt no-RTC no-DBus no-I18N DHCP no-scripts no-TFTP
Feb 17 15:19:22 tomato daemon.info dnsmasq-dhcp[13007]: DHCP, IP range 192.168.3.100 -- 192.168.3.149, lease time 1d
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: reading /etc/resolv.dnsmasq
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: using nameserver 192.168.4.254#53
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: using nameserver 8.8.4.4#53
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: using nameserver 8.8.8.8#53
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: read /etc/hosts - 0 addresses
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: read /etc/hosts.dnsmasq - 16 addresses
Feb 17 15:19:25 tomato daemon.err miniupnpd[12649]: recv (state0): Connection reset by peer
Feb 17 15:19:27 tomato daemon.notice miniupnpd[12649]: received signal 15, good-bye
Feb 17 15:19:27 tomato daemon.notice miniupnpd[13043]: HTTP listening on port 5000
Feb 17 15:19:27 tomato daemon.notice miniupnpd[13043]: Listening for NAT-PMP traffic on port 5351
Feb 17 15:19:27 tomato user.info kernel: device br0 left promiscuous mode
Feb 17 15:19:27 tomato user.info kernel: vlan1: dev_set_allmulti(master, -1)
Feb 17 15:19:27 tomato user.info kernel: vlan1: del 01:00:5e:00:00:02 mcast address from master interface

Working with the solution mentioned in the forums, I added a firewall rule to allow DHCP traffic into the router itself. This is in the INPUT chain. This worked well up until I enabled DMZ mode for my Xbox 360. Once I enabled DMZ mode, the DHCP renewal issue cropped back up and I kept getting dropped. Luckily, I have experience with netfilter and iptables so I know that DMZ is probably implemented in tomato as a catch-all PREROUTING rule to perform NAT on all unknown connections to a specified address. I also know the PREROUTING chain is processed before the INPUT chain, so any catch-all rule there would trump my fix to allow DHCP in the INPUT chain.

This can be verified with tcpdump and wireshark. Luckily, there are pre-compied versions of tcpdump for the mips architecture located at http://ipkg.nslu2-linux.org/feeds/unslung/wl500g/.

In order to get the tcpdump binary onto the router, I had to unpack the ipkg file:

wget http://ipkg.nslu2-linux.org/feeds/unslung/wl500g/tcpdump_3.9.7-1_mipsel.ipk
gzip -dc tar xvzf data.tar.gz
scp opt/bin/tcpdump fw:/tmp

Finally, capturing the data is easy and since we're dealing with DHCP traffic, there's not much worry about filling up the small /tmp filesystem on the router:

/tmp/tcpdump -w /tmp/renew.cap -v -i vlan1 -s 1500 port 67 or port 68

I copied the cap files back to my desktop and fired them up in wireshark. Not too surprising, it's clear as day the request packets are making it out, but the acknowledgement packts coming back from the DHCP server aren't making it to udhcpc.

Screen capture of wireshark displaying repeated attempts to renew the DHCP lease

Adding the explicit rule to the PREROUTING and INPUT tables, the conversation looks much less confusing:

The logs tell a similar tale. Note the lack of the full re-initialization of dnsmasq, upnpd, and the firewall script itself.

Feb 17 15:29:31 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:29:31 tomato daemon.info udhcpc[285]: Lease of 99.29.172.159 obtained, lease time 600

 

Resize the ZFS Root Pool

OpenSolaris logoI originally installed OpenSolaris 2008.11 on my home media server, and 2009.06 has some features I’d like to take advantage of. Running a pkg image-update, I ran my root pool out of space since it’s located on a relatively small compact flash card.

I decided to grow the root pool by using an available external disk I have. The process involves attaching the new, larger disk to the root pool as a mirror, waiting for the resilver process to complete, installing the boot loader onto the new disk, then detaching the old, small device from the root pool. This information is documented at sun.com in the document How to Replace a Disk in the ZFS Root Pool.

Attempting to attach the new device to the pool with zpool attach, I ran in the error message "cannot label 'c3t0d0': EFI labeled devices are not supported on root pools." I tried wiping the EFI label, but kept running into the same error. I noticed other people talking about this issue;
Removing EFI (format -e not working?)
and
Please help need to remove EFI label: msg#00173

My problem was that I was not properly creating the root partition on the disk with an SMI label. I was properly using format -e, then executing “fdisk”, creating the VTOC on the entire disk, but I forgot the step where once the VTOC is created, you need to create partition 0, which will be used for the zpool vdev.

If you run into this error, make sure you use the “partition” option in fdisk -e, which will allow you to then define slice 0. Label the slice “root” and give it as much space as you’d like. Make sure it does not overlap with the boot slice which is automatically created when the VTOC is created.

Once slice 0 is present, use c1t0d0s0 rather than c1t0d0 when you attach the new disk to the root pool. For example:

Correct:
zpool attach rpool c4t0d0s0 c3t0d0s0
Incorrect:
zpool attach rpool c4t0d0s0 c3t0d0
If you receive an error about overlapping partitions, just use zpool attach -f to force the attach.

Once the device is in the pool and re-silvering, use installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c3t0d0s0 to install the boot block.

After testing the new boot device, use zpool detatch rpool c4t0d0s0 to remove the old device from the pool and complete the resize process.

Here’s my original partition layout:

Current partition table (original):
Total disk cylinders available: 3820 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders        Size            Blocks
  0       root    wm       1 - 3818        7.46GB    (3818/0/0) 15638528
  1 unassigned    wm       0               0         (0/0/0)           0
  2     backup    wu       0 - 3819        7.46GB    (3820/0/0) 15646720
  3 unassigned    wm       0               0         (0/0/0)           0
  4 unassigned    wm       0               0         (0/0/0)           0
  5 unassigned    wm       0               0         (0/0/0)           0
  6 unassigned    wm       0               0         (0/0/0)           0
  7 unassigned    wm       0               0         (0/0/0)           0
  8       boot    wu       0 -    0        2.00MB    (1/0/0)        4096
  9 unassigned    wm       0               0         (0/0/0)           0

Here’s my new, larger disk layout:

Current partition table (original):
Total disk cylinders available: 60797 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       boot    wm       1 - 60700      929.97GB    (60700/0/0) 1950291000
  1 unassigned    wm       0                0         (0/0/0)              0
  2     backup    wu       0 - 60796      931.46GB    (60797/0/0) 1953407610
  3 unassigned    wm       0                0         (0/0/0)              0
  4 unassigned    wm       0                0         (0/0/0)              0
  5 unassigned    wm       0                0         (0/0/0)              0
  6 unassigned    wm       0                0         (0/0/0)              0
  7 unassigned    wm       0                0         (0/0/0)              0
  8       boot    wu       0 -     0       15.69MB    (1/0/0)          32130
  9 unassigned    wm       0                0         (0/0/0)              0

After detaching the original, small disk from the mirror, the root pool expands to the size of the remaining vdev:

jmccune@rain:~$ zpool list rpool
NAME    SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rpool   930G  5.57G   924G     0%  ONLINE  -
 

Podcasting the unix nerd way or Peapod for the win

Terminal
I cooked dinner for myself today and sat down at the table looking forward to streaming The Daily Show or The Colbert Report on Hulu since there’s no way I’m paying $65 a month for cable TV. As it turns out, there haven’t been any new episodes in awhile, and I like my fake news fresh off the wire, so I decided to catch up on my other fake news addiction; aggregated blog rss feeds.

I came across the TED talk for today, which is Michell Obama. Great speech by the way, check it out at: http://www.ted.com/talks/michelle_obama.html. The streaming video quality left something to be desired, so I looked around and found the HD podcast URL at podcasters.tv.

This works well with iTunes, and MediaLink is able to copy the movie file from my MacBook pro, but for some reason streaming the video usually quits part way through playback with an obscure error code.

I have my OpenSolaris, Intel Atom based file server running on a gigabit network connected up to the Playstation 3 and HDTV using MediaLink, so I decided to look for some unix tool to download the podcast which could easily be run from cron.

After some searching and research into different options, I downloaded Peapod, a wonderful python command line application, and gave it a whirl.

To my complete satisfaction, peapod runs from my home directory without requiring any piece of itself being installed into the system. The only missing dependency I ran into was urlgrabber for python 2.4. Luckily, I have easy_install installed so it was a simple matter of:
pfexec /usr/bin/easy_install-2.4 urlgrabber
Once urlgrabber was installed, setup of the podcast client was a breeze:

jmccune@rain:~$ cd ~/bin
jmccune@rain:~/bin/$ ln -s ../apps/peapod/peapod.py peapod
jmccune@rain:~/bin/$ cd ~
jmccune@rain:~$ peapod
Creating user directory: /home/jmccune/.peapod
Created a default configuration file in :
/home/jmccune/.peapod/peapod.xml
Please edit this file to contain your feeds and options.

I commented out the sample podcast and added TED in HD.
(Note: I found the feed URL by doing a “Get Info”, or clicking on the little i next to the podcast title in the podcast section of iTunes.)
Podcast Get Info Image

For the title I made it “TED Talks (HD)” and for the URL, I used http://feeds.feedburner.com/TedtalksHD.

Finally, running peapod simply works.

jmccune@rain:~$ peapod
...Spawning thread 0 for feed url http://feeds.feedburner.com/TedtalksHD
Fetching feed for TED Talks (HD)
Downloading TED Talks (HD) -- http://video.ted.com/talks/podcast/MichelleObama_2009P_480.mp4
Trackname MichelleObama_2009P_480.mp4
Savename /export/dozer/podcasts/jmccune/TED Talks (HD)/MichelleObama_2009P_480.mp4
Mime-type video/mp4

This will be added to cron to run every day a few hours before I get home from work, and MediaTomb should pickup the new content.

And now to figure out how to manually kick off a MediaTomb scan of the folder once downloading is complete.

Here are some decent feeds I’ve found so far:

peapod --addnew=http://www.hbo.com/podcasts/billmaher/podcast.xml --title="Bill Maher"

Please post more video feed URL’s, especially 720p and higher, in the comments if you have some good video podcasts worth watching on my TV.

 

Solaris Development

OpenSolaris logoUsing OpenSolaris 2008.11, it appears the most complete way to obtain a full-featured development tool chain is to install the ss-dev and gcc-dev package clusters.

pfexec pkg install gcc-dev ss-dev

References:

 

PS3 Media Server for Solaris

OpenSolaris logoI spent the better part of the evening attempting to get a reliable, responsive and otherwise unobtrusive DLNA media server running on my new OpenSolaris home file server. I finally stumbled upon PMS which “just works” after using X11 forwarding over ssh once in order to get at the GUI configuration screen. I went ahead and tried the Linux tarball. None of the included binaries execute on solaris, but the jar file appears to run great.

Once running, my PS3 sees the media player quickly and easily and streams my MP3’s nicely.

I’m planning on cooking up a SMF profile to keep this running as a service and figure out the mplayer calls in order to stream my favorite web streams directly to the PS3.

Other DLNA media servers I tried were Coherence (no documentation, didn’t work out of box), Mediatomb (needed to hack the source to get it to run on solaris. When running, rarely showed up in the XMB), fuppes (compile issues).

 

Solaris 10 Root Shell Recovery

Sun Solaris

Solaris


Contrary to recommendations from seasoned Unix admins, it’s perfectly acceptable to change the root shell from the bourne shell to something like bash. The most common reason to leave the root shell alone usually goes something like, “you need a valid and statically linked shell defined in /etc/passwd to boot into single user mode if you need to recover your system.”

There’s a really nice list of Solaris root shell misconceptions published at http://www.roble.com/docs/sol_root_shell.html.

Fortunately for me, this isn’t the case in Solaris 10. While setting up a new Solaris 10 system today, I accidentally set root’s shell to /sbin/bash instead of /usr/bin/bash. /sbin/bash doesn’t exist, so I could no longer log into the system.

Luckily, this is a system with a Dell RAC card setup for remote console access. I logged into the RAC and issued a “graceful shutdown” power off command, which Solaris responded to nicely and brought the system entirely down. Once I powered the system back on, it’s simply a matter of booting into single user mode by passing the -s flag to the kernel.

Solaris 10 is smart enough to fall back to /sbin/sh if it can’t invoke the shell defined in /etc/passwd booted into single user mode. So long as you don’t horribly mangle /sbin/sh and the libraries it’s linked to, you’ll be fine changing the root shell to anything you like.

Here’s how it went:
2009-04-01_1708
2009-04-01_1709
2009-04-01_1710
2009-04-01_1711
2009-04-01_1714

 

Solaris ZFS Windows Sharing

Today, I setup my new Atom 330 based OpenSolaris 2008.11 file server to share files using windows file sharing. Windows file sharing uses the SMB/CIFS protocol and is commonly implemented using Samba on Unix. With OpenSolaris 2008.11 and zfs, however, the SMB/CIFS protocol is implemented in the kernel itself by way of an SMB module.

Sun Solaris

Sun Solaris

Unfortunately, it’s not quite as simple as executing zfs set sharesmb=on dpool/export/dozer. The SUNWsmbs and SUNWsmbskr packages need to be installed, the system needs to be rebooted, pam configured to create smb password hashes, passwords reset, and finally the smb SMF service needs to be enabled.

You may need to create new filesystems with the casesensitivity and nbmand zfs properties set correct and copy your data over to these filesystems.

Here is the transcript:

pfexec pkg install SUNWsmbs
pfexec pkg install SUNWsmbskr
pfexec reboot
pfexec svcadm enable -r smb/server
pfexec zfs create -o casesensitivity=mixed -o nbmand=on dpool1/export/dozer
pfexec zfs set sharesmb=on dpool1/export/dozer
pfexec bash -c "echo 'other password required pam_smb_passwd.so.1 nowarn' >> /etc/pam.conf"
pfexec passwd jmccune

You must reset your password to generate the new SMB hash value. All users that need SMB access will need to reset their password in this manner.

You may check the sharing status with sharemgr;

jmccune@rain:/export/dozer/isos$ pfexec sharemgr show -vp
default nfs=()
zfs
zfs/dpool1/export/dozer smb=()
dpool1_export_dozer=/export/dozer
dpool1_export_dozer_documents=/export/dozer/documents
dpool1_export_dozer_isos=/export/dozer/isos
dpool1_export_dozer_movies=/export/dozer/movies
dpool1_export_dozer_music=/export/dozer/music
dpool1_export_dozer_pictures=/export/dozer/pictures

Troubleshooting.
svcadm may speak up about a dependency on the physical network. This does not appear to be an error.

jmccune@rain:~$ pfexec svcadm enable -r smb/server
svcadm: svc:/milestone/network depends on svc:/network/physical, which has multiple instances.

You may receive an error that sharing failed. In order to resolve, make sure you’ve done everything I listed above and rebooted the system.

cannot share 'pool/media': smb add share failed

References