Archive for the ‘Solution’ Category

OpenSolaris milestone/xvm grub dom0_mem problem

I’ve recently been struggling to track down a problem with my OpenSolaris xVM system. I’m running xvm in OpenSolaris b133. The issue is that my manual configuration of dom0_mem in /rpool/boot/grub/menu.lst seems to constantly be overwritten upon reboot. This is a problem since I need dom0 to be clamped down to prevent Xen’s balloon feature from fighting with the ZFS arc. In addition to this problem, there are bugs in the b132 and b133 of OpenSolaris which require the config/dom0-min-mem SMF property to be set to match dom0_mem.

I’ve also been running into the dom0-min-mem issues documented at My South – Sun xVM 3.4.2 available, dom0_min_mem. Pascal also mentions setting the dom0-min-mem propery, but doesn’t appear to be running into the issue I have with b132 and b133 where the property is consistently changes by the xvm-milestone service method script.

The problem is caused by the SMF xvm milestone ( svc:/milestone/xvm) constantly re-writing these properties and the menu.lst file. The solution is to disable the xvm milestone and re-enable all of the xvm services manually. This will allow you to make manual changes to the menu.lst file without the xvm milestone interfering with you.

OpenSolaris introduced the xvm milestone in b126 around October of 2009. Please see [xen-discuss] FYI: enable/disable the xVM hypervisor.

Here is the recipe to fix the problem. First, make a backup copy of your menu.lst file, then disable the xvm milestone, enable the other xvm SMF services, and finally restore your menu.lst file. We do this because disabling the xvm milestone disables all of xvm, where we really just want to prevent /lib/svc/method/xvm-milestone from executing.

This assumes you already have xVM enabled through the use of svcadn enable milestone/xvm.

cd /rpool/boot/grub
pfexec cp -p menu.lst menu.lst.milestone-xvm.enabled
pfexec svcadm disable milestone/xvm
pfexec svcadm enable -r svc:/system/xvm/domains:default
pfexec cp -p menu.lst menu.lst.milestone-xvm.disabled
pfexec cp -p menu.lst.milestone-xvm.enabled menu.lst

Before rebooting, ensure the dom0_mem setting is something reasonable. I find 1.5GB to be a good balance.

title os-133-xvm1
findroot (pool_rpool,0,a)
bootfs rpool/ROOT/os-133-xvm1
kernel$ /boot/$ISADIR/xen.gz console=vga dom0_mem=1536M dom0_vcpus_pin=false watchdog=false
module$ /platform/i86xpv/kernel/$ISADIR/unix /platform/i86xpv/kernel/$ISADIR/unix -B $ZFS-BOOTFS
module$ /platform/i86pc/$ISADIR/boot_archive

Finally, ensure SMF properties match the dom0_mem value:

svccfg -s svc:/milestone/xvm listprop hypervisor/dom0_mem
svccfg -s xend listprop config/dom0-min-mem

If they don’t match, they may be set using:

pfexec svccfg -s svc:/system/xvm/xend setprop config/dom0-min-mem = 1536
pfexec /usr/sbin/svccfg -s svc:/milestone/xvm setprop hypervisor/dom0_mem = 1536

I plan to diagnose just why the xvm-milestone service method script is misbehaving so much and file the appropriate bug reports. If anyone has any suggestions or ideas, please let me know.

 

Tomato and AT&T U-Verse Disconnects

I recently ran into an issue with my home network setup where my Linksys WRT54G router running Tomato 1.27 was disconnecting my long-running active TCP connections every 10 minutes or so. After further investigation, this is known to be a common issue resulting from Tomato’s dhcp client performing a unicast DHCP renewal which the firewall blocks or misroutes.

A number of people have published similar reports, but none of the suggested solutions appeared to work reliably for me, so I decided to diagnose, troubleshoot and resolve the issue myself. Here’s how I solved the problem. The notes I gathered while working on this are also located at 2WIRE & Tomato – Google Docs.

If you’d like to stop reading and skip right to the pay off, simply add the following two lines to the firewall script which is located in the web based user interface under administration, scripts, in the firewall tab:

iptables -t nat -I PREROUTING -p udp -i vlan1 --dport 68 --sport 67 -j ACCEPT
iptables -I INPUT -p udp -i vlan1 --dport 68 --sport 67 -j ACCEPT

These firewall rules allow DHCP traffic to and from the Linksys router, regardless if the traffic is broadcast or unicast. Please let me know if these rules are not optimal or could be improved.

Here are some references to other reports of this issue:

My troubleshooting process follows.

I can see in the logs that udhcpc attempts a renewal right up until the lease expires:

Feb 17 15:14:26 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:16:56 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:18:11 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:18:48 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:06 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:15 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:19 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:21 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:22 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:19:22 tomato daemon.info udhcpc[285]: Lease lost, entering init state
Feb 17 15:19:22 tomato user.info kernel: vlan1: dev_set_allmulti(master, 1)
Feb 17 15:19:22 tomato user.info kernel: vlan1: dev_set_promiscuity(master, -1)
Feb 17 15:19:22 tomato user.info kernel: device vlan1 left promiscuous mode
Feb 17 15:19:22 tomato daemon.info udhcpc[285]: Sending discover...
Feb 17 15:19:22 tomato daemon.info udhcpc[285]: Sending select for 99.29.172.159...
Feb 17 15:19:22 tomato daemon.info udhcpc[285]: Lease of 99.29.172.159 obtained, lease time 600
Feb 17 15:19:22 tomato user.info kernel: vlan1: dev_set_allmulti(master, -1)
Feb 17 15:19:22 tomato daemon.info dnsmasq[12612]: exiting on receipt of SIGTERM
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: started, version 2.51 cachesize 150
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: compile time options: no-IPv6 GNU-getopt no-RTC no-DBus no-I18N DHCP no-scripts no-TFTP
Feb 17 15:19:22 tomato daemon.info dnsmasq-dhcp[13007]: DHCP, IP range 192.168.3.100 -- 192.168.3.149, lease time 1d
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: reading /etc/resolv.dnsmasq
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: using nameserver 192.168.4.254#53
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: using nameserver 8.8.4.4#53
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: using nameserver 8.8.8.8#53
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: read /etc/hosts - 0 addresses
Feb 17 15:19:22 tomato daemon.info dnsmasq[13007]: read /etc/hosts.dnsmasq - 16 addresses
Feb 17 15:19:25 tomato daemon.err miniupnpd[12649]: recv (state0): Connection reset by peer
Feb 17 15:19:27 tomato daemon.notice miniupnpd[12649]: received signal 15, good-bye
Feb 17 15:19:27 tomato daemon.notice miniupnpd[13043]: HTTP listening on port 5000
Feb 17 15:19:27 tomato daemon.notice miniupnpd[13043]: Listening for NAT-PMP traffic on port 5351
Feb 17 15:19:27 tomato user.info kernel: device br0 left promiscuous mode
Feb 17 15:19:27 tomato user.info kernel: vlan1: dev_set_allmulti(master, -1)
Feb 17 15:19:27 tomato user.info kernel: vlan1: del 01:00:5e:00:00:02 mcast address from master interface

Working with the solution mentioned in the forums, I added a firewall rule to allow DHCP traffic into the router itself. This is in the INPUT chain. This worked well up until I enabled DMZ mode for my Xbox 360. Once I enabled DMZ mode, the DHCP renewal issue cropped back up and I kept getting dropped. Luckily, I have experience with netfilter and iptables so I know that DMZ is probably implemented in tomato as a catch-all PREROUTING rule to perform NAT on all unknown connections to a specified address. I also know the PREROUTING chain is processed before the INPUT chain, so any catch-all rule there would trump my fix to allow DHCP in the INPUT chain.

This can be verified with tcpdump and wireshark. Luckily, there are pre-compied versions of tcpdump for the mips architecture located at http://ipkg.nslu2-linux.org/feeds/unslung/wl500g/.

In order to get the tcpdump binary onto the router, I had to unpack the ipkg file:

wget http://ipkg.nslu2-linux.org/feeds/unslung/wl500g/tcpdump_3.9.7-1_mipsel.ipk
gzip -dc tar xvzf data.tar.gz
scp opt/bin/tcpdump fw:/tmp

Finally, capturing the data is easy and since we're dealing with DHCP traffic, there's not much worry about filling up the small /tmp filesystem on the router:

/tmp/tcpdump -w /tmp/renew.cap -v -i vlan1 -s 1500 port 67 or port 68

I copied the cap files back to my desktop and fired them up in wireshark. Not too surprising, it's clear as day the request packets are making it out, but the acknowledgement packts coming back from the DHCP server aren't making it to udhcpc.

Screen capture of wireshark displaying repeated attempts to renew the DHCP lease

Adding the explicit rule to the PREROUTING and INPUT tables, the conversation looks much less confusing:

The logs tell a similar tale. Note the lack of the full re-initialization of dnsmasq, upnpd, and the firewall script itself.

Feb 17 15:29:31 tomato daemon.info udhcpc[285]: Sending renew...
Feb 17 15:29:31 tomato daemon.info udhcpc[285]: Lease of 99.29.172.159 obtained, lease time 600

 

Podcasting the unix nerd way or Peapod for the win

Terminal
I cooked dinner for myself today and sat down at the table looking forward to streaming The Daily Show or The Colbert Report on Hulu since there’s no way I’m paying $65 a month for cable TV. As it turns out, there haven’t been any new episodes in awhile, and I like my fake news fresh off the wire, so I decided to catch up on my other fake news addiction; aggregated blog rss feeds.

I came across the TED talk for today, which is Michell Obama. Great speech by the way, check it out at: http://www.ted.com/talks/michelle_obama.html. The streaming video quality left something to be desired, so I looked around and found the HD podcast URL at podcasters.tv.

This works well with iTunes, and MediaLink is able to copy the movie file from my MacBook pro, but for some reason streaming the video usually quits part way through playback with an obscure error code.

I have my OpenSolaris, Intel Atom based file server running on a gigabit network connected up to the Playstation 3 and HDTV using MediaLink, so I decided to look for some unix tool to download the podcast which could easily be run from cron.

After some searching and research into different options, I downloaded Peapod, a wonderful python command line application, and gave it a whirl.

To my complete satisfaction, peapod runs from my home directory without requiring any piece of itself being installed into the system. The only missing dependency I ran into was urlgrabber for python 2.4. Luckily, I have easy_install installed so it was a simple matter of:
pfexec /usr/bin/easy_install-2.4 urlgrabber
Once urlgrabber was installed, setup of the podcast client was a breeze:

jmccune@rain:~$ cd ~/bin
jmccune@rain:~/bin/$ ln -s ../apps/peapod/peapod.py peapod
jmccune@rain:~/bin/$ cd ~
jmccune@rain:~$ peapod
Creating user directory: /home/jmccune/.peapod
Created a default configuration file in :
/home/jmccune/.peapod/peapod.xml
Please edit this file to contain your feeds and options.

I commented out the sample podcast and added TED in HD.
(Note: I found the feed URL by doing a “Get Info”, or clicking on the little i next to the podcast title in the podcast section of iTunes.)
Podcast Get Info Image

For the title I made it “TED Talks (HD)” and for the URL, I used http://feeds.feedburner.com/TedtalksHD.

Finally, running peapod simply works.

jmccune@rain:~$ peapod
...Spawning thread 0 for feed url http://feeds.feedburner.com/TedtalksHD
Fetching feed for TED Talks (HD)
Downloading TED Talks (HD) -- http://video.ted.com/talks/podcast/MichelleObama_2009P_480.mp4
Trackname MichelleObama_2009P_480.mp4
Savename /export/dozer/podcasts/jmccune/TED Talks (HD)/MichelleObama_2009P_480.mp4
Mime-type video/mp4

This will be added to cron to run every day a few hours before I get home from work, and MediaTomb should pickup the new content.

And now to figure out how to manually kick off a MediaTomb scan of the folder once downloading is complete.

Here are some decent feeds I’ve found so far:

peapod --addnew=http://www.hbo.com/podcasts/billmaher/podcast.xml --title="Bill Maher"

Please post more video feed URL’s, especially 720p and higher, in the comments if you have some good video podcasts worth watching on my TV.

 

Screenshot Highlights with the Gimp

Here’s my preferred method of drawing attention to screen elements in technical documentation.

Direct Link: Screen Shot Highlights

iPhone / iPod Direct Video Link

Procedure:

  1. Copy window to clipboard with ALT+PrintScreen
  2. Paste as a new image into the Gimp with CTRL+SHIFT+V
  3. Use the rectangular selection tool to select the regions you want to draw attention to.
  4. Feather the selection for effect.
  5. Create a drop shadow if desired.
  6. Insert a new, totally black layer named mask.
  7. Keeping the selection in place, select the mask layer and delete the black pixels, creating a “hole” through the layer to the underlying image of the window.
  8. Set the mask layer’s transparency appropriately.
  9. Save the image, flattening the layers.
  10. Insert the image into your word processor of choice.

The embedded screen cast was created with CamStudio, by converting the resulting AVI into an H.264 AVC MP4 file using the SUPER ffmpeg/x264 front end by eRightSoft.  The embedded player is JW FLV Media Player.  All tools are open source software.

 

LDAP Berkeley Database Recovery

DirectoryWe experienced a power outage today, caused by someone tripping the emergency power off relay to our server room. Unfortunately, emergency power off really means “power off” so our UPS did the right thing and completely cut power rather than fall back to battery backup.

It was a little bit stressful getting everything back up, but everything appears to be working fine now.

The one serious error message we ran into is the following, when bring our OpenLDAP server back up:

[root@ldap ldap]# /etc/init.d/ldap restart
Stopping slapd:                                            [FAILED]
Checking configuration files for slapd:  bdb_db_open: unclean shutdown detected; attempting recovery.
bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if errors are encountered.
bdb(dc=math,dc=ohio-state,dc=edu): PANIC: fatal region error detected; run recovery
bdb_db_open: Database cannot be opened, err -30974. Restore from backup!
bdb(dc=math,dc=ohio-state,dc=edu): DB_ENV->lock_id_free interface requires an environment configured for the locking subsystem
backend_startup_one: bi_db_open failed! (-30974)
slap_startup failed (test would succeed using the -u switch)
                                                           [FAILED]
stale lock files may be present in /var/lib/ldap           [WARNING]

Fortunately, the solution to this problem is easy enough. Just run slapd_db_recover -v in the Berkeley Database directory.

cd /var/lib/ldap
slapd_db_recover -v

Finding last valid log LSN: file: 4 offset 4818337
Recovery starting from [4][4815752]
Recovery complete at Wed Feb  6 15:33:42 2008
Maximum transaction ID 80000ba7 Recovery checkpoint [4][4818337]

After that, slapd should startup just fine.

[root@ldap lib]# /etc/init.d/ldap start
Checking configuration files for slapd:  bdb_db_open: unclean shutdown detected; attempting recovery.
bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if errors are encountered.
config file testing succeeded
                                                           [  OK  ]
Starting slapd:                                            [  OK  ]
 

Nifty Work Around for File Size Limitations of FAT32

I picked up a 250 Gig Western Digital Passport portable hard drive to keep a backup copy of my file vault home directory, among other things while I travel next week, in the somewhat-likely event something disastrous happens to my laptop.

I really like how small and portable the drive is, along with it’s USB bus powered interface. There’s no futzing around with wall warts and power supplies, it truly is plug and play.

I also really like that my PS3 recognizes the device, since I’ve transfered my entire iTunes library over to it (Huzzah, Option-Starting iTunes to select a library!). All of my H.264 AVC movies play right off of the drive on my Playstation 3 as well, which is really nice and convenient.

Copying some rather large files, specifically a 7 gig ASR Golden Master image of my demonstration PowerBook leopard OS, and the actual Leopard ISO image itself, I ran into a file size limitation of FAT32. Of course, I knew FAT32 didn’t support large files, but I’ve just been spoiled in recent years by things like this “just working.”

I didn’t want to reformat the small drive, because that would surely mean my Playstation 3 would no longer recognize the file system, so instead I opted to create a sparsebundle HFS+ formatted disk image, exactly like I would do manually for Leopard File Vault images.

The end result is that each “band” in the sparse bundle image will satisfy the limitations of FAT32, while providing a nice, secure and robust HFS+J file system to store all of the “big files” I need to carry with me.

Long live robust Disk Imaging Frameworks.

The only catch is that these files are only accessible on Mac OS X Leopard machines now, but that’s not a huge problem for me. Especially traveling to the MacWorld conference.

 

TelePort NFS Home Directory

TeleportI usually compute with n-tupel of Mac computers sitting in front of me. I have a strong aversion to clutter, despite the state of my apartment, and the power of Teleport providing seamless, encrypted keyboard sharing, a-la so called “soft KVM” utilities is a killer app for me.

Alas, I’ve found that Teleport does not work as expected when operating from an NFS Mounted Home Directory.

Trying to connect to my Laptop, nutburner (Yes, nutburner is the given name of my first generation MacBook Pro), I received the following error.

Teleport Keychain Access

UNKNOWN wants permission to sign using key “privateKey” in your keychain. Do you want to allow this?

On a working host, e.g. two machines with file vault home folders, that “UNKNOWN” will actually display as “teleportd”. I suspect whatever logic Apple is using to verify the authenticity of program binaries doesn’t work as expected over NFS.

After clicking “Always Allow” twice, I get the following error:

Teleport Connection Error

I synchronize my login.keychain, so the private key and certificate are identical between these two hosts, leading me to believe a certificate algorithm mismatch is unlikely.

In any event, my solution was to simply redirect the teleport.prefPane to a local HFS+ volume using a symbolic link.

# /Scratch is a local HFS+ volume.
mkdir -p /Scratch/mccune/Library/PreferencePanes
mv ~/Library/PreferencePanes/teleport.prefPane \
  /Scratch/mccune/Library/PreferencePanes/
ln -s /Scratch/mccune/Library/PreferencePanes/teleport.prefPane \
  ~/Library/PreferencePanes/teleport.prefPane

Once teleport.prefPane resided on a local HFS volume, everything “just worked” perfectly.

As an alternative, you could deploy the prefPane to /Library/PreferencePanes to make teleport available to all users of the system.

 

Apache and strace /usr/sbin/httpd

TuxWorking with Apache today, I ran into an issue where the process would appear to start OK, returning a zero exit status, yet strace was showing a SIGCHLD being caught.

Needless to say, the server wasn’t actually running for any length of time, but I found the following strace command immensely helpful in figuring out the problem.

  strace -o /tmp/httpd.strace -ff /usr/sbin/httpd

Because apache spawns a number of children, strace with -ff attaches to each child and recorded the system calls in /tmp/httpd.strace.$PID

As it turns out, I was receiving the following error in the child processes:

    bind(5, {sa_family=AF_INET, sin_port=htons(443), sin_addr=inet_addr("0.0.0.0")}, 16) \
    = -1 EADDRINUSE (Address already in use)
 

Simplify Media – Rockin’ on the Road

Simplify Media IconAs a system administrator with a very large music collection, I’ve always been mildly irritated at the difficulty accessing my “master” music library while away from home.

Enter Simply Media a free, small application which does just as the name promises.

My iTunes library back home just shows up in my shared iTunes listing, regardless of where I am. No firewall hackery, nothing to configure, it just works, and works well.

Simplify Media

The iTunes integration is fantastic.

 

Large Backups with Bacula: /tmp Overfilling

I’ve run into several problems backing up our central file servers with Bacula, mostly centered around the sheer number of files (~6 million) a single job must process and store into the MySQL catalog.

I ran into the following error last night, attempting to back up the entire 6TB array as a single job:

  07-Nov 18:10 backup-dir JobId 3: Fatal error: sql_create.c:732 sql_create.c:732 insert INSERT INTO batch VALUES (1580771,3,'/Volumes/0/export/users/kodama/Desktop/GAP/gap4r4/small/small2/','sml800.z','OAAAD DkeW IGk B ih C+ A KZn BAA BY BHLtzL 1sNQO BFnqZZ A A C','0') failed:
  Incorrect key file for table '/tmp/#sql2459_94_0.MYI'; try to repair it

After doing a bit of research, I’ve concluded the /tmp volume, which is only a 256M tmpfs partition is filling to capacity before the job is able to complete.

Restarting the job this morning confirms MySQL is spooling data into /tmp.

  [root@backup tmp]# ls -l /tmp/
  total 332
  -rw-rw---- 1 mysql mysql 319276 Nov  8 09:48 #sql511e_3_0.MYD
  -rw-rw---- 1 mysql mysql   1024 Nov  8 09:48 #sql511e_3_0.MYI
  -rw-rw---- 1 mysql mysql   8722 Nov  8 09:48 #sql511e_3_0.frm

My solution for the time being is to reconfigure mysql to use /var/tmp for it’s temporary storage, rather than /tmp. This places the data on a much larger file system.

# /etc/my.cnf
[mysqld]
tmpdir=/var/tmp

I’m also planning to split the job into smaller jobs, using regular expressions to include only pieces of the home directory tree at a time. This will keep the number of files each job needs to handle under a reasonable threshold.