MPICH2 Tight Integration  //  Tuesday, December 16, 2008

After getting SGE installed, I spent some time getting familiarized with the way it works. I am a little concerned with the ease with which a queue can be put into the Error state, and the necessity of an administrator to clear it. I will be working more on this in coming weeks.

The issue at hand is, of course, MPICH2 tight integration with SGE. I was using the howto on the Sun SGE site as a reference, but of course had to roll my own and Now that I've gotten all the kinks worked out, MPICH2 is successfully started when a parallel environment type of 'mpich2' is requested in the qsub script.

For example:

# Shell to execute this job under
#$ -S /bin/bash
# Name of this job
#$ -N mpihw
# My username
#$ -A caf
# Specifying the mpich2 parallel environment
#$ -pe mpich2 8

PROCS=$((NHOSTS * 2)) # Get the number of processors
/grid/mpich2/bin/mpiexec -n $PROCS ~caf/bin/mpihw
The above will execute in parallel an MPI Hello World program. Again, it concerns me that if this program were to hang, core dump, segfault, be killed, fart, or otherwise exit uncleanly, the queue in question will be put into an error state, and not be usable until a grid administrator clears it. Before I figured out the nifty qstat -f and qstat -j $PID commands, I spent many an hour scratching my head over why SGE was complaining about not having enough available queues to run the above script.

On to GridSphere. GridSphere is a JSR 168 compliant portlet container, and apparently the best available. However, compiling this thing was the stuff of nightmares, as every single time I tried, I found a new unescaped string in a JSP page. Well, the reason for this is that I like to keep things up-to-date, bleeding edge, etc. GridSphere was written in a time where unescaped double quotes were acceptable by Tomcat's standards, but as of Tomcat 5.5.26, it no longer is. Thank God Almighty for Google, because I scoured pages and pages of forums and mailing list archives and release notes before stumbling across a single line buried deep in the bowls of the internet that described this issue. That being said, I popped over to the Tomcat 5 archive page and pulled down 5.5.25, copied my tomcat-users.xml into the appropriate directory, and ln -s'd the new version to /usr/local/tomcat. I've never been so happy to watch old software start up.

Now that I've gotten GridSphere installed, I hope to get GridPortlets installed, but as of yet, haven't found a way to get it becuase the host site has been tossing me a "Bad Gateway" for almost two weeks now. WTF?!

Note: If someone clicks that link and doesn't get a bad gateway error, kindly contact me and let me know.

That's all for now. I'm off to Ghana for two weeks in the morning, so more updates will need to wait until I get back on the 2nd.

Merry Christmas!


posted by Christian @ 8:02 AM

Site Design Copyright © 2008 Christian Funkhouser

Site used in accordance with the Elon University Web Policy.

Make note of this disclaimer.