Materials for the Hadoop workshop at CASCON

This is the syllabus for the workshop I’m chairing at CASCON 2011 with @mariusbutuc and @bsteinfe. If you’re interested, you can also take the course at your own pace online at BigDataUniversity.


Attendees will be provided with access to machines running Hadoop in a cloud environment. The necessary SSH credentials will be provided in class.


Chairing a Hadoop workshop at CASCON 2011

I’ll be chairing the Crunching Big Data in the Cloud with Hadoop and BigInsights workshop at CASCON 2011 in Toronto on Wednesday, November 9th. @BSteinfe and @MariusButuc will be joining me as co-chairs.

The workshop will be an all day hands-on introduction to Hadoop, HDFS, MapReduce, Hive, and JAQL. The plan is to have ready Hadoop clusters running in the cloud for the various exercises.

Hadoop is a parallelized data processing framework. It lends itself very nicely to running in cloud environments like Amazon EC2 and IBM SCE, as the core concept is to split sophisticated queries across clusters of commodity hardware. On a basic level it’s an implementation of MapReduce in Java, but a great many tools in its eco system make it easy to formulate and execute queries on the fly.

The material will have some things in common with the free Hadoop Fundamentals course you can take on Big Data University today, though naturally adapted for the CASCON themes and with added hands-on instruction.

Next steps

Worst Google Translation ever

I was writing a response to a forum post in Russian and thought to run it through Google Translate for verification. If nothing else, it would catch the sort of misspelling I tend to make.

It surprised me by completely reversing the meaning of what I wrote:

That’s the complete opposite meaning!

This is the first time I’ve been irritated enough to correct a Google translation. It’s a common word, so I’ m not sure what could have led to the misapprehension on Google’s part.

Adopting a new WordPress theme

New theme thumbnailAfter a pointer from rc3, I read an interesting article earlier today:

In short, 9 of the top 10 Google search results for free WordPress themes provide themes full of malware and spammy links. The one site that doesn’t is the official site. Unfortunately, I have to say from experience that the free themes on the official site are consistently poor in quality.

You can verify that your current theme is free of malware by using the Theme-Check and Theme Authenticity Checker plugins.

The theme I was using before was clean, but the design quality was low. I began to consider buying a quality theme somewhere, but the article did point out two decent sites that have some quality free themes:

I can’t vouch them, as all I have to go on is the word of that article. I did end up adopting the free TypeBased theme from the latter site, and I am very happy with it so far. It’s well-designed, polished, and it integrates nicely with WordPress 3.0.

Oddly, Theme-Check does flag TypeBased as using base64_encode() and base64_decode() functions, but from what I can tell it’s in the legitimate context of an FTP API.

Blog dehacked

I apologize if you saw any strange malware warnings last Friday. Someone exploited a hole in my rather old installation of WordPress and added a nasty Javascript scriptlet to every post. They also published every one of my drafts and also truncated every post title to a single word.

Google Webmaster Tools alerted me to the situation with an email. They have since confirmed that all the malware has been removed.

This involved upgrading WordPress to 3.0 as well as going through every post to remove the payload and fix the title. I took the opportunity to purge many of the posts.

WordPress 3.x seems much nicer than my creaky old install of WordPress 2.x. The widgets are very handy, as is the integrated update functionality for both WordPress and WordPress plugins.

CASCON Deploying MediaWiki with DB2 in the Cloud workshop

I’m hosting a workshop today at CASCON. The resources for it are below.





I work on the free (as in share kegs of beer with your friends) DB2 Express-C at IBM. They have me writing demo applications and tutorials, which means I get to play with lots of neat stuff like Ruby, Python, PHP, Javascript, Dojo, etc. There’s lots of technical stuff, tips, ideas I happen upon in my projects. This seems an excellent forum to disseminate such…

BTW, IBM (or any other company) is not in any way responsible/liable/at-fault for anything I say. What I say is my own opinion.