IBM’s Hadoop distribution

My work for the past couple years has been to develop DB2 images and templates for various cloud platforms and to engage the DB2 community online. This is still the case, but increasingly I’m spending my time working with IBM InfoSphere BigInsights.

BigInsights is IBM’s distribution of Hadoop.

What’s Hadoop? It’s a great way to crunch through massive amounts of unstructured data like email archives, geographic stuff, economic measurements, and so on to find interesting patterns. It rests on the Map-Reduce algorithm, which is what Google uses when you search. Much Google’s success rests on Map-Reduce’s ability to scale out on commodity hardware.

(Notably, the whole Cloud Computing thing is the flip side of using massive arrays of commodity hardware. Since you have so much of it, you need a way to automate and abstract the management as much as possible. Since you’ve automated and abstracted away management, you might as well sell it as a service.)

Hadoop itself is an Apache Software Foundation project nurtured by Yahoo among others. It’s gaining an increasing number of commercial distributions including Cloudera, IBM, and now Hortonworks.

You can quickly try out BigInsights Basic on the cloud or download it to your own machine.

I really do recommend the macro that