Archive by Author

Google Plus as a successor to LiveJournal

Google+ is Google’s new social network. It takes aim at both Facebook and Twitter, but I think its unique intersection of features also positions it to succeed LiveJournal in a way that neither Facebook nor Blogger ever could.

Let’s back up a bit. What is LiveJournal?

LiveJournal, or Zhivoi Zhurnal as its Russian userbase calls it, is a social network masquerading as a blogging platform. It started in 1999, 3 years before Friendster, and provided:

  • Fine-grained access control to individual posts
  • Ability to group your friends into circles
  • One-way friend relationships (so you could be a follower of someone without them friending you back)
  • An equivalent to Facebook’s News Feed years before FB filed a patent on it

Google+ has all of these features. Not just that, but it exposes them much better than inertia-laden LiveJournal ever did, and it is amenable to long posts.

LJ is a smaller fish than Facebook, with only 32,000,000 friends of whom only 2,000,000 are active, but targeting this community could give Google a sufficient core of users to take on the Facebook behemoth.

Will LiveJournal users migrate? I’m not sure. LiveJournal satisfies several rather different demographics. 48% of its active user base is Russian, exemplified by President Dmitry Medvedev and various photoblogs. There’s a big celebrity news community called Oh No They Didn’t. There’s also a significant scifi and comics fandom userbase exemplified by Scans Daily that is already moving to a clone site called DreamWidth. Some of these groups may be more amenable to the lure of Google+ than others.

I can say that the circle of people I met back when I had a journal on Slashdot in the early 2000s migrated to G+ overnight, and LiveJournal fostered and fosters similar communities. They could very well follow and give Google the critical mass that it craves.

IBM’s Hadoop distribution

My work for the past couple years has been to develop DB2 images and templates for various cloud platforms and to engage the DB2 community online. This is still the case, but increasingly I’m spending my time working with IBM InfoSphere BigInsights.

BigInsights is IBM’s distribution of Hadoop.

What’s Hadoop? It’s a great way to crunch through massive amounts of unstructured data like email archives, geographic stuff, economic measurements, and so on to find interesting patterns. It rests on the Map-Reduce algorithm, which is what Google uses when you search. Much Google’s success rests on Map-Reduce’s ability to scale out on commodity hardware.

(Notably, the whole Cloud Computing thing is the flip side of using massive arrays of commodity hardware. Since you have so much of it, you need a way to automate and abstract the management as much as possible. Since you’ve automated and abstracted away management, you might as well sell it as a service.)

Hadoop itself is an Apache Software Foundation project nurtured by Yahoo among others. It’s gaining an increasing number of commercial distributions including Cloudera, IBM, and now Hortonworks.

You can quickly try out BigInsights Basic on the cloud or download it to your own machine.

I really do recommend the macro that

Webinar on private clouds, including stuff I’ve been working on

There’s a DB2 Chat with the Lab on Wednesday next week which will cover, among other things, the DB2 images I’ve developed for IBM Workload Deployer over this past while. I recommend checking it out, as the speakers know their stuff and this is a pretty cool product.

Easily Deploy Database Workloads on Private Clouds

Date:                 Wednesday, June 29, 2011 (29.6.2011)
Time:                 12:30 AM – 2:00 PM Eastern Time (ET)
11:30 AM Central / 9:30 AM Pacific / 17:30hrs London / 18:30hrs Frankfurt, Paris / India 10 PM
Speakers:         Sal Vella, Leon Katsnelson, Rav Ahuja, Chris Gruber

As more and more businesses look for ways to reduce costs within IT, their research typically discovers cloud computing. However Public Clouds are not an option for many enterprise workloads and applications with strict security and privacy requirements. Organizations with such needs can benefit from building Private Clouds within their own data centers. In this webcast we will look at a fast path to deploying private clouds for database workloads.

We will specifically look at instant provisioning of DB2 systems and databases in a private cloud infrastructure using the IBM Workload Deployer (IWD), previously called the WebSphere CloudBurst Appliance. We will also look at how IWD can be easily used for implementing complete web application patterns with web servers, application servers (WebSphere), and DB2 database servers with high availability options.

To learn more, please join experts from the IBM labs – Sal Vella, Leon Katsnelson, Rav Ahuja, and Chris Gruber.

Presentation charts will be available from ibm.com/db2/labchats just before the webcast starts.

Register

Worst Google Translation ever

I was writing a response to a forum post in Russian and thought to run it through Google Translate for verification. If nothing else, it would catch the sort of misspelling I tend to make.

It surprised me by completely reversing the meaning of what I wrote:


That’s the complete opposite meaning!

This is the first time I’ve been irritated enough to correct a Google translation. It’s a common word, so I’ m not sure what could have led to the misapprehension on Google’s part.

At the #hackreduce Hadoop workshop

[10:05] At Hack/Reduce. It’s at a nice loft-style office space downtown.

[10:10] Installing git, msysGit, Tortoisegit, and also the git package in cygwin on the off chance one of them proves useful.

[10:30] They have Bixi usage, ocean carbon measurements, a Wikipedia dump and other sample datasets.

Hey, wasn’t there an interesting theorem in an XKCD comic recently about Wikipedia…

Wikipedia trivia: if you take any article, click on the first link in the article text not in parentheses or italics, and then repeat, you will eventually end up at “Philosophy”. – XKCD

[10:45] Some guy: I want to work on calculating the Philosophy-distance for every article on Wikipedioa.

[10:50] Me too.:P

[11:23] Installing Eclipse, either Gradle or Ant or both, and for the hell of it a Windows PATH editor.

[11:38] OK, OK, downloading Ubuntu. I have VMWare Workstation so I’m going to skip VirtualBox.

[11:39] On further thought, I already have 2 Ubuntu vms on this computer that I haven’t used in a year or two. I’ll just use one of them.

[12:31] I’m team 14 running stuff on cluster 4. Working solo, since that’s the way I roll. :P

[12:35] Contrary to the organizers, I have it running on Windows with no problems. I’ll post instructions for getting it working on Windows as I go.

Running Hadoop and #HackReduce exercises on Windows

[13:26] A few notes for running it on Windows:

  • Obviously, have Cygwin installed with git, wget, OpenSSH and other packages
  • Add gradle, ant, and so on bin directories to your system PATH variable after installing them
  • Re-launch any open command prompt or cygwin windows after changing the PATH
  • In this command:
    • java -classpath “.:build/libs/HackReduce-0.2.jar:lib/*” org.hackreduce.examples.wikipedia.RecordCounter datasets/wikipedia /tmp/wikipedia_recordcounts
  • You need to:
    • Change both the colons : to semicolons ;
    • mkdir a tmp directory in your project folder and change the path accordingly: ./tmp/wikipedia_recordcounts

[13:34] Yum, free pizza

Parsing Mediawiki links using Java and Regex

[14:10] OH HAI GUYZ. I HEAR U LIEK REGEX. HEERS SUM REGEX.

A (likely incomplete) regular expression for finding out the target of an internal Wikipedia link:

  • \[\[[^\]\:\|]*\|([^\]\:]+)\]\]|\[\[([^\]\:]+)\]\]

Notes:

  • A Wikipedia link is of the form [[label|target]] or [[target]]
  • For the purposes of this exercise, we want to avoid capturing stuff like [[Category:name]] or [[Image:name]]
  • Not interested in {{templates}}

Let’s decompose the line noise:

  • \[\[ means we are looking for something that starts with [[
  • [^\]\:\|]  means we are looking for something that does not contain ], :, or |
  • [^\]\:\|]* means the above, but occurring between 0 and infinity times
  • \| means we are looking for something that does contain |
  • [^\]\:]+ means similar to the above, but occurring between 1 and infinity times
  • ([^\]\:]+) means we want to capture this substring
  • \]\] means we are looking for something that ends with ]]
  • | means that, alternatively, we will ignore all of the above and define a second pattern to match

Oh, and since this is Java, we have to double up the escape slashes:

  • \\[\\[[^\\]\\:\\|]*\\|([^\\]\\:]+)\\]\\]|\\[\\[([^\\]\\:]+)\\]\\]

[14:50] After a long while of running my job on my local machine, I’m now running it on the cluster against the full dataset. Whee!

Algorithm for calculating Philosophy distance

[15:51] Continuing to work at it. Talked the idea over with one of the mentors, who suggested this approach:

  • Run Map-Reduce once to generate a list of articles linking directly to Philosophy
    • (Ancient philosophy, Philosophy)
    • (Mathematics, Philosophy)
    • Etc.
  • Run Map-Reduce a second time to generate a list of articles link to articles in the first set
    • (Aristotle, Ancient philosophy, Philosophy)
    • (Democritus, Ancient philosophy, Philosophy)
    • (Euclid, Mathematics, Philosophy)
    • Etc.
  • Etc.

[16:31] I think I should wrap this up soon.

1-Philosophy set

[16:34] Here’s a list of all Wikipedia articles that are 1 link away from Philosophy.

DB2 images for IBM Workload Deployer

IBM Workload Deployer 3.0 has just come out along with the DB2 images I developed. IWD is a major revision of what was formerly known as WebSphere CloudBurst Appliance.  The idea behind it is that it lets you plug a box into your existing virtualization infrastructure (VMWare, pSeries, etc) and make it feel more like a proper private cloud with image templates, automation, reproducible deployments, monitoring, and such things.

I developed the three DB2 images  and, along with Dustin, the script packages and topology patterns that make them integrate nicely with WebSphere Application Server. Here’s a screenshot of what one of those patterns looks like in the IWD pattern editor:

Highly available cluster pattern on IWD
In addition to the images made available on the appliance, we are making five additional DB2 image templates available for download. They add Red Hat Enterprise Linux as an OS option as well as bump up the DB2 version to V9.7 FP4 and enhance the High Availability enablement on the AIX-based DB2 Enterprise image.

When you hear WebSphere folks talking about “DB2 Hypervisor Edition”, these images are what they are talking about.

I learned a lot about the arcana of Linux and AIX administration developing these, though of course learning a lot about something always highlights how much more one has to learn. Regardless, I’ve harvested some of that for a couple blogposts this past winter, and I hope to post more about it in the coming weeks.

On a side note, if you are interested in more of a Database-as-a-Service rather than Infrastructure-as-a-Service approach, IWD 3.0 also comes with Workload Pattern for DB2. It abstracts things to a higher level by letting you provision databases directly, which can be a nice option to have.

How to diff Word documents

It’s fairly straightforward to diff or compare different revisions of an Office document on Windows. The approach below applies to Word, Excel, and PowerPoint files, as well as to ones created by Lotus Symphony, OpenOffice.org, or LibreOffice.

  1. Download and install WinMerge. This is a free, open source utility.
  2. Download the xdocdiff plugin. Unzip it somewhere.
  3. Copy xdoc2txt.exe and zlib.dll to C:\Program Files\WinMerge
  4. Copy amb_xdocdiffPlugin.dll to C:\Program Files\WinMerge\MergePlugins
  5. Start WinMerge.
  6. Go to Plugins > List and check [x] Enable plugins.
  7. Go to Plugins and set it to [x] Automatic unpacking
  8. Close or restart WinMerge

You should now be able to select any two documents that you want to compare, right-click on them, and choose WinMerge to get a meaningful comparison of the textual differences between them.

If you are seeing line noise in the comparison, you need to make sure you enable the settings mentioned in steps 6 and 7 above.

Triggers in DB2 Express-C 9.7.4

My team at IBM recently released DB2 Express-C 9.7.4, the latest and greatest version of our free database.

Raul wrote up a detailed article with the technical nitty-gritty of what’s new. There’s a bunch of different improvements, but one thing that’s caught my eye are the enhancements to triggers.

A trigger is something defined to fire automatically when you insert, update, or delete a row in a table. Starting with 9.7.4, you can basically inline a whole stored procedure in the trigger definition. This is nice because it lets you keep the code for all the different actions on a table together.

Let me quote Raul’s example:

CREATE TABLE COMPANY_STATS (NBEMP INTEGER)
!

CREATE TRIGGER HIRED
 AFTER INSERT OR DELETE OR UPDATE OF SALARY ON EMPLOYEE
 REFERENCING NEW AS N OLD AS O FOR EACH ROW
   BEGIN
         IF INSERTING
         THEN UPDATE COMPANY_STATS SET NBEMP = NBEMP + 1;
         END IF;

         IF DELETING
         THEN UPDATE COMPANY_STATS SET NBEMP = NBEMP - 1;
         END IF;

         IF (UPDATING AND (N.SALARY > 1.1 * O.SALARY))
         THEN SIGNAL SQLSTATE '75000' SET MESSAGE_TEXT='Salary increase>10%';
         END IF;
   END
!

Ignore that last part. All salary increases should be > 10%.

Reduce your stress by disabling notifiers, toasts, and every sort of popup

A month ago, I disabled email notification in my Gmail notifier before doing a presentation and neglected to turn it back on later.

It took me a long time to notice the lack of notifications. What I did notice was a reduced level of stress. I was able to effectively concentrate on a single task without unimportant, offtopic notices distracting me. This is enormously important in software development.

Let me quote from Eric S. Raymond’s classic Jargon File:

hack mode n.

a Zen-like state of total focus on The Problem that may be achieved when one is hacking (this is why every good hacker is part mystic). Ability to enter such concentration at will correlates strongly with wizardliness; it is one of the most important skills learned during larval stage. Sometimes amplified as deep hack mode.

Being yanked out of hack mode (see priority interrupt) may be experienced as a physical shock, and the sensation of being in hack mode is more than a little habituating. The intensity of this experience is probably by itself sufficient explanation for the existence of hackers, and explains why many resist being promoted out of positions where they can code. See also cyberspace (sense 3).

Some aspects of hacker etiquette will appear quite odd to an observer unaware of the high value placed on hack mode. For example, if someone appears at your door, it is perfectly okay to hold up a hand (without turning one’s eyes away from the screen) to avoid being interrupted. One may read, type, and interact with the computer for quite some time before further acknowledging the other’s presence (of course, he or she is reciprocally free to leave without a word). The understanding is that you might be in hack mode with a lot of delicate state (sense 2) in your head, and you dare not swap that context out until you have reached a good point to pause. See also juggling eggs.

Joel Spolsky wrote something similar in 2001. The basic idea is that multitasking is inherently wasteful because a context switch between one complicated task and another complicated task has costs. The more often you switch between tasks, the more often you incur the overhead of a context switch.

My advice to you is this: disable your email notifier, disable your Twitter notifier, disable every other sort of notifier you have. They are never as urgent as the task at hand. You’ll only be happier and more productive.

Find out public information about the people you email with Rapportive

Rapportive is a great browser enhancement for Gmail. It automatically looks up email addresses and populates a sidebar with that person’s profile photo (from Google Talk or Flickr), job title (from LinkedIn), tweets, as well as links to their profiles on Facebook, Skype, etc.

I find it especially useful when reading mailing list messages, as it  lets me easily find the twitter accounts of interesting people.

It’s also useful for making you think critically about the information you have exposed online. By looking at your own profile, you can find out if there’s any information that you are exposing without meaning to. In my own case, I was surprised to see my ancient Flickr account from circa 2003 on it which I’ve since made private.

I think it ties in nicely with IBM’s recent study that 21% of email users would consider applications to complement email. Lotus is also cooking up a lot of neat things that integrate social media with email. A hat tip goes to Marius for the link.