Intro to #datascience and #spark at #ibminsight 2015

I’ll be teaching two hands-on labs at Insight 2015 in Las Vegas:

LCD-3459 Introduction to Data Science
Data science is a very popular job profile and in great demand in a wide variety of industries. You no longer need a Ph.D. in mathematics or statistics to become a data scientist. Any data professional can upgrade their skills and study data science. This lab introduces the basic concepts of data science and provides hands-on examples to help you apply these concepts.

Update: Do the Intro to Data Science Lab on Big Data University

LCD-3479 Fundamentals of Spark
Spark is one of the most important technologies for big data analytics and Spark skills are in great demand. This lab session introduces Spark fundamentals and applies the concepts using hands-on examples with a Spark cluster in cloud. You can also download a Docker image to your own laptop and run the lab projects there.

Update: Do the Fundamentals of Spark Lab on Big Data University

If you can’t make it, you can teach yourself data science at your own pace at

Setting up a new Macbook

I’ve just migrated to a new Macbook Pro as my primary work machine. As part of setting it up, I installed the following:

  • Caffeine to prevent it from going to sleep when I don’t want it to go to sleep
  • BetterTouchTool so that I can middle-click (three finger tap) to close tabs and paste in the terminal
  • f.lux to reduce eye-strain at night
  • Firefox as my web browser
  • Chrome so that I can run Authy for two-factor authentication on the desktop
  • Ditto for running Google Hangouts
  • iTerm2 as a superior, multi-pane terminal
  • Atom because it’s handy to have a text editor
  • Homebrew as an excellent package manager for installing Unix services and tools
  • Divvy for resizing windows to fractions of the screen


  • System Preferences > Trackpad > Tap to Click, so that I can click without mashing the trackpad
  • System Preferences > Accessibility > Cursor Size > Larger, so that the cursor is easy to find
  • System Preferences > Accessibility > “Reduce transparency” to get a solid menu bar
  • System Preferences > Sound > “Show volume in menu bar”, so that I can quickly adjust sound (and configure devices by option-clicking the icon)

    Option-clicking the volume icon
    Option-clicking the volume icon
  • Keychain Access > Preferences > “Show keychain status in menu bar”, so that I can quickly lock my laptop

    Quickly lock the screen
    Quickly lock the screen

Tuning wifi on Mac OS 10.10.3 Yosemite

I’ve experienced increasingly bad wifi performance on my Macbook Air over time. This has been accentuated further by me being somewhere with a lot of network lag. I did some research and the following suggestions made the wifi faster and more responsive with Mac OS 10.10.3 Yosemite.

1. Disable Bluetooth. I don’t use any wireless mice/keyboard/headsets, so Bluetooth doesn’t do anything for me.

2. Disable Facetime. Command+Space, Facetime. File > Preferences > [ ] Enable this account. I’ve never used Facetime, but from what I understand it’s Apple’s clone of Skype that only works with other Apple users.

3. Disable Handoff. System Preferences > General > [ ] Allow Handoff between this Mac and your iCloud devices. Handoff is Apple’s recent attempt to increase platform lock-in for those unfortunate souls who use non-Macbook Apple devices. Seemingly, it negatively affects network performance.

These three steps noticeably improved my wifi performance.

If that doesn’t help, there are more involved things you can do to tune Yosemite wifi performance or work around other Yosemite bugs.

I, for one, look forward to the forthcoming release of Mac OS 10.11 El Capitan.


#DUTO2015 conference

I’m at the Data Unconference in Toronto today. Jarred Gaertner just gave a through-provoking keynote on the ethics of big data, and I’m about to dig my hands into some open data sets in Richard Pietro’s hands-on session.

My colleague Polong Lin will be on a panel about IBM’s data science tools this afternoon, which should be interesting if only because it’s hard to keep track of everything that’s out there.

#DUTO2015 is sponsored by my friends at Big Data University

$10k! Spark hackathon in San Fran

I’ll be in San Francisco this weekend helping run the Apache Spark hackathon, and afterwards I’ll be at Spark Summit 2015.

If you’re curious at all about Spark, you should come out and hack with us. We’ll have some fun data sets and help you find a team.

You can take the free Spark Fundamentals course on Big Data University to brush up on your Spark skills. Spark is a framework for fast in-memory and batch analytics processing. It’s algorithmically smarter and so a lot faster than traditional Hadoop.

There’s $10k in prizes at the hackathon.

Encrypt Gmail, Facebook emails with OpenPGP and Mailvelope

Facebook just released a great use case for OpenPGP encryption in Gmail and other web email providers. You can now configure Facebook to encrypt all email it sends you with OpenPGP.

Whether or not you use Facebook, it’s surprisingly easy to use Mailvelope to integrate OpenPGP with Gmail and other email providers. Mailvelope is a browser extension for Chrome and Firefox that lets you encrypt messages that you write and decrypt messages that you receive.

The encryption can be done externally to the web email interface, so your email provider does not have access to the plain text of your email message.

OpenPGP is based on public key cryptography. You have two keys — a public key you can share with everyone, and a private key that you keep secret. Everyone can use your public key to encrypt messages they send you, but only you can decrypt them using your private key.

Why encrypt email? Email is generally transmitted in plain text across the internet, meaning a hostile party can intercept it. With the web (http) moving to encrypted connections for everything, email is left as an insecure communication medium. You as a user have to take active steps to make it secure.

Here’s how you can transparently integrate OpenPGP encryption with Gmail and Facebook:

  1. Install Mailvelope
  2. Generate a new key in Mailvelope options
  3. Go to Display Keys, click on a key, go to Export, and copy the public key
  4. Open your Facebook profile, go to About > Contact and Basic Info, and paste in the key in the PGP Public Key field
  5. Facebook will send you an encrypted notification. Mailvelope should turn your browser cursor into a golden key when you hover your cursor over the encrypted contents
  6. Tada!

You may also want to share your public key on keyservers like the MIT PGP Key Server or the PGP Global Directory. In principle, that will allow other people to send you encrypted email messages that only you can decrypt.

$7,500! Spark hackathon in Boston May 28-30

My team will be supporting the Spark hackathon in Boston on May 28-30.

This weekend’s a good chance to teach yourself Spark.

“Apache Spark is an open source processing engine built around speed, ease of use, and analytics. If you have large amounts of data that requires low latency processing that a typical Map Reduce program cannot provide, Spark is the alternative. Spark performs at speeds up to 100 times faster than Map Reduce for iterative algorithms or interactive data mining.”

Happy Memorial Day weekend!