Spark Summit East 2016

Next week I’ll be demoing Data Scientist Workbench at Spark Summit East (official site) in New York. Polong Lin will be there with me. Come by the expo floor next Wednesday and Thursday and chat with us.

Data Scientist Workbench is what my team builds. It hosts open source data science tools like Jupyter, OpenRefine, R Studio IDE, Zeppelin and others for you. There’s exciting stuff in the changelog every week.

I signed up in time to get into a training session at Spark Summit East, so I’ll be spending my Tuesday working with the Wikipedia data sets. In today’s industry jargon, I’m more of a data engineer than a data scientist, so I’m hoping my Spark skills are up to the level needed for the advanced course.

This week I’m at Datapalooza Seattle, which is a good opportunity to brush up and expand those same Spark skills. In fact, we just posted the Day 1 challenge for Datapalooza. If you’re following along at home, fire up your Data Scientist Workbench, open a Jupyter notebook, and give it a try.

Spark Summit East

 

Datapalooza Seattle on Feb 9-11

On February 9 through 11, I’ll be mentoring hackers and budding data scientists at Galvanize during Datapalooza Seattle. It should be a great conference covering topics like things like machine learning, natural language processing, and data engineering infrastructure.

Last year’s Datapalooza in San Francisco was a fantastic event with lots of in-depth sessions. I was impressed with the range of material on data science and data engineering. The upcoming Datapalooza Seattle looks equally as fascinating.

My team at work runs  Data Scientist Workbench which is free hosted suite of open source tools including Jupyter, Zeppelin, R Studio IDE, and OpenRefine. We also organize free data science education through Big Data University.

I’m expecting Antonio Cangiano, Polong Lin, and Leon Katsnelson to be at Datapalooza with me as fellow mentors.

Let me know if you’re in Seattle at the same time and we’ll connect.

Datapalooza Seattle

Adobe password breach as the world’s greatest crossword puzzle

Adobe was recently breached and 150,000,000 user accounts were stolen. Adobe was following the one of the worst practices of password storage — reversible encryption (rather than hashing with a salt using a good, slow algorithm like bcrypt). A very, very old throwaway password of mine was among those leaked.

XKCD has referred to this breach as The Greatest Crossword Puzzle in the History of the World!

It was bound to happen eventually. This data theft will enable almost limitless [xkcd.com/792]-style password reuse attacks in the coming weeks. There's only one group that comes out of this looking smart: Everyone who pirated Photoshop.

With the help of LastPass’ Has Adobe Leaked My Password, let me illustrate why:

The following hints have been used by other people that share your password. This information could be used to determine your password as well.

  • Life, Universe, Everything
  • life?
  • DA
  • h2g2
  • hitchiker’s guide to the galaxy
  • yes
  • meaningoflife
  • theusual
  • everything
  • hitchhiker
  • dolphins
  • gta
  • a4
  • answer
  • meaning?
  • life
  • the answer
  • the question of life
  • HGTTG
  • meaning of life
  • the usual
  • life..
  • life the universe and everything
  • a2lae
  • the ultimate
  • Hitchhiker
  • What’s the answer?
  • hitchhikers?
  • Life the Uni and Every
  • life meaning and flower
  • common
  • douglas adams
  • a?
  • maiden
  • lotr no #
  • Adams question
  • Hitchhiker’s Guide
  • answer?
  • question
  • Life Meaning
  • adams
  • life universe everything
  • HHGTTG
  • the number
  • towel
  • typical
  • The Usual
  • How many roads must a man walk down?
  • Life, the universe, and everything
  • What is the meaning of life, the universe and all?

Would you care to guess what password the naive, young me used for Adobe?

Next steps

IBD-3475A Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

I’m teaching a hands-on lab at Information on Demand 2013. I will edit the post to include lab materials closer to the date.

Session: IBD-3475A Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
Time: Thu, 7/Nov, 10:00 AM – 01:00 PM
Location: Mandalay Bay South Convention Center – Shorelines B Lab [Room 15]

First step

Please request a lab environment. We will use a Hadoop environment hosted in the cloud. Each attendee will be provided with a personal environment.

Lab materials

Posting more frequently

I’ve been really enjoying Rafe Colburn’s technical blog since he made his pledge to post more frequently. It makes a lot of sense for a technical blog to also have linkblogging with brief commentary within the same stream of content. I would argue that the appeal of sites like Reddit and Hacker News relates to people doing the same en masse.

Naturally, I’ve also been doing some techie linkblogging on my Twitter account.

CASCON 2012

On November 6, 2012, I’m teaching a hands-on lab at CASCON together with Bradley Steinfeld and Marius Butuc. The lab is called Crunching Big Data with Hadoop and BigInsights in the Cloud. The lab is based on the Hadoop Fundamentals course at Big Data University.

Morning

1.0 Welcome
1.1 What is Big Data?
1.2 Lab Setup
Setup Lab
Setup Lab (PDF Download)
1.3 What is Hadoop?
1.4 Hadoop Architecture – HDFS
HDFS Lab
Lab (PDF Download)
1.5 Hadoop Architecture – MapReduce
MapReduce Lab
Lab (PDF Download)

Afternoon

1.6 Pig, Hive, and Jaql
Pig, Hive, and Jaql Lab
Lab (PDF Download)
1.8 Working with BigInsights
Web Console Lab
Web Console Lab (PDF Download)
1.9
Data Discovery with BigSheets

Module 1.7 covers Flume. It’s available for free on Big Data University.