Adobe password breach as the world’s greatest crossword puzzle

Adobe was recently breached and 150,000,000 user accounts were stolen. Adobe was following the one of the worst practices of password storage — reversible encryption (rather than hashing with a salt using a good, slow algorithm like bcrypt). A very, very old throwaway password of mine was among those leaked.

XKCD has referred to this breach as The Greatest Crossword Puzzle in the History of the World!

It was bound to happen eventually. This data theft will enable almost limitless [xkcd.com/792]-style password reuse attacks in the coming weeks. There's only one group that comes out of this looking smart: Everyone who pirated Photoshop.

With the help of LastPass’ Has Adobe Leaked My Password, let me illustrate why:

The following hints have been used by other people that share your password. This information could be used to determine your password as well.

  • Life, Universe, Everything
  • life?
  • DA
  • h2g2
  • hitchiker’s guide to the galaxy
  • yes
  • meaningoflife
  • theusual
  • everything
  • hitchhiker
  • dolphins
  • gta
  • a4
  • answer
  • meaning?
  • life
  • the answer
  • the question of life
  • HGTTG
  • meaning of life
  • the usual
  • life..
  • life the universe and everything
  • a2lae
  • the ultimate
  • Hitchhiker
  • What’s the answer?
  • hitchhikers?
  • Life the Uni and Every
  • life meaning and flower
  • common
  • douglas adams
  • a?
  • maiden
  • lotr no #
  • Adams question
  • Hitchhiker’s Guide
  • answer?
  • question
  • Life Meaning
  • adams
  • life universe everything
  • HHGTTG
  • the number
  • towel
  • typical
  • The Usual
  • How many roads must a man walk down?
  • Life, the universe, and everything
  • What is the meaning of life, the universe and all?

Would you care to guess what password the naive, young me used for Adobe?

Next steps

The specified bucket is not S3 v2 safe

I ran into this error when running ec2-upload-bundle:The specified bucket is not S3 v2 safe (see S3 documentation for details)This was due to uppercase letters or underscores. Later I also ran into an issue with periods in bucket names which showed up as this error message:ERROR: Error talking to S3: Server.AccessDenied(403): Access DeniedHere is an easy command to sanitize the bucket names:

It will lowercase all letters and convert all punctuation to dashes.

IBD-3475A Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

I’m teaching a hands-on lab at Information on Demand 2013. I will edit the post to include lab materials closer to the date.

Session: IBD-3475A Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
Time: Thu, 7/Nov, 10:00 AM – 01:00 PM
Location: Mandalay Bay South Convention Center – Shorelines B Lab [Room 15]

First step

Please request a lab environment. We will use a Hadoop environment hosted in the cloud. Each attendee will be provided with a personal environment.

Lab materials

Machine learning with Mahout and Hadoop session

Tonight I attended a session about machine learning with Mahout at BNotions. The session was organized through the Toronto Hadoop User Group.
Quick Notes
  • BNotions uses Hadoop and Mahout for their Vu mobile app. Vu is a smart news reader that recommends articles based on article similarity to things you like as well as user similarity to you.
  •  Graph theory and graph processing algos are helpful for this work.
  •  Likes, dislikes, reads, skips are the most important input for their machine learning. Also relevant: user preference for breadth of topics vs depth; recency; natural language processing to extract topic keyword and organize topics by similarity.
  •  Redis is used for transient storage. It has some useful ops above just key-value. They use S3 as a data warehouse, but it could just as easily be HDFS.
  •  They use Amazon EMR as the Hadoop cluster. EMR constrains technology choice. For example, harder to use HDFS, hence Redis instead. They are evaluating HBase as an alternative — performance differences not relevant for use case.
  •  They don’t currently adjust for article length as factor in recommendations.
  •  They use a third party API for NLP, not Hadoop specidically. Only once per article, so not a bottleneck yet. Not happy with NLP quality, though.
  •  Cascalog/JCascalog to query the Hadoop data using Scala.
  •  Scalability is limited by cost, not capability. May switch from EMR to dedicated cluster,  etc as cost grows.
  •  Data science 10%, engineering 90%. Stock algos for rapid application development, tweak after. Deployment (my own specialty!) can be painful.
  •  Service-oriented architecture (SOA) helps with deployment. Simplifies components, but adds a devops layer. Jenkins is used to automate builds.

Have bash warn you about uninitialized variables with set -u

By default, Bash treats uninitialized variables the same way as Perl — they are blank strings. If you want them treated more like Python, you can issue the following command in your bash script:

You will then start seeing warning messages like the following:

Note that this mean you can’t check for the non-existence of environment variables with a simple [[ -z "$ENVIRONMENT_VARIABLE" ]]. Instead, you could do something like the following:

 

Set PuTTY defaults, permanently

PuTTY or one of its forks is a standard tool for administering Unix and Linux machines from Windows. It provides SSH connectivity for command line access, as well as keypair management for compatible programs like WinSCP.

Unfortunately, PuTTY has some terrible defaults. For example, it limits itself to 200 lines of scrollback by default, which guarantees that you’ll lose some history in most SSH sessions.

There’s a way to fix this and other defaults.

First, load the “Default Settings” saved session:1-load-default Then, configure the defaults as you like. For example, I’m increasing my lines of scrollback from 200 to 20,000: 2-configure

Then, save the new default settings:

3-save-default

PuTTY will now have a sensible defaults whenever you’re connecting to a random server.