Exploring Ruby and Python interactively

Both Ruby and Python offer great interactive shells, also known as REPLs (Read Eval Print Loops). These are handy for verifying snippets of code. You can invoke Python’s by simply running python or Ruby’s by running irb, jirb (for jRuby), or rails c (for Rails).

Sometimes, however, one can be mystified as to what one can do with an object or module. Lately, I’ve been finding the Ruby API documentation especially frustrating.

Fortunately, both Python and Ruby let you see what’s available. In Python, you can call the dir() function, while Ruby has the handy .methods() method.


And Ruby:


Circular dependency detected while autoloading constant

I recently ran into this frustrating and intermittent error in Ruby on Rails 4 (JRuby, actually):
Circular dependency detected while autoloading constant

Googling turned up several articles advising one to abide by the Rails conventions, but that was not the issue.

The application I’m writing uses background threads. The problem shows up when trying to instantiate a Rails controller in one of them. Rails searches for object definitions dynamically, so when multiple threads are trying to instantiate the same object, there’s a race condition.

The fix is define a mutex for access control:

And then use it in the threads when instantiating the controller, model, or some Ruby class:

Adobe password breach as the world’s greatest crossword puzzle

Adobe was recently breached and 150,000,000 user accounts were stolen. Adobe was following the one of the worst practices of password storage — reversible encryption (rather than hashing with a salt using a good, slow algorithm like bcrypt). A very, very old throwaway password of mine was among those leaked.

XKCD has referred to this breach as The Greatest Crossword Puzzle in the History of the World!

It was bound to happen eventually. This data theft will enable almost limitless [xkcd.com/792]-style password reuse attacks in the coming weeks. There's only one group that comes out of this looking smart: Everyone who pirated Photoshop.

With the help of LastPass’ Has Adobe Leaked My Password, let me illustrate why:

The following hints have been used by other people that share your password. This information could be used to determine your password as well.

  • Life, Universe, Everything
  • life?
  • DA
  • h2g2
  • hitchiker’s guide to the galaxy
  • yes
  • meaningoflife
  • theusual
  • everything
  • hitchhiker
  • dolphins
  • gta
  • a4
  • answer
  • meaning?
  • life
  • the answer
  • the question of life
  • meaning of life
  • the usual
  • life..
  • life the universe and everything
  • a2lae
  • the ultimate
  • Hitchhiker
  • What’s the answer?
  • hitchhikers?
  • Life the Uni and Every
  • life meaning and flower
  • common
  • douglas adams
  • a?
  • maiden
  • lotr no #
  • Adams question
  • Hitchhiker’s Guide
  • answer?
  • question
  • Life Meaning
  • adams
  • life universe everything
  • the number
  • towel
  • typical
  • The Usual
  • How many roads must a man walk down?
  • Life, the universe, and everything
  • What is the meaning of life, the universe and all?

Would you care to guess what password the naive, young me used for Adobe?

Next steps

The specified bucket is not S3 v2 safe

I ran into this error when running ec2-upload-bundle:The specified bucket is not S3 v2 safe (see S3 documentation for details)This was due to uppercase letters or underscores. Later I also ran into an issue with periods in bucket names which showed up as this error message:ERROR: Error talking to S3: Server.AccessDenied(403): Access DeniedHere is an easy command to sanitize the bucket names:

It will lowercase all letters and convert all punctuation to dashes.

IBD-3475A Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

I’m teaching a hands-on lab at Information on Demand 2013. I will edit the post to include lab materials closer to the date.

Session: IBD-3475A Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
Time: Thu, 7/Nov, 10:00 AM – 01:00 PM
Location: Mandalay Bay South Convention Center – Shorelines B Lab [Room 15]

First step

Please request a lab environment. We will use a Hadoop environment hosted in the cloud. Each attendee will be provided with a personal environment.

Lab materials

Machine learning with Mahout and Hadoop session

Tonight I attended a session about machine learning with Mahout at BNotions. The session was organized through the Toronto Hadoop User Group.
Quick Notes
  • BNotions uses Hadoop and Mahout for their Vu mobile app. Vu is a smart news reader that recommends articles based on article similarity to things you like as well as user similarity to you.
  •  Graph theory and graph processing algos are helpful for this work.
  •  Likes, dislikes, reads, skips are the most important input for their machine learning. Also relevant: user preference for breadth of topics vs depth; recency; natural language processing to extract topic keyword and organize topics by similarity.
  •  Redis is used for transient storage. It has some useful ops above just key-value. They use S3 as a data warehouse, but it could just as easily be HDFS.
  •  They use Amazon EMR as the Hadoop cluster. EMR constrains technology choice. For example, harder to use HDFS, hence Redis instead. They are evaluating HBase as an alternative — performance differences not relevant for use case.
  •  They don’t currently adjust for article length as factor in recommendations.
  •  They use a third party API for NLP, not Hadoop specidically. Only once per article, so not a bottleneck yet. Not happy with NLP quality, though.
  •  Cascalog/JCascalog to query the Hadoop data using Scala.
  •  Scalability is limited by cost, not capability. May switch from EMR to dedicated cluster,  etc as cost grows.
  •  Data science 10%, engineering 90%. Stock algos for rapid application development, tweak after. Deployment (my own specialty!) can be painful.
  •  Service-oriented architecture (SOA) helps with deployment. Simplifies components, but adds a devops layer. Jenkins is used to automate builds.