Blog

  • Circular dependency detected while autoloading constant

    I recently ran into this frustrating and intermittent error in Ruby on Rails 4 (JRuby, actually):
    Circular dependency detected while autoloading constant

    Googling turned up several articles advising one to abide by the Rails conventions, but that was not the issue.

    The application I’m writing uses background threads. The problem shows up when trying to instantiate a Rails controller in one of them. Rails searches for object definitions dynamically, so when multiple threads are trying to instantiate the same object, there’s a race condition.

    The fix is define a mutex for access control:

    $thread_mutex = Mutex.new

    And then use it in the threads when instantiating the controller, model, or some Ruby class:

    mc = nil
    $thread_mutex.synchronize do
      mc = MyController.new
    end
  • Adobe password breach as the world’s greatest crossword puzzle

    Adobe was recently breached and 150,000,000 user accounts were stolen. Adobe was following the one of the worst practices of password storage — reversible encryption (rather than hashing with a salt using a good, slow algorithm like bcrypt). A very, very old throwaway password of mine was among those leaked.

    XKCD has referred to this breach as The Greatest Crossword Puzzle in the History of the World!

    It was bound to happen eventually. This data theft will enable almost limitless [xkcd.com/792]-style password reuse attacks in the coming weeks. There's only one group that comes out of this looking smart: Everyone who pirated Photoshop.

    With the help of LastPass’ Has Adobe Leaked My Password, let me illustrate why:

    The following hints have been used by other people that share your password. This information could be used to determine your password as well.

    • Life, Universe, Everything
    • life?
    • DA
    • h2g2
    • hitchiker’s guide to the galaxy
    • yes
    • meaningoflife
    • theusual
    • everything
    • hitchhiker
    • dolphins
    • gta
    • a4
    • answer
    • meaning?
    • life
    • the answer
    • the question of life
    • HGTTG
    • meaning of life
    • the usual
    • life..
    • life the universe and everything
    • a2lae
    • the ultimate
    • Hitchhiker
    • What’s the answer?
    • hitchhikers?
    • Life the Uni and Every
    • life meaning and flower
    • common
    • douglas adams
    • a?
    • maiden
    • lotr no #
    • Adams question
    • Hitchhiker’s Guide
    • answer?
    • question
    • Life Meaning
    • adams
    • life universe everything
    • HHGTTG
    • the number
    • towel
    • typical
    • The Usual
    • How many roads must a man walk down?
    • Life, the universe, and everything
    • What is the meaning of life, the universe and all?

    Would you care to guess what password the naive, young me used for Adobe?

    Next steps

  • The specified bucket is not S3 v2 safe

    I ran into this error when running ec2-upload-bundle:The specified bucket is not S3 v2 safe (see S3 documentation for details)This was due to uppercase letters or underscores. Later I also ran into an issue with periods in bucket names which showed up as this error message:ERROR: Error talking to S3: Server.AccessDenied(403): Access DeniedHere is an easy command to sanitize the bucket names:

    sanitized_name=$( echo $name | tr [:upper:] [:lower:] | tr [:punct:] - )

    It will lowercase all letters and convert all punctuation to dashes.

  • CN Tower Climb for United Way

    On October 22, I’ll be climbing the CN Tower stairs for United Way. Any contribution is appreciated.

    cn-tower

  • IBD-3475A Crunch Big Data in the Cloud with IBM BigInsights and Hadoop

    I’m teaching a hands-on lab at Information on Demand 2013. I will edit the post to include lab materials closer to the date.

    Session: IBD-3475A Crunch Big Data in the Cloud with IBM BigInsights and Hadoop
    Time: Thu, 7/Nov, 10:00 AM – 01:00 PM
    Location: Mandalay Bay South Convention Center – Shorelines B Lab [Room 15]

    First step

    Please request a lab environment. We will use a Hadoop environment hosted in the cloud. Each attendee will be provided with a personal environment.

    Lab materials

    [slideshare id=27862820&w=427&h=356&fb=0&mw=0&mh=0&style=border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;&sc=no]

    [slideshare id=27862878&w=476&h=400&fb=0&mw=0&mh=0&sc=no]

  • Machine learning with Mahout and Hadoop session

    Tonight I attended a session about machine learning with Mahout at BNotions. The session was organized through the Toronto Hadoop User Group.
    Quick Notes
    • BNotions uses Hadoop and Mahout for their Vu mobile app. Vu is a smart news reader that recommends articles based on article similarity to things you like as well as user similarity to you.
    •  Graph theory and graph processing algos are helpful for this work.
    •  Likes, dislikes, reads, skips are the most important input for their machine learning. Also relevant: user preference for breadth of topics vs depth; recency; natural language processing to extract topic keyword and organize topics by similarity.
    •  Redis is used for transient storage. It has some useful ops above just key-value. They use S3 as a data warehouse, but it could just as easily be HDFS.
    •  They use Amazon EMR as the Hadoop cluster. EMR constrains technology choice. For example, harder to use HDFS, hence Redis instead. They are evaluating HBase as an alternative — performance differences not relevant for use case.
    •  They don’t currently adjust for article length as factor in recommendations.
    •  They use a third party API for NLP, not Hadoop specidically. Only once per article, so not a bottleneck yet. Not happy with NLP quality, though.
    •  Cascalog/JCascalog to query the Hadoop data using Scala.
    •  Scalability is limited by cost, not capability. May switch from EMR to dedicated cluster,  etc as cost grows.
    •  Data science 10%, engineering 90%. Stock algos for rapid application development, tweak after. Deployment (my own specialty!) can be painful.
    •  Service-oriented architecture (SOA) helps with deployment. Simplifies components, but adds a devops layer. Jenkins is used to automate builds.
  • Have bash warn you about uninitialized variables with set -u

    By default, Bash treats uninitialized variables the same way as Perl — they are blank strings. If you want them treated more like Python, you can issue the following command in your bash script:

    set -u

    You will then start seeing warning messages like the following:

    ./my_script.sh: line 419: FOO_BAR: unbound variable

    Note that this mean you can’t check for the non-existence of environment variables with a simple [[ -z “$ENVIRONMENT_VARIABLE” ]]. Instead, you could do something like the following:

    [[ $( set | grep "ENVIRONMENT_VARIABLE=" | wc -l ) -lt 1 ]]

     

  • Set PuTTY defaults, permanently

    PuTTY or one of its forks is a standard tool for administering Unix and Linux machines from Windows. It provides SSH connectivity for command line access, as well as keypair management for compatible programs like WinSCP.

    Unfortunately, PuTTY has some terrible defaults. For example, it limits itself to 200 lines of scrollback by default, which guarantees that you’ll lose some history in most SSH sessions.

    There’s a way to fix this and other defaults.

    First, load the “Default Settings” saved session:1-load-default Then, configure the defaults as you like. For example, I’m increasing my lines of scrollback from 200 to 20,000: 2-configure

    Then, save the new default settings:

    3-save-default

    PuTTY will now have a sensible defaults whenever you’re connecting to a random server.

  • Hardening WordPress against the ongoing brute-force attack

    There’s an ongoing brute-force attack against WordPress and Joomla sites. The attack tries to brute-force the admin password. (Reddit)

    I had to harden my WordPress some time ago. Here are the guides I followed when hardening my installation:

    Additional steps I’ve taken today:

  • Alternatives to Gmail?

    Now that I’ve moved from Google Reader to Fever, I’d like to reduce my reliance on other Google services. Switching from Google search to Bing is pretty easy, but I’m on much less sure ground when it comes to replacing Gmail.

    Requirements:

    • Paid service (If you aren’t paying, you are the product, not the customer)
    • Search-driven interface
    • Reasonable limits on message and mailbox size

    I’ve heard of HushMail. Is there anything else worthwhile?

    Edit: HushMail is a no-go. It doesn’t have a way to set up a filter or rule to automatically file incoming mail.