Python Library of the Day: retrying

Python logoI’ve learned through extensive experience that Bash is the wrong choice for anything longer than a few lines. I needed to write a command line app, so I put one together in Python — Python 3 of course, as Python 2 is going away by 2020. In the process I discovered a new to me Python library called retrying.

If you want to learn Python, check out the Python for Data Science course on Cognitive Class.


I needed my Python code to repeat a bunch of operations until they succeeded. It’s easy to write a naive loop for that, but the logic gets convoluted and makes the actual operation ugly to look at. By the time you do something three times over, you should automate.

You can of course write an abstraction yourself, but for this sort of common problem it is best to use an existing library.

XKCD comic on automation

The benefit to using an existing library is not just that someone else maintains it, but also that you benefit from the collective wisdom and experience of everyone else using the library. Computing is full of strange edge cases and unexpected security holes. These are harder to avoid when rolling your own abstraction.

For my purpose, I found a Python library called retrying. It provides a simple decorator called @retry that you can apply to any function or method. The decorator also takes additional parameters so you can configure all the timeouts, intervals, exponential decay, and smart exception handling that you want.

Kudos to everyone working on the library. It’s a great little tool.

Kudos to Firefox team on Quantum release

The new Firefox Quantum release is incredibly fast. It feels faster than Chrome, faster than old Firefox, and faster than all the other browsers on my Macbook.

Impressively, despite Firefox ditching the old extension model, all my extensions continue to work. I did have to manually reinstall the indispensable Tree Style Lab, but it works and Firefox is incredibly speedy.

Kudos on the great effort!

My Firefox Extensions

Cryptocurrency and irreversible transactions

There’s a current news story about a wallet blunder freezing up $280,000,000 of Ether, a cryptocurrency. I try to avoid posting too much opinion on my blog, but I do have a view on this.


A cryptocurrency like Bitcoin or Ether is based on the idea of unbreakable contracts and irreversible transactions. This is great in many contexts, but somewhat scary to me as consumer should I ever choose to pay for something using a cryptocurrency.

If you want to know more about cryptocurrency and Blockchain, you should check out the Blockchain Essentials course on Cognitive Class.

Mostly Harmless

I think this Douglas Adams parable about the design problem of un-openable windows applies to many things in tech, including cryptocurrency:

…all the windows in the buildings were built sealed shut. This is true.

While the systems were being installed, a number of people who were going to work in the buildings found themselves having conversations with Breathe-o-Smart systems fitters which went something like this:

“But what if we want to have the windows open?”

“You won’t want to have the windows open with new Breathe-o-Smart.”

“Yes but supposing we just wanted to have them open for a little bit?”

“You won’t want to have them open even for a little bit. The new Breathe-o-Smart system will see to that.”


“Enjoy Breathe-o-Smart!”

“OK, so what if the Breathe-o-Smart breaks down or goes wrong or something?”

“Ah! One of the smartest features of the Breathe-o-Smart is that it cannot possibly go wrong. So. No worries on that score. Enjoy your breathing now, and have a nice day.”

It was, of course, as a result of the Great Ventilation and Telephone Riots of SrDt 3454, that all mechanical or electrical or quantum-mechanical or hydraulic or even wind, steam or piston-driven devices, are now requited to have a certain legend emblazoned on them somewhere. It doesn’t matter how small the object is, the designers of the object have got to find a way of squeezing the legend in somewhere, because it is their attention which is being drawn to it rather than necessarily that of the user’s.

The legend is this:

“The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair.”

Integrate your Rails app with Open edX SSO and Oauth2

Earlier this year, I put together the omniauth-cognitiveclass gem for integrating Rails apps with Open edX SSO (single sign-on). OpenEdX is an open source platform for running massive online open courses (MOOCs). Ruby on Rails is a web application framework. I develop the services and infrastructure for IBM Cognitive Class which is partly based on Open edX.

Cognitive Class

IBM Cognitive ClassCognitive Class has a whole bunch of learning paths and courses covering topics like data science, deep learning, and machine learning. My role on the team is to architect and develop the hands-on labs environment, which I think is one of the best in the industry. We provision a full suite of industry tools on demand for any student looking to do data science exercises.

Open edX

I’m assuming that your Open edX online course system is set up as an Oauth2 authentication provider.

Ruby on Rails

Behind the scenes at Cognitive Class, we use a mix of micro-services and web applications built in Ruby, Python, and Node.js to manage the infrastructure. Rails is a great framework for creating web services or web applications.

The usual way to add authentication to Rails is using the devise and omniauth gems.

Open edX SSO

The omniauth-cognitiveclass gem is a plugin for that extends omniauth to support Open edX SSO with Oauth2 as an authentication provider. I’ve deployed it in production with Cognitive Class, but it should work generally for all Open edX. Let me know if you run into any issues.



Brace expansion to match multiple files in Bash

Bash has handy brace expansion powers that I’ve belatedly discovered.

$ echo I love hippo{griffs,potamuses,dromes}
I love hippogriffs hippopotamuses hippodromes

For example, you can quickly diff a file with and without a suffix:

$ echo diff .env{,.example}
diff .env .env.example

Or tail multiple log files:

$ echo tail -f /var/log/{messages,secure}
tail -f /var/log/messages /var/log/secure

Bash brace expansion can do other things too, such as specify a range with a .. operator.

Command line client for Sentry (Bash)

Sentry is a great error aggregation service. We use it for every service we deploy at work. It lets us monitor and troubleshoot incidents of errors. It also integrates nicely with Slack, a messaging tool we use for everything.

It integrates nicely with Javascript, Ruby, and Python stacks among others — but as a RESTful service you can also access it directly from the command line.

Sometimes, you start writing a Bash shell script that grows so much that you need to start logging errors in a central error aggregation service. Frankly, that’s a sign that you should have picked a different language for the initial implementation, whether Python or Ruby or something else more robust. However, once you have such a Bash script, porting it arguably becomes as problematic as instrumenting it.

Bash client for Sentry

A quick Google search turns up more feature-complete attempts at a command line client. You may want to follow the link and use that instead.

Still, for posterity, here’s something I used to instrument such a Bash script last year:

# Install dependencies on Alpine Linux
apk --no-cache add --virtual build-dependencies gcc python-dev musl-dev
pip2 install httpie

# Transform a Sentry DSN into useful components
trim() {
    local var="$*"
    # remove leading whitespace characters
    # remove trailing whitespace characters
    echo -n "$var"
SENTRY_DSN=$(trim "${SENTRY_DSN:-}")
SENTRY_KEY="$(echo $SENTRY_DSN | sed -E "s@^.*//(.*):.*@\1@g")"
SENTRY_SECRET="$(echo $SENTRY_DSN | sed -E "s@^.*:(.*)\@.*@\1@g")"
SENTRY_PROJECT_ID="$(echo $SENTRY_DSN | sed -E "s@^.*/([0-9]*)@\1@g")"

# Bash function to report errors to Sentry
# Usage:
# report_error "${FUNCNAME[0]}:$LINENO" "Uh oh, spaghettios!"
report_error() {
  [[ -z "${SENTRY_DSN:-}" ]] && return

  declare culprit
  declare timestamp
  declare message
  declare x_sentry_auth
  declare referer
  declare body
  declare url
  declare content_type

  timestamp=$(date +%Y-%m-%dT%H:%M:%S)

  x_sentry_auth="X-Sentry-Auth:Sentry sentry_version=5"

  content_type="Content-Type: application/json"


  body=$(cat <<BODY
  "culprit": "${culprit:?}",
  "timestamp": "${timestamp:?}",
  "message": "${message:?}",
  "tags": {
  "exception": [{
    "type": "BackupError",
    "value": "${message:?}",
    "module": "${BASH_SOURCE[0]}"

  echo "$body" | http POST "${url:?}" "${x_sentry_auth:?}" "${referer:?}" "${content_type:?}"



Update OpenLDAP SSL certificate on CentOS 6

You may need to update your OpenLDAP SSL certificate, as well as the CA certificate and signing key on a regular basis. I ran into an issue that was ultimately resolved by doing that.

Connections to an OpenLDAP server I administer stopped working with this error:

ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

The server itself was up and the relevant ports were accessible. In fact, unencrypted LDAP continued to work while LDAPS saw the error above.

I restarted slapd with -d 255 flag (-d 8 is sufficient for this error) and started seeing this error:

TLS: error: could not initialize moznss security context - error -5925:The one-time function was previously called and failed. Its error code is no longer available TLS: can't create ssl handle.

At the start of the log, I saw several related errors including this one:

... is not valid - error -8181:Peer's Certificate has expired..

Ultimately this meant that I had to replace not just my certificate but also the CA certificate and the signing key in OpenLDAP’s moznss database. I believe my CA’s certificate had to be replaced because of the SHA1 retirement last year.

The steps I had to follow were surprisingly involved and undocumented:

  • Upload the new certificates to /etc/openldap/ssl
  • cd /etc/openldap/certs
  • List the existing certificates in the database:
certutil -L -d .
  • Remove the existing certs:
certutil -D -d . -n "OpenLDAP Server"

certutil -D -d . -n "My CA Certificate"
  • Load the new OpenLDAP SSL certificate and CA certificate:
certutil -A -n "OpenLDAP Server" -t CTu,u,u -d . -a -i ../ssl/my_certificate.bundle.crt

certutil -A -n "My CA Certificate" -t CT,C,c -d . -a -i ../ssl/my_CA_certificate.intermediate.crt
  • Verify:
certutil -L -d .
  • Convert the key to pkcs12 format:
openssl pkcs12 -export -out ../ssl/my_certificate.key.pkcs12 -inkey ../ssl/my_certificate.key -in ../ssl/my_certificate.bundle.crt -certfile ../ssl/my_CA_certificate.intermediate.crt
  • Import the signing key:
pk12util -i ../ssl/my_certificate.key.pkcs12 -d .

# Database password is in /etc/openssl/certs/password

# Key password is what you set above
  • Restart slapd:
service slapd restart

I hope that’s enough to help anyone facing the same problem on CentOS, RHEL, Fedora, and possibly other distros.

See Also

Fix full Ubuntu /boot partition

Linux kernel images are stored on a separate partition mounted under /boot. This partition can fill up, at which point you can no longer install any software updates.

Ubuntu (and possibly Debian and Mint) has a command called purge-old-kernels that helps to prevent you from ever getting in that situation. Similarly, RHEL/CentOS/Fedora have a command called package-cleanup.

However, if your /boot partition is already full, purge-old-kernels won’t work. You will need to run something like the following:

dpkg --list 'linux-image*' | cut -d' ' -f3 | grep linux-image | grep -v "$(uname -r)" | grep "[0-9]" | xargs dpkg -r --force-depends

apt-get -fy install

purge-old-kernels -y



Enable better git diffs on the Mac

I am pretty excited about the release of git 2.9. It brings several new features that make reviewing changes easier and more sensible.  It has better change grouping, and it can highlight individual changed words. Everyone should set these configuration options to enable better git diffs.

Upgrade your git

Before you can enable the new settings, you have to upgrade your git installation.

If you already have git installed through homebrew, you can upgrade it as follows:

brew update && brew upgrade

If you do not have git installed through homebrew, you’ll want to override your ancient Mac git by installing it as follows:

brew install git

Enable better git diffs

Once you have upgraded your git, you can put the new configuration in place.

The first major change is an improvement to how git groups changes in a diff. When you add a new block of code, it’s now likelier to see the whole block as a change rather than misinterpreting it as an insertion splitting an existing block into two.

Bad change grouping in old git
Enable better git diffs by configuring git to group changes together

The second change is the addition of more places for you to hook in the diff-highlight utility.

diff-highlight post-processes your diffs to add more highlighting to the specific changes between two lines when you just change a few words in a line.

Enable better git diffs by integrating diff-highlight utility to highlight individual word changes

You can enable all of these by running the following commands in your terminal:

git config --global diff.compactionHeuristic 1

git config --global pager.log "`brew --prefix`/share/git-core/contrib/diff-highlight/diff-highlight | less"
git config --global "`brew --prefix`/share/git-core/contrib/diff-highlight/diff-highlight | less"
git config --global pager.diff "`brew --prefix`/share/git-core/contrib/diff-highlight/diff-highlight | less"

git config --global interactive.diffFilter "`brew --prefix`/share/git-core/contrib/diff-highlight/diff-highlight"

The configuration will persist in a ~/.gitconfig file.

You should now have easier to read and compare git diffs. I certainly appreciate the changes.


656x Faster JSON Parsing in Python with ijson

I profiled my team’s Python code and identified a performance bottleneck in JSON parsing. At two points, the code used the ijson package in a naive way that slowed down terribly for larger JSON files. It’s possible to achieve much faster JSON parsing without changing any code.

Data Scientist Workbench

My team builds Data Scientist Workbench, which is a free set of tools for doing data science.  It includes Jupyter and Zeppelin interactive notebooks as well as R Studio IDE all pre-configured to work with the Spark parallel data processing framework.

Behind the scenes, Data Scientist Workbench is composed of microservices. Some of them are built in Ruby, some in Node, and some in Python.

Faster JSON Parsing

Faster JSON parsing is possible in PythonJSON is a convenient format for serializing data. It originates from a subset of JavaScript Object Notation. Most languages have several libraries for reading and writing JSON.

ijson is a great library for working with JSON files in Python. Unfortunately, by default it uses a pure Python JSON parser as its backend. Much higher performance can be achieved by using a C backend.

These are the available backends:

  • yajl2_cffi: wrapper around YAJL 2.x using CFFI, this is the fastest.
  • yajl2: wrapper around YAJL 2.x using ctypes, for when you can’t use CFFI for some reason.
  • yajl: deprecated YAJL 1.x + ctypes wrapper, for even older systems.
  • python: pure Python parser, good to use with PyPy

Assuming you have yajl2 installed, switching from the slow, pure Python parser to a faster JSON parser written in C is a matter of changing this line:

import ijson

To this line:

import ijson.backends.yajl2 as ijson

All other code is the same.

Installation of yajl2

Before you can use yajl2 as a faster JSON parsing backend for ijson, you have to install it.

On Ubuntu, you can install it as follows:

apt-get -qq update
apt-get -y install libyajl2 libyajl-dev
pip install yajl-py==2.0.2

On the Mac, you can install it as follows:

brew install yajl
pip install yajl-py==2.0.2

Performance Micro-benchmark

Other people have benchmarked ijson before me.

I did see a huge performance improvement with a specific 4MB JSON file (a Jupyter notebook), so it makes sense to measure that specifically.

Here’s the very simple code that I will use to measure the performance of parsing JSON with ijson:


import ijson

# Do this 10 times
for i in range(0, 10):
    print "Starting parse #%i" % (i)
    json = ijson.parse(open('4MB.ipynb', 'r'))
    for prefix, event, value in json:

The result of the first run:

$ time python
Starting parse #0
Starting parse #1
Starting parse #2
Starting parse #3
Starting parse #4
Starting parse #5
Starting parse #6
Starting parse #7
Starting parse #8
Starting parse #9

real    20m52.592s
user    14m37.860s
sys    6m6.768s

After changing to yajl2 as the parser:

$ time python
Starting parse #0
Starting parse #1
Starting parse #2
Starting parse #3
Starting parse #4
Starting parse #5
Starting parse #6
Starting parse #7
Starting parse #8
Starting parse #9

real    0m1.910s
user    0m1.784s
sys    0m0.085s

That’s 656x or 65600% faster!

I should mention that the JSON I’m parsing contains 4MB of escaped JSON represented as a string within actual JSON, so it may be an unusually bad case for the pure Python parser.