Update OpenLDAP SSL certificate on CentOS 6

You may need to update your OpenLDAP SSL certificate, as well as the CA certificate and signing key on a regular basis. I ran into an issue that was ultimately resolved by doing that.

Connections to an OpenLDAP server I administer stopped working with this error:

ldap_sasl_bind(SIMPLE): Can't contact LDAP server (-1)

The server itself was up and the relevant ports were accessible. In fact, unencrypted LDAP continued to work while LDAPS saw the error above.

I restarted slapd with -d 255 flag (-d 8 is sufficient for this error) and started seeing this error:

TLS: error: could not initialize moznss security context - error -5925:The one-time function was previously called and failed. Its error code is no longer available TLS: can't create ssl handle.

At the start of the log, I saw several related errors including this one:

... is not valid - error -8181:Peer's Certificate has expired..

Ultimately this meant that I had to replace not just my certificate but also the CA certificate and the signing key in OpenLDAP’s moznss database. I believe my CA’s certificate had to be replaced because of the SHA1 retirement last year.

The steps I had to follow were surprisingly involved and undocumented:

  • Upload the new certificates to /etc/openldap/ssl
  • cd /etc/openldap/certs
  • List the existing certificates in the database:
certutil -L -d .
  • Remove the existing certs:
certutil -D -d . -n "OpenLDAP Server"

certutil -D -d . -n "My CA Certificate"
  • Load the new OpenLDAP SSL certificate and CA certificate:
certutil -A -n "OpenLDAP Server" -t CTu,u,u -d . -a -i ../ssl/my_certificate.bundle.crt

certutil -A -n "My CA Certificate" -t CT,C,c -d . -a -i ../ssl/my_CA_certificate.intermediate.crt
  • Verify:
certutil -L -d .
  • Convert the key to pkcs12 format:
openssl pkcs12 -export -out ../ssl/my_certificate.key.pkcs12 -inkey ../ssl/my_certificate.key -in ../ssl/my_certificate.bundle.crt -certfile ../ssl/my_CA_certificate.intermediate.crt
  • Import the signing key:
pk12util -i ../ssl/my_certificate.key.pkcs12 -d .

# Database password is in /etc/openssl/certs/password

# Key password is what you set above
  • Restart slapd:
service slapd restart

I hope that’s enough to help anyone facing the same problem on CentOS, RHEL, Fedora, and possibly other distros.

See Also

Fix full Ubuntu /boot partition

Linux kernel images are stored on a separate partition mounted under /boot. This partition can fill up, at which point you can no longer install any software updates.

Ubuntu (and possibly Debian and Mint) has a command called purge-old-kernels that helps to prevent you from ever getting in that situation. Similarly, RHEL/CentOS/Fedora have a command called package-cleanup.

However, if your /boot partition is already full, purge-old-kernels won’t work. You will need to run something like the following:

dpkg --list 'linux-image*' | cut -d' ' -f3 | grep linux-image | grep -v "$(uname -r)" | grep "[0-9]" | xargs dpkg -r --force-depends

apt-get -fy install

purge-old-kernels -y



Enable better git diffs on the Mac

I am pretty excited about the release of git 2.9. It brings several new features that make reviewing changes easier and more sensible.  It has better change grouping, and it can highlight individual changed words. Everyone should set these configuration options to enable better git diffs.

Upgrade your git

Before you can enable the new settings, you have to upgrade your git installation.

If you already have git installed through homebrew, you can upgrade it as follows:

brew update && brew upgrade

If you do not have git installed through homebrew, you’ll want to override your ancient Mac git by installing it as follows:

brew install git

Enable better git diffs

Once you have upgraded your git, you can put the new configuration in place.

The first major change is an improvement to how git groups changes in a diff. When you add a new block of code, it’s now likelier to see the whole block as a change rather than misinterpreting it as an insertion splitting an existing block into two.

Bad change grouping in old git
Enable better git diffs by configuring git to group changes together

The second change is the addition of more places for you to hook in the diff-highlight utility.

diff-highlight post-processes your diffs to add more highlighting to the specific changes between two lines when you just change a few words in a line.

Enable better git diffs by integrating diff-highlight utility to highlight individual word changes

You can enable all of these by running the following commands in your terminal:

git config --global diff.compactionHeuristic 1

git config --global pager.log "`brew --prefix`/share/git-core/contrib/diff-highlight/diff-highlight | less"
git config --global pager.show "`brew --prefix`/share/git-core/contrib/diff-highlight/diff-highlight | less"
git config --global pager.diff "`brew --prefix`/share/git-core/contrib/diff-highlight/diff-highlight | less"

git config --global interactive.diffFilter "`brew --prefix`/share/git-core/contrib/diff-highlight/diff-highlight"

The configuration will persist in a ~/.gitconfig file.

You should now have easier to read and compare git diffs. I certainly appreciate the changes.


656x Faster JSON Parsing in Python with ijson

I profiled my team’s Python code and identified a performance bottleneck in JSON parsing. At two points, the code used the ijson package in a naive way that slowed down terribly for larger JSON files. It’s possible to achieve much faster JSON parsing without changing any code.

Data Scientist Workbench

My team builds Data Scientist Workbench, which is a free set of tools for doing data science.  It includes Jupyter and Zeppelin interactive notebooks as well as R Studio IDE all pre-configured to work with the Spark parallel data processing framework.

Behind the scenes, Data Scientist Workbench is composed of microservices. Some of them are built in Ruby, some in Node, and some in Python.

Faster JSON Parsing

Faster JSON parsing is possible in PythonJSON is a convenient format for serializing data. It originates from a subset of JavaScript Object Notation. Most languages have several libraries for reading and writing JSON.

ijson is a great library for working with JSON files in Python. Unfortunately, by default it uses a pure Python JSON parser as its backend. Much higher performance can be achieved by using a C backend.

These are the available backends:

  • yajl2_cffi: wrapper around YAJL 2.x using CFFI, this is the fastest.
  • yajl2: wrapper around YAJL 2.x using ctypes, for when you can’t use CFFI for some reason.
  • yajl: deprecated YAJL 1.x + ctypes wrapper, for even older systems.
  • python: pure Python parser, good to use with PyPy

Assuming you have yajl2 installed, switching from the slow, pure Python parser to a faster JSON parser written in C is a matter of changing this line:

import ijson

To this line:

import ijson.backends.yajl2 as ijson

All other code is the same.

Installation of yajl2

Before you can use yajl2 as a faster JSON parsing backend for ijson, you have to install it.

On Ubuntu, you can install it as follows:

apt-get -qq update
apt-get -y install libyajl2 libyajl-dev
pip install yajl-py==2.0.2

On the Mac, you can install it as follows:

brew install yajl
pip install yajl-py==2.0.2

Performance Micro-benchmark

Other people have benchmarked ijson before me.

I did see a huge performance improvement with a specific 4MB JSON file (a Jupyter notebook), so it makes sense to measure that specifically.

Here’s the very simple code that I will use to measure the performance of parsing JSON with ijson:


import ijson

# Do this 10 times
for i in range(0, 10):
    print "Starting parse #%i" % (i)
    json = ijson.parse(open('4MB.ipynb', 'r'))
    for prefix, event, value in json:

The result of the first run:

$ time python test.py
Starting parse #0
Starting parse #1
Starting parse #2
Starting parse #3
Starting parse #4
Starting parse #5
Starting parse #6
Starting parse #7
Starting parse #8
Starting parse #9

real    20m52.592s
user    14m37.860s
sys    6m6.768s

After changing to yajl2 as the parser:

$ time python test.py
Starting parse #0
Starting parse #1
Starting parse #2
Starting parse #3
Starting parse #4
Starting parse #5
Starting parse #6
Starting parse #7
Starting parse #8
Starting parse #9

real    0m1.910s
user    0m1.784s
sys    0m0.085s

That’s 656x or 65600% faster!

I should mention that the JSON I’m parsing contains 4MB of escaped JSON represented as a string within actual JSON, so it may be an unusually bad case for the pure Python parser.

Preserve bash history across multiple terminals and sessions

I use the bash command line on my Mac a lot. I typically have multiple tabs with multiple terminal panes open in iTerm2, often with multiple ssh sessions running. By default, the last terminal session to close trashes the bash history of all the other sessions. Is it possible to configure the terminal to preserve bash history?

Terminal windows with multiple panes to preserve bash history

Preserve bash history

It’s actually fairly straightforward to preserve the history.

Open up your ~/.bash_profile configuration file in an editor of your choice such as nano. Once you have it open, add these lines at the end:

# Maximum number of history lines in memory
export HISTSIZE=50000
# Maximum number of history lines on disk
export HISTFILESIZE=50000
# Ignore duplicate lines
export HISTCONTROL=ignoredups:erasedups
# When the shell exits, append to the history file 
#  instead of overwriting it
shopt -s histappend

# After each command, append to the history file 
#  and reread it
export PROMPT_COMMAND="${PROMPT_COMMAND:+$PROMPT_COMMAND$'\n'}history -a; history -c; history -r"

Save the file an exit. In order for your configuration change to take effect, you will need to reload the configuration in all your open terminal sessions:

source ~/.bash_profile

This configuration change has to be done per user per machine.

Backup your bash configuration

I use mackup together with Dropbox to keep my bash and other command line configuration files backed up. This makes it easy to transfer your command line configuration to a new primary machine.


Preserve bash history in iTerm2

iTerm2 is my terminal of choice on the Mac. It has great tab and pane management accessible via both keyboard and mouse, and some subtle quality of life features.

For example, if you ssh somewhere, it sets the tab title to the hostname of the remote machine, or the name of the local directory.


One drastic alternative would be to migrate from Bash to an alternate shell like Fish or, in the future, the Next Generation Shell (NGS).

No meta description has been specified

In the past year, I started checking my blog posts against the Yoast SEO plugin for WordPress. It provides great suggestions for improving readability and quality of writing, as well as making blog posts friendlier for search engines and social network sharing. It also has a cryptic suggestion about “no meta description”. Suggestions including meta description warning

Here is the cryptic suggestion in full:

No meta description has been specified, search engines will display copy from the page instead.

The reason it’s cryptic is that setting the meta description for a post is neither part of WordPress core functionality, nor something Yoast visible exposes in its own extended interface. There are WordPress plugins of questionable provenance for setting post meta descriptions, but installing small WordPress plugins only increases your blog’s attack surface for hackers.

Since I blog infrequently, every time I end up searching for what it is Yoast wants me to do to satisfy and dismiss the warning.

How to set meta description

It turns out that the ability to set a meta description is built into Yoast. It is simply hidden behind the seemingly unrelated “Edit Snippet” button. If you click on it, you can set the meta description and dismiss the stern warning.

Click on Edit Snippet to set the meta description


One of the reasons I started using Yoast SEO is that my colleague Antonio Cangiano has written a blog and a book about Technical Blogging. I read it some time ago on my Kindle and found it a pretty good read.

Book about Technical Blogging, including things like meta description


Datathon For Diabetes in Boston

This weekend Brandon and I are at the Datathon for Diabetes in Boston. It starts tonight at 5 and goes all day Saturday. The goal is to use publicly available data to generate an insightful and innovative analysis of diabetes in United States and abroad.

Datathon for DiabetesFitBit Charge HR prize at Datathon for Diabetes

We’re sponsoring a prize for the team that makes best use of Data Scientist Workbench in their solution. Novo Nordisk and Deloitte are also sponsoring a prize each.

Our prize consists of a FitBit Charge HR for each member of the winning team.
I think it’s worthwhile to learn and apply Spark as a tool to the problem of diabetes. Spark is an open source framework that lets you run your data analysis in parallel on multiple machines for speed and ability to work with large amounts of data.

Data Scientist Workbench has Spark ready to use with Python, Scala, and R in Jupyter, Zeppelin, and R Studio IDE.

If you run into trouble at the datathon, come up and ask me any question you like. I’ll be there for the duration as a mentor. As always, if you run into a Data Scientist Workbench issue, you should also open a support ticket.

Open a Data Scientist Workbench support ticket for any issues at Datathon for Diabetes

Other events

May 11-12 is Datapalooza Beijing and May 19 is Datapalooza Denver. Also, Big Data University is now posting events on its Facebook page.

Datathon for Diabetes, Boston

Ottawa: Data Day 3.0 at Carleton

On Tuesday March 29, I’ll be demoing Data Scientist Workbench (DSWB) at Data Day 3.0 for the Carleton University Institute of Data Science.

I’m in Ottawa the weekend before, so feel free to ping me and connect. I’m on Twitter as @leonsp.

Data Scientist Workbench

DataScientistWorkbench.com hosts open source data science tools for you for free. The tools include Jupyter and Zeppelin notebooks for developing and documenting your algorithms, R Studio IDE for focusing on your R code, and OpenRefine for cleaning your data.

Data Day 3.0Data Day 3.0 takes place at Carleton University

The event  is organized by the Carleton University Institute of Data Science in Ottawa. It runs from 8am to 3:30pm in the River building. You can find more details on their event page.


Spark Summit East 2016

Next week I’ll be demoing Data Scientist Workbench at Spark Summit East (official site) in New York. Polong Lin will be there with me. Come by the expo floor next Wednesday and Thursday and chat with us.

Data Scientist Workbench is what my team builds. It hosts open source data science tools like Jupyter, OpenRefine, R Studio IDE, Zeppelin and others for you. There’s exciting stuff in the changelog every week.

I signed up in time to get into a training session at Spark Summit East, so I’ll be spending my Tuesday working with the Wikipedia data sets. In today’s industry jargon, I’m more of a data engineer than a data scientist, so I’m hoping my Spark skills are up to the level needed for the advanced course.

This week I’m at Datapalooza Seattle, which is a good opportunity to brush up and expand those same Spark skills. In fact, we just posted the Day 1 challenge for Datapalooza. If you’re following along at home, fire up your Data Scientist Workbench, open a Jupyter notebook, and give it a try.

Spark Summit East


Datapalooza Seattle on Feb 9-11

On February 9 through 11, I’ll be mentoring hackers and budding data scientists at Galvanize during Datapalooza Seattle. It should be a great conference covering topics like things like machine learning, natural language processing, and data engineering infrastructure.

Last year’s Datapalooza in San Francisco was a fantastic event with lots of in-depth sessions. I was impressed with the range of material on data science and data engineering. The upcoming Datapalooza Seattle looks equally as fascinating.

My team at work runs  Data Scientist Workbench which is free hosted suite of open source tools including Jupyter, Zeppelin, R Studio IDE, and OpenRefine. We also organize free data science education through Big Data University.

I’m expecting Antonio Cangiano, Polong Lin, and Leon Katsnelson to be at Datapalooza with me as fellow mentors.

Let me know if you’re in Seattle at the same time and we’ll connect.

Datapalooza Seattle