Category: Uncategorized

  • Join or union ranges from multiple tabs in Google Sheets

    A spreadsheet is the wrong tool for everything, but it’s also a very available tool. No results came up when I was searching for how to join data sets in different ranges in a Google Sheets or Excel worksheet, so here’s how you do it.

    Suppose you have these three data sets in a Google Sheets worksheet:

    Key
    Aardvark
    Aardwolf
    Aaron
    Sheet 1
    Key
    Aardvark
    Koala
    Zebra
    Sheet 2
    Key
    Aaron
    Abdullah
    Alice
    Sheet 3

    How do you join them and list the unique elements on a different sheet? You want something along the lines of this formula:

    =index(sort(unique({sheet1!A$2:A$4; sheet2!A$2:A$24; sheet3!A$2:A$4; })),row()-1)

    {range1; range2} joins the contents of multiple ranges into a union

    unique eliminates duplicates, sort() is self explanatory, and index(range, i) gets an element

    The output should be something like this:

    Key
    Aardvark
    Aardwolf
    Aaron
    Abdullah
    Alice
    Koala
    Zebra
    Summary
  • Git commands for a clean history

    I recently saw a discussion about how very few people actually use the famous gitflow workflow. This makes sense to me. The situation where one might have multiple simultaneous supported releases is relatively rare in software as written, because most software is written with one customer or use case in mind and that use case will run the latest stable version. Your team is likely not working on Ruby, where you’ll want to release 2.5.5 and 2.6.2 at the same time. As a result, there won’t be any release branches the way Gitflow means it, and there won’t be any long-lived feature branches.

    I want to discuss something much lower level — how to keep your git history clean, which makes meaningful commit messages possible and gives you a changelog for free. That’s what the git workflow on my team is focused on. We accomplish this by amending one or very few commits while we work on a branch and then rebasing for the Pull Request.

    This is different from what I was accustomed to on my previous team, where I would continuously add new commits to a branch and then either merge or squash merge the Pull Request.

    The typical sequence of steps is to create a branch:

    git checkout -b <initials>/<branch-name>

    Create or modify some files, and then indicate your intent to add the new files:

    git add --intent-to-add

    Examine the changes and add them interactively:

    git add --patch

    Initially create a new commit:

    git commit

    Later, amend the existing commit:

    git commit --amend

    Initially, push your branch to the repo:

    git push --set-upstream origin HEAD

    Later, push it again with the amended commits without overwriting other people’s changes:

    git push --force-with-lease

    Finally, Rebase and Merge your pull request for that squeaky clean git history. PRs created this way have the added benefit of being easy to review, so your teammates will get back to you with meaningful feedback faster.

    I like to alias all of these commands for economy of typing:

    git config --global alias.an 'add --intent-to-add'
    git config --global alias.ap 'add --patch'
    git config --global alias.ca 'commit --amend'
    git config --global alias.co checkout
    git config --global alias.pfwl 'push --force-with-lease'

    git config --global alias.pu 'push --set-upstream origin HEAD'
    git config --global alias.st status

  • Contributing to open source at IBM Skills Network in 2019

    In 2019, my team embraced the philosophy that any library or package we create should be open sourced with a permissive license in a public source repo. This is not just the right thing to do — it’s also a more productive approach. It’s the right thing to do because we get to collaborate effectively with the larger development community and benefit from public tools like Github’s neat dependency update bot. It’s more productive because we avoid maintaining a shadow infrastructure of private package repos.

    Some of these libraries are very minor, but the point is not to open source only perfect, polished, popular things. If you only open source perfection, you’ll open source nothing. Perfect is the enemy of good.

    The thing that impresses me the most of the most of our open source projects is a dashboard for monitoring a Gluster storage cluster that some very brilliant people on my team put together.

    We also open sourced a Ruby client for Kubernetes. Its distinguishing feature is being able to update among other things all of Pods, Services, Deployments, Endpoints, and Ingresses.

    Another thing we released publicly was our extensible backup framework, Backwork. We deploy lots of little, inconsequential databases and files that we nevertheless need to be able to back up with one tool rather than memorizing the syntax of MySQL backup tool for one thing, the MongoDB backup tool for another, and so on.

    We use JupyterLab a lot in our Skills Network Labs environment that lets learners learn Python, machine learning, and other technologies. To that end, we’ve put together several plugins.

    I also open sourced an older Ruby on Rails library I put together when I needed to write several REST services in quick succession. It catches common Rails exceptions and transforms them into an appropriate RESTful status code (401, 404, 422, etc.)

    I expect there are a lot more contributions coming in 2020, including several tools for the Open edX ecosystem for online courses.

  • ISOC is selling .org to a private equity firm

    The Internet Society is selling the .org registry to Ethos Capital, a private equity firm. Such .org domains are typically used by non-profits, the open source community, and — for ineffable reasons — my blog.

    The function of a registry is to grant monopolies over specific domain names for a year. They can set a new price for renewal every year — say, instead of $10/year charge $10,000/year. If I were to not re-purchase my domain name when it expires, then all the existing links to my blog would instantly break.

    When Ted Nelson invented hyperlinks in 1965 for the Xanadu hypertext system, they were bidirectional and impossible to break. For practicality, Tim Berners-Lee simplified the hypertext architecture to unidirectional links when adapting it to create the world wide web and the first web browser in the 1989, and broken links are the reality we live with today.

    Consider insulin. The Canadian inventor of insulin, Frederick Banting, sold the patent for a dollar so that it would be freely accessible to everyone in the world. It costs a dollar to produce a dose. It retails for $30 a dose in Canada. It retails for $700 a dose in US. When you monetize someone’s lifeline, you can charge their life savings.

    My blog depends on the stability of its domain name. How much is not breaking all incoming links worth to me?

    Just to be safe, I’ve renewed my domain at the current reasonable price for the next nine years to postpone my decision point. In the meantime, I encourage everyone to sign EFF’s open letter to ISOC to block the sale of .ORG.

  • Hiring Developer Interns for January 2020

    My team is hiring a new cohort of paid developer interns in Toronto (Markham) for January 2020:

    We have a great, friendly, supportive environment. Everyone on the team gets two big monitors and the peripherals to go with them. IBM Canada Lab has free snacks and good coffee.

    Our team operates at scale. Millions of learners use our education portals and containerized, microservice-driven hands-on labs. We develop with Python and Ruby, and we operate with Terraform and Kubernetes. Our course authors teach AI, machine learning, and other exciting technologies. There are a lot more details in the job descriptions above.

    I think these are great opportunities. If you’re in school, I encourage you to apply and to refer your friends.

  • My team at IBM Skills Network

    Some of the out-of-town team members visited in October. We wrapped up the workday at a local escape room here in Ontario. It was good to have everyone in the same room, or perhaps trapped in three separate rooms filled with puzzling situations, each room more inescapable than the last.

    At IBM Skills Network, we run education portals for companies and educational institutions including CognitiveClass.ai. We also create the best rated and most popular data science courses on Coursera and edX, as well as operate a containerized data science and machine learning labs environment.