Blog

  • History meme

    Substantial content is in the pipes, but in the meantime here’s Arve Bersvendsen’s history meme:

    history | awk '{a[$2]++ } END{for(i in a){print a[i] " " i}}'|sort -rn|head
    

    In my cygwin:

    52 python
    44 ssh
    33 exit
    33 cd
    18 ls
    7 java
    5 diff
    4 nano
    2 ping
    1 svn
    

    And on a Linux machine I administrate:

    211 sudo
    92 ls
    80 cd
    34 locate
    14 nano
    6 rm
    6 exit
    6 cp
    3 tar
    3 python2.4
    

    I can survive in vi if pressed, but nano is my text-mode editor of choice. I do serious Linux development in Eclipse, Kate, or Kdevelop.

  • An IDE for TeX

    TeXnicCenter is an excellent IDE for developing TeX documents on Windows. It follows the usual interface conventions and is quite helpful in getting started and debugging. TeX is the standard page layout language for writing mathematical and scientific papers.

    Here are some excellent tutorials for getting started with TeX.

    The IDE requires MikTeX for actually compiling TeX files into PDF, PS, and so on.

  • Unknown root password in SuSE Linux

    After I installed Suse Linux Enterprise Desktop 10, I tried to update it. It prompted me for a root password even though the install hadn’t asked for one — it had only created a regular user. Sudo didn’t help.

    I tried booting in single-user mode, but that also prompted me for the root password.

    Finally, I appended init=/bin/bash to the Linux boot command in GRUB. That booted me in a passwordless command line, letting me run passwd and fix things.

  • Repair table failed. Please run repair table.

    One of the tables of my MediaWiki installation crashed. When I tried to repair it, I got this less-than-helpful error message. So did the mysqlcheck utility when I SSHed to the server. However, the command has extra options that can be used to repair high levels of corruptions, such as when the MYI is missing.

    REPAIR TABLE tablename USE_FRM;
    
  • Jaxer

    John Resig writes very positively about Jaxer. It runs Javascript on the server while serving documents to the client, with seamless communication between JS on the client and JS on the server.

    Jaxer provides:

    1. Full DOM on the server
    2. Shared code between client and server
    3. Database, file, and socket access from JavaScript
    4. Familiar APIs
    5. Integration with PHP, Java, Rails, etc. apps

    In other news, IE8 will use the latest rendering mode by default for documents with the HTML5 doctype:

    
    

    Finally, Good Math has a published a good defense of Google’s MapReduce algorithm.

  • Dean

    Base2 has just come out in beta on Google Code. It’s hosted there and can be included straight off the Google Code server.

    I think it’s a really neat library because it basically fixes all browsers so that the built-in DOM, events, etc work the same way. Instead of providing an API of its own, it makes the existing API work consistently and reliably.

    Base2 Features:
    – A fast implementation of the Selectors API
    – Fixes broken browser implementations of the DOM events module including document.createEvent(), dispatchEvent(), addEventListener(), etc
    – Supports DOMContentLoaded
    – Fixes getAttribute()/hasAttribute()/setAttribute() (Internet Explorer)
    – Implements a few other useful DOM methods like getComputedStyle() and compareDocumentPosition()
    – Supports a variety of browsers including ancient browsers like IE5.0 (Windows and Mac)

    Dean Edwards has also done the excellent Packer Javascript minifier. It has probably the most in-depth support for obscure language features that simpler minifiers tend to mangle.

  • How to enable logging in Python LDAP

    When writing Python scripts which rely on python-ldap and openLDAP, it is often useful to turn on debug messages as follows:

    import ldap;
    
    # enable python-ldap logging
    ldap.set_option(ldap.OPT_DEBUG_LEVEL, 4095)
    
    # enable openLDAP logging
    l = ldap.initialize('ldap://yourserver:port', trace_level=2)
    

    This is also useful when debugging the LDAP Plugin for Trac.

  • ECMAScript 4

    John Resig has posted a whitepaper outlining the new features in ECMAScript4 (aka the Javascript standard), how it differs from ECMAScript3, and the rationale for any incompatibilities.

    Many of the features have already made their way into Opera and Firefox, which is at Javascript 1.7 level. ES3 is equivalent to JS1.3, and ES4 is the basis for forthcoming JS2.

    I look forward to optional strict typing, multiline strings, comprehensions, and generators making their way into browsers. A lot of the new features make Javascript more like Python without losing all the nice things tabout Javascript.

  • Some Facebook network stats

    I’m part of three Facebook networks, and I’ve been keeping track of their size since May of this year.

    Facebook network size stats

    Toronto has gone from 600k people in May to 800k people in September. That’s 32% of the municipality or 16% of the metropolitan area, which is an impressive proportion.

    University of Toronto has been stable at 55k, but there should be a flurry of new users in September when first-year students get their UofT email addresses.

    The population of IBMers on Facebook has actually declined. Conversely, I suspect our population on LinkedIn, a career-oriented networking site, has not.

  • Array access and virtual memory

    (This applies to Java and C, but the code is given in Python for readability.)

    Is it faster to iterate over multiple separate arrays (tuples) of simple variables?

    for i in range(0, n):
    	phone = phones[i];
    	# ...
    	email = emails[i];

    Or over a single array of complex variables?

    for i in range(0, n):
    	phone = people[i].phone;
    	# ...
    	email = people[i].email;

    One array is faster than multiple arrays. This is because an array is stored in a contiguous block of memory. Accessing data in different arrays at the same time can require several different pages to be loaded from virtual memory. Memory access, especially hard drive access, is slow. As your application and data set grows, a significant performance difference may manifest itself.

    Arrays from the Second Dimension

    When iterating over a multidimensional array with indexes i and j, is it faster to iterate over j inside i?

    for i in range(0, n):
    	for j in range(0, m):
    		cell = cells[i][j];

    Or over i inside j?

    for j in range(0, m):
    	for i in range(0, n):
    		cell = cells[i][j];

    In Java and C, a multidimensional array[n][m] is stored as contiguous m-block of contiguous n-blocks. Let i be in n and j be in m. For a given i-cell, j-cells will be far apart. For a given j-cell, i-cells will be adjacent. Accessing adjacent values in memory is always faster.

    For an array[i][j], putting j in the outer loop and i in the inner loop will significantly reduce potential virtual memory slowdowns.

    This is the right way:

    for j in range(0, m):
    	for i in range(0, n):
    		cell = cells[i][j];

    The above only makes a difference with large data sets, but I like to cultivate good habits.