Exploring Ruby and Python interactively

Both Ruby and Python offer great interactive shells, also known as REPLs (Read Eval Print Loops). These are handy for verifying snippets of code. You can invoke Python’s by simply running python or Ruby’s by running irb, jirb (for jRuby), or rails c (for Rails).

Sometimes, however, one can be mystified as to what one can do with an object or module. Lately, I’ve been finding the Ruby API documentation especially frustrating.

Fortunately, both Python and Ruby let you see what’s available. In Python, you can call the dir() function, while Ruby has the handy .methods() method.

Python:

>>> dir("some string")
['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__','__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser', 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

And Ruby:

irb(main):003:0> "some string".methods
=> [:to_java_bytes, :upcase!, :ascii_only?, :lstrip, :upto, :lines, :encoding, :prepend, :scan, :==, :clear, :squeeze!,:chop, :next!, :casecmp, :start_with?, :split, :to_f, :center, :reverse, :sub, :byteslice, :>, :upcase, :next, :strip!,:count, :sub!, :hash, :bytesize, :lstrip!, :to_sym, :<=, :replace, :length, :swapcase, :gsub, :intern, :succ, :capitalize, :each_codepoint, :oct, :delete!, :+, :initialize_copy, :to_java_string, :match, :unpack, :index, :rstrip!, :*, :each_char, :gsub!, :to_s, :empty?, :size, :swapcase!, :ljust, :downcase, :rpartition, :to_str, :getbyte, :sum, :crypt, :partition, :reverse!, :=~, :force_encoding, :each_byte, :tr!, :inspect, :to_c, :rstrip, :succ!, :<, :[]=, :valid_encoding?, :slice!, :slice, :insert, :tr_s!, :unseeded_hash, :squeeze, :dump, :===, :end_with?, :hex, :strip, :capitalize!, :bytes, :setbyte, :chop!, :each_line, :[], :encode, :include?, :chomp!, :<<, :encode!, :chomp, :rindex, :to_i, :<=>, :eql?, :tr_s, :chars, :codepoints, :delete, :chr, :to_r, :rjust, :%, :>=, :concat, :ord, :tr, :downcase!, :between?, :handle_different_imports, :include_class, :java_kind_of?, :java_signature, :methods, :define_singleton_method, :initialize_clone, :freeze, :extend, :nil?, :tainted?, :method, :is_a?, :instance_variable_defined?, :instance_variable_get, :singleton_class, :instance_variable_set, :public_method, :display, :send, :private_methods, :enum_for, :com, :to_java, :public_send, :instance_of?, :taint, :class, :java_annotation, :instance_variables, :!~, :org, :untrust, :protected_methods, :trust, :java_implements, :tap, :frozen?, :initialize_dup, :java, :respond_to?, :java_package, :untaint, :respond_to_missing?, :clone, :java_name, :to_enum, :singleton_methods, :untrusted?, :dup, :kind_of?, :javafx, :java_require, :javax, :public_methods, :instance_exec, :__send__, :instance_eval, :equal?, :object_id, :__id__, :!, :!=]

 

Array access and virtual memory

(This applies to Java and C, but the code is given in Python for readability.)

Is it faster to iterate over multiple separate arrays (tuples) of simple variables?

for i in range(0, n):
	phone = phones[i];
	# ...
	email = emails[i];

Or over a single array of complex variables?

for i in range(0, n):
	phone = people[i].phone;
	# ...
	email = people[i].email;

One array is faster than multiple arrays. This is because an array is stored in a contiguous block of memory. Accessing data in different arrays at the same time can require several different pages to be loaded from virtual memory. Memory access, especially hard drive access, is slow. As your application and data set grows, a significant performance difference may manifest itself.

Arrays from the Second Dimension

When iterating over a multidimensional array with indexes i and j, is it faster to iterate over j inside i?

for i in range(0, n):
	for j in range(0, m):
		cell = cells[i][j];

Or over i inside j?

for j in range(0, m):
	for i in range(0, n):
		cell = cells[i][j];

In Java and C, a multidimensional array[n][m] is stored as contiguous m-block of contiguous n-blocks. Let i be in n and j be in m. For a given i-cell, j-cells will be far apart. For a given j-cell, i-cells will be adjacent. Accessing adjacent values in memory is always faster.

For an array[i][j], putting j in the outer loop and i in the inner loop will significantly reduce potential virtual memory slowdowns.

This is the right way:

for j in range(0, m):
	for i in range(0, n):
		cell = cells[i][j];

The above only makes a difference with large data sets, but I like to cultivate good habits.

Setting up svn with trac

Trac is an excellent web-based wrapper for SVN that adds bug tracking, a wiki, and several handy project management features. I keep setting up new repositories up for all the little projects we cook up in DB2 Technical Marketing, so I thought I’d write up a guide.

Installing Trac, SVN, and dav_svn for Apache2 is left as an exercise for the reader.

Create a new SVN repository:

svnadmin create /var/svn/Project

Create a new Trac environment:

trac-admin /var/trac/Project initenv

Change the owner to Apache so that it can read and write:

cd /var/svn
chown -R www-data Project
cd ../trac
chown -R www-data Project

Navigate to Apache site settings:

cd /etc/apache2/sites-enabled

If you want Trac to support multiple repositories, edit the trac file to look like this:


        ServerAdmin me@somewhere.com
        ServerName mysite.com
        DocumentRoot /usr/share/trac/cgi-bin/
        
                Options Indexes FollowSymLinks MultiViews ExecCGI
                AllowOverride All
                Order allow,deny
                allow from all
        
        Alias /var/trac/chrome/common /usr/share/trac/htdocs
        
                Order allow,deny
                Allow from all
        
        Alias /trac "/usr/share/trac/htdocs"

        
                SetEnv TRAC_ENV_PARENT_DIR "/var/trac"
        
        
                AuthType Basic
                AuthName "Trac"
                AuthUserFile /etc/apache2/trac.passwd
                Require valid-user
        

        DirectoryIndex trac.cgi
        ErrorLog /var/log/apache2/error.trac.log
        CustomLog /var/log/apache2/access.trac.log combined

The above assumes that all the repositories are in /var/trac

Navigate to Apache settings:

cd /etc/apache2/mods-enabled/

Append to dav_svn.conf:


   DAV svn
   SVNPath /var/svn/Project

   AuthType Basic
   AuthName "Subversion Repository"
   AuthUserFile /etc/apache2/dav_svn.passwd

  AuthzSVNAccessFile /etc/apache2/dav_svn.authz

  
    Require valid-user
  


The above lets you check out from http://yoursite/svn/Project

If you like, you can add a new user to dav_svn.psswd:

cd ..
htpasswd2 /etc/apache2/dav_svn.passwd NewUser

Users can then be granted permissions by editing the dav_svn.authz file. Sample file:

[groups]
developers = NewUser, OtherUser
others = ThirdUser

# Restrictions on the entire repository.
[/]
# Anyone can read.
* = r
# Developers can change anything.
@developers = rw

# Other can write here
[/trunk/public/images]
@others = rw

[/trunk/public/stylesheets]
@others = rw

Restart Apache:

killall apache2
apache2

You now have have an Trac/SVN install with SVN at http://yoursite/svn/Project and Trac at http://yoursite/trac.cgi

No implementation defined for org.apache.commons.logging.LogFactory

While writing a DB2 stored procedure that invoked a SOAP/WSDL web service using Apache Axis as part of WSIF, I ran into this doozie:

org.apache.commons.discovery.DiscoveryException:
No implementation defined for org.apache.commons.logging.LogFactory

Ultimately, it’s caused by a too restrictive lib/security/java.policy file that ships with DB2.

Wrong Solution

The standard way to define an implementation is to create the following commons-logging.properties file and place it anywhere in your CLASSPATH (such as the root of a JAR file):

# Default
#org.apache.commons.logging.LogFactory = org.apache.commons.logging.impl.LogFactoryImpl

# SimpleLog
#org.apache.commons.logging.Log = org.apache.commons.logging.impl.SimpleLog 

# JDK 1.4 logger
#org.apache.commons.logging.Log = org.apache.commons.logging.impl.Jdk14Logger

# Avalon Toolkit
#org.apache.commons.logging.Log = org.apache.commons.logging.impl.LogKitLogger

# Log4j (Recommended by Axis)
org.apache.commons.logging.Log = org.apache.commons.logging.impl.Log4JLogger

Alternatively, you can set the org.apache.commons.logging.Log configuration attribute for LogFactory programmatically.

Right Solution

Solution: Running an Axis SOAP client in Domino [or DB2]

My DB2 is installed into C:\Program Files\IBM\SQLLIB

1. Copy all your JARs to C:\Program Files\IBM\SQLLIB\java\jdk\jre\lib\ext
2. Open C:\Program Files\IBM\SQLLIB\java\jdk\jre\lib\security\
3. Open java.policy
4. Add:

permission java.util.PropertyPermission "java.protocol.handler.pkgs", "write";

5. Restart DB2