Array access and virtual memory

(This applies to Java and C, but the code is given in Python for readability.)

Is it faster to iterate over multiple separate arrays (tuples) of simple variables?

for i in range(0, n):
	phone = phones[i];
	# ...
	email = emails[i];

Or over a single array of complex variables?

for i in range(0, n):
	phone = people[i].phone;
	# ...
	email = people[i].email;

One array is faster than multiple arrays. This is because an array is stored in a contiguous block of memory. Accessing data in different arrays at the same time can require several different pages to be loaded from virtual memory. Memory access, especially hard drive access, is slow. As your application and data set grows, a significant performance difference may manifest itself.

Arrays from the Second Dimension

When iterating over a multidimensional array with indexes i and j, is it faster to iterate over j inside i?

for i in range(0, n):
	for j in range(0, m):
		cell = cells[i][j];

Or over i inside j?

for j in range(0, m):
	for i in range(0, n):
		cell = cells[i][j];

In Java and C, a multidimensional array[n][m] is stored as contiguous m-block of contiguous n-blocks. Let i be in n and j be in m. For a given i-cell, j-cells will be far apart. For a given j-cell, i-cells will be adjacent. Accessing adjacent values in memory is always faster.

For an array[i][j], putting j in the outer loop and i in the inner loop will significantly reduce potential virtual memory slowdowns.

This is the right way:

for j in range(0, m):
	for i in range(0, n):
		cell = cells[i][j];

The above only makes a difference with large data sets, but I like to cultivate good habits.

Published by

Leons Petrazickis

I'm a full-stack developer at IBM Digital Business Group. I do Ruby, Node, Python, Hadoop, Spark, as well as web scale devops with Docker and Terraform. My opinions are my own.

3 thoughts on “Array access and virtual memory”

  1. please put more details so that we could easily understand how to declare multidimensional arrays and does it work.. Thanks

  2. Hi Astarr,

    Try Googling:

    This is how you declare a multidimensional array in Java:
    int[][] a = new int[10][5];

    This is how you do it in C/C++:
    int a[10][5];

    Because of the hacky way C implements arrays, you can also do this and get exactly the same array:
    int a[10 * 5];

    It’s a bit trickier in C/C++ when objects get involved because of manual memory allocation. I leave that as an exercise for the reader.

Comments are closed.