Monday, May 28, 2012

bubble vs decorated sort

There were some comments on the previous post about using a limited number of bubble sort runs, as opposed to a full sort of the whole list weighted by original position, which was an appealing argument in a Keep-It-Simple-Stupid kind of way.

So I'm running some performance tests comparing limited bubblesorting of lists against a decorated sort weighted by initial position.  Here is the code tested:

# This is run on the data eight times by the testing code
def bubblepass(sortlist):
    skip = False
    for a in xrange(len(sortlist)-1):
        if skip == False:
            if sortlist[a] > sortlist[a+1]:
                temp = sortlist[a]
                sortlist[a] = sortlist[a+1]
                sortlist[a+1] = temp
                skip = True
        else:
            skip = False
    return sortlist

def decsortundec(sortlist):
    templist = zip(sortlist,range(len(sortlist)))
    declist = [(item[0]+(item[1]/4),item[0]) for item in templist]
    declist.sort()
    return [item[1] for item in declist]

In performance tests on lists of 3200 random small integers, the decorated sort runs from 40-60% faster than the bubblesort, despite it creating three entirely new lists along the way.

decsortundec uses list comprehensions, which are a powerful, yet to me somewhat confusing, aspect of the language.  To my understanding they are a way of creating a new list from a source list quickly, and along the way you can modify each item of the original list and/or choose whether to include it or not.  Note that this code runs in Python 2.7; Python 3 list comprehensions create iterators instead of lists, so the return value must be explicitly turned into a list, using list().  (There might be a better way to do this, in Python 2, using functional programming, but my brain hasn't been twisted into thinking in that particular Lispy way yet.)

Notes:
1. bubblepass is not a pure bubble sort, because on our trip through the list we skip positions that have already been swapped from the previous position.  That doesn't matter in a true bubble sort where we're ultimately trying to sort all the items as quickly as possible, but we'd need it here since we specifically don't want particular items to travel far.
2. It is possible that PyPy will optimize the bubblesort better, maybe far better, since it uses fairly elementary Python, while the other uses list comprehensions which are mostly implemented in C by the virtual machine.
3. I'm new to writing list comprehensions (which decsortundec uses twice) so it's possible there's a more efficient way to write this.  Would anyone know of a better way to factor in an item's original index number into its sorting weight?
4.  It might be possible to use some of that list comprehension stuff on the bubble sort, but I don't see how to implement it off the top of my head.

3 comments:

  1. enumerate(sortlist) is equivalent to templist, except that enumerate() returns an iterable instead of a list, and the index is before the item.

    (x for y in z) makes a generator.
    [x for y in z] makes a list.
    This is true in python 2.x and python 3.x. Python 3.x (and maybe 2.7) added similar syntax for making dictionaries and sets.

    You can pass a key function to sort(), but generally every time I use it for something CPU-bound I find that it's better to put the keys in a tuple as you have done.

    It's probably possible to do a bubble sort with list comprehensions, but you'd have to twist them beyond all recognition to make it work.

    It bothers me that the bubble sort and decsortundec are semantically different, and you don't seem very concerned with the semantics.

    ReplyDelete
  2. It's too early to be concerned about their behaviors being a bit different, in tests they're both close enough to what I want.

    My idea is to implement one or the other and see what the consequences is to the simulation. At the moment, I've realized, both solutions are better at simulating falling liquids than my current setup, which is basically to separate tiles into empty and fluid, then handle separately both flow between cells and inside the same cell. It'd be both simpler, and potentially faster, to handle fluids as a bunch of columns of space, treating air as basically the lightest fluid.

    I'm proceeding by programming experimentally, trying some algorithms that do generally what I want, more than designing it all first and going from there. It's an approach that's worked well for me in the past, but it helps that the end state is still in some flux.

    ReplyDelete
  3. Almost a full year later, I know, but I wanted to point out that Python allows swapping variable values without using a temp variable. You simply do

    a, b = b, a

    :)

    ReplyDelete