COIN DOOR INTERLOCK: Python Explorations

Most of my In Profundis work this week has been in research. Here's what I'm looking into.

There are benefits and drawbacks to using Python as a development language. The biggest drawbacks are performance (although that's not as bad as you'd think) and something called the GIL, or Global Interpreter Lock.

A feature of many dynamic languages (Ruby has one too), the GIL is something that ensures that, even in multithreaded code, only one thread can execute at a time. On a single-core machine this is not so bad since the code is stuck doing this, more or less, anyway, but it means we won't get all the expected performance out of multicore machines that we might expect. (Note: I'm not currently sure if Pygame's rendering respects the GIL.) This is done to simplify the behind-the-scenes details of Python. There are exceptions (especially regarding I/O), but for the most part threading seems like of a convenience feature than something one would turn to to improve speed.

But all is not lost, there are other solutions. One is to use the module multiprocessing, which offers a way around the GIL. Another is to use a version of Python that solves the GIL problem; both Jython and IronPython allow this, although they don't interface to Pygame to my knowledge. And then there's to optimize the cellular loop using something like Numpy and/or Cython.

Other notes:

Discarding Psyco means some hard choices I had to go with in earlier versions, especially concerning Python version, aren't so hard anymore. This opens things up a bit concerning other modules, although I still can't go to Python 3.X because Pygame requires 2.X.

In the last message someone expressed concern that calculating the whole world would make the game unresponsive. This is actually not necessarily the case, as the calculation loop is written in such a way that on a given "frame" it can just calculate part of the world, remembering where it left off to continue later. So, I can calculate one-fifth of the world this frame, get player input, then another fifth the next frame, and so on. It queues cells up in a spiral pattern from around the player's location, and the visible screen is the beginning of each pass of the world, so we don't even have to worry about visible calculation artifacts. Neat, huh? In the future it would be nice to calculate different, non-adjacent areas of the world in different threads to make use of multicore systems -- maybe the multiprocessing module can help with that.

2 comments:

Vincent PovirkMarch 26, 2012 at 2:21 PM
I fear that with multiprocessing you will have problems with the overhead of marshaling the data between processes.

In general, calls into regular C code release the GIL. So if you write code in C and call it from Python, that code can use multiple threads. The trouble is you'd need a lot of thread-safe non-Python C code (which rules out Cython) to see any benefit. If this were my project, there's a good chance I'd end up writing all the game logic in a C library and making a small interface to call from Python using ctypes.

As I understood it, your system of queueing cells worked in such a way that certain parts of the world would systematically fall behind and not catch up when they need to, which is why you'd have problems with liquids building up around the edges. A correct implementation would require as many cells as there are in the world to be calculated every frame, on average.
AmitApril 5, 2012 at 9:39 PM
It might be useful for you to estimate just *how much* additional CPU you need. Twice as much? Ten times? A hundred times? A thousand?

Having an estimate of that will help you decide which direction to go. If you only need twice as much, then threads could help. But if you need ten times as much, going multicore may not be enough. Maybe writing a C (or C++) routine linked into Python would do the trick. If you need a thousand times as much horsepower, you need to change the game design.

Sunday, March 25, 2012

Python Explorations

2 comments: