Sunday, May 22, 2011

In Profundis Progress (5/22)

I'm back baby!

So, here is the thing that's been consuming my development time lately.  Cellular automation for use in real-time games frequently has to be heavily optimized.  When you're cycling though every cell of a hundreds by hundreds array every frame, then even the overhead from those cells in which you just check contents and decide nothing needs to happen is significant.

My current strategy for negating this is to not use a big loop that iterates through every cell.  Instead, I keep a list of every cell that _might_ change this frame.  I calculate those and, any that do change, I add adjacent cells to the list for the next frame.

It's much MUCH faster, but it's worth noting that this is entirely because, for most of the game world, large portions are static between each frame.  Stone is largely inert, so we don't need to worry about it except how other things react to it -- they are the initiating factors in that, so their routines handle the work.  Empty spaces (that don't contain weird gases) are similar.  These two cell types are 90% of the world.  By removing them from consideration, it's worth a huge performance gain.

A huge gain, but it turns out, not huge enough.  I remind you that, before, we kept performance under control by only updating the screen and a short distance outside of it.  Under this system we update the whole world, and factoring in all of the cells that CAN change, it's a bit too much to reliably handle.

There is a way around it.  We sort the update list, using one of the handy-dandy, heavily-optimized Python sort functions, with the sort key being a simple Cartesian distance from the player or cursor.  Then we can just pop off the first few hundred or so every frame, and leave the rest for the next frame.  If the player never goes anywhere near them they won't get handled, but as soon as he does they'll enter consideration.

It's a cheat, but real-time game development is basically a convoluted series of cheats, a set of makeshift concessions to processor speed and memory limits.  Now that we have multicore machines and multi-gigabyte memory spaces this fundamental fact is sometimes obscured, but it doesn't change it.


  1. This does mean though that some events far away may never get handled right ? How do you choose the number of events to pop ?

  2. Under this scheme (which experiments tonight proved to me is by no means final yet), distant events don't ever happen until the player approaches some distance to it.

    However, since the system is designed in such a way that, for the most part, systems settle into a state of minimum energy, and because as things settle they trigger, generally, fewer and fewer events, the more the system runs the further that range goes. And the game does a lot of pre-calculating full world turns before play begins, so the player enters the map with things already settled somewhat.

  3. So, now you have to find the perfect cap for events.

    Also, just an idea, maybe for gas (which must be the thing taking the most checks) maybe have the gas act only every x turns, x being dependant on the gas' volatility or something ?
    I guess that should make things a bit faster... maybe

  4. I forgot to mention last time, the last experiment ran with an event cap of 600, but that might be too much. Also I'm running into a problem that, research suggests, is due to Python's optimizations for lists not being made for the kind of use I'm putting it to. What I might need, the documentation mentions, is something called a "deque."

    Gas is currently broken under this system. I might end up using the old, screen-specific system just to handle gas, or redesign it, or find a way to only hand out gas turns depending on how strong gas pressure is in that area.

  5. It sounds like you need to be sorting your list in the reverse of the order you're currently using, so that you can pop the items you want from the END of the list. Popping an item from a list requires moving every item after it to the previous slot. So popping the first item can be very slow (especially if you do it 600 times), but popping the last item is fast.

    The advantage of a deque is that you can quickly add or remove items on either end, but sorting the deque will be much slower. It doesn't sound you really need insertion/deletion on both ends, and you do need fast sorting.

    If it's not obvious to you which operations on a list (basically an array) and a deque (basically a doubly-linked list) are fast, I suggest reading up on algorithm efficiency and on data structures.

    Just to make sure: You are profiling your code, right?

    You may want to consider trying the PyPy interpreter. It can run multi-threaded code and is often faster than CPython.

  6. I know something about the efficiencies of general data structures, but one thing I don't have a lot of data on are the efficiencies of Python's various implementations. I had sort of assumed that adding to the beginning of Python's lists didn't involve a lot of copying internally, which seems odd, but I suppose it means this way random access is really fast since it doesn't have to walk through data structures. I will take your advice, thanks very much.

    I haven't started profiling in earnest yet, just noting what frame rates are after various experiments. There's a good chance everything will be moved to C anyway and implementing too-specific Python optimizations could end up being wasted effort. One thing I am considering is abandoning using a sort altogether, maybe in favor of using a loop through a search pattern around the center of the screen. That would translate to C a lot better than using specific Python data structures anyway.

    I have considered using PyPy, but I ruled it out in favor of Psyco for some reason that doesn't immediately spring to mind. It might have been related to finding a version compatible with PyGame; PyPy targets 2.7, and I seem to remember that PyGame has been having trouble with that version, although checking the PyGame site now reveals they do have a 2.7 download available.