Profiling has revealed that it's not so much the sort that's taking all the time as the lambda function that determines distance from the player's location. Hmmm.

I would not be surprised if the profiler is skewing your results by adding overhead to the function call. I would check how many times that function is actually being called, just to be sure. (And if a function returning dx + dy really is THAT slow, I'd give up and use C at that point.)

I'm finding it very hard to believe that sorting the list of cells to update by distance actually makes sense. It should be O(1) to update one cell, so updating all the cells should be O(n). For large n, sorting the cells by distance is going to be O(n log n), assuming you're moving and therefore the list isn't already mostly sorted, so if that's somehow faster then I think the CA is too slow to update a cell.

On second thought, maybe that result makes sense. Calculating a sort key is going to happen more often than compares or swaps, and it's going to be slower than the other two because it's in Python and not C. (But the profiling probably does intensify that difference.)

And in hindsight it's not really surprising that the sort takes all the time.

The player moves around, and this can upset portions of the sort. If it wasn't for the fact that the "focus" changes position as the player moves around, we could just use a normal queue.

The only thing preventing us from just inserting the cells into the right place in the queue (not a dead-simple operation but certainly faster than a sort) is that the focus, the center of calculation, changes as the player moves around. But maybe we can take advantage of the fact that the player doesn't usually move quickly? If the player moves one space, it might upset the calculation order of some nearby spaces, but the lion's share are far away.

The trouble with simply reducing the frequency of the sort is that the worst case for a frame is still the same, and those worst case frames will still have a long delay. You just end up with a jerky program instead of a slow one.

But here's a thought.

Arbitrarily divide the cells into rectangular blocks. Let's say they're the size of the screen. Instead of a giant sorted list of individual cell co-ordinates that might have changed, store an unsorted list (or set or whatever) for each block. Each frame, recalculate all of the cells that need it in some set of blocks (probably at least the block where the player is and all blocks one or two adjacent/diagonal spaces away, maybe followed by the least-recently-updated blocks if there's time). Or just go by distance like you do now, but using blocks.

(Incidentally, I'm very curious how you can update the cells in any order and skip some updates, and not end up with bad things happening like water spontaneously disappearing or growing.)

On reducing frequency: That's only the case if we sort the entire list each time. We have to do that if we use a Python sort, which block the program, but if we use our own we could just do a bit each frame.

Of course most sorts do less work when the list is already close to sorted, and that order doesn't change much each frame, so it's possible that by the end the overhead of calling the comparison function thousands of times a frame takes longer than the sort itself, so this might not actually be a useful technique.

In fact your very idea, about using blocks instead of individual cells, is one of the better ideas I myself have gotten so far concerning optimizing the sort. GMTA.

When material moves between cells, they always effectively do it in a move, not a copy. Thus locations that update also effectively update adjacent locations.

You're sorting by distance from the player. Are you calculating distance as sqrt(dx*dx+dy*dy)? Because if so, if it's getting millions of calls, you might want to consider simply dx*dx+dy*dy. It's not the correct distance, but it's provably correct for sorting purposes. Getting rid of a thousands of sqrts per frame it pretty much guaranteed to be good.

I would not be surprised if the profiler is skewing your results by adding overhead to the function call. I would check how many times that function is actually being called, just to be sure. (And if a function returning dx + dy really is THAT slow, I'd give up and use C at that point.)

ReplyDeleteI'm finding it very hard to believe that sorting the list of cells to update by distance actually makes sense. It should be O(1) to update one cell, so updating all the cells should be O(n). For large n, sorting the cells by distance is going to be O(n log n), assuming you're moving and therefore the list isn't already mostly sorted, so if that's somehow faster then I think the CA is too slow to update a cell.

On second thought, maybe that result makes sense. Calculating a sort key is going to happen more often than compares or swaps, and it's going to be slower than the other two because it's in Python and not C. (But the profiling probably does intensify that difference.)

ReplyDeleteAnd in hindsight it's not really surprising that the sort takes all the time.

The player moves around, and this can upset portions of the sort. If it wasn't for the fact that the "focus" changes position as the player moves around, we could just use a normal queue.

ReplyDeleteThe only thing preventing us from just inserting the cells into the right place in the queue (not a dead-simple operation but certainly faster than a sort) is that the focus, the center of calculation, changes as the player moves around. But maybe we can take advantage of the fact that the player doesn't usually move quickly? If the player moves one space, it might upset the calculation order of some nearby spaces, but the lion's share are far away.

It's an interesting problem at least.

The trouble with simply reducing the frequency of the sort is that the worst case for a frame is still the same, and those worst case frames will still have a long delay. You just end up with a jerky program instead of a slow one.

ReplyDeleteBut here's a thought.

Arbitrarily divide the cells into rectangular blocks. Let's say they're the size of the screen. Instead of a giant sorted list of individual cell co-ordinates that might have changed, store an unsorted list (or set or whatever) for each block. Each frame, recalculate all of the cells that need it in some set of blocks (probably at least the block where the player is and all blocks one or two adjacent/diagonal spaces away, maybe followed by the least-recently-updated blocks if there's time). Or just go by distance like you do now, but using blocks.

(Incidentally, I'm very curious how you can update the cells in any order and skip some updates, and not end up with bad things happening like water spontaneously disappearing or growing.)

On reducing frequency: That's only the case if we sort the entire list each time. We have to do that if we use a Python sort, which block the program, but if we use our own we could just do a bit each frame.

ReplyDeleteOf course most sorts do less work when the list is already close to sorted, and that order doesn't change much each frame, so it's possible that by the end the overhead of calling the comparison function thousands of times a frame takes longer than the sort itself, so this might not actually be a useful technique.

In fact your very idea, about using blocks instead of individual cells, is one of the better ideas I myself have gotten so far concerning optimizing the sort. GMTA.

When material moves between cells, they always effectively do it in a move, not a copy. Thus locations that update also effectively update adjacent locations.

You're sorting by distance from the player. Are you calculating distance as sqrt(dx*dx+dy*dy)? Because if so, if it's getting millions of calls, you might want to consider simply dx*dx+dy*dy. It's not the correct distance, but it's provably correct for sorting purposes. Getting rid of a thousands of sqrts per frame it pretty much guaranteed to be good.

ReplyDeleteNo, I'm just using the largest of x or y distance. Basically that makes the update region square.

ReplyDelete