A bit of real progress! Profiling revealed that, anomalously, most of the processor time was being taked up by one particular lambda. I've avoided lambdas in the code mostly, partly because I knew the various machine code compilers don't tend to optimize them well. It didn't help at all that this lambda was being called in the coordinate sort function as a key. Profiling revealed that this lambda was being called millions of times over the course of a short run of the platform engine.
I replaced it with a class method, and the result is a good 4-5 fps increase. Further optimization is needed I think, but it's a solid improvement!