Friday, July 31, 2009

Redlining the processor

Jetblade's written in Python. Python's a wonderful language: very easy to develop in, highly expressive without lacking power. It's not particularly fast, though. And while it doesn't matter so much if map generation takes a minute vs. 30 seconds to run, it does matter if you can only have 12 active creatures running around before your FPS starts to stumble. So this week was spent fixing things up a bit.

One of the great mottos of software development is to never optimize prematurely. Worrying about how fast your code runs when you write it for the first time will have you wasting a lot of time, probably without making any significant gains. You might spend hours optimizing your file load/save routines, only to discover that hey, you only call those once per session anyway (big hint to budding game developers: don't worry about binary file formats! It's not worth the hassle! Just use plaintext). Of course, you don't have carte blanche to just throw together any old code that gets the job done -- you want to keep your Big O at a reasonable level (as a general rule, O(n*log(n)) should be your limit). But you don't need to worry about the constant factors until you know they're causing you difficulty. And then you haul out a profiler and take a look. Don't guess which parts of your code are slow; have the profiler tell you. In Jetblade's case, the profiler is CProfile.

At the beginning of the week, the major time sucker was the Vector2D class, which does some handy vector math but is mainly there because it makes the code so much cleaner in comparison to using tuples and lists for 2D coordinates. It's such a simple class than there weren't any algorithmic problems with it, nor any obvious constant factors to get rid of. So I ditched Python.

The Vector2D module (and Range1D, and Polygon) are now written in Cython instead of Python. Cython is basically Python with static types and a little more rigour. It takes almost-Python code and compiles it into C, which is much faster. Just converting Vector2D over to Cython made mapgen 35% faster. Of course, now anyone who wants to use Jetblade needs to have Cython installed, and Cython in turn needs GCC. I'm not really happy about adding more dependencies, but faced with the clear improvements in speed, it's hard to argue. The alternative (writing the modules manually in C) would still have the GCC dependency but would also make development far more painful.

One odd bit of optimizing was discovering that a lot of time was being spent in, which is a file that, as far as I can tell, doesn't exist on my hard drive. A little work with the profiler revealed that was being called by Python's built-in range function, because I was calling it with floating-point numbers instead of the ints that it expected. And I was doing this for every creature in the game, every frame. Each time, it'd call, which would then decide whether or not to print a warning. Fixing that up got me another noticeable boost.

Another thing to check for is unnecessary calculations. Why recalculate a value if it doesn't change? For example, every Block instance is a fixed bit of terrain. I need to have bounding boxes for them, but those bounding boxes don't change because the block doesn't move. So those bounding boxes can be calculated ahead of time and stored.

Finally, of course, if you can avoid calling code at all, then you won't pay the price for it. For example, if I discover that a creature has run smack into one tile of a wall, then I don't need to check any of the tiles that are in the same column as the tile the creature hit. There's either going to be no collision, or they're going to have exactly the same collision data. Skipping the somewhat-expensive collision detection algorithm even a small amount can give noticeable speedups.

At the end of all of this, Jetblade now can handle upwards of 50-60 creatures at the same time without dipping below 30FPS, which is a reasonable framerate for playing games; not great, but decent. And that's where we're going to leave it for now, since the next step would probably be to farm out collision detection to a third-party physics engine like Box2D or Chipmunk. These pure-C physics engines would be nice and fast, but they'd introduce another dependency and require reworking Jetblade's object model. So, until we find that we need even more speed, that's getting put off.


  1. Are you also putting a range limit on critter sim? After all, if you can get away with it, there's no point doing anything with them if they're off-screen...

  2. I will be, yes. The current plan is to have an active region that's, oh, about six screens long on the diagonal; everything in that region gets simulated, but if they leave that region then they get deleted (and re-created when their spawning site gets close enough to the player again).

    Limiting the active area to immediately around the screen breaks immersion a bit, since creatures that left the screen thirty seconds ago will be right where you left them as soon as you scroll the screen a bit. The proper settings to preserve both performance and immersion will doubtless require some tweaking.

  3. Why not have doors a la metroid or rooms like from castlevania? Too much work to add them to map gen. or is it a goal to NOT have such things?

  4. It's partially too much work at mapgen, partially that I'm worried it would cut off options for making large map features (for example, having a large "room" that is actually composed of multiple closely-laid tunnels), and partially because I want to see how the flow of the game works out if you aren't interrupted by screen transitions on a regular basis. Metroid and Castlevania have rooms in large part because they're heavily resource-constrained (the SNES had a 21MHz processor and 128KB of RAM!) and the rooms provide a clean way to drop old game data and load new game data.