Fastdemo isn't very fast
Okay this is something I've been investigating recently and I'm not sure how well I'm going to be able to explain it.
Firstly I assume the reader is aware that Doom can record demos. Player input is recorded to a file, which can be replayed by reading the recorded input from the file and relying on a deterministic random number generator to make the monsters do exactly the same thing they did when the demo was recorded.
A bit of terminology:
- frame, view - we render the frame, or draw the view, just means drawing what the player is looking at.
- tic (the source spells it without a k, so so do I) This means both an update of the game world and the nominal length of time between updates of the game world (1/35 of a second, about 28.5ms) I usually say gametic for an update of the game world, if it needs clarifying.
The reader should also know that in Doom there were two ways to play back a demo:
- -playdemo which would do just that, play back a demo, as if the player was actually playing at the computer, rendering the view and running however many tics it needs to "catch up"
- -timedemo which is a timing/profiling option that repeats rendering the frame, and running exactly one tic - one gametic, one world update - without any of the adaptiveness the game usually uses. This is also referred to internally as singletics.
Boom added a third option.
- -fastdemo plays the demo "as fast as possible".
Okay now I've got through over 200 words of useless preamble I can get onto writing about what I wanted to start with.
Fastdemo's relationship to playdemo and timedemo
Fastdemo works by using TryRunTics, the usual adaptiveness loop - render a frame, see how many tics have passed, run as many gametics as needed to catch up - but with a clock that runs very quickly. So the theory is, you're rendering the frame, then running a lot of gametics. Hence "as fast as possible".
But, here's the thing. How many tics? This is what I stumbled upon the other week. It turns out it's a constant - in both rboom and PrBoom, it's three. Three tics. You render the frame, then run three tics, then go immediately back to rendering the frame.
An astute reader will at this point realise that's basically the timedemo/singletics loop, but with three tics being run instead of one. And indeed typically a fastdemo's framerate is almost three times faster than the framerate of the corresponding timedemo! (It's only almost three because of some overhead somewhere, I never bothered to work out exactly what it was)
Another thing I noticed which prompted this investigation was that PrBoom-Plus ran fastdemos noticably quicker than rboom and PrBoom. I know PrBoom-Plus has been optimised a lot but not that much. Well, it turns out PrBoom-Plus is running exactly seven tics per frame rendered.
The fastdemo clock
Why three? Why seven? To answer this I have to go back to the fastdemo clock. The crazy thing about the fastdemo clock is that what causes it to tick is the act of reading it. It's like some quantum mechanical nightmare - the act of observation changes the thing observed.
Stupid jokes aside, the point is that when you read the clock, it returns one more than the last time you read the clock. And clock time is what you use to determine how many gametics to run to catch up with the renderer.
So, in a sentence: it's running as many gametics per frame rendered, as there are calls to read the clock in each game loop iteration. And guess what, in PrBoom and rboom, there were three calls to I_GetTime (the name of the clock-reading function) per loop.
Indeed, it turns out the speed (frame rate) of a fastdemo is proportional to the speed of a timedemo multiplied by the number of calls to read the current time per game loop, up to a limit (determined by the size of the "backup tics" buffer; if the game gets too far behind on a slow computer, input is discarded. It seems the maximum, which PrBoom-Plus has reached, is seven.)
Incidentally, fastdemos in PrBoom run faster if you use the 'idrate' cheat code to measure the frame rate while a fastdemo is running, because that adds one extra call to I_GetTime per loop.
As fast as possible
Let's return to the question: is fastdemo, as the documentation says, "as fast as possible"? Well, no, else I wouldn't be writing all this rubbish.
Firstly, what is as fast as possible? Recall that a timedemo is drawing the screen then running one gametic in a tight loop. Therefore if you turn off the rendering, all you're doing is running tics in a tight loop. So this, the so-called renderless timedemo is as fast as possible. Of course, you're not doing any drawing, but at least that gives us something to compare to.
Let's pull some numbers out of the air. On my current computer, with my usual game settings, a typical doom2.exe demo (such as the ones in the iwad) runs a timedemo at about 5 times normal speed (~175ms - 5.7ms per frame) Of that, nearly all of it is rendering; the gametic takes a fraction of a millisecond to run (gametics run really fast when you don't have thousands of monsters or masses of ridiculously complex and badly-programmed Boom features to worry about)
Therefore when running a fastdemo you're drawing a frame then running 3 tics (a fraction of a millisecond, three times) Then it goes straight back to rendering.
In conclusion, it's spending the vast majority of its time rendering frames, then running a small, fixed, and very much arbitrary number of gametics for each frame. That's what bothers me - the arbitraryness of the ratio. Why three? Why seven? Why not a hundred? Or a thousand?
Intuitive fastdemo speed
So what should happen? As I said the fastest we can possibly go is a renderless timedemo, running tics in a tight loop without drawing anything at all. But you actually want to draw something so you can see the demo progressing, right? So how often do you stop running tics at a rate of several per millisecond, and pause for an order of magnitude longer, to draw the view?
The answer suggested itself immediately: for however long you spend drawing the view, you should spend the same amount of time running gametics. This makes the number of gametics per frame rather variable, but keeps the two tasks "balanced". It also means a fastdemo should be about half the speed of a renderless timedemo.
So, that's what I ended up rewriting it to do. It runs 30ns7155.lmp (a 32-map Nightmare skill speedrun of Doom 2, getting 100% secrets, in just under 72 minutes) in about 30 seconds, with the number of gametics per frame drawn varying wildly around a hundred or so; a renderless timedemo of same takes about 12 seconds. This is reasonable (there is obviously some more overhead somewhere) and I was satisfied.
If you got this far, congratulations~