Analysing Performance Problems with Xcode Performance Tools, Part 1

This is the first post in a two-part series. Stay tuned for the second part.

One of the projects that we’re most excited about is our ongoing work to bring Night in the Woods to iOS. This has been an enormous amount of technical effort, and has led to a bunch of very cool spin-off projects – for example, Yarn Spinner only exists because we built it for Night in the Woods, and the research we’ve been doing in sprite compression to fit the game into a tiny amount of memory has been a blast.

A screenshot of Night in the Woods.
Night in the Woods.

One of the things we’re doing right now is improving the performance of the game on the device. Just like when you’re building a game for PCs or consoles, you want your game to be running at a high, stable frame rate. For mobile devices, this is a little more complex – you also have to consider the impact that the game has on the phone’s battery, because nobody wants to play a game and then find that their phone’s about to die.

Another important element of performance for mobile games is thermal impact. When the phone’s running a game, it’s making the hardware do a lot of work, and this makes it heat up. Modern systems – that is, anything made in the last 40 years – detect when they’re under thermal stress, and reduce their performance to avoid damage. This means that you might be able to achieve a solid 60 frames per second when you start playing, but that might get laggy in a couple minutes (or seconds!) of play. (Thermal stress is also a consideration for PCs and consoles, but it’s more critical for phones due to the reduced amount of space inside the chassis, the lack of active cooling, and the fact that you’re holding it in your hands.)

Spotting the Problem

The first thing that we noticed was that the reported frames per second in the game was low.

(We use an FPS counter called Graphy to visualise FPS. Graphy’s great – it’s easy to set up, reliable, low-impact, and open source. You should use it!)

The FPS counter was reporting an FPS count of about 20 to 40 frames per second on my iPhone X, depending on the scene. The other thing that was spotted was that the frame rate was inconsistent to boot.

Inconsistent frame rates in the Towne Centre East scene. Note the slightly jerky camera movement.

What was going on? Night in the Woods, despite its gorgeous visual style, doesn’t actually have hugely complex scenes. When diagnosing a problem like this, the first thing to do is to figure out where the bottleneck is – on the CPU, or the GPU.

Finding the Bottleneck

Xcode has a very simple tool for checking which part of the system is under the heaviest load. When running the game from Xcode, you can just click on the Debug navigator, which will show you a summary of the app’s performance.

A screenshot of Xcode's debug view. It shows the CPU at 79%, memory usage at 859MB, energy usage as Very High, disk usage at 80 KB/s, network usage at 2.9MB/s, and a frames per second count of 37.

The CPU usage and energy impact here are pretty high. Not enormously high, considering it’s a game, but still high. As the game itself reports, it’s barely achieving 30FPS frame rate. When we select the FPS element, we get some more detail:

A screenshot of the GPU load level. It shows the tiler's load at 7%, renderer at 86%, device utilisation at 100%, and CPU and GPU both taking 28.4 milliseconds to render.

The FPS report is showing that the device is 100% utilised, and that the CPU and GPU are both taking a full 28 milliseconds to produce the frames. The renderer is under significantly more load than the tiler, which effectively means that most of the work is being done by the fragment shaders, and not by having to deal with a lot of geometry (something that we’d expect, given that the polygon count of Night in the Woods, like most other 2D games, is not terribly high.)

Even though the CPU and GPU times were the same, this doesn’t necessarily mean that they’re under the same degree of load. Switching over to the CPU report showed that the CPU wasn’t really at maximum load (though this can be deceiving, since it may have been the case that a single core was at max load.)

The CPU load indicator, showing 87%.

The analysis that we’d seen so far seemed to suggest that the GPU might be the best place to look at, so we pulled out one of the two biggest and baddest tools in the chest: Instruments.

The application icon for Instruments.
.

Instruments can show a huge amount of data about the performance of an app. Happily, recent versions of the app come with the Game Performance template, which sets up a number of recorders that relate to how a game performs.

The instruments template selector. The "game performance" template is selected.

So, we ran our scene while recording data, and got back a simply enormous amount of data. It’s actually rather pretty.

There’s a lot of charts here, but the key thing to look at is at the bottom of the window, where the GPU state and display information is shown. Let’s zoom in on that.

A close-up view showing performance data from Instruments.

The purple bar at the top shows when the GPU was busy doing something. The bar is entirely filled, which means that there was no point in the run when the GPU was allowed to be idle. That’s bad, because when the GPU is active, it’s pulling energy out of the battery, and heating up the phone. We already knew this, because Xcode’s report was showing us that the device utilisation was 100%, but it’s nice to see this in a little more detail.

Where we start to see more useful information is in the ‘Display’ instrument. The Display instrument shows four charts:

  1. The current frame shown on the screen
  2. When a new frame was delivered to the screen, ready to be shown
  3. When the screen vsync’ed (that is, when it attempted to swap to the next available frame)
  4. Any frame stutters that Instruments found. There aren’t any in this screenshot, but that doesn’t mean that the display isn’t stuttering.

Take a close look at the top row of the Displays instrument. It’s showing the duration of each frame on screen, and they’re not all the same. Some are 33 milliseconds, some are 16. That’s what’s causing the janky appearance.

What’s interesting about this is that the frames are taking the same amount of time to be produced! We know this because the second row, labelled ‘scaler’, shows that each frame is being submitted to the GPU after about the same amount of time. The reason why some frames list for 33 milliseconds and some last for 16 is due to the fixed rate at which the display is swapping to the next available frame.

Understanding Frame Judder

To figure this out, let’s look at the timing information for four frames.

A close-in screenshot of Instruments, showing the timing of four frames, coloured blue, green, orange and blue again.

Here’s what’s happening at each of these four steps.

  1. The blue frame is on screen. The next frame, shown in green, is submitted to the display.
  2. The display hits its next vsync point. The green frame is ready, so it’s shown to the user.
  3. The orange frame is submitted to the screen, but it’s just too late for vsync. The screen therefore has no new frame to show, so it keeps showing the green frame for another sync interval. The orange frame is eventually shown at the next vsync.
  4. Meanwhile, the blue frame was being drawn, and it’s ready to go right away. It’s submitted to the display, and drawn right after that. The orange frame was only on screen for a single sync interval.

The problem here is not that the frames aren’t taking too long to draw. The problem is that the game is trying to send them to the screen at too high a rate. Ironically, in order to make the game smoother, we need to reduce the frame rate.

This was a one line change:

Application.targetFrameRate = 30;

The result was this:

A screenshot of Instruments, showing frame timing data. The timing of the frames is more consistent.

There are two things to note here.

First, the purple area now has gaps in it, indicating periods of time when the GPU was not active. This is a good thing! Less energy being drawn from the battery, and less heat.

Secondly, all of the frames are on screen for a consistent amount of time. They’re never missing a vsync, because the game isn’t trying to get them onto the screen as fast as it can. The game can now focus on operating at a consistent 30 frames per second.

The result is lower power consumption, and a smoother gameplay experience.

We’re Not Done Yet

But this was only part of the problem. As I mentioned earlier, the scenes in Night in the Woods are not incredibly complex; why was it taking so long to produce the frames? To solve that question, we needed to take a much closer look at the internals of how the GPU was drawing each frame.

We’ll look at how we achieved even better results in the next post. In the meantime, I’ll tease you with this:

A screenshot of Instruments, showing significantly less GPU usage.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s