Black Pixel

We are Black Pixel and we make software for your Mac & your iPhone. Software design & development junkies, we eat Cocoa for breakfast.

The cake is not a lie!

iPhone Vertex Buffer Object Performance

Another performance enhancement we attempted on Plasma was to follow Apple's Techniques for Working with VertexData guidelines, which state

Each time glDrawElements is called, the data is retransmitted to the graphics hardware to be rendered. If the data did not change, those additional copies are unnecessary. To avoid this, your application should store its geometry in a vertex buffer object (VBO). Data stored in a vertex buffer object is owned by OpenGL ES and may* be cached by the hardware or driver to improve performance.

Vertex Buffer Performance Comparison

In order to get a sense of how the VBOs might affect performance, I modified the original code samples provided by Aaftab Munshi, Dan Ginsburg, Dave Shreiner to have an option to use VBOs, and did the same with the ES 1.1 port I'd made as well.

I expected that the ES 2.0 shader code would show some degree of performance boost, because all of the vertex data is defined once, at start up, and permuted on the GPU based on a few uniform attributes that are passed to the shaders at the start of the draw pass.

The results of the tests appear below:

Implementation 3G/ES1.1 3GS/ES1.1 3GS/ES2.0
Client-side arrays 46 fps 56 fps 55 fps
Interlaced VBOs 46 fps 56 fps 55 fps
Improvement None None None

In the case of the ES 1.1 VBO code, I'm calling glBufferData(GL_DYNAMIC_DRAW), as GL_STREAM_DRAW isn't available under 1.1. Given that this is a particle animation, and the vertex data is changing on each drawing pass, one could definitely question the efficacy of using VBOs at all, but these results are also consistent with what we've seen using VBOs on the iPhone for static scene elements as well.

What was very surprising to me is that there was also no benefit seen using VBOs with the ES 2.0 shaders, even using the newer PowerVR SGX graphics processor. Evidently the overhead between pushing the vertex data to the GPU on each pass compared to just accessing the buffered data is negligible. I was also somewhat taken aback by the fact that the ES 2.0 version on the 3GS consistently ran slightly slower than the ES 1.1 version.

Analysis

In retrospect, most of these results actually make sense, given that the iPhone uses a shared memory architecture. The VBOs that are setup in the program are not actually cached directly on the graphics hardware, but are regular blocks of memory, right along side all of your other application data.

As an outsider, it's hard to say for sure what is going on under the hood, but my speculation is that all of these implementations do nothing more than move pointers to the data buffers around instead of block copy operations, whether you are using VBOs or not.

The bottom line

Given the results shown, I'm really unsure exactly why Apple has been banging the vertex buffer drum as stridently as they have, unless they were simply copying the PowerVR technical notes word-for-word. The only real benefit I can see to using VBOs is that it can allow you to use a consistent codebase, if parts of your GL code are also running on platforms that actually cache the buffers on the graphics hardware, or for possible future-proofing for any forthcoming Apple devices in which VBOs actually make a difference.

It's really hard to say if a shared memory implementation is better than having onboard memory for the GPU. Most CPUs enjoy a considerable performance benefit from having a local cache, letting them access cached data directly instead of having to go out over the bus to access it.

-Daniel


*Considering the results of our testing, it might be more appropriate to say "won't" ↩