Coding

I tried rendering to a Pbuffer instead of regular surface on the device as proposed in the issue comment, however it did not work for me. Thus for now I will focus on polishing the API, to allow for operator chaining and adding broadcast operators. I also will try unrolling the loops as mentioned in the 2D convolution query.

Update I was able to render to the Pbuffer so now BBB supports 32-bit renedering. See the bottom of this post for details.


I was able to implement the 2D convolution with the unrolled loops with some hurdles along the way. Notably, what worked on my host did not work on BBB so I had to debug the shader code. It turns out that I had to store the sampled texels in variables before using them for calculations. I would like to know why did it help and why it works on some machines and on some I have to make hacks like this 😅. Maybe in time I will know…

I also updated the benchmarking post with the newly added 2D convolution calculation.

Chaining operations API

The API discussed so far is so-called single-shot API where the rendering context is set up for a single operation and once the operation has been performed the data is read back immediately. The other API allows for operation chaining where data is passes through a series of operations and is transferred to the CPU only after it goes through all of them. This aims to minimize the overhead of GPU <=> CPU transfers.

I implemented broadcast operations for adding, subtracting, multiplying and dividing the input array of floats. The API is designed in such a way to allow chaining arbitrary operations, so in future one can use it for very complex tasks, such as Machine or Deep Learning.

In order to be able to process the data without transfers to the CPU, I had to create additional texture for copying data from previous operations. This texture is bound as the input to next operation and the resultant texture is bound to the Framebuffer (as described in the library innards blog post). The data has to be copied from texture to texture using glCopyTexSubImage2D() function, otherwise it is overwritten when the new operation commences.

Rendering to a pixel buffer with 32-bits

I was finally able to render using all 32-bits (ARGB8888) thanks to Omar from Imagination. It turns out that I had to use a pixel buffer and set up some configuration variables. I also changed the /etc/powervr.ini to have DefaultPixelFormat=ARGB8888. This way I am using 32-bit data in the same manner as on my host PC.