Man it sounds like you’ve had a rough go. I’m sorry that’s been your experience.
Man it sounds like you’ve had a rough go. I’m sorry that’s been your experience.
Graphics Programmer here.
More likely you would just write data to a buffer (basically an array of whatever element type you want) rather than a render target and then read it back to the cpu. Dx, vulkan, etc. all have APIs to upload / download to / from the GPU quite easily, and CUDA makes it even easier, so a simple compute shader or CUDA kernel that writes to a buffer would make the most sense for general purpose computation like an advent of code problem.
As a graphics programmer in the games industry, it’s always exciting to me when people discover how much fun all this stuff is
Also if you branch on a GPU, the compiler has to reserve enough registers to walk through both branches (handwavey), which means lower occupancy.
Often you have no choice, or removing the branch leaves you with just as much code so it’s irrelevant. But sometimes it matters. If you know that a particular draw call will always use one side of the branch but not the other, a typical optimization is to compile a separate version of the shader that removes the unused branch and saves on registers
Great stuff. I find it really funny that a big feature of modern API’s is that applications place barriers instead of being handled by the driver, and we all tried it for a while and then threw up our hands and decided to write graph systems to automatically place them because it was too hard.