Here are some tips for filing a good bug report:
Step 1: Is it a vertex or a fragment shader?
If running the program with RADEON_NO_TCL=1 fixes the problem then it is probably a vertex shader that is broken, if it doesn’t then it is probably a bad fragment shader.
Step 2: Does running with RADEON_DEBUG=noopt help?
If it does than the bug is in one of the optimization passes. If it doesn’t help then your bug is in the main part of the compiler.
Step 3: Collect debug output.
There are 3 debug options that are useful collecting debug output: FP,VP,PSTAT. FP dumps debug output for fragment shaders, VP dumps debug output for vertex shaders, and PSTAT dumps statistics for each compiled shader. Here are the debug logs you should attach to a bug:
|Does noopt fix it?
I spent the last 3 weeks working on adding presubtract support to the r300 compiler. It turned out to be quite an undertaking. I had to make some major changes to some core parts of the compiler to get it working. I think it is pretty stable at the moment, but I would like to refactor some of the code and add support for the add and subtract operations before I merge it in to master. I would really like to create a sort of optimization framework that makes writing new optimizations a lot easier, so that will be part of what I do when I add the remaining presubtract operations. I’ll probably pick this up again after my GSoC project is finished. For now, I am going to focus on loops again. Right now, I am working on handling breaks and continues for r500 fragment shaders. Once I get that working, I’ll see what I can do about loops in Vertex shaders.
The last week I’ve been trying to get presubtract operations working for the r300 compiler. Presubtract operations are basically “free” instructions that modify source values before the are sent to the ALU. The four presubtract operations for r300 cards are (1 – src0), (src1 + src0), (src1 – src0), and (1 – 2 * src0). At this point the compiler only uses (1 – src0), but now that I have one working adding the others shouldn’t be too hard. I had to make some major changes to the compiler to get this working, so I am going to let it sit in its own branch (presub branch at http://cgit.freedesktop.org/~tstellar/mesa/) and test it out for a while before I merge it into the the master branch.
I just pushed commit 3724a2e65f5b3aa6e123889342a3e9c4d05903f5 to the mesa master branch that fixes this bug. I filed this bug 8 month ago as a user without knowing anything about mesa or the r300 driver, and today I fixed it! How cool is that?
A few weeks ago I began working on using the hardware loop capabilities for fragment shaders on R500 cards. My original plan was to use the specialized loop instructions provided by the graphics card, but as it turned out, the documentation for these instructions was a little confusing (or so I thought), and I could never get them to work the way I wanted. So, instead I ended up using JUMP instructions to execute loops the same way you would if you were generating code for a CPU. This is an OK solution, but it makes it very difficult to generate code for loops that have continue or break statements.
After taking a few days off from loops, I decided to give the specialized loop instructions another try. I went back and reviewed the documentation and still it did not make sense to me, so I decided to ask Alex Deucher, who works at AMD, for some clarification on the documentation. As it turns out the documentation was fine, Alex pointed out a short but very important part of the documentation that I had over-looked. I’ve probably read the documentation one hundred times, but I always missed that one crucial part!!! Thanks, Alex.
I will start working on hardware loop instructions again soon, but first I am going to take a little detour to fix a bug in the compiler’s instruction scheduler that is preventing me from playing civ4 and causing problems with Compiz for some people.
I have been making good progress on implementing loop emulation for the r300 compiler. I just published a branch containing loop emulation code here: http://cgit.freedesktop.org/~tstellar/mesa/
Loops like for(i=0; i<10; i++), the compiler is able to figure out how many iterations the loop will have and then unroll it that many times. It can't handle every possible loop, but I think I have the most common ones covered. For the rest of the loops that don't have a known number of iterations at compile time, the compiler will just unroll the loop until it hits the maximum instruction limit.
I have started working on my Google Summer of Code project, which is to improve the GLSL compiler for the open source r300 driver. Right now, I am working on emulating loops in the compiler backend for cards that don’ t have hardware looping instructions.