I just pushed up a new branch to my LLVM repo that enables two important LLVM codegen features (machine scheduling and subreg livenes) for SI+ targets, which should improve performance of the radeonsi driver.
The biggest improvement that I’m seeing with this branch is the luxmark luxball OpenCL demo which is about 60% faster on my Bonaire. Other tests I’ve done show 10% – 25% improvements in performance. I haven’t done much OpenGL benchmarking, but I expect these changes will have much bigger impact on the OpenCL benchmarks, so OpenGL improvements may be in the lower end of that range. I still need more benchmark results to know for sure.