The bug was that bitIdx was not being taken into account when we realized that there was enough pixel depth to stay within the current byte when reading pixel values.
This class accepts a pixels with up to 8-bit components. With the way PVRTC is laid out, there are many different bit modes that we could run into. This lets us change between any that we'd like to deal with.
Added comments to the imagefile header. There were method declarations in the file that did not actually correspond to methods either. These were removed.
In the previous commit, we simply accomodated for alpha errors when compressing single color partitions. In fact, the issue was a bit more greivous: we weren't computing the proper error term at all! This fixed that function so that we emphasize the error metrics induced by *squaring* the error in each channel and then returning that as a measurement of the acceptability of using a single color compression for that partition.
When the compressor recognized that a shape was a single color, it determines
an optimal encoding for that color. However, only the error in the single
pixel was returned as the error for the overall shape. This caused problems
with modes that do not support alpha and shapes that do have alpha.
When we detect that a partition has a single color in each subset, we can generate almost an exact representation of this value for most compression modes. However, when we were doing this subset matching, we were ignoring the error introduced by modes that had completely opaque representations against data that had transparent pixels. This bug fix essentially includes this error in our "best fit" calculations and makes everything work out for the better.
With the old code, it was possible that we skipped a compression with unlucky
preemption of our threads. I'm not exactly sure why, but that caused deadlock
(livelock?) in some very unfortunate circumstances. This new algorithm should
work regardless of how many threads execute at once and should also prevent
textures in the compression job list from being skipped. This algorithm seems
to be an improvement on low-core count machines (around 4 cores), but it is
slower on high-core count machines (40 cores or more)...
In general, we want to use this algorithm only with self-contained compression
lists. As such, we've added all of the proper synchronization primitives in
the list object itself. That way, different threads that are working on the
same list will be able to communicate. Ideally, this should eliminate the
number of user-space context switches that happen. Whether or not this is
faster than the other synchronization algorithms that we've tried remains
to be seen...