1. Split compression parameter generation and compression parameter
packing. This gives a good performance boost, since we don't pack every
single time we compress. The error is computed each time, and only the
best parameters are packed.
2. Allow the shape selection function to specify up to ten shapes to
try for compression. We were already doing this kind of hackily where
we allowed both a three and two partition shape. This makes it a little
cleaner and exposes it to the user.
Change RGBACluster to be a class that only really persists once per block.
When we switch shapes and do operations on them, then we really only need
to change which points in the block are accessed. We don't need to do this
very often, so just change the mask whenever we need it. This brings us back
closer to our original performance, but we're still not where we were when
we started refactoring.
We suffered another performance hit. This time it comes from the fact
that we're copying around a lot of data based on what partition we're
choosing. We can get rid of this a tad by only copying the data that we
need once and then using getters/setters that selectively pull from
an array based on our shape index.
Changed the RGBAEndpoints to use the vector/matrix classes in
FasTCBase. This caused a ~20ms performance hit on an 8-core machine
which is likely due to the compiler having difficulty compiling away
some procedure call overheads. Upon profiling, the biggest bottleneck
is still by far the QuantizedError function, so any and all further
optimization should be focused on that.
I'm not completely sure what the best strategy is in this case. Ultimately, it's good
that the format itself carries the block dimensions. It makes a lot of the code somewhat
uglier though, but really the only thing that we're sullying is the succinct ability to
determine what large-scale format it's in (PVRTC vs ASTC instead of 2bpp PVRTC vs 4bpp).
In general, we should really copy the images with the built-in Clone()
function, but then we'd need to manage memory, etc. To avoid that headache,
we can simply just use references.
Most notably, we need to actually fix a bug in MSVC that doesn't know how to properly instantiate
enums in partial template specialization. There are more details outlined here:
http://stackoverflow.com/questions/15466594/why-does-msvc-fail-to-compile-this-template-function
The fix in this commit closes#10
Also in this commit is a hacky way to allow GL defines. Apparently "LoadImage" is defined as a
macro even with WIN32_LEAN_AND_MEAN. This means that we have to #undef the code that includes
it, meaning that we also need to make sure not to actually mix GLDefines.h with any file that needs
to use the functions from Windows.h