FasTC

mirror of https://github.com/yuzu-emu/FasTC.git synced 2024-11-24 06:15:37 +01:00

Author	SHA1	Message	Date
Pavel Krajcevski	9144db4de6	Actually pass block coordinates to shape selection function	2014-03-22 19:25:21 -04:00
Pavel Krajcevski	891e2cfee8	Formatting	2014-03-22 19:24:51 -04:00
Pavel Krajcevski	9f259744de	Get rid of comment	2014-03-21 20:36:54 -04:00
Pavel Krajcevski	e936cce0cb	More refactoring. Change RGBACluster to be a class that only really persists once per block. When we switch shapes and do operations on them, then we really only need to change which points in the block are accessed. We don't need to do this very often, so just change the mask whenever we need it. This brings us back closer to our original performance, but we're still not where we were when we started refactoring.	2014-03-21 20:27:00 -04:00
Pavel Krajcevski	cf937f2ad3	Refactor shape and mode selection We suffered another performance hit. This time it comes from the fact that we're copying around a lot of data based on what partition we're choosing. We can get rid of this a tad by only copying the data that we need once and then using getters/setters that selectively pull from an array based on our shape index.	2014-03-21 18:02:02 -04:00
Pavel Krajcevski	26e816b3db	Add settings for BPTC compression	2014-03-21 12:45:47 -04:00
Pavel Krajcevski	6954d7b154	Refactor RGBAEndpoints Changed the RGBAEndpoints to use the vector/matrix classes in FasTCBase. This caused a ~20ms performance hit on an 8-core machine which is likely due to the compiler having difficulty compiling away some procedure call overheads. Upon profiling, the biggest bottleneck is still by far the QuantizedError function, so any and all further optimization should be focused on that.	2014-03-21 01:21:07 -04:00
Pavel Krajcevski	e06f60c536	Fix some compiler warnings.	2014-03-21 01:14:36 -04:00
Pavel Krajcevski	c6948e8421	Merge branch 'master' into ModularizeBPTC	2014-02-27 14:20:50 -05:00
Pavel Krajcevski	1a5b748b2c	Check for C++11 types in base library	2014-01-30 13:55:55 -05:00
Pavel Krajcevski	c37dca1068	Split calculation of compression parameters from packing them.	2014-01-21 16:23:18 -05:00
Pavel Krajcevski	ea953979fe	Move bitstream to FasTC base lib	2014-01-21 15:04:39 -05:00
Pavel Krajcevski	f12ee09f7e	Some formatting and rearrange the BPTC code to be more structured like the others	2014-01-21 14:46:25 -05:00
Pavel Krajcevski	3734d643a6	Fix some compiler warnings on MSVC	2013-12-02 12:52:44 -05:00
Pavel Krajcevski	7359f9e758	Some compilers treat hex literals as unsigned, which causes problems	2013-11-19 14:54:59 -05:00
Pavel Krajcevski	6794a0fffb	Add hooks to NVTT bc7_export library if present on the users machine. Assumes that all of the cross platform problems are fixed for incorporation into FasTC... Otherwise the options to use NVTT are ignored.	2013-11-19 12:03:03 -05:00
Pavel Krajcevski	a80944901e	Refactor CompressionJob struct. In order to better facilitate the change from block stream order to non-block stream order, a lot of changes were introduced to the way that we feed texture data to the compressors. This data is embodied in the CompressionJob struct. We have made it so that the compression job points to both the in and out pointers for our compressed and uncompressed data. Furthermore, we have made sure that the struct also contains the format that its compressing for, so that if any threading programs would like to chop up a compression job into smaller chunks based on the format, it doesn't need to know the format explicitly, it just needs to know certain properties about the format. Moreover, the user can now define the start and end pixels from which we would like to compress to. We can compress subsets of data by changing the in and out pointers and the width and height values. The compressors will read data linearly until they reach the out pixels based on the width of the given pixel.	2013-11-08 16:31:19 -05:00
Pavel Krajcevski	f70b26a47f	Change interface of compression/decompression jobs.	2013-11-06 18:55:53 -05:00
Pavel Krajcevski	8e76d149ba	Remove a bunch of code that assumes that we get our pixel data in block stream order...	2013-11-06 18:23:19 -05:00
Pavel Krajcevski	289bcc9d44	Make the block index for the stat function the pointer reinterpreted as an integer. This way we know exactly what block it is because we simply need to sort the stats in the output log.	2013-09-28 22:39:27 -04:00
Pavel Krajcevski	baab69dc99	Fix some MSVC compiler snafus	2013-09-28 22:21:31 -04:00
Pavel Krajcevski	f1924bd221	Try to send a single string that encompasses a stat to the stream so that when we do synchronization it will crunch the entire string at once.	2013-09-28 21:43:25 -04:00
Pavel Krajcevski	dcf389d346	Merge PVRTC compressor into split library.	2013-09-27 17:30:16 -04:00
Pavel Krajcevski	e0ec005ac8	Fix link problems	2013-09-18 14:00:53 -04:00
Pavel Krajcevski	29bd1368e6	Fix a few compiler warnings and add the BPTCEncoder license.	2013-09-15 14:56:09 -04:00
Pavel Krajcevski	28cf254fe5	Initial decoupling of base library from core library. Includes a few formatting changes as well.	2013-09-13 19:36:37 -04:00
Pavel Krajcevski	9fe7a08422	Fix a bunch of errors incurred from refactoring.	2013-08-27 14:39:31 -04:00
Pavel Krajcevski	03a7934644	Get rid of evil tabs once and forever (from cpp/h files)	2013-08-26 16:54:08 -04:00
Pavel Krajcevski	0304bd4187	Refactor a bunch of things to renforce a bunch of style rules.	2013-08-26 16:11:39 -04:00
Pavel Krajcevski	25eba39870	Change the name of everything to FasTC	2013-08-22 18:35:01 -04:00
Pavel Krajcevski	f1f1294b2e	Add tab formatting.	2013-08-22 18:33:42 -04:00
Pavel Krajcevski	921c3e9f16	Add comments to BC7CompressionMode.h	2013-08-22 18:33:41 -04:00
Pavel Krajcevski	b072d10b6c	Multiple single pixel error by number of pixels in the partition	2013-04-08 17:03:14 -04:00
Pavel Krajcevski	d23125e14c	Another bug fix. In the previous commit, we simply accomodated for alpha errors when compressing single color partitions. In fact, the issue was a bit more greivous: we weren't computing the proper error term at all! This fixed that function so that we emphasize the error metrics induced by squaring the error in each channel and then returning that as a measurement of the acceptability of using a single color compression for that partition.	2013-04-08 16:44:15 -04:00
Pavel Krajcevski	ff18e8f33e	Bug fix When the compressor recognized that a shape was a single color, it determines an optimal encoding for that color. However, only the error in the single pixel was returned as the error for the overall shape. This caused problems with modes that do not support alpha and shapes that do have alpha.	2013-03-30 11:16:32 -04:00
Pavel Krajcevski	f825b28051	Single color partition with alpha bugfix. When we detect that a partition has a single color in each subset, we can generate almost an exact representation of this value for most compression modes. However, when we were doing this subset matching, we were ignoring the error introduced by modes that had completely opaque representations against data that had transparent pixels. This bug fix essentially includes this error in our "best fit" calculations and makes everything work out for the better.	2013-03-19 11:58:21 -04:00
Pavel Krajcevski	6f6ca2d867	Another bug fix. With the old code, it was possible that we skipped a compression with unlucky preemption of our threads. I'm not exactly sure why, but that caused deadlock (livelock?) in some very unfortunate circumstances. This new algorithm should work regardless of how many threads execute at once and should also prevent textures in the compression job list from being skipped. This algorithm seems to be an improvement on low-core count machines (around 4 cores), but it is slower on high-core count machines (40 cores or more)...	2013-03-11 16:20:52 -04:00
Pavel Krajcevski	9c48aaa7f2	Remove unused ResetTestAndSet function	2013-03-11 15:10:15 -04:00
Pavel Krajcevski	da44e58160	Actual bug fix	2013-03-11 15:08:44 -04:00
Pavel Krajcevski	cd17ddaa0b	Add check for Clang.	2013-03-11 14:51:32 -04:00
Pavel Krajcevski	fa56d37080	Fix a few bugs in our atomic compression algorithm	2013-03-11 14:41:25 -04:00
Pavel Krajcevski	ae2324153d	Repurpose the rest of our scaffolding to use Compression Jobs	2013-03-09 13:36:39 -05:00
Pavel Krajcevski	435f935de3	Update atomics compression algorithm In general, we want to use this algorithm only with self-contained compression lists. As such, we've added all of the proper synchronization primitives in the list object itself. That way, different threads that are working on the same list will be able to communicate. Ideally, this should eliminate the number of user-space context switches that happen. Whether or not this is faster than the other synchronization algorithms that we've tried remains to be seen...	2013-03-09 13:34:10 -05:00
Pavel Krajcevski	1aa62003b9	Apparently rand() returns zero too. Avoid that.	2013-03-07 02:43:08 -05:00
Pavel Krajcevski	42e75a5e4c	Fix debug image comparison to make sure that the difference in our images takes into account alpha.	2013-03-07 02:35:40 -05:00
Pavel Krajcevski	3d1d1e359f	Actually, it turns out the min/max thing was an MSVC issue.	2013-03-06 20:57:05 -05:00
Pavel Krajcevski	599ded49d1	Remove global scope min/max	2013-03-06 20:38:00 -05:00
Pavel Krajcevski	bacf327246	Fix MSVC compiler errors with the atomics	2013-03-06 19:57:20 -05:00
Pavel Krajcevski	342614a6ec	Fix the horribly wrong check for atomic support with MSVC	2013-03-06 19:56:38 -05:00
Pavel Krajcevski	53fe825e49	Add first pass of atomic implementation. This is a first pass of what I believe to be a not too terrible implementation of a cooperative thread-based compressor. The idea is simple... If a compressor is invoked with the same parameters on multiple threads, then the threads cooperate via an atomic counter to compress the texture. Each thread can take as long as possible until the texture is finished. If a caller calls a compression routine that has different parameters, then it will help the current compression finish before starting on its own compression. In this way, we can split the textures up among the threads and guarantee that we maximize the resource usage between them. I.e. this becomes more efficient: Thread 1: Thread 2: Thread N: tex0 texN tex(N-1)N tex1 texN+1 tex(N-1)(N+1) .. .. .. texN-1 tex2N tex(N-1)N I have not tested this for bugs, so I'm still not completely convinced that it is deadlock-free although it should be...	2013-03-06 18:47:15 -05:00

1 2

98 Commits