the problem is that a median of 3 data points is far less accurate in general than an average. here, I’ll RNG(0-100) some numbers for us. two set of 3 and two sets of 11. 50 is the true answer.
12,64,89 (median = 64, average = 55)
8,78,88 (median = 78, average = 58)
88,13,7,67,66,47,0,60,8,79,45 (median = 47, average =43.6)
64,58,77,30,99,41,29,12,20,8,73 (median = 41, average = 46.5)
as you can see, for small data sets, median performs poorly, and can even fail easily below 10. it is very weak to small or “clumpy” datasets, but very good for diffuse, high sample ones. Say we have all the numbers above for a test expected to take 50 or so seconds, but one user accidentally had a priority render running in the background, and it took an hour, or a broken GPU that was severely undervolted.
64,58,77,30,99,41,29,12,20,8,73,88,13,7,67,66,47,0,60,8,79,45,8,78,88,12,64,89,3544 (median = 58, average = 168)
notice how much that 1/28 chance of a bad data sample did to the dataset. It would lead people to decide that the card is 30% as fast as it really should be.
and that’s why we don’t use a median when calculating the cumulative score for only 3 tests. an average would be better, but still would not be indicative of card that, say, handle raster better, or volumes better.
so add them up.
@JohnDow so, TL;DR, we’ve been agreeing with you this whole time that we should make the individual scores more easily available (you can already see and browse and parse them), but disagreeing that samples per minute means anything concrete at all. After all, my $400 lenovo 2011 laptop outperforms my 2017 workstation in erosion simulation, and only erosion simulation.
The scores, despite being a concrete metric, don’t really mean anything beyond a value-based metric with which to compare the result of 2 sets of uncontrolled variables in a small range of common use cases. it would only mean something if you controlled variables and ran it twice on the same system after changing a single component…
and it seems you’ve seen these exact tests. those are not the kinds of tests opendata will give you. you need to have the hardware or a coordinated effort for that.