What were the tests that WinG did to evaluate video cards?


Georg Rottensteiner was curious about the weird things that WinG performed on installation to evaluate video cards. "What did it do actually and what for?"

I don't actually know, since I was not involved in the WinG project, but I remember chatting with one of the developers who was working on video card benchmarks.

He says that video card benchmarks are really hard to develop, not just because video cards are complicated, but also because video drivers cheat like a Mississippi riverboat card sharp on a boat full of blind tourists.

He discovered all sorts of crazy shenanigans. Like a video driver which compares the string you ask it to display with the text "The quick brown fox jumps over the lazy dog." If the string matches exactly, then it returns without drawing anything three quarters of the time. The reason: Benchmarks often use that sample string to evaluate text rendering performance. The driver vendors realized that the fastest code is code that doesn't run, so by ignoring three quarters of the "draw this string" requests, they could improve their text rendering performance numbers fourfold.

That was the only one of the sneaky tricks I remember from that conversation. (I didn't realize there was going to be a quiz 17 years later or I'd have taken notes.) Another example of benchmark cheating was a driver which checked if the program name was TUNNEL.EXE and if so, enabled a collection of benchmark-specific optimizations.

Anyway, I suspect that the weird things that the WinG installer did were specifically chosen to be things that no video card driver had figured out a way to cheat, at least at the time he wrote the test. I wouldn't be surprised if fifteen seconds after WinG was released, video driver vendors started studying it to see how they could cheat the WinG benchmark...

Comments (46)
  1. John says:

    Never heard the term "card sharp" before; must be a regional thing.

  2. Motti says:

    @John, it's very common among C-Shark developers.

  3. TarjetaTiberon says:

    @John – "Card Sharp" is the correct term for what most people think of as "card shark".

  4. Boris says:

    Motti, "sharp" and "shark" mean different things.  For the former, see dictionary.reference.com/…/sharp definition 36.

  5. Another solution might be to cut the benchmark scores in half each time a vendor is caught cheating the test.

    They'd at least have to get more creative and invent plausibly deniable heuristics and shortcuts. :)

    [You'd then get all sorts of appeals saying "Hey, we're not cheating. You just found a bug in our renderer. I'm going to sue Microsoft for abusing its monopoly position!" -Raymond]
  6. Bob says:

    WinG had many methods for drawing to the screen, each one named after a item on the Denny's late night menu (Super Bird, Grand Slam, Moons over my hammy , …). The Profile phase tried them all to determine witch method worked fastest, and did not crash, on the current display driver.

  7. Dylan says:

    Even more fun than cheating the score is cheating on a heavy benchmark to hide that your chip overheats at full power.

    en.expreview.com/…/ati-officially-optimize-catalyst-for-furmark-making-it-run-slower

  8. Ñop says:

    Can't you sue them for this? It wouldn't be cheap or fast, but MS does have a buttload of money, and maybe you'd manage to scare them straight.

  9. John says:

    This is why you should the executables of all your games to quack3.exe.

  10. Kzinti says:

    I remember writing specific code paths in a D3D driver (I won't say which company) to optimize tunnel.exe. Ah the memories!

  11. Maurits says:

    it wonders what would induce a blind person to play cards for money…

  12. I remember lots of those programs under the Microsoft Home brand used WinG. My were those products exciting back in the good old days.

  13. Randomise the test. Generate a new EXE name every time. Don't use predefined strings, just generate some random ones.

    [Now you're just playing Walls and Ladders. The driver will just switch to some other detector, like the EXE size in bytes or the linker timestamp or the address of the GUID used to create the surface. And by letting them base their cheat on something easily testable by a non-expert, it's also easier to expose. (If the driver didn't base the cheat on the file TUNNEL.EXE, but rather on, say, the address of the GUID used to create the surface, how would you expose the cheat? "Here, run this alternate version of the benchmark. Trust me, it's identical except for one thing.") -Raymond]
  14. Kzinti says:

    If we didn't have access to strings or executable names, we would actually analyse the usage pattern and enable fast paths. This was in 1996. I can't imagine what they do nowaday.

  15. JM says:

    @Maurits: as long as everyone's cheating, the blind players could get pretty far with marked cards.

  16. Kzinti says:

    Basically anything goes to get higher benchmark scores. As long as the reviewers can't tell the difference, you do what you have to do.

  17. Joshua Ganes says:

    @Kzinti – You write as someone who has pulled these "dirty tricks" to game benchmark tests. I'm curious if you had any reservations about doing it then, or if you had any regrets about it after the fact. If not, how do you justify it?

  18. Kzinti says:

    I didn't have any reservations about this then and still don't. At the time, benchmarking GPUs was a new thing and the benchmarks were poorly written / were not representative of actual games. To give you an idea, some benchmarks were using D3DRM (Retained Mode) if you know what that is… Let's just say it measured your CPU / memory bandwidth more then your GPU.

    Tunnel.exe was a fillrate test IIRC and there was another one that was a triangle throughput test. These ended up being used as benchmarks because the rest wasn't that good (see above). In any case it provided another data point. Optimizing fillrate and throughput are always a good idea. We would also optimize code paths for popular games. We would implement tricks like replacing black (or near black) fog with color modulation and things like that. Of course, that didn't change the image much (if at all). We got away with it because there was no such thing as a conformance test at the time.

    Just to be clear: optimizing for tunnel.exe simply meant that you made sure the triangle setup code path was shortest for tunnel. You also spend lot of time hand-tuning the assembly code for it. Other code paths might end up using the generic "C" path.

    In the end I don't recall us ever doing dishonest things like not rendering some triangles and/or part skipping frames / rendering features. But we would actively get our hand on pre-release versions of important games and benchmarks and made sure we would look as fast as we could.

  19. Kzinti says:

    Here is a specific example using tunnel:

    • You profile what is send to the app and notice that 70% of the triangles are back-facing. So you want a code path for it that rejects them as early as possible.
    • Then you notice that most triangles are only 1 pixel on the screen. So you write a specific test that detect these triangles and instead of setuping a whole triangle, you just copy the texel where it needs to be.

    • Then you notice the fog color used is pure black. It's cheaper to not use the fog hardware (blending) and instead modulate the vertex colors by black. This is done on the CPU.

    • Then you notice tunnel.exe only uses one texture and keeps setting it every frame. You decide to ignore texture change request after the first one.

    • Optimize the crap out of it using hand-written assembly code

    Of course you only do all of the above when "tunnel.exe" is the program calling you.

    That's the kind of work I did.

  20. Ivan K says:

    Even more fun than cheating the score is cheating on a heavy benchmark to hide that your chip overheats at full power

    If this is true, then it must be fun for laptop designers & manufacturers whose goal is to make their products as light and thin as possible, but still be stable under all expected use (including gaming). If the system's thermal specs change based on what's running, and the standard tests under-report real world conditions, then I see blue screens, early failed hardware, and angry users ahead.

  21. Anonymous Coward says:

    As a programmer, I never thought I'd say this, but man do I ever hate programmers. [expletive deleted] Not only will they do anything for money, no matter what, but afterwards they'll get all high an' mighty, rationalising and pretending to be right and defending the indefensible. What gall. I swear, the same kind of people become hitmen if nature gave them a bit more brawn than brain.

    P.S. Whatever fix the people who operate the blog rolled out… it didn't work; comment posting issues persist.

  22. Andrew says:

    @Anonymous Coward

    Your anger is misplaced; it's almost certainly one of those "I bet somebody got a really nice bonus for that feature" situations. The programmer wants to be honest (and wants to be proud to say, yes, we really are the best), but generally marketing wants to inflate the numbers at any expense. Kzinti may have enjoyed the challenge of analyzing the code for TUNNEL.EXE, but he probably did it at the direction of someone above him anyway.

  23. Georg Rottensteiner says:

    Woo, I'm famous :)

    Thanks Raymond for answering that. I assumed it would be doing all kind of strange blits to find the best performance path, but never expected it to trick cheaters.

  24. John Hensley says:

    @Ñop Microsoft can't just sue someone for running sneaky code on its platform. That's not a legal cause of action. It would be thrown out of court and before you know it there'd be some urban legend about Microsoft litigating a video manufacturer into bankruptcy to protect its interests with another manufacturer.

  25. Anonymous Coward says:

    Andrew, you only confirm what I say. In the mean, programmers will do anything for money. The only reason they aren't killers is because they're too wimpy. And they'll put on their best saint voice and tell everyone within earshot that this is a good character trait.

    On the 29th, a bus crashed. Seven people were killed and thirty-eight injured because the chauffeur was sleep-driving. Of course his boss told him to do so, but still the chauffeur is just as guilty as management. Guilt is not an additive property, it needn't total 100%.

    Anyway, I've never been so ashamed in my life. I knew programmers were a bit off, but I didn't realise until now I was a member of a caste of sociopaths.

  26. Joshua Ganes says:

    @JamesJohnston – I have no problem with optimizing for a real-world game. This is in the interests of both the company and their customers. The problem I have is with optimizing for benchmark tests. This artificially inflates the reported benchmark numbers for your product's capabilities without giving any benefit to the customer. Instead, it is misleading and could lead customers to purchasing an inferior product based on incorrect measurements.

  27. Kzinti says:

    I'm not sure why the anger is directed at the programmers doing what they've been told to do. That's how the workplace works: marketing gives requirements to your project manager, your project manager tells you what to do, you do what your manager tells you or you look for another job. This isn't any different then any other type of work.

    Do you think other fields / industries are any more "honest" about what they do / their products? How naive.

  28. Kzinti says:

    Also consider this: modern video drivers compile PS and VS shaders to a native format. You can bet they use every optimization they can based on the current context/state and their hardware capabilities. How is that different that analyzing common code paths and optimizing them? It is not.

  29. Kzinti says:

    @Anonymous Coward: "Andrew, you only confirm what I say. In the mean, programmers will do anything for money"

    I wasn't getting more or less money for doing what I was told to do. I was paid a fixed annual salary. The technical challenges were great and I learned a lot.

    "Anyway, I've never been so ashamed in my life. I knew programmers were a bit off, but I didn't realise until now I was a member of a caste of sociopaths."

    Really? Doing your work makes you a sociopath? Aren't we a bit idealistic here? I guess I should quit programming and start begging for money on the street, else I become a nuisance on society.

  30. Joshua Ganes says:

    @Kzinti – I'm going to take a page from Raymond's book and try out the bad analogy gun.

    Imagine that a car manufacturer wanted to perform well on safety crash tests. They added a little measuring device to detect the body temperature of the person in the driver's seat. If it was too cool (a crash test dummy) it used hydraulics to change the car's suspension and weight distribution to be ideal for the crash test measurements. This feature changes the traction, steering, fuel economy and other statistics about how the car normally functions, but is only ever used for crash tests.

    Would you feel that this is appropriate from the car manufacturer, or is there something questionable about this behavior?

  31. Kzinti says:

    Like you said yourself, bad analogy. We can go back and forth picking up bad analogies to say anything we want.

    Keep in mind that most optimization done as a result of analyzing one benchmark or one game ended up benefiting many games. For example, optimizing the driver for the Quake engine would result in many games running faster/smoother. (I made up this example, Quake was using OpenGL at the time.). Optimization were also done based on the profiling of the top 5 games at a given time.

    The whole thing is just an optimization process: identify the most important code paths and optimize them. In this case it means the top 5 games and the top benchmarks. Why would you spend time optimizing games and apps that aren't important?

    A huge part of my work was also getting back to developers and helping them optimize their D3D usage. I would have course tell them how to get the most of our video cards. I wasn't actively trying to screw other vendors (hell I didn't even know how their hardware worked), but it's quite plausible that I did recommend things that wouldn't work well on other vendors' GPU. Is that wrong? The hardware vendor is a business and it makes money by selling video cards. Not by being idealistic.

    I am a bit surprise to get this type of reaction here on Raymond's blog… If someone values practical matters over idealistic ones, it is Raymond.

  32. @Joshua:  I think it's a bad analogy.  Your change in behavior negatively altered the external, visible functionality of the car – it was not safe any more.  Kzinti's optimizations didn't negatively impact the functionality: the apps still rendered the same output to the screen.  Your analogy seems to better fit vendors who cheat on WHQL, where the driver goes very unstable outside of the WHQL test lab.  That has very negative effects for the user: crashing and incorrect results!

    Kzinti says that he mostly optimized games, which I'm sure is still true today.  If we dump the concept of "just benchmark the hardware, please", and instead think of it as benchmarking the entire ecosystem: hardware PLUS the talents of the driver writers (which is more useful anyway, because games are going to get optimized), then optimizing even the benchmarks makes sense and isn't offensive.  Games will be optimized, so benchmarks should be too.

    That said, some very synthetic benchmarks that measure very specific things shouldn't be optimized to generate an inaccurate benchmark result (but, OK if the benchmark was flawed and optimization was needed to deliver accurate result).  But a big-picture benchmark like 3DMark is fair game for optimizing, in my book.  I don't know about now, but in the past they actually used a real game engine for this benchmark.  The intention of that benchmark is to reflect real-world gaming performance.  So why not optimize it, if real games are being optimized?  Real-world gaming involves optimization, so optimize the benchmarks that are supposed to reflect real-world games, too.

  33. Joshua Ganes says:

    @JamesJohnston – You said, "Kzinti's optimizations didn't negatively impact the functionality: the apps still rendered the same output to the screen"

    If I took the state-of-the-art game from the time of the test and renamed its executable to "Tunnel.exe", would it still work?

  34. @Joshua: Then hope that the optimizer did a better job of fingerprinting the program than just the EXE name.  Which, I understand, was not always the case.  (One nice way to work around this might be to check for the publisher's digital signature/certificate, plus a unique string from the EXE like a version resource.)

    Not that it really matters from a practical standpoint.  We're nitpicking here.  If the game EXE has a relatively unique name, as it usually does, then there won't realistically be problems.  I'm not going to compile MyFancy3DApp to Quake3.exe, for example.  (I've never, ever had a problem with a 3D app not working right because it had the same EXE as some other optimized program.)

  35. Kzinti says:

    @Joshua Ganes: "If I took the state-of-the-art game from the time of the test and renamed its executable to "Tunnel.exe", would it still work?"

    Yes. It would work. Why wouldn't it?

  36. Joshua Ganes says:

    @Kzinti – "Then you notice tunnel.exe only uses one texture and keeps setting it every frame. You decide to ignore texture change request after the first one"

    If the optimization keys off the executable name and the game issues multiple texture change requests, doesn't this change the behavior?

  37. "almost certainly one of those 'I bet somebody got a really nice bonus for that feature' situations"

    "generally marketing wants to inflate the numbers at any expense. Kzinti may have enjoyed the challenge of analyzing the code for TUNNEL.EXE, but he probably did it at the direction of someone above him anyway."

    And why not?  If I had the resources, I would seriously consider it too.  I'm sure he didn't stop with benchmarks, and probably optimized some real-world games, too.  This would have provided real, tangible benefits to their customers who would experience better performance.  As a GPU customer, I would prefer they do this instead of leaving me to suffer with poor performance on a game.

    Ideally, the game developers would do the optimization, but perhaps they didn't due to (1) lack of time, (2) lack of funding, (3) lack of understanding/skills for that particular GPU, (4) not prioritizing performance on a particular GPU, (5) any number of other reasons.  Your competitors might be implementing "cheats" in their drivers to optimize the performance of that game, so why not you?  (Also, from what I understand – a lot of optimization is sometimes GPU specific anyway.  Where should this be handled?  App-specific optimizations in the GPU driver, or GPU-specific optimizations in the app?  I guess it depends on who wants to pay for the development?)  Also, a lot of PC games these days are console ports, where the developers might have focused on console performance first, with PC performance a distant second.  (Both versions of Halo for PC, anyone?)  Let the GPU vendors have at it I say, if the game developer can't be bothered to make it run better on a PC!

    (Certainly with benchmarks, I could see the line could be a little gray – Kzinti noticed it set the same texture every time, so ignored future "set texture" requests.  But maybe the benchmark developer *wanted* to measure this metric?  Good developer communication between GPU companies and benchmark companies is probably needed to establish what optimizations are appropriate and what are not for an application.  I could see that too much optimization might go beyond what optimizations are possible with a game.)

    To me, this is no different from the kind of things Raymond writes every week on his blog, where Windows bends over backwards for compatibility with old apps.  Yes, we all wish the app developers had their heads on straight and cleaned up their act before releasing.  But they didn't, so Windows has to either (1) implement app-specific hack, (2) leave customers who upgrade Windows and have incompatible apps out to hang.

    I buy my GPUs by looking at reviews that examine numerous benchmarks of various games of the type I like to play.  Assuming both major GPU vendors cheat "equally", I assume the benchmarks for what I buy will be representative of what I should expect in the real world relative to other GPUs.  And if someone "cheats"/optimizes my favorite game – great!!  More frames for me!  GPUs and games are too complex for just comparing raw GPU specs, anyway.

  38. steveg says:

    @Kzinti: Sounds like a fantastically interesting job, very jealous!

  39. Kzinti says:

    @Joshua Games "I have no problem with optimizing for a real-world game. This is in the interests of both the company and their customers. The problem I have is with optimizing for benchmark tests."

    What difference to you see between optimizing benchmarks and optimizing real-world games? They are the same thing. Game frame rates *are* benchmarks and are used as such.

    Rest assured I spent a lot more time optimizing games then tunnel.exe. I only used it as an relatively simple example of the optimization work done by driver vendors.

    As JohnJohnston also mentions, every manufacturer does it… If you don't, you are out of the picture. Part of my work was to reverse-engineer the competitors' video drivers.

    Anyone offended by this simply lives in a fairy land with rainbows.

  40. 640k says:

    If a customer is basing a purchase on benchmarks and not real application performance, he/she deserves to be cheated.

    As long as the pixels is rendered to screen correctly, I accept any "cheat". You can compare this with *allowed* optimization a c++ compiler are allowed to do.

  41. Cheong says:

    Indeed. When codes from real game is being used as benchmark subject, it will be optimized too.

    It seems both ATI and Nvidia control panel have option to import application profile that'll improve performance for centain games. So I'll say this is somehow fairly said.

  42. Kzinti says:

    Okay I was just describing high level ideas without getting into details. You can easily figure out that the app has created more then one texture and therefore you *know* you are not dealing with tunnel.exe. In fact, I think that's what the driver did. As soon as a second create texture was created, that optimization was disabled. Looking at the filename of the program calling you was just one piece of information used in the heuristics. Every optimization path could be disabled when we detected it wasn't right to do it.

    Just because you enable a bunch of optimization for certain apps doesn't mean you can get away with crippling other apps. I don't know where you got that idea.

  43. John Hensley says:

    Microsoft has to accept that manufacturers are going to cheat (i.e. act in their self-interest) and  it can try to box them in as much as possible, but this has to happen essentially in the open. You can look up "AARD code" and see what happened when Microsoft went a bit too far trying to prevent reverse engineering of platform tests.

  44. M1EK says:

    D3D drivers were notorious cheats. Everybody did it – in our case (S3 was where I worked at the time), we actually had to cheat to work around our god-awful hardware. (Detect an .exe name at startup, use different settings – this was so actual games would run well enough to be tolerated).

    We (and by this I assume all graphics card companies at the time) were all aware that we'd better not 'cheat' for benchmarks in a way so transparently obvious – so the way this really ended up happening was that the "I didn't find an .exe I recognize" case was the optimized one (that would sometimes result in bad drawing artifacts but be the fastest possible); and the other case (I recognize game X) would result in specifically more correct but possibly slower behavior.

  45. Worf says:

    Most drivers for 3D actually monitor the GPU instruction stream and de-optimize the stream when they detect series of instructions that can cause thermal problems. Problem is the sequences are usually only discovered after a program comes out and starts blowing up cards, then added to drivers to not make them to heat generating.

  46. To be fair, it seems drivers nowadays genuinely "optimise" (in the legitimate) sense) for specific cases: "Ah, this is Quake, so we need more texture buffers and not so many triangles" or whatever, rather than "ah, this is WidgetMark 3000, I'll just skip drawing every second frame". It sounds like Kzinti was on the legitimate side of that dividing line, tuning as opposed to cheating, if tunnel.exe was benefitting from a single-texture optimisation that could apply to other programs doing the same thing. Of course, changing the executable name to a random string is one of the things tools like Sysinternals RootkitRevealer do use, for exactly this reason (graphics driver authors may cut corners occasionally, but of course malware authors have no concern at all for any niceties; you can hardly boycott them for poor quality code…)

    This reminded me of the clever trick in DirectX mentioned here – blogs.msdn.com/…/71307.aspx – ask the driver a "trick" feature question, "do you support frobnicating nonaligned wodzits?" and if the driver says yes, you know it's a liar so don't fall for any lies in future.

Comments are closed.