Solving a customer problem (by Calvinh)

VFP hosting ActiveX control made with Visual Studio 2003 on Win98

 

The other day, one of our support engineers, Brad Peterson, came into my office and described a problem to me and asked for my help.

 

A customer has an ActiveX control that had been working fine with VFP 6 for years. They upgraded to Visual Studio 2003, rebuilt their control, and all of a sudden, the control stops working on Win98. Visual FoxPro 6 gives an error message “80004005: unspecified error”

 

I can just imagine the customer’s scenario. Upgrade to the latest and greatest. Test everything. It all works ((they don’t use Win98, but their customers do). Deploy their upgrade to their customers. Bingo, it fails for some of them.

 

I ask Brad a few questions:

            Does this problem occur only on Win98 ? Yes

            What happens on WinXP ? Works fine

            What happens with Visual Studio 7.0 ? Works fine

            Is there *any* way we can avoid trying on Win98 ? No <sigh>

 

 

I groan because I know that the only way to figure out the problem is to debug 6 year old code on a 6 year old OS, and I’d probably need to use recent tools that would have a hard time in a 6 year old environment. Some may think that 6 years isn’t very long, but I have a 7 year old daughter Wendy. 6 years is a *very* long time<g>.

 

At home, I have a Win98 machine that isn’t even on my home wireless network because it’s so old. It’s just dedicated to playing and recording my piano. I haven’t installed new software or changed configuration on this machine in several years. No software patches needed: it’s not on the network, so nobody can attack it.

 

The piano is a Steinway with a PianoDisc and a USB interface to Win98 running Cakewalk It shows the notes highlighted on sheet music on the screen as the piano is played by me or the PianoDisc. I can also download MIDI music from the web (and transfer it via floppy disc. Remember, it’s not on my network. Good thing downloaded MIDI files are *much* smaller than WAV or MP3 files (30 Scott Joplin rags zip to 128K. That’ll fit on one of my original IBM PC 1981 Floppy disks). I can compose (if you can call it that<g>) on the keyboard of the computer or the piano. I shudder to think what musical geniuses like Mozart or Beethoven could have done with such a tool. I record my daughter’s piano playing and can show it on sheet music.

 

Of course, I don’t have a Win98 machine sitting around at my office, but Brad gladly offered to let me have access to a Virtual PC on his main machine, which of course, is on a different campus. More reason to groan: I have to resort to using remote access tools to another machine which hosts a virtual machine hosting Win98. I could just imagine hitting the Single Step key in the debugger and going to the bathroom while waiting for it to step<sigh>. Good thing the drinks are free at Microsoft<g>.

 

Sure enough, there were connection problems. Using Remote Assistance is really great, until you accidentally hit the <esc> key, which immediately disconnects you from controlling the machine. Perhaps a better design would have been to bring up a disconnection confirmation MessageBox or to let the user configure the disconnect key.

 

Of course, I use the keyboard way more than the mouse because a keyboard is a higher bandwidth input device: I cringe when I see people typing with the keyboard, then taking their eyes off the screen, locating the mouse, playing the eye-hand coordinated video game just to click OK when just hitting the Enter key would do. .Using the keyboard, I don’t have to take my eyes off the screen or play that eye-hand game.

 

Speaking of pianos and bandwidth, what human activity requires the highest information output bandwidth ?

  • Playing a Rachmaninoff Prelude on a piano comes to mind. (Think of the brain orchestrating<g> all those messages to those fingers.)
  • Playing ice hockey? (Think of all that motor coordination. Lots of visual input processing too (which one of my teammates is open for a pass?))
  • How about programming a computer ? (nah.. there’s no time pressure<g>)
  • Solving crossword (or other kinds of) puzzles against a clock ? almost all input, little output.
  • Walking and chewing gum at the same time? That’s a lot of brain messages for President Ford<g>
  • Top Gun pilot: lots of input, not that much output compared to the pianist.

 

I’ve done all these things (the Top Gun thing was at a 1995 DevCon Speaker dinner where we actually got to sit in F14 Tomcat planes at Miramar, although we didn’t actually fly<g>)

 

You might think that I’d use Remote Desktop which I use quite often, instead of Remote Assistance

 

In fact, my web server at home is just a 5 year old Dell laptop running WinXP sitting at the very top of a 7 foot tall bookcase, along with a wireless hub and a UPS (in case of power failure) and my DSL. I haven’t looked at that machine’s screen in months (I’m not tall enough<g>): I just use remote desktop to access it.

 

Brad had suggested that I use Remote Desktop, but I said perhaps Remote Assistance would be better because then he could see what I was doing to his main machine and perhaps help out while we were connected by that old technology, the telephone.

 

Anyway, Brad had the Win98 machine configured as a Virtual PC, which is a pretty cool idea: it’s just a bunch of disk space on your machine that can be used as another machine. However, there are a lot of problems getting a Win98 Virtual machine working so that it could be seen on the network. Just imagine all the security patches that are mandatory before connecting to the network. I’m glad Brad took care of configuring that.

 

Using the Virtual PC was pretty easy until I needed to type a backslash (“\”), which is used quite often when using the keyboard and navigating around the various directories. Each time I typed a “\” it came out as a “#”. Apparently it has something to do with the keyboard layout, font, or something, but we never did figure out how to get a real “\”. So I see the “#” and think, that’s not what I want, so I hit <esc> and boom I’m disconnected! Ai yi yi….Brad, can you please grant me access to controlling your main machine yet again? Thanks.

 

Brad had already set up VS 2003 remote debugging on the virtual PC, so we could run VS2003 debugger on the host machine. (Thanks Brad!) Remote debugging works pretty well, although you’re at the mercy of the network speeds.

 

It turns out that the problem could be reproduced without the customer’s source code: just an empty control would work. I started creating an empty control (Very simple in Visual Studio: just choose File->New->Project, choose C++, choose the kind of project, give it a name, choose all the defaults. Brad said the issue only occurred with MFC based ActvieX controls. (In VC you can make controls using ATL (ActiveX Template Library: my preference), MFC (Microsoft Foundation Classes), or just using straight C++). (Remember that ActiveX, OLE, COM are names for sort of the same thing over the years.)

 

So I created a blank MFC control, using VS2003, which uses MFC version 7.1. I copied the files onto the Win98 virtual machine (with great difficulty due to the “#” and <esc> keys spuriously appearing) and tried to register the control.

I wasn’t surprised that registration failed: I used DEPENDS.EXE to figure out what dependency the OCX had, and saw that MFC71D.DLL was required (the d at the end meant Debug version). Bells started to ring. I copied the DLL onto the machine, but by this time, Brad had to leave the office, and he wouldn’t be available until the next afternoon. He said he’d get the DLL registered on the target machine.

 

When we reconnected the next day, it was easy to reproduce the problem. From Visual FoxPro (any version) choose File->New->Form, add an ActiveX control, choose the new control, then try to save the form. Boom. 0x80004005: Unspecified error.

 

(0x80004005: Unspecified Error is an error called E_FAIL defined in winerror.h. The MFC code returns E_FAIL for this exception. So If you ever see this error message from using a COM object in FoxPro, it isn’t an error from FoxPro. )

 

Funny thing was, I could write this code:

 

ox=CREATEOBJECT("form")

ox.addobject("oc","olecontrol","the control’s progid") && you can use “comctl.treectrl.1 “ for the listview or “shell.explorer.2” for IE

ox.oc.visible=.t.

ox.show(1)

 

and run it just fine on Win98. This was strange (and is a possible workaround for the customer).

 

Now that I was able to debug it using chopsticks through a garden hose (somehow my sympathy for laparoscopic surgeons grew), I could trace through the failure. I set Visual Studio to stop on exceptions being thrown, and, sure enough, a C++ exception was thrown when the form was being saved.

 

After tracing the code, I saw that the MFC 7.1 code was calling GetFullPathName in a function called AfxFullPath (in vc7\inc\atlmfc\src\mfc\OLESTRM.CPP). On Win98, this function was returning 0, indicating failure. The parameter was a hard coded #define in the fox source code that has been there for more than a decade:

#define SZOLEOBJECTDATASTREAM L"\3OleObjectData")

 

VFP uses the OLE feature Structured Storage which uses IStorage and IStream (A quick synopsis: Structured storage is analogous to the files system on a disk: A storage is analogous to a folder, and a stream is analogous to a file) The structured storage is used to save the ActiveX control to a FoxPro table (the form’s scx file). This explains why the 4 lines of code above work: the control isn’t saved to a table anywhere.

 

VFP uses a stream name with a leading 0x3 and that looked very suspicious. The documentation for IStorage:CreateStream says

 

The name must not exceed 31 characters in length, not including the string terminator. The 000 through 01f characters, serving as the first character of the stream/storage name, are reserved for use by OLE. This is a compound file restriction, not a structured storage restriction

 

I wanted to know what the leading 3 meant (it’s a non-printable character) because it was being validated on Win98 and GetFullPathName was failing with it. I could change the 3 to ‘a’ and it succeeded. So after a few minutes googling around, I just reached for my trusty copy of “Insdie Ole” (2nd edition) by Kraig Brockschmidt. A quick look in the index found on p. 367:

• \00,\01,\02 specify an OLE managed element. OLE has special uses for each of these values, as we’ll see in later chapters.

• \03 marks an element as owned by the code that manages the parent storage of that element. This is useful when a client is handing out IStorage pointers to other components so that those components can store their data inside the client’s storage hierarchy. A client can save extra information for each instance of a component within such a storage by using the \03 prefix.

• \04 …

 

 

So I knew that it was valid to use this character for a stream name. After all, it’s been that way for 10 years, even way before Win98. So why was it failing now?

 

MFC was using the stream name and treating it as a filename and calling GetFullPathName on it. However, comparing the MFC 7.0 and 7.1 versions, I saw that the 7.0 version did the same thing, but ignored the return value of GetFullPathName, whereas the 7.1 version threw an exception if it failed. Why does it fail only on Win98 ? Because GetFullPathName succeeds on WinXP, but fails on Win98 with that leading 3.

 

I let Brad get off the phone line. He was probably sick of me saying oops, which was his cue to re-enable me to take control of the remote assistance session.

 

I did a little more research. I searched for any information on why the GetFullPathname failure was no longer being ignored. I found a bug report associated with the code change (after all, MFC is around a decade old too). The bug report title is “XP SECURITY REVIEW: MFC42: olestrm.cpp COleStreamFile::CreateStream validate param of AfxFullPath” Note the words “Security” and “XP”

 

So the MFC code was changed from silently ignoring the failure of GetFullPathName, to throwing an exception. The comment for AfxFullPath says “turn a file, relative path or other into an absolute path.” One would think this should not be called for names of streams, but it fails.

 

There is a big push at Microsoft to ensure that our software is secure for our customers. When much software was written, computers weren’t connected to each other online 24 hours a day. They existed in isolation, with the user occasionally lifting up his acoustic coupler on his modem to connect to some bulletin board. Malicious attacks from outside were difficult if not impossible. I challenge anyone to break into my piano’s Win98 machine (without breaking into my house<g>)

 

When the C language was designed in the mid 70’s, it had no native support for strings. It still doesn’t. However, strings are supported by a library of functions. A string was defined as an array of bytes, with a null byte at the end signifying the end. To copy a string, you just call the library function “strcpy” which takes 2 parameters: the original input string, and a place to put the result. If the original input string is not null terminated or is longer than the result buffer, then a buffer overrun security violation has occurred. The inherent design of strcpy makes this kind of buffer overrun occur frequently. If the original function were designed to take the size of the result buffer as a 3rd parameter, then it would be impossible for this library function to overwrite memory unintentionally. Alas, hindsight is 20/20.

 

The MFC code for streams was changed to throw an exception rather than silently ignoring a failure. It sounds reasonable: it would be better for customers to be alerted to failure, rather than just ignoring the failure.

 

On another note, customers over the years have said that FoxPro’s support of ActiveX controls isn’t up to par. However, this code serves as a good example of what really is happening. There was a specification that allowed a leading non-printable character for OLE. You could use it if you wanted to. VFP did. Some other control hosts probably didn’t. So FoxPro customers hear that controls work in other products, but not in FoxPro, so it must be the case that FoxPro’s control support is not good enough.

 

So the overall result was a failure for the customer after an upgrade (I hate that) At least we’ve identified the problem and come up with a possible workaround (dynamically create the control as the 4 lines of code above do).

 

Our goal at Microsoft is to provide customers code that works and is secure. After all, I’m a customer as well. I don’t like it when software I use seems to fail. However, it’s not always simple to know the right thing to do. Especially with the mix and match of old and new code from several different groups over a decade.