Replacing a COREDLL component in CE 6.0

Posted by Kurt Kennett

The Windows CE Operating System is composed of many modules. Some modules are interdependent on each other. Most modules never need to be touched in order to provide the desired functionality. However, there are some cases where it is desirable or necessary to replace some default system processing with your own code. The Windows CE build system provides a mechanism for you to do this.

 

Let’s consider an example. Say you have an embedded device which has a display screen, but no standard input method. You will be providing your own custom shell/display mechanism. In this case, you want to include display support in your operating system design (SYSGEN_DISPLAY), as well as the basics to provide your own shell application for displaying windows, fonts, etc. (SYSGEN_MINGDI, SYSGEN_MINGWES, SYSGEN_MINWMGR). All display of your windows is done using programmatic control. You may think that since you are controlling the system and there is no user input, you can determine if or when a dialog box is presented.

 

This is not the case. There are built-in parts of the operating system that will display a dialog in certain circumstances. One such case is when an application crashes – you get a dialog box indicating “Application <XXX> has encountered a problem and must be closed.” The OS kernel, when it detects an application abort that requires a process be terminated, calls into a coredll module called “showerr”. This module receives information about the crashing thread/process, and actually displays the dialog box – waiting to be dismissed by the end user. If this happens in our example system, the dialog can never be dismissed, since there is no user input mechanism.

 

There are other examples as well, but this one is suitable for demonstration purposes. We shall replace the “ShowErr” component, changing it so that no dialog is presented. Instead, we will simply output a debugging message that the application crashed.

 

To start, let’s build an OS Design that has the SYSGENs I mentioned above set. We can use the Device Emulator platform that ships with Windows Embedded CE 6 as the reference ‘hardware.’ After starting Visual Studio 2005, we can choose to start a new project:

This will bring up the Visual Studio 2005 “New Project” dialog box. On the left hand side is a tree view which has as a leaf “Platform Builder for CE 6.0”. Choosing this leaf, we can pick off “OS Design” from the templates view, and enter a name for our new OS design.

After we click “OK”, we are asked which BSP this OS Design will be for. As mentioned above, the Device Emulator will suit us for this demonstration, since it is always available as a reference platform, and we are not concerned with performance issues at this time.

After picking the BSP to use, we can start with a Design Template. This would give us a set of standard SYSGEN variables that are recommended as a starting point for certain types of systems. In this case, we want to be able to choose any and all components to include, so we’ll use “Custom Device”.

 After choosing the design template, we can now specify precisely the types of components we want in our OS Design. We’ll choose to have all the C/C++ libraries:

We don’t need any of the End User applications:

But we do want display support:

We need some default components to be able to have a useful file system:

And finally we need the GWES components that let us put up our own windows and manage them on the display screen:

After that, we can breeze through the remaining dialogs to complete our OS Design, pressing ‘Finish’ at the end of configuration:

The Platform Builder module will construct the design for us, and give us a display that looks like the following:

This default view is not entirely useful, but will be in the future. To start, let’s look at the operating system components that we specified and that were pulled in as dependencies. To do this, we switch to the Catalog Items View:

This shows us a tree view window in the left hand pane, which includes subcomponents from the available BSPs, Core OS components, Device Drivers, and the Platform Manager component.

Using the “Filter” tab, we can choose to just look at the components that we specified explicitly and the components that were brought in as dependencies:

This view is much more interesting. Expanded, it shows us the SYSGEN variables we picked in the preceding dialog boxes, as well as a couple of other components they depend on.

As shown above, we can use the drop-down list of configurations to select the ‘Release’ mode configuration. This is just to speed up our build process and the resulting speed of the runtime image. We also don’t need to debug at this stage. Once we’ve selected this configuration, let’s look at the project properties dialog, to configure it to work when we run it. We select ‘Properties..’ from the Project menu:

The property pages are very useful for allowing us to customize how the OS Design is built. The first configuration page looks like this:

Since we picked “Release” as a configuration type previously, these are the settings which are displayed when we open the properties pages dialog. For our design, let’s enable the Kernel Debugger (on the “Build Options” leaf in the tree at left), and also make sure that KITL is enabled, so we can talk to our device from Platform Builder:

Now we can click “OK” to dismiss the project properties dialog, and it will return us to the main Visual Studio 2005 window. From here, we need to set up our connectivity options so that when we’re ready to launch the OS it will start the emulator appropriately. We pick “Connectivity Options…” from the “Target” menu:

This displays a dialog which shows us currently defined connection parameters and allows us to make new ones. By clicking on the “Add Device” underlined button, we can enter in a name for a new set of connection parameters:

After we define the name of our new set of properties, the system lets us choose a download transport and a KITL transport mechanism. For Download, we use the “Device Emulator (DMA)” selection:

And for KITL transport we will use the same mechanism:

‘KdStub’ is already selected as the debugger. We can now click on “Apply”, then “Close”.

Returning to the main Visual Studio 2005 window, we need to do one more step – select the new set of communications parameters as the one we want to use. We do this through the drop-list in the ‘Device’ toolbar. Once we’ve picked this, we should choose the “Save All” option to persist all the configuration we just performed.

At this point, we’re ready to build the OS Design and deploy it, if we don’t want to change anything. However, before we do, as an example let’s make a test program that crashes and makes the system display the dialog box that we want to eventually suppress. To do this, we right-click on the “SubProjects” tree item in the Solution Explorer, and choose “Add New Subproject…”

The next set of dialogs that are displayed allow us to choose what type of subproject we want to make. For this example, we’ll choose to create a WCE application (one that we will make crash).

We can choose a name for this project – “mytest_crasher”, then click “Next >” to specify parameters for it. We’ll just start with a simple empty project.

This will create our subproject, which will appear in the solution explorer tree. Expanding this view gives us a look at the contents. The Subproject Wizard has made a few files for us, including a ‘.reg’, a ‘.bib’, and a few other components that are useful, and are automatically put into the build system for us.

In order for this application to be useful, we need to add some code to it. Let’s right-click on the “Source” folder in the tree, and pick “Add > New Item…” to create the source file.

This brings up the “Add New Item…” dialog box. We choose a C++ source file type, and fill in an example name, then click ‘Add’:

The new file is placed in our solution, and the editing window for it can be opened:

Into this file, let’s put the code for a simple program that crashes itself. This is really not that hard to do:

To make itself crash, this program simply writes a value of 0 to the NULL address. Since this is a protected address that cannot be used, an access violation should occur. If there is no debugger present, and since we didn’t put in any exception handling code, the process should be terminated at that point. Now that we have code for the program, we can build our entire solution to produce an OS image. The ‘mytest_crasher.exe’ program will automatically get put into the OS image:

The build will take a while to complete, but not as long as some platforms since we have selected a minimal set of components to use.

The build output shows that the OS image was successfully constructed, and is only 3.1MB in size. This isn’t that big, and is a great starting point for most types of images. We can use the “Target > Attach Device” menu command to start the emulator and run our image:

When it has started, the OS simply shows us a blank screen. This is not unexpected as we have no shell in our image, and have not specified any programs to run at startup:

Not really that interesting an OS project. To run the ‘mytest_crasher’ program we created and included in the OS image, we can use the CE Shell (CESH) tool in platform builder:

The ‘s’ command means ‘start’. For a list of other commands available, you can enter ‘help’ or the ‘?’ command. The command shown above simply starts our program that is going to crash. It does so immediately, breaking into the kernel debugger. If there was no kernel debugger included in the image, we would not get the debugger break. Instead, the exception would be passed to any application-level debugger being used, and then if unhandled would result in the termination of the program:

This is the first of two exceptions we will get. The first-chance exception is one which occurs before any exception handling takes effect – it gives us a chance to see the real problem before any of our code that we could write to handle it has a chance to. Looking at the debug output in source view, we can see we have crashed exactly where we planned to:

The debug output in the bottom window shows our program outputting it’s masthead (“CRASHER!”) so that we know the program is executing. The rest of the output is from the operating system, trying to help us find the source of the problem. If we choose to continue execution and let any exception handling take effect, we get another exception dialog. This is a “last chance” exception – basically telling us that the program is being terminated. Again, if there were no debugger present in the system we would not see this message and the termination process would not stop at this point.

Since we have a debugger, we acknowledge this dialog and continue execution. We can then see on the Device Emulator screen that a dialog has been presented, informing us of the application crash:

Herein lies the core of the problem we are trying to solve. On a “headless” device (device with no display), or on a device with a custom display that does not have a generic input mechanism, there is no way for a user to dismiss this dialog (by clicking the ‘OK’ button). Also, for style and product branding, we may never want such a dialog to be displayed, even if an application has crashed – we may want it to just log the error and automatically restart.

 

The code that puts up the dialog box actually runs in the context of the ‘crashed’ process – during termination when the program stack is invalid. The code needs to run in the context of a current process, and can’t run in the kernel because it uses the user interface. To display the dialog, the termination code does all required tasks to kill the process, then calls into a COREDLL function called “ShowErr”. The function has the following prototype:

 

extern “C” BOOL ShowErr(DWORD dwExcpCode, DWORD dwExcpAddr);

 

The function lives in its own COREDLL module, aptly named “showerr”. The function is passed the exception code that was generated that resulted in the process’ termination, as well as the address that generated the exception. Since the function executes in the context of the terminated process, the function can retrieve the name of the process (simply the name of the current process), and can use the MessageBox() function to display the error message. This is the default processing.

 

We want to change the processing so that we log the error to the debug output stream, but do not stop to display a dialog box to a user. To do this, we need to add another project to our OS design. As before, we select “Add New Subproject…” from the context menu we get from right-clicking on the “Subprojects” tree item:

Code that goes into COREDLL is linked together during the SYSGEN phase of the build process. This is so that COREDLL only included the functional blocks for features that are present in the OS design. In order for us to replace one of the modules in COREDLL, we must have the library we’re going to use available *before* SYSGEN starts. This will be explained as we go. To start this project, we know we’re going to need a library (.lib) that can get linked with the rest of the contents of COREDLL:

We’ll call our library ‘mytest_showerr’, as shown above. Making a static library is pretty simple, and we can just ignore the next dialog that asks about precompiled header support.

Now we can see our new project in the OS design tree, and add a source file to it so it is useful:

We’ll name the source file we’re going to add “showerr.cpp”, and it will simply include a replacement function for the one that is standard in COREDLL:

Now that we’ve added the source file, we can edit it and put in the code for our new ShowErr:

I have put in some comments as to what a ShowErr() function could possibly do to replicate all of the functionality that is currently in the function. People who are able to view the Windows CE shared sources can locate the actual code used for the function. For our implementation (shown above), we’re simply going to output a very terse message to the debug stream, and return. No MessageBox() will be done, and so none will be displayed.

 

This is simplistic code and should compile properly. As mentioned above, we need to have the resulting library available *before* COREDLL is constructed in the SYSGEN phase. So how do we get our library to build at the appropriate time? How can it get built when COREDLL doesn’t exist yet in that stage of the build? The answer is that it is a *library*, and not a *dll*. Therefore, it does not get linked at all. It is just a collection of .obj files that can have external linkage. When COREDLL is linked, they will all be resolved correctly. However, to get the library to build at the appropriate point in the build process, we can adjust its properties:

As shown above, we select “Properties” off the context menu for our project in the tree pane view. This brings up the dialog specific for that project:

Many of the default properties for the project are editable through this dialog. We want to change only a couple of the default settings. We first want to change the “Release Type” found on the “General” tab to “OAK”. This is the location for SYSGENed libraries for our OS.

The next item to change is the name of the target. To get linked properly with COREDLL, we need our library to have the same name as the original one (the default one) has. This is “showerr”. We can change this in the dialog using the “Target Name” setting:

To include the appropriate header files, we need to add a path to the set of include directories. On the “C/C++” tab of the dialog, we edit the “Include Directories” setting, and add the location of the public common OAK include directory. We can only use SDK and OAK public header files prior to a SYSGEN.

With these changes, the library will build with the correct name, and be placed in the correct output directory so that it can be linked with COREDLL. However – how do we get it to build *before* the SYSGEN phase runs? The answer is to use a batch file to invoke a build of the component prior to the SYSGEN step starting. We can add a batch file to our project by using the context menu to add a file:

The file we’ll add is just a text file (normal batch file), and we’ll give it the extension ‘.cmd’. To denote that this batch file runs before SYSGEN, we’ll call it ‘presysgen.cmd’:

The commands we want to execute are simple – we want to set up the local environment, change to the directory where our new “showerr” library source is, and execute a clean build. These steps are accomplished using the commands we put into our batch file:

Once we have saved the file, the last thing we need to do is to make the execution of our batch file happen before SYSGEN when we build our OS image. This is easily done by adding a pre-sysgen step into the OS design project settings. To access these settings, we can use the “Project > Properties” menu command, or right-click on the project and use the context menu that is displayed.

This will display the project property pages. One of the items available in the tree view at the left of the dialog is “Custom Build Actions”. When we select this item in the tree, we can then select a point at which we want to execute our custom build action (the invocation of the batch file we made):

Selecting “Pre-Sysgen” for the “Build Step:”, we can click “New…” to bring up a dialog that lets us enter a command to execute. We put in the command to execute our batch file. Since it will be relative to where we put our project, we use a path specification relative to %_PROJECTROOT%:

Clicking OK, we can see the command in the list of custom build actions for the pre-sysgen build step:

We dismiss the dialog by clicking “OK”. Now when a SYSGEN occurs, prior to it starting, the command file we entered will get executed, and it will build our version of the ShowErr module, putting it into the right directory so that the creation of COREDLL will include it. However, in order to not replace our new SHOWERR.LIB with the one that would normally get created, we need to do some dirty work in the build process to override some of the default build actions. This involves editing the “Projsysgen.bat” file for our replacement component, and altering a couple of build variables. We open the “Projsysgen.bat” file that is in our list of parameter files for our project, and add some lines that will be executed during the “preproc” and “pass1” stages.

 

The first command we execute in the “preproc – one to reset the status of whether our settings have been made or not. This prevents us from re-setting settings that are already made, or changing things that the Windows CE build system has deliberately turned off. We use the command:

 

set MYTEST_SHOWERR_SET=

 

at this phase of sysgen to clear out the variable to a known (empty) state. In other phases, we can check to see if this variable is clear, and if it is make our settings changes. If the variable is set when we check it, we do not need to make any updates to other variables. We do this exact check in the “pass1” SYSGEN state, and make alterations to the variables we want:

 

if /I not “%MYTEST_SHOWERR_SET%” == “” goto :MyTest_Showerr_NowSet

set COREDLL_REPLACE_COMPONENTS=%COREDLL_REPLACE_COMPONENTS% showerr

set REPLACE_MODULES=%REPLACE_MODULES% coredll

set MYTEST_SHOWERR_SET=1

:MyTest_Showerr_NowSet

 

What have we done here? The CE build system looks at two variables when it is doing a SYSGEN, to see if any COREDLL modules are being replaced – “REPLACE_MODULES” which speaks globally to the system components that are being modified, and then for COREDLL, “COREDLL_REPLACE_COMPONENTS”. In each case, since we know that we have not modified these variables before, we simply add on to the end of the lists of components in them. For components, we specify that we’re replacing “showerr”, and for modules we specify that “coredll” is being modified.

 

With these changes, the build system knows specifically that “coredll” must be built prior to SYSGEN, and it also knows that the COREDLL component “showerr” is being replaced. The changes to the ProjSysgen.bat file are shown in their entirety below:

This is by far the most complex part of the replacement scheme. Other parts of the OS can be replaced by using the appropriate build variables – this includes the resources used by default for dialog boxes, icons, etc. Which variables are used is beyond the scope of this particular how-to.

 

Once we have saved our ProjSysgen.bat file, we can use the Build menu to start a clean SYSGEN:

A clean SYSGEN forces reconstruction of system components if they have been marked as having a replacement piece. When the SYSGEN commences, we will see that one of the first actions that is taken is to execute the pre-sysgen build commands. Our “presysgen.bat” file will get executed as one of these commands, and will build our ShowErr() component:

We can see above that our showerr component is built, and the library is output to the correct OAK directory. This will allow SYSGEN to link with it when it builds/rebuilds COREDLL. Please note that if any changes are made to the showerr library, a completely new SYSGEN must be done in order to pick up the changes – COREDLL is not rebuilt when there is a project or OS design change unless it affects the SYSGEN or IMG variables used.

 

The OS design will build – maybe taking a bit longer than usual since it must replace a coredll component that is not part of the standard build. You will notice that since “mytest_showerr” is a project under our OS design, that it will get built *again* when platform builder builds the subprojects. This does not affect COREDLL – it just replaces the library that has already been put into COREDLL.

Upon a successful build, we have an OS image with our replaced COREDLL component. There has been a tiny change in the size of the OS image:

 

Old image size: 3142643

New image size: 3142616

 

Only 27 bytes - this is less than 9 ARM 32-bit instructions - evidently the code in the default ShowErr() did not do a lot, and did not include any Unicode strings like the one that we put in. We can now execute the new OS design by using the regular “Target > Attach Device” command:

The emulator should start normally, and once again display the black screen, just as it did in the previous iteration before we’d replaced the COREDLL component. Using the CE Shell, we can now start our “crasher” program, the same way we did before:

This time, since we are still using the kernel debugger in our image, we get the same exception as before (first chance):

Acknowledging this error, we can look at the point in the source that it happened if we want to, and deal with it. As mentioned previously, execution is only stopping because we have the kernel debugger enabled. These exceptions would not be presented if no debugger was running – the system would just do its normal processing.

 

After the first-chance exception is continued, we get our last chance to debug exception, the same way we got it before:

We click OK, and continue processing. With the standard version of COREDLL, we would now get the pesky dialog box. With our new version, we get a debug message output, and no dialog:

Processing has continued, we got the notification that a crash occurred – “*** A PROCESS CRASHED ***”, and we got no dialog on our screen. There is no user action required and a robust system can restart the application that crashed, and log the error for later analysis.