Let’s Talk About Touch (Part1)

Windows Mobile (and CE) has supported touch screen interfaces since the beginning but the release of Windows Mobile 6.5 brings something new to the platform: gesture support. Gestures are intended to be a more natural way of interacting with the device through the touch screen, making more of an emotional connection between the user and the applications under the finger or stylus. Technically gestures are a collection of input points generated by touching the screen in patterns that the system recognizes. However the touch solution is much more than just the gestures, it’s also about the animation and interaction that take place as a result of the gesture input, for example smoothly scrolling a list of items or ensuring a UI item remains fixed to the ‘finger’ to give the illusion that the user is directly manipulating the screen content as if it were tangible.

What’s the difference between gestures and simple mouse input?

At first glance there appears to be a lot of commonality between raw mouse messages and gesture messages such as select = mouse click, pan = mouse move and double select = double click. However the gesture recognition code is designed to handle a quite difference set of input limitations to the mouse input. Primarily the mouse input for mobile devices is expected to originate with a pointing device like a stylus or a physical mouse; however gesture messages are expected to originate from a variety of sources such as using a finger or thumb, or by shaking the device and even broader input like smile recognition from a camera. Most users will initially experience gestures through the touch screen input and the majority of the work in 6.5 has been around getting finger input and response right.

Using a stylus or a mouse results in surprisingly accurate touch data which in turn makes small screen controls a viable user experience. In this situation tolerances for click, double click and tap’n’hold can be very small.

However when using a finger instead of a stylus several things have to change – for example the tolerances for click , double click, and tap’n’hold need to grow significantly to handle the huge variety of finger shapes and sizes found out in the wilds of human kind. Additionally when moving your finger across the screen the shape pressed against the screen changes due to the angle the finger is at. This often leads to unexpected input points at the end of a pan input that can cause misinterpretation of the movement

A Word about Screens

Touch enabled Windows Mobile devices traditionally sport a plastic tipped stylus and have a touch screen based on resistive technology.

In brief resistive screen technology is based on two layers of transparent conducting material (Indium Tin Oxide or ITO) separated by an air gap held apart with tiny insulating plastic beads. Pressing the screen deforms the two sheets and makes contact between them and from the change in resistance the screen firmware can identify where the stylus has been placed. There are lots of variations on this technology.

Resistive screens have several killer properties: they are cheap, very accurate for a stylus, and they can continue to work in quite hostile environments i.e. dirty screens.

However they do suffer in other areas: they require an amount of force to deform the screen and make contact between the conducting layers; because of the multi plastic layers placed on top of the display and the air gap, some brightness is always lost; cheap & readily available traditional resistive screens really only support a single touch point - more advanced digital resistive sensors have been demonstrated which do support multiple touch points, but this is a future development; it’s quite tough to get more information beyond just the point location i.e. size of the touch area; and durability can be an issue due to the use of moving parts – i.e. deformation of the screen.

Another touch technology that has rapidly gained in popularity is capacitive (as found in the iPhone and Android G1). This technology works by continually measuring the capacitive property of different areas of the screen. When conducting material such as a finger is placed on the screen, its capacitive properties change and the screen driver can determine where the finger is based upon the changes.

Capacitive technology has several advantages: zero pressure is required to make an input because nothing needs to be deformed and this leads to a much more natural interface experience; although additional material is laid onto the screen, there is no air gap so optical clarity is much improved reducing the need for backlighting making power draw lower; multiple touch points can be supported; things like touch size and pressure can be extrapolated from the capacitive data.

However they do suffer in other areas: in general the cost is currently higher than the equivalent resistive screen; supporting a stylus is hard because it must be made of conducting material and must make sufficient contact to change the capacitive property of the screen; in several areas the accuracy tends to be lower than resistive e.g. around the edges of the screen, combined with the lack of a stylus and lower sample rates makes things like handwriting input very hard.

There are other input technologies developing all the time, but at the moment these two represent nearly all the market for mobile devices.

Windows Mobile 6.5 has primarily been designed for resistive screens because some input areas still rely on small controls and require a high level of input accuracy that can’t be easily achieved with a finger and require a stylus; however some device manufacturers are considering options to ship capacitive screens.

Looking forward the mobile team is considering how to address these issues and support many more screen types including capacitive.

What Gestures are supported?

In Windows Mobile 6.5 we have implemented five primary gestures:

Select

User taps on the screen for less time than a specific threshold, and movement is less than a threshold distance.

Double select

A second select is detected within a timeout period of the first one

Hold

User taps on the screen for more time than a specific threshold.

Pan

Once the distance moved exceeds a threshold all touch movement is represented as a pan gesture.

Scroll

At the end of a touch session, if the preceding points are roughly linear and exceed a minimum speed.

 

Gestures are delivered using a new message WM_GESTURE which is accompanied by the gesture ID and a handle that can be used to get the rest of the gesture data, like angle and velocity of a scroll, or the location of a pan gesture through the GetGestureInfo () API. Windows 7 for the desktop uses this same message and at the moment offers a slightly different set of gestures available on mobile, so be careful when searching MSDN docs to get the right ones (at this time the MSDN mobile docs haven’t yet been published).

 

How do gestures work then?

There are a couple of things you need to know when working with gestures directly:

· Gestures and mouse messages are not intended to be interchangeable. Although in WM 6.5 you will probably get away with using mouse messages instead of select or double select gestures, but moving forward that’s highly likely to change as new hardware is designed to take advantage of the touch infrastructure – touch is designed to allow separate areas for touch and mouse input so imagine a device with a mouse pad area as well as a touchable screen where the touch screen only generates gestures.

Ideally you should write your code to work either with mouse messages, or with gestures but not both at the same time.

· Gestures are always delivered to the window under the initial input – i.e. touch down location. You’ve probably never thought about this for mouse messages but it makes total sense that all mouse messages are delivered to the window directly under the mouse at the point a mouse event happens (unless delivery is forced to a specific window using SetCapture()).

For Gestures it’s a bit different. If the user wants to send a scroll gesture to a specific area of the screen the touch input may start in the ‘target’ window area but because it takes time and distance to describe a scroll gesture the end of the gesture might happen in a completely different window somewhere else on the screen. So the gesture engine code remembers where the initial point was and ensures the scroll gets delivered there as well. Same for a hold – the input may ‘wander’ under the finger, but the hold is sent to the window under the initial input point.

If for some reason the window under the initial input gets destroyed the whole of that input ‘session’ will be lost. It will only start again after the finger has been lifted and placed down again.

· We’ve also added some special routing for the WM_GESTURE message to help maximise the size of the touchable area. If we get a WM_GESTURE message in DefWindowProc() it means the target window didn’t process it either because it doesn’t support touch at all, or because the specific gesture means nothing to the control. DefWindowProc() will send the WM_GESTURE message to the parent window in case there is a larger control that will support the gesture. Consider the example of a form with labels on it – a pan gesture means nothing to the individual control, however the form itself can reasonably respond to the pan and move the whole form around in response.

So here’s something to watch out for: Don’t send gesture messages from parent to child window. We’ve put loop protection to stop a stack overflow but it’s still very inefficient to hit this and I’m sure you could find some way of overcoming the protection if you try!

Can I extend the list of gestures?

No, not at this point although it’s something we might consider in the future.

How do I use them?

There are a couple of examples that shipped in the Windows Mobile 6.5 Developer Tool Kit showing how to use the WM_GESTURE message. The basics are here:

LRESULT CALLBACK WndProc(HWND hWnd, UINT message, WPARAM wParam, LPARAM lParam)

{

    switch (message)

    {

        case WM_GESTURE:

        {

        GESTUREINFO gi = {sizeof(gi)};

        // Go get the gesture - will return FALSE if the gesture engine is not present in the system.

        if (TKGetGestureInfo(reinterpret_cast<HGESTUREINFO>(lParam), &gi))

        {

    switch (wParam)

    {

        case GID_PAN:

            {

    ...

    fHandled = TRUE;

        }

            break;

    case GID_SCROLL:

    {

        ...

            fHandled = TRUE;

    }

    break;

    }

    }

        if (!fHandled)

        {

            return DefWindowProc(hWnd, message, wParam, lParam);

        }

        break;

    }

}

What’s physics got to do with anything?

So far I’ve only covered half the story... or maybe it’s less than half because the user can only be aware of the response to gestures and without the right response there can be no real connection with the device.

The key point is that the device presents consistent responses across all applications so the user becomes confident in their interaction. So the expected responses are these:

Select and double select

Drill or action on the selected item

Hold

Bring up a context menu

Pan

Content under the finger moves in direct proportion to the movement of the finger i.e. direct manipulation.

Scroll

Content under the finger continues to move in the direction of the last pan and at the same velocity, decaying to a halt over time.

 

From this we can see there is really only one gesture that require any sort of physics driven response and that’s the scroll gesture. What we need is a way to implement a consistent movement in response to the gesture. To make this possible we’ve implemented a number of routines in the physics engine that allow the caller to identify the shape of the data area and the client area, then input the speed and angle of the scroll gesture (both available from GetGestureInfo()) and query over time the location of the client area until it comes to rest.

There are a number of animations ‘modes’ available to the Physics Engine beyond just deceleration and these are internally combined to move the location of content to the extent of the data area and then to change from one mode to another i.e. decelerate to rubber band, so that the final resting place of the animation is always with valid data showing.

By implementing this behaviour in a central Physics Engine module, each touchable UI component can expose consistent and natural feedback to the user. This is a key to raising user confidence in the device and achieving an emotional connection with the experience.

Take a look at the PhysicsEngineSample in the Windows Mobile 6.5 Developer Tool Kit for more information.

That’s enough for part 1

This is already a bit long, so let me break off now and I will post another update soon covering WindowAutoGesture, Managed Code and some more stuff about the resource kit. Oh and I will share things we learned while optimizing 6.5 touch related animations.