This is something me and my colleagues come across quite often. It is often quite time consuming to troubleshoot, even though there aren't that many possible reasons.
The typical scenario is that a webapplication stores client-specific data in session variables. Intermittently clients will see other clients' data, find that their session has been reset long before the timeout, or receive an IndexOutOfBounds, NullReference-exception or similar, since their session data suddenly appears corrupt or compromised.
Moving to Production
Unfortunately this problem is normally identified first when the application is in production. Normally you don't pay too much attention to the actual data presented during stress tests. Instead you focus on making sure that the response times are within an acceptable range and at best you only throw a quick glance at the web page to ensure that it looks okay. The discovery that you have a glitch in session state is made in production by the actual clients.
Don't blame the Framework
The first thing you hear is usually: - This has to be a bug in the framework. I can almost assure you that it isn't. It's usually cache, code or configuration that's to blame.
The usual suspects
- Static objects
- Cookieless sessions
- Process recycling
- Load balancing
- Incorrect data retrieval
I normally begin troubleshooting these issues by disabling all caching. Remove all <@ OutputCache >-directives from the .aspx-pages and, if you're running Framework 2.0, verify that the outputCache-element is set up correctly in the machine.config and web.configs.
Normally you wouldn't cache client-specific data anyway, so disabling the cache for the affected pages shouldn't have any impact on performance.
You should also investigate what happens on the way from the client to the server. Is the page cached in a proxy in mid-transit? Is the page even requested at all or is it simply retrieved from the browser cache?
Network traces and IIS-logs are a great tool for verifying that the request actually reaches the server, and that the response is, in fact, returned properly.
Static objects and ASP.NET can cause headaches sometimes. The common scenario with session state and static objects is that two clients are using a static object at the same time and one client manages to change it while another is reading it. If you have static objects in your code I suggest you investigate thoroughly, possibly eliminate them temporarily for testing purposes and see if the problem still reproduces.
When you use cookieless sessions you store the session ID in the URL. That's not really a safe alternative and not something I'd recommend for applications with sensitive client data. Also, saving bookmarks, etc. can cause all sorts of weird situations when you try to access the application with a non-existing session ID, or even a re-used one that now belongs to a completely different client.
If you're just loosing session data and are using in-process session state, then the cause is most likely to be the w3wp itself. Is the process recycling at the default intervals? Have you changed any of the settings in IIS manager so that it recycles after 300 request? Are you writing to the bin-directory, etc? I wrote a post on common reasons why your application pool may unexpectedly recycle earlier. If you suspect that this is what's happening I suggest you check it out.
Another reason why sessions may reset or possibly even jump between clients is load balancing. Make sure you have configured the state server correctly on all servers and haven't accidentally switched to in-process session state or have one configuration server 1 and another for server 2.
Incorrect data retrieval
If the client is seeing incorrect data you may need to verify that this data is retrieved correctly. By adding trace info to the code, tracking the session ID, and the variables used to retrieve the customer-specific data you may find that there is actually something wrong in the way the data is retrieved, rather than in the session ID.
If this problem happens to you, be prepared to spend a bit of time troubleshooting it. It usually takes a while to figure out where things go wrong. In order to avoid unpleasant surprises you should really take the time to verify that the correct data is displayed before deploying. It is far too easy to stare at the response times and forget about verifying that the data displayed is actually the data you'd expect.