A bit about WinInet’s Index.dat


Since a recent digg article and its underlying Wikipedia entry seems a little confused about index.dat, I’d like to give some more detail about what it is and what we have changed with it in IE7/Vista’s version of WinInet. As Jeffdav explained a while back, the index.dat file is a store for web related things; the URL content cache, cookies, RSS feeds, and visited links. Each of these collections, called a container, has their own index.dat file that lives in the user profile.


First, let’s talk a bit about these containers a bit more:


On most machines the biggest and most important container is the URL content cache index.dat. It lives (on vista) at \Users\<user>\AppData\Local\Microsoft\Windows\Temporary Internet Files\Content.IE5\index.dat. Content such as pages and images that we fetch from the web and that are cacheable get placed into this cache until they expire. The rules for if it is cacheable and when the entries expire from the cache are complex enough to warrant its own blog posting, but the common reasons that content doesn’t go in the cache is due to the server telling us not to via response headers, or the user telling us to not save any SSL resources to disk via the “Do not save encrypted pages to disk” option in Internet Options->Security->Advanced. Each cache entry has the URL and a file name to allow us to quickly find previously retrieved URLs and serve that content out of the content container. If a user just deletes all the files in the directory, the index.dat file will still contain all the URLs and paths until we realize that the cache entry is missing the file, and should be deleted from the index.dat.


The visited container is a listing of the URLs that you click on when web browsing, which is how IE can do URL auto completion and mark the links that you have visited a different color. This container is located on my vista box in \Users\<user>\AppData\Local\Microsoft\Windows\History\History.IE5\index.dat. Visited only needs to know about each URL once, since you have either visited the site or you haven’t.


The history containers are a set of containers for the different date ranges that IE displays, like today, yesterday, last week, etc. These containers are in \Users\<user>\AppData\Local\Microsoft\Windows\History\History.IE5\MShist01<date><date>\index.dat. Again top level links that you visit are stored in these containers. When the date shifts, IE does the bookkeeping often through merging these buckets.


The cookie container maps the cookie URLs to individual cookie files. It is stored in \Users\<user>\AppData\Roaming\Microsoft\Windows\Cookies\index.dat. The index.dat contains the associated URL, path to the cookie data and other cookie metadata information. You might notice that unlike the other containers this container is under a path called Roaming. This has to do with a domain feature that copy around your preferences from machine to machine on a domain. Cookies are one of those types of settings.


You might have also seen that starting in Vista almost all the containers have a Low\ directory with another index.dat. That is because these are specially marked directories that IE in protected mode can access. We completely partition off IE between the protected mode and normal modes. By design, normal only accesses the normal cache, cookies, etc. and by design and OS protection, protected mode only accesses the Low\ versions. The “how” of this partitioning is talked about on MSDN.


It’s important to note that pretty much all modern web browsers has to store these types of data stores. Firefox (1.5.0.6 at least) uses different types of file formats for each of its index.dat equivalents but they are there. The equivalent of the cache container index.dat is in Users\<user>\AppData\Local\Mozilla\Firefox\Profiles\59kuzm1n.default\Cache with the _CACHE_* files. The other containers are in the Roaming version of the directory over in Users\<user>\AppData\Local\Mozilla\Firefox\Profiles\59kuzm1n.default\. The history and visited are probably combined into one container; history.dat and the cookies container is cookies.txt.


There is one thing pretty special about WinInet and hence the index.dat files; they are OS components that many applications use, including explorer. That means that they were highly optimized for sharing data between processes. Each application’s copy of WinInet opens up the file for sharing read and write, but not for delete. As long as any program is using WinInet, the index.dat file can’t be deleted. If you could delete it, the applications actively using the file would probably crash or start corrupting data in memory. This also means that many applications leave their own footprints in the different containers. For Example: when Windows Music Player downloads an mp3 from the web from an URL, that file can end up in WinInet’s content cache.


So what’s new in IE7? Well the first thing is that IE made the interface for clearing up these files much simpler with “Cover My Tracks”. Under this idea WinInet made a bunch of improvements. The first improvement was in entry deletion. Those of you who remember the FAT file system on DOS might find the concepts behind this problem familiar. In DOS when you delete a file, the file is still around and special tools can undelete them unless some new files have already written over the old files. The way we use to delete entries in the index.dat file was pretty similar, the old URL data was marked free, but was still there, at least until it was overwritten by a new entry. In IE7 we now zero out the entry. Another problem was that some applications (cough Outlook Express cough) would write temporary files, like attachments, into the cache file directory to allow other applications to open them. If the index.dat file didn’t know about the file, we wouldn’t clean it up. Now when you use the “Delete Files…” button we delete everything in the directory regardless of if it’s in index.dat or not. There is one more feature in this area that I should mention even though it is not new. When we attempt to delete an entry from the cache, but can’t delete the actual storage file, we will still remove the entry from index.dat and stick the file on a list of things to periodically try to clean up.


Any Questions?


    — Ari Pernick

Comments (23)

  1. Continuing the discussion in the previous post, offcourse index.dat is not a secret record of any kind,…

  2. wagahai says:

    index.dat was the reason that I switched to Firefox.

    I do not want any of that information cached. I would set history to 0 days. The best that I could do for Temporary Internet files folder was setting it to 1MB, etc. Nevertheless, huge amounts of information were recorded.

    For years I would repeat the same old clearing of the various cache files.

    This was very tedious, so I eventually wrote a utility to do this for me. I was amazed at how much was cached, and went to further and further extremes to remove it. However, I could never get it all; the provided APIs left much behind. There were some values stuck in there from several years before.

    But, I would remove what I could.

    I would run this utility dozens of times a day.

    Then I heard that Firefox has a feature to automatically clear all of it’s cache. I broke down and tried it. It doesn’t store it’s cache in the index.dat files. I investigated it’s storage system. When it cleared the cache, it could actually clear it _all_. And this could be done automatically. This was the single reason that I switched to Firefox.

    I am aware that IE7 will have a similar feature (of course). I will give it a try. Depending on how thorough it removes the cache, I will decide my browser.

    My remaining problem is that index.dat is a system component. It is used for system-wide caching from other applications, not only by IE. So I still need to constantly run my utility to clear as much as I can. But at least I don’t have to worry about URIs being in there.

    I am looking forward to "Cover My Tracks."

    I do not want index.dat. Allow me to disable caching permantantly, for the whole system. It has been a source of frustration for years. Speed is not a problem. I spend far too much time just trying to remove that information. Or at the very least, provide an option to clear the various index.dat files at system startup / shutdown. That way you would not have to worry about corrupting other applications.

    Call be paranoid, but I do not care to leave a trail of bread crumbs detailing my actions for a period of time.

  3. Mike says:

    I think this post is misleading. The ability to delete cookies has been available in Internet Explorer just about forever. The real problem behind index.dat is that whether or not the indexes inside are still relevant or not, it keeps named urls forever. This is a privacy issue. Any application can read index.dat and figure out which sites I visit, without me knowing.

    As a user, I want to be able to turn on an hypothetical "auto-delete" of everything either anytime the web browser is restarted, or windows is restarted, or even on a schedule basis. I don’t think IE7 is going to provide any of this, unless I have missed something.

  4. Why is the directory still called IE5? It’s exactly those kind of legacy thingies which frustrate me about MS software. The fact that UA string additions can be stored in HKLMSoftwareMicrosoftWindowsCurrentVersionInternet Settings5.0User AgentPost Platform as well as HKLMSoftwareMicrosoftWindowsCurrentVersionInternet SettingsUser AgentPost Platform is just another example of this.

  5. Nicholas says:

    I agree on the legacy comment by Jorrit– why not start using a new folder name? Isn’t there a "system variable" for that system directory, much like there is one for "MyDocuments" and "Windows" ?

    Also, I remember there being issues with the index.dat becoming corrupted or full. This is what leads to the "right click-> view source -> nothing happens." bug, right?

    That’s why I use CacheSentry [http://www.enigmaticsoftware.com/cachesentry/] sometimes.. but wish it would just be fixed in the source!

  6. In my previous post I tried to explain a bit about what the index.dat files are and what has changed…

  7. wndpteam says:

    I’ve responded to a number of the questions asked here on the next post: http://blogs.msdn.com/wndp/archive/2006/08/07/WinInet_Index_dat_Q_and_A.aspx.

    — Ari

  8. The mysterious history file.

  9. Adam says:

    “The visited container […] is how IE can do URL auto completion and mark the links that you have visited a different color. […] Visited only needs to know about each URL once, since you have either visited the site or you haven’t.”

    I suppose … but you could do much nicer things if you kept a count of the number of times a URL has been visited. Like order the autocompletion list by most-visited instead of just alphabetically. This is one thing that FF does that I really miss in IE when I use it.

    For example, if I type “http://blogs“ in my FF address bar, the top 10 items in the autocomplete list are:

    http://blogs.msdn.com/
    http://blogs.msdn.com/oldnewthing/
    http://blogs.msdn.com/jensenh/
    http://blogs.msdh.com/michkap/
    http://blogs.msdh.com/ie/
    http://blogs.msdn.com/larryosterman/
    http://blogs.msdn.com/ericlippert/
    http://blogs.msdn.com/robmen/
    http://blogs.msdn.com/michkap/archive/2006/08/01/685351.aspx
    http://blogs.msdn.com/jensenh/archive/2006/08/04/688355.aspx

    That’s great! It gives me the links I visit most – the blog front pages, before any individual articles. If I did the same in IE, it would start with Abhinaba’s blog front page, and then list every single entry of his I’ve ever read, probably followed by Brian Jones and every article of his I’ve ever read, etc… despite the fact that I don’t care /that much/ about those guys blogs (interesting as they are). That really sucks in comparison to FF.

    Even ordering by most-recently-visited would be better than alphabetically. (Could this be implemented with the history info?)

    Also, if you’re parameterizing variable path elements, the cache path for FF is more like:

    Users<user>AppDataLocalMozillaFirefoxProfiles<random>.defaultCache

    The <random> part is there so that if a security vulnerability allows an attacker to read arbitrary files on your HD, the attacker still has some work to do before they can read your cookie/password/history/settings/etc… files.

  10. Mark says:

    FYI, if you want to decode the contents of the Index.dat file, here’s a forensics tool to do that. (Useful if you want to see what websites may have done nefarious things to Internet Explorer):

    http://www.foundstone.com/index.htm?subnav=resources/navigation.htm&subcontent=/resources/proddesc/pasco.htm

  11. zzz says:

    Does this mean that a forensic company/police has to actually read the HDD with some special tools (might need to open it physically up) if they need to have access to visited urls after user cleared up them easily with IE7? Or maybe they elect to try get the urls from ISP but hey, someone might have been using your wireless access and now it’s harder to tell the police it wasn’t you since it was your ip but you had just yesterday cleared the cache.

  12. sery0ga says:

    I have to analyze visitors history in a local net but i can’t find by which way I can pass to WinINet caching functions locations of files. Then, I have set of index.dat files on server, certainly, in different folders. How can i read them using WinINet?

  13. myob says:

    What about people with older computers or operating systems who can’t get IE7? How do they delete that info stored about them?

  14. wndpteam says:

    myob: I’m sorry to hear that you are not running XP or 2003. I’m not aware of an easy way to cleanup these files on previous versions of IE.

  15. SJ says:

    where do i go on the site to get the forensic tool to decode the index

  16. TJ says:

    I created a batch file (below) that cleans up a bunch of stuff including clearing IE’s history, cookies, temporary internet files, and index.dat files. I run it (via a shortcut) whenever needed. Also, I’ll periodically log off then log in as a different user (with admin privs), which releases the index.dat files for that profile. Then upon running the batch file, all the index.dat files are deleted. After logging back in as the regular user, the files are recreated.

    rmdir /S /Q "C:Documents and SettingsuserApplication DataAdobe"

    rmdir /S /Q "C:Documents and SettingsuserApplication DataMacromedia"

    rmdir /S /Q "C:Documents and SettingsuserApplication DataMicrosoftInternet ExplorerUserData"

    del /F /A:H /Q "C:Documents and SettingsuserLocal SettingsApplication Data*.db"

    del /F /Q "C:Documents and SettingsuserLocal SettingsApplication DataMicrosoftMedia Player"

    del /F /Q "C:Documents and SettingsuserLocal SettingsApplication DataMicrosoftWindows Media11.0"

    del /F /Q "C:Documents and SettingsuserApplication DataMicrosoftMedia Player*.wpl"

    del /F /Q "C:Documents and SettingsuserApplication DataMicrosoftOfficeRecent"

    del /F /A:H /Q "C:Documents and SettingsuserApplication DataMicrosoftOfficeShortcut Bar*.tmp"

    del /F /A:H /Q "C:Documents and SettingsuserCookies"

    rmdir /S /Q "C:Documents and SettingsuserLocal SettingsHistory"

    rmdir /S /Q "C:Documents and SettingsuserLocal SettingsTemporary Internet Files"

    del /F /Q "C:Documents and SettingsuserLocal SettingsTemp"

    rmdir /S /Q "C:Documents and SettingsuserLocal SettingsTemp"

    mkdir "C:Documents and SettingsuserLocal SettingsTemp"

    rundll32.exe InetCpl.cpl,ClearMyTracksByProcess 255

    pause

    exit

  17. HawkPunk says:

    You can read these files using index.dat Viewer™, available for free at http://www.pointstone.com/products/

    ~Hawk~

  18. Robert says:

    Index.dat Suite by Ur I.T. Mate Group and Steven Burn (2007) is supposed to delete all index.dat files, or at least those that you want to delete.

    I have run it several times with no problems.

  19. Robert Waldock says:

    Clearing your cache is not a lot of help to be honest.  The police use tools like NetAnalysis from Digital Detective which has a history extractor that will recover the deleted data.  

    http://www.digital-detective.co.uk/netanalysis.asp

    I have been using this for a few years and it is amazing what it can recover.

    Bob

  20. kit10 says:

    I work with IE8 on Vista. After deleting cookies from IE using "Delete History" button, I still have not-empty index.dat in my RoamingMicrosoftWindowsCookies folder. I can see whole browsing history using notepad – index.dat is NOT filled with blanks, as I expected. "Unlocker" doesn’t help delete file, even "Index.dat Suite" doesn’t help (nothing changed after reboot) :-/

  21. kit10: Did you leave "preserve favorite website data" checked?

  22. Mike says:

    So…here is a question about this file….

    My current field of work depends on being able to reconstruct browsing history and if the suspect used IE, I go after the index.dat file.

    I’ve noticed some odd output from some forensics tools regarding this file, though.  I’ll see large holes in the history when evaluating this output.  Am I not processing all the files or is this something particular to the index.dat file?

    As an example, I’ll see URLs for one month and than the next URL has a date attached to it that is several months later.  I have no reason to beleive that browsing didn’t occur, but somehow the index.dat file didn’t record it?  

    Can you explain this behavior?