Taxes: Hierarchical Storage Management

One of the taxes I alluded to some time ago when I broached the issues of software development "taxes" is Hierarchical Storage Management. The short description of Hierarchical Storage Management is that it is a way of archiving data transparently. When a file is due for archival, it is transferred to a slower (but less expensive) storage medium, such as magnetic tape, leaving a stub behind.

The stub retains some of the file's original metadata, such as last-modified time and file size, but none of the original file's contents are recorded by the stub. If a program tries to open the stub, the original file is "recalled" from tape backup, a process which can take minutes.

Programmatically, you can detect that you stumbled across one of these stubs by checking for the FILE_ATTRIBUTE_OFFLINE file attribute. (Note that this is not the same as Offline Files.) We already saw that Explorer indicates such files with a black clock. The command prompt indicates such files by putting the file size in parentheses. If your program encounters a file with this attribute, it should not open the file unless the user explicitly asked it to do so. Examples of operations that should be suppressed for an offline file in the absence of explicit user indications to the contrary:

  • Auto-preview.
  • Content indexing.
  • Searching.
  • Scanning for viruses.
  • Sniffing file content.

For example, a context menu handler should not open an offline file just to see which context menu options to offer. Right-clicking a file is not a strong enough reason to recall it from tape.

Failing to respect the FILE_ATTRIBUTE_OFFLINE file attribute when performing a search would result in all files accessed during the search being recalled from tape. If left unchecked, this will eventually recall every single file on the system, completely negating the act of archiving the files to tape in the first place!

[Raymond is currently away; this message was pre-recorded.]

Comments (11)
  1. PatriotB says:

    Since I don’t have a setup with HSM enabled, I just now manually set some files to offline (via SetFileAttributes) to test some things out with some of the shell extensions I’ve written. I’ve got some tweaks to make to my property sheet extensions: they show up when they shouldn’t.

    Looks like Windows Explorer (XP SP2) automatically disables thumbnails for offline files, so my thumbnail handler won’t need to do any extra work. However XP’s Filmstrip view does show offline files in the preview area.

  2. Anonymous says:

    This sounds like a useful flag for things like Webdrive to apply to their virtual files. I do like Webdrive, but the trouble is, Explorer is always downloading jpegs to preview them, or downloading thumbs.db, etc, and it slows things down.

    A friend once came to my house to use my broadband connection to upload a large EXE (a demo file). It was going really slowly, and then we realised that as Webdrive was uploading it, Explorer was downloading the EXE so that it could extract the icon to show it to me in the window. Ngghh. (In the end, we used command line ftp).

    In summary, more things should use this flag when appropriate.

    Or would that be bad? Is it ‘only’ for tapes? Or just for ‘files that take a long time to get’?

  3. A better (IMNSHO) solution would be for Explorer (and other programs) to ALWAYS ASK before opening files. It needn’t be intrusive. Just have a "view thumbnails" link.

  4. Anonymous says:

    It’s for offline files, which would include anything on secondary storage.

  5. Anonymous says:

    My university used to use this with UNIX based systems. During holidays, all the undergrad files ended up migrating to the optical storage jukeboxes. (The postgrads and staff worked during holidays so their new files pushed the undergrad ones out.)

    So when we all came back, the first thing we discovered was that access to our files was *really* slow. Even the first login takes ages as your .cshrc had to be fetched (in competition with everyone else). It didn’t take too long before everyone figured you cat all of your files to bring them back. So everyone did.

    And every few terms/semesters that broke the optical jukeboxes! Moral of the story: HSM only works if the people using it understand the point and cooperate :-)

  6. Anonymous says:

    In one company I worked for, we were doing something like this on an IBM Mainframe, we thought there was an automatic tape retrieval system so we had some batch jobs to get these files. They were not the most efficient things in terms of minimising retrievals but we didn’t care as we thought it was an automated system….

    Then we got a phone call from a rather annoyed operator, it turned out that what happened was that the command caused a message to appear on a console and someone had to go and get the tape and load it into the drive. After the first 10 or so instructions to get the same tape they began to lose their sense of humour about it…

  7. Anonymous says:

    You’re lucky if they phoned you Harvey, We tended to just cancel jobs if we thought they were doing something wrong.

  8. Anonymous says:

    Do you work at the same place I do?

  9. Anonymous says:

    Of course there are several RTLs that require you to open a file before you can determine its size, even if that’s all you want… sigh.

  10. Anonymous says:

    Hello Raymond,

    I am right now writing just such a program, a server that needs to retrieve file information and send it to a client.

    The problem is that some information, like the file link count, is available from the system only by first opening the file.

    Now, I want to send the client the file link count if the client requested it. However, I don’t want this to cause the file to be loaded from the tape, either.

    Can I open the file with FILE_FLAG_OPEN_NO_RECALL in order to achieve this?

    Will this flag cause the open to succeed immediately, without causing anything to be loaded from the offline resource? Or will stuff still be loaded from the offline resource, even if I use this flag?

    All I want to use the handle for is to call GetFileInformationByHandle().

  11. Anonymous says:

    No comment? :(

    I wouldn’t bug, but the little there is about this flag in MSDN Library could be interpreted either way:

    "FILE_FLAG_OPEN_NO_RECALL – The file data is requested, but it should continue to be located in remote storage. It should not be transported back to local storage. This flag is for use by remote storage systems."

    What bothers me is "The file data is requested". What does that mean? I am not requesting file data, I just want to use the handle to call GetFileInformationByHandle() without causing any loading of optical media or tapes.

Comments are closed.