Concerns about Spotlight and non-extensible types of data


I’m watching Apple’s 2004 WWDC broadcast, and I was watching the part on Spotlight (48 minutes into it) and I got somewhat worried. Jobs was pitching this as being a feature from longhorn that they were bringing to market ahead of WinFS. I was watching and was impressed by the demonstrations I was seeing. He showed searching in Mail being incredibly fast and powerful. But I wasn’t that enthusiastic. That just seemed like a catching up to features that have been in Office since Office v.X (which came out years ago on the mac). Ok, so then he showed something interesting. He went into Address Book and created a new search group with the following criteria “Show me all people with a birthday in the next seven days”. A useful group to always remind you of upcoming birthdays that you should pay attention to. What worried me was something he said like “which you are not able to do since we’ve added the ‘Birthday’ field to the address book”. Ugh. If this means what I think it means, then I’m suddenly far less satisfied with Spotlight vs. what WinFS gives me. In the spotlight system the metadata associated with a type doesn’t seem to be extensible. I.e. if Apple had not provided this “birthday field” then I would be unable to add it myself to the existing “address type”. While still powerful (and probably good enough for most people’s needs), I want something a lot more. I want to be able to add a “cuteness” scale to my address book. I could then create a query that asked “show me all the people with a cuteness factor of Kyooot!! within 10 miles of my current GPS coordinates”. Without extensible metadata there is no way to do that. And that makes me an unhappy camper :-(

Does anyone know more about this? I know you can create plug-ins to extra metadata from existing types that the system doesn’t know about. But can you add new metadata to existing types in the system? Could I, for example, add a “version” tag to every source file that I work on. Could i implement my revision control system on that metadata? (imagine being able to search through all versions of a file automatically!).

I don’t just want fast searching and thing like search folders/groups. I want to be able to extend the system to pay attention to the metadata _i_ care about.

Also, it’s not clear to me that they’re providing anything more than an indexing service to search for things with. The other part of WinFS that intrigues me is that it provides a store for me that I can use as an application developer to place data in. I.e. if I want to write my own email app, i no longer have to create my own database to store emails and contacts in. Instead I can just place them straight in the WinFS store. I then get all of this additional functionality (searching, transactions, etc.) for free. Is this provided through spotlight?

I’m hoping that by the time this ships that that’s the kind of functionality we’ll be seeing. If not, then i think it’s rather innapropriate for apple to claim longhorn functionality with what they’re offering.

Another thing I’m interested in is the new RSS support in Safari. I’m so happy that they’re adding this. However, it’s unclear to me if I’m going to be able to use my .Mac account to write a blog with? It seems like an ideal match and a really useful thing for .Mac to have.


Comments (26)

  1. Anon says:

    "show me all the people with a cuteness factor of [Cat] within 10 miles of my current GPS coordinates"

    Cyrus do you think anyone other than programmers/power users would use functionality like this? What percentage of ‘average’ users do you expect to even know what metadata is, nevermind add it themselves? My guess is that’s it is a very low number.

    One of the problems with metadata is that unless it is automagically added no-one will ever enter it – I can count on one hand (clever eh?:) ) the number of times I’ve seen Word documents with *any* metadata added manually by the user.

    I’ve said it elsewhere, and I know it sounds a little troll-ish (it is not intended to be) Longhorn looks more like a techie toy than something that will make the user experience better, and by better I don’t mean flashier :)

  2. Anon: First off, i expect metadata to be automagically extracted from existing known types, and 3rd parties will be able to extend that to handle their own types.

    Note: I understand your concerns about users not using metadata. However, I see it from another perspective. Users don’t use metadata because normally it’s useless because nothing takes advantage of it. one area where I see users being extremely proactive with metadata is the tagging of music files. People will spend a lot of time on this, but they find it quite worthwhile because they see the immediate benefits. When you have thousands of songs, it’s important that metadata be accurate so you can search quickly for the song you want. etc. etc. I think when you have ubiquitous support for this and you’re seeing the benefits of being able to say "find me all the stuff from my boss in the last week", then tagging information will become far more common.

    As I said, this doesn’t interest me purely from a user standpoint, but also from a developer standpoint. I as a developer want to use these technologies in interesting and innovative ways. For example, imagine using WinFS/Spotlight as the backing store for your code editor and using extensible metadata in the context of an IDE. Allowing for amazing search options beyond just what could be done with textual search. Maybe you want to be able to search for all structs with mutable fields. By providing extensible metadata I can now tag .cs files with an incredible amount of information about the information stored within it. But, more importantly, if there’s something I miss that someone else wants to add, they can now do that.

    Note: I am not saying what apple is providing is without worth. on the contrary, I think it solves part of the problem fantastically. However, I think that’s it’s only part of the puzzle, and that while a good start, it really doesn’t get into what’s really possible.

    Also, this only addresses search. It doesn’t deal with the issue that this still puts the burden of having a DB for you app in the hands of the app developer. I want to get the DB for free which will automatically give these benefits _and_ more :-)

  3. Juanxer says:

    At my job there are a few metadata types waiting for them to be invented, by developers or by me by means of an user-extensible system, such as Air Time, Client, etc.

    The best thing an extensible metadata system would allow for is to characterize legacy format files, plus folders.

    Also, one would hope that the same Bayesian algorithms that help tag mail as spam would help tag files with relevant metadata once trained.

  4. Anon says:

    This URL http://www.eweek.com/article2/0,1759,1618115,00.asp says that developers will be able to add their own metadata definitions although I assume it does not mean that they can extend existing ones. I couldn’t see anything useful about this on the Apple site.

    Cyrus said : "one area where I see users being extremely proactive with metadata is the tagging of music files."

    As an iTunes user I can sort of see your point, except the only metadata I ever manually add is a score/rating to particular tunes. All of the rest is fetched for me from CDDB and usable in Smart playlists (which is a *fantastic* feature and also appears in iPhoto).

    I do however see value in your point about the general backing store for app config and so forth, the current .Net (am I still allowed to call it that) config solution is not ideal as it makes it difficult to manager remotely, I’m talking about the .config files. In general this means I need to write a better configuration framework for my solution where configuration is shared between many apps in the solution, and I really don’t want to.

    Maybe Apple will add more features to spotlight before final release, which is if I am not mistaken about a year before Longhorn is due? Or someone might implement BeFS (sniff – wipes tear from corner of eye) for OSX instead :)

  5. Jaunxer: I never thought about using those techniques to help tag files. One issue is that that tagging is based on textual scanning. I’m curious what innovations we’re going to see in extracting relevant information from non-textual data (like pictures). I would love tools that would figure out what people were in photos, or woul figure out which pictures were related (combining time the image was taken, and the elements in the shots, etc.).

  6. Anon: Absolutely. I’m taking a wait and see attitude on this. I’m disappointed with the lack of information being made available about this stuff. Where are the bloggers on all of this? :(

    If you know of good resources for learning more, let me know!

  7. Apple Fan says:

    Didn’t you download iBlog from .Mac? There’s your blogging tool.

  8. Anderson Imes says:

    As sad as it is, I think I partially agree with "Anon" on this point. I think that the extensibility of document metadata, if the option is not presented to the user in such a way that such functionality is obvious and easily understandable. Current implementations of such functionality do not lend themselves well to usability by your average user.

    As much research as Microsoft puts into usability, I’m sure that there will be an attempt at doing this, but it had better be good.

  9. Eddy Young says:

    I don’t have the SDK in hand, but if I were doing this, I would make it a service of Tiger that any implementing application could use like BeOS does it. I could add the metadata schema to include any fields that my application requires, and these would become available to all the applications that use the SDK.

  10. Apple Fan: I wanted a free solution :-)

  11. Anderson/Anon: One way to accomplish this is through the use of smart folders (which apple may have with Tiger). Such a folder would be set to search for metadata like "related to my family". Then when i dragged items into it (like new pictures I’d gotton off my camera) they would be tagged with the "family" flag.

  12. shaitan says:

    Only trouble with metadata, is that one has to be quite aaaaanal-retentive to use such a feature. But some one that anal probably wont find search that useful.

    As far as adding metadata in more automatic fashion then in say, iPhoto, dragging camera files to the "My family" folder should automatically add metadata "userX’s family" and if a subfolder is named "Beach" then it would have metadata "userX’s family" and "beach". If I now e-mail the photo or move it somewhere else then that metadata is retained. It is the reponsisibilty of the aplication to assist one to at least add initial metadata as transparently asn possible.

  13. Anona says:

    You’re throwing around a lot of suppositions/conclusions without having the benefit of the SDKs or homework. For instance: "if I want to write my own email app, i no longer have to create my own database to store emails and contacts in. Instead I can just place them straight in the WinFS store. I then get all of this additional functionality (searching, transactions, etc.) for free." If you want WinFS, wait for 2-3 years for Longhorn. If you want OS-wide persistence (sooner in Tiger) read up on CoreData.

  14. Anona: No. I am not throwing around suppositions. I’m asking for information from the community to help understand what the keynote means.

    "Also, it’s not clear to me that they’re providing anything more than an indexing service to search for things with. "

    I am stating out right that I don’t know and I would like clarification on this.

    "Does anyone know more about this?"

    I am asking for people who do know to tell me more and to clarify this. I have not been able to find good data on this stuff. All I’ve found are high level overviews that leave a lot of questions unanswered.

    If you provide me with more information then that will help.

    BTW: I did a search for CoreData and found nothing helpful :(

    I also haven’t been able to find any useful information on developer.apple.com on this as well.

    I’m sorry that I went to the community for information on this and that that bugs you. However, I’ve found that blogging is an incredible way to learn things.

    Note: I think you are twisting my words. What I am saying is that I know what capabilities that longhorn is planning and what I will be able to do with them and why I’m excited about them. What I want to know is whether or not those will be avilable in Tiger.

    I tried to do my homework here but ran into wall after wall after wall. So i thought I might try actually talking to people first. I really didn’t think that that was such a bad thing to do.

  15. Shaitan: Yup. What would be cool also would be if iPhoto said "hey! that looks like your friend Mike in that photo, want me to add that info?"

  16. Tom Meschter says:

    As it turns out, the current Address Book API already allows pogrammers to extend the metadata in a record. A simple method allows you to add more properties and specify the data’s type. A (minor) catch is that the data needs to be one of several pre-defined types; however, as one of those types is NSDictionary and you can put essentially anything in there, it’s not much of an issue.

    Of course, the new property types won’t automagically show up in Address Book; it only uses the properties it knows about. But you could certainly create your own address book app that uses and extends the same data store.

    I don’t know how Spotlight will handle handle those new properties, whether you will be able to search by them or not. Perhaps it will have an API allowing you to extend the range of data types it understands.

    The relevant Address Book API can be found at:

    http://developer.apple.com/documentation/AppleApplications/AddressBook-date.html

  17. Rosyna says:

    Of course there is no documentation and CoreData doesn’t show up anywhere. Anything not in the keynote is under NDA unless Apple posts it on their site.

  18. Rosyna: Well… that kind of makes it difficult to learn anything…

    Why are these things under NDA? Shouldn’t apple be trying to covet developers across the globe into trying out all these new technologies?

  19. Tom: Adding key/value pairs is something that can be done in pretty much every OS today. Windows supports it as well and the current indexing service will index that data. I’m looking for more structured data. The ability to add _my_ own types as metadata to an existing object in the OS.

  20. Dr Pizza says:

    I thought Core Data was just a cut down version of the Cocoa EOF facilities used in things like WebObjects.

  21. Dr. Pizza: I have no idea. I can find like no information on it. Any links you have would be appreciated.

  22. Ben Donley says:

    I don’t know why people assume that spotlight features depend on filesystem metadata. Sure, it’s clear that the indexed information is stored by the filesystem, but I’ve assumed that the source of the indexed information is a filesystem plugin that is file-type-specific. So I think you have two questions: 1) Can Apple-supplied plugins be overridden by developers? (Dunno, check the SDK or force it in with Mach.) And 2) How well written are those original plugins & filetypes? (If the filetype supports a arbitrary extensions like "cuteness", and the plugin reads it properly, you won’t need to write a better plugin.

    Or maybe I have totally mis-guessed how they’ve implemented Spotlight. My intuition is that Spotlight is to WinFS as .Mac is to .NET: They’ve implemented the features that end-users understood from MS’s high technology, without actually implementing the high technology. Spotlight looks like V-Twin, which has been in the OS for like a decade, improved to read more filetypes & automatically updated each time something gets written to the disk.

    But I haven’t looked for the SDK or developer docs at all. So I’m definitely talking out of my ass.

  23. Ben: So far i haven’t been able to learn any more. I was basign my concerns totally on what I’d seen reported. I’m also speaking very much from the perspective of a developer who wants to use this stuff for cool purposes, not from the perspective of user who’s probably going to be happy with what comes out of the box :-)

  24. Ben Donley says:

    I guess I’m just reaffirming your concerns. It’s pretty clear that Spotlight does not allow extending HFS+ metadata. Neither will anything else. I don’t think Apple wants to do it due to the cross platform nightmares that we used to have with metadata based filetypes. You may be able to extend the full content searches so that it properly identifies a cuteness value inside whatever filetype you like, but not in the metadata.

    So when I say that Spotlight is to WinFS as .Mac is to .NET, I’m saying that it is nothing like WinFS and will not have the same features for developers. Back in the keynote where Jobs announced that iTools was now .Mac and it would cost $99/year, he bragged that we .Mac users were getting the features of .NET today: application integration with the internet. This is one of the many times that the Reality Distortion Field completely failed to fool anyone. That said, we’ll probably have the best desktop search solution ever deployed to a significant user base :)