What makes a database a database and not just a data store?


I am sitting in the ridiculously nice Seattle weather sipping a local brew watching The Patriot, get this a decent movie starring Mel Gibson, and started thinking about databases and data stores in a gereral high level of abstraction that encompasses Sybase, Oracle, Yukon, and WinFS.


I disappointingly could not abstract much above the standard relational database stuff that came out of the 70s with the competition between IBM’s System R and Berkeley‘s Ingres. (Is Berkeley the best CS school in the world, or what?). All I could think of was Chris Date’s explanation that a database is a computer based record keeping system.


OK here’s my initial straw man list:




  • Data identified by unique keys.


  • Relationships expressed through matching keys.


  • The ability to retrieve information based on mathematical set algebra-calc

There are somewhat implementaiton details that happen to be common amongst most RDBMS and might not be what I am really looking for though. So here is the big question, what makes a database a database and not just a data store? Don’t worry, this is going somewhere (in my follow up posts).



 


Comments (12)

  1. Duncan Lamb says:

    I don’t think any of those things you mentioned are distinctive to databases. Keys, etc can all be present in a data stored in a bunch of flat files and nothing else.

    Here’s a few things that come to mind:

    A database is an abstraction layer for the data that hides the physical storage (or physical model) of the data, allowing the user to access it through the logical layer. Nowadays, ANSI SQL is the that querying standard. This abstraction is overkill for small databases (when flat files will do), but very necessary as you move into thousands, million, and hundreds of millions of records.

    That abstraction layer handles the speed, searching, integrity, sorting, and all that nice stuff we expect when we store and ask for data. It hides that the data is stored across several filegroups and disks, and that some of those autogrow and some get reorganized and moved around from time to time. After that, all the rest is just enhancements, some of which have become very necessary. Oracle excels in stuff like table partitioning and materialized views, Yukon will be the easiest to administer by far, MySQL is blazingly fast, etc, but essentially, their first job is to make the data easy (and fast) to get while hiding all the storage stuff on the back end.

    From what I understand, WFS is much the same idea. Ease of use and not requiring knowledge of the physical location to do everything.

  2. Dennis says:

    I’d say set algebra plus transactions.

  3. Scott says:

    I’d agree with Duncan. I’ve always just thought of an RDBMS, and XML to a larger extent, as just a wrapper for a flat file.

    There is nothing that can be done with a database that can not be done with a complicated flat file system.

  4. Matt says:

    At the physical level, I don’t think that there really is much difference between a database and a data store. At the physical level, a database is typically a collection of one or more files, and a DBMS is the complicated application that takes care of the task of managing the internals of those files and presenting them to the user.

    At the logical level, though, a database can be substantially more than a data store – which I interpret as simply a place to serialize your data, i.e. a file. At the logical level, a database conforms to some data model, with the relational model arguably being the best (some would say "only truly accurate") model. A data model defines not only the structure of data elements, it also defines operations that can be performed on those structures, and a system of integrity rules that can be applied to those structures. The DBMS, typically, takes on the responsibility of supporting the features of the data model, making it possible to declare integrity constraints, data types, and relational structures in your database and have them work properly to ensure that the data in your database is accurate.

    Personally, I think that integrity constraints are probably one of the most distinguishing characteristics, at least in the way that people approach databases. Those who tend to view a database as nothing more than a data store create databases that do not contain integrity constraints and that make only a nominal attempt at normalization. They then go on to talk about how "weak relational databases are," how they must "denormalize in order to achieve performance," and how they’ll handle all of the integrity in the application code. Those who understand what a database is will spend the time necessary to accurately capture business rules, develop a logical model that reflects those business rules via integrity constraints, normalize the logical model, choose appropriate data types, and so forth. They are the ones that reap the benefits of databases and see that they have much more power than a simple data store.

    Basically, a data store is a limp, lifeless, ignorant file that accepts what you put into it an spits it back out when you ask for it, while a database is a robust, vibrant, intelligent structure that accepts only facts (i.e. data that conforms to your declared business rules) and returns only facts.

  5. Peter Evans says:

    What a loaded question? Its like asking when does an apple become an apple core while eating one.

    IMHO, a database provides features above and beyond the storage construct implied by the term ‘data store’.

    Now what is the simplest data store? Well probably bytes or bits depending on how you want to begin defining data store. But at some level data store is just an abstraction from which you can begin defining more sophisticated data access and retrieval features for give user scenarios.

    Databases for me usually begin at some minimal functionality for a single user and progress to advanced multi-user and multi-protocol data access and retrieval systems.

    I always like to conceptualize it as a data store can always have a database feature set bound to it. While a database feature set cannot necesssarily be bound to a given data store.