BlobId vs BlobReference, Collections

Pradeep just sent out a good description of the differences between an RBS BlobId and BlobReference token, and a quick description of the current state of the collection architecture.

Both a BlobId and BlobReference are generated by RBS. The ID from the provider (called StoreBlobId) is not exposed to the application directly. The BlobId is about 20-32 bytes and needs to be stored by the application in a registered RBS column (of type varbinary(64)). This is required (storing BlobId); if it is not stored, GC (garbage collection) will delete the blob. While reading a blob, the application can specify either the BlobId or BlobReference. The BlobReference can be retrieved from the database using the TSQL function mssqlrbs.rbs_fn_get_blob_reference(blob_id). The BlobReference contains all the data needed to locate the blob in the blob store and is typically larger than the BlobId. If only the BlobId is specified to read a blob, RBS will internally go to the database and retrieve the BlobReference and then proceed to read the blob data. So, the advantage of using the BlobReference instead of directly using the BlobId is that it avoids the extra network round-trip to the database.

Collections are logical groupings of blobs and are also the unit of migration (from one DB to another). Migration is not yet implemented, but will be implemented in the next version of RBS. Those applications that want to be able to migrate a set of blobs from one DB to another (perhaps for load-balancing reasons) need to create collections and keep track of them and use them when creating RBS blobs. For other applications that don't need the complexity, using the default collection (CollectionId = 0) should be enough.

- mike