A conversation about how to implement Name/value pairs

 

Hello, Guys

   We have a conversation about how to implement a name/value pairs in our internal forum. I like to share your guys about this:

 

Question:

I am about to create EAV based tables to store name / value pairs. I am kind of f familiar with the drawback of this approach and kind of familiar with a common recommendation for it – redesign.

However, I don’t think I have a choice here - I need to be able to store name / value pairs that represent customer definable object metadata (e.g. there is an object and a customer should be able to attach any number of name / value pairs as metadata (properties) to it at run time).

Is there a better approach for solving this problem without using EAV?

Possibly there is a database-based Microsoft technology (extension of some sort?) that solves this problem better? (I looked into Entity Framework, but it does not seem like it supports this scenario?)

If I am to use EAV, what would be the recommended implementation? I am using MS SQL Server (2005?) (and .NET 3.5. on the backend). Should I use table relationship approach (something like https://weblogs.sqlteam.com/davidm/articles/12117.aspx) or an xml based one (something like https://weblogs.sqlteam.com/mladenp/archive/2006/10/14/14032.aspx)?

Answer 1

Do you just need to store these name/value pairs (and possibly just return all of them), or do you need to search for specific values given specific names? If the former, maybe you could store (all of them) as an xml column on whatever entity you're attaching the name-value-pairs to. (Even if you have to search for specific values, you can probably index the XML column, and define a relatively simple schema e.g (<nvp name="name">value</nvp>) for this xml so that it's indexable by name.

Reply to Answer 1:

I also need to support searching based on metadata name / values.

E.g. Pseudo code: find me all objects that have: (metadataName = “Size” and metadataValue = “small”) OR (metadataName = “HSize” and metadataValue > 500) OR NOT (metadataName = “VSize” and metadataValue < 400)

Answer 2

I'm not sure how well the XML index will work over values from different domains (strings and numbers). Probably should skip my suggestion and checkout the new SQL 2008 features mentioned on a related response. (Let me know which way you end up with as I'm curious the best way to do this for future projects of my own too.)

Answer 3

Note that the general use of sparse columns encodes the property name into the column name. If you know apriori all your properties or they don’t change too often, you can use sparse columns and get the best performance:

Create table nvp(id int, size nvarchar(100) sparse, HSize int sparse, VSize int sparse /*… */)

 If you have to parameterize your property names, you have to use dynamic SQL in the above case, or you could use sparse cols with filtered indexing together with a modified NVP model if you need to search for the names too:

Create table nvp (metadataName nvarchar(100), intval int sparse, strval nvarchar(4000) sparse, floatval float sparse /*… other types as needed */)

-- create filtered indexes

Create index size_index on nvp(strval) where metadataName=N’Size’

If you have an open schema, you can use the above modified NVP model or you can use XML to represent the values as

<size>small</size>

<HSize>500</HSize>

<VSize>400</VSize>

If you associate a schema, you can get the right type semantics and with XML indexes, you should get acceptable performance if you write the XPaths right. Eg. assuming you have a schema that types size as string, and HSize and VSize as integer, you could write the queries below as:

Select * from nvp

Where 1=properties.exist( .[size=”small” or HSize>500 or not(VSize<400)]

Again, if you need to parameterize the names in your query, then you need to use either dynamic SQL (parameterizing the XQuery expressions) or use the XML model <prop name=”…”>value</prop>.

Reply to Answer 3:

Thank you for your suggestions.  

Currently I started looking at the sparse columns solution, but I don’t have a good idea how EAV could be implemented using sparse columns.

I feel like I am missing something important here, but it seems like to use sparse columns for storing user defined metadata, when a request to add new metadata arrives, I need to

1) Verify that the sparse column with the requested name (name of a name / value pair) does not exists

2) ALTER TABLE and add a sparse column with a new name

Is there anything there that I am missing to avoid ALTER TABLE?

Possibly there is a whitepaper or best practices note, or a blog post on how to do it properly in the environment where propertyNames could change /created very often?

My Answer 1:

 The simple solution is that you pre-create a table with many predefined columns and different types, for example, 500 nvarchar(4000), 100 float, and 100 int types. All the columns will be sparse, so that they take 0 bytes if the value is NULL. Then, you have another table to keep track of the assignment/usage of these columns, so that you never need to alter the table, but only assign a column name to an attribute. You can also create a view on top of the table, and give each column a meaningful name, and you only query against the view. However, the view might need to be changed if new attributes are added.  

     You might run into issue when the number of attributes exceeds 1024. Then, I suggest you take advantage of the “Column Set” feature in SQL Server 2008, which 1) it allows you define 30K columns per table 2) it exposes an XML column which contains all not null sparse values in a row. For this case, you do not need to define view, because by default, all sparse columns will be hidden by the “ColumnSet”, such as select * from T, which not show the sparse columns. Also, you can get/set the whole properities easily.

     If you have multi-value per name, then you have to deal it by yourself. You can else concat the values into a string and store them in a string column, or use a sparse XML column and store it as a XML type which is also queryable (using XQuery). I personally recomend to use XML column to represent multi-values. Note, it is best to avoid use XQuery directly on the column set column.

    You can also take advantage a feature called is “Filter Index”. You can create up-to 1000 filtered index on the name values. By provide a filter like “where my_property1_value is not null”, the index size is small, and has low maintaince cost, and can answer the queries you provided.    

    Another approach is that you store the whole name/value pairs in XML column, and promoted your interested/queried name/values with the sparse column table, so that the XML column handle storing the whole name/value collection, you can search/query the promoted name/values.

 

Reply to my answer 1

 

Qingsong, possibly you could explain how the association between the pre-created and assignment of a column name (allocation of a column) works?

For instance, if you allocated a column HSize (assigned one of the precreated columns to HSize), can you use HSize column name in a query:

Select *

From Attributes

Where HSize > 120

If so, how do you associate a precreated column (e.g column1345) with column name HSize?

 

My answer 2

Yes, you can always explicitly select any columns in the table. In term of how you associate the column with a run-time name. I think you can use a table to track the usage of these columns and the name of the assigned columns.

 

Reply to my answer 2:

So, to associate the column with a run-time name I need to have a table that maps {name -> precreatedName}?

Is that correct? But then I will not be simply able to simply use the actual name without a precreated name?

If I understand this correctly, I will not be able to use this query:

 Select *

From Attributes

Where HSize > 120

But instead (From C#) I first obtain the real name

 Select Name

From AttributesPreallocatedMappings

Where AttributeNam = ‘HSize’

Then after I obtained the name (e.g. column1345), I query for the data:

 Select *

From Attributes

Where column1345 > 120

I am missing something here?

 

My answer 3:

You can always cache the result at your C# client, so that you do not need query the table every time. There are two alternative approach if you want meaningful columnname, 1) add column dynamically 2) use sp_ rename to rename a column. I am not sure the cost of doing the above two approach.

 

Guys, I hope your guys can follow the thread, and reach the end of the topic here. I feel that there are many ideas from the conversation. Frankly, I don’t know what is the best approach to implement this (just because I do not have experiment on implement such as system). In the further blogs, I like to explore this topic more.