I attended the PDC session by Shyam Pather and Chris Anderson at Microsft because I want to see learn about the way Microsoft looks at data. We’ve seen the evolution over the years. There is a wide array (some say dizzying array) of options for the developer to consume and publish data in its various forms.
The inventor of the Entity Relationship Model is Peter Chen. He makes the following points:
- We want to model the real world – a relational data model is not as real world as we need
- Too many headaches worrying about relationships
- We want to be as “truthful” as possible so if we see a table, we should make an entity called “Employee” or “Building”
- I will say this a lot. Good Entities are invariant over time. They exist independent of relationships.
- Whatever we see, “table, chairs, people, cups” – that should be what the developer should have in their code
- You need to know “relationships” in entities, like a group of people. They are gathered at a place and time. They have “relationships”
- There may be a spatial relationship, proximity or other type of “relationships”
- Relational model was a way to formalize the structure of data (customers have orders)
- Set theory
- Query and update is an abstraction over set operations
- A set of principles (3rd Normal Form)
- Meaningful structure for queries and updates, but not for a developer who wants to see a single employee object, not a collection of related tables.
- Issues – degree of concurrency, workloads
- Isolate applications from details of objects and data model layout.
- Applications are hard to write because you have to know all the relationships in a relational model
- Applications like concepts and abstractions that closely monitor the real world
- You don’t want apps to write complex queries or to have to understand the myriad of relationships
Making Life Easier For the Developer
- A good example is an employee.
- The information for an employee might be spread among many different tables. We have many different tables because most design sessions start with lists of data, which later become segregated into tables.
- But as a developer we only want to worry about a single object, not all the related tables and associations.
- Having to understand all the relationships in a database lowers productivity, obscures the structure and meaning of your code, and generally increases the time it takes to build typical business applications.
- Additional challenges include performance in terms of concurrency and response time.
- We need something natural, more intuitive.
- Semantics gets lost because some tables exist just for relationships, like in a many to many scenario
- Developers like real objects
- An “Employee” object is a lot easier than the 5 tables that make up an employee
- Relational Data
- No structured way to say “how many tables in an Employee object”
Key Facts about Entity Modeling
- Some data is designated as invariant to time
- It therefore exists on its own
- Objects have independent existence
- Relationships, on the other hand
- Are connections over time
- But still are first class concepts (Relationships)
- They are time dependent, no independent existence
- Entity Framework is basically a client side view manager
- Entities and relationships are views
- Build a view abstraction
- Allow extensions to SQL Language to greatly simplify joins
- Have intellisense work with query construction
Many times, data has an independent existence. In contrast, relationships are time dependent. For example, you may no longer be with a specific insurance company.
Usually you create a relational schema to see your chunks of data. But when it comes time to code you notice an employee is spread among a collection of tables. Relationships are important because they exist for data integrity, much like an order having many line items. Nice and easy from a query point of view. The ideal scenario is to have just an order object that automatically has all the line items in it already, or at least available by calling a method (you may want lazy loading, “loading on demand, not all the time).
Oftentimes, partitioning of data complicates matters
Tables are often horizontally partitioned to support better performance and concurrency. This complicates the developers life because data is now even more fragmented.
Evolving ADO.NET Entity Framework in .NET 4 and Beyond
It is somewhat overwhelming to think about all the Microsoft technology surrounding data.
There are some core pillars from which other frameworks are based. Those low-level core pillars include:
- SQL Server Modeling Services
- Cloud Storage
- On Premise SQL Server
Cloud Storage is the latest in the Microsoft offerings with options for relational as wells Windows Azure Storage for access to secure anytime, anywhere data that is durable and scalable.
On Premise SQL Server
- Run your most demanding mission-critical applications on premise.
- Reduce time and cost of development and management through great tooling
- Deliver actionable insight to your entire organization through business intelligence software and other offerings.
SQL Server Modeling Services
Note: Used to be called “Oslo” repository and the built-in domains.
I found this one to be the newest terminology. So I started to explore it. I essentially uncovered a ton of technologies being put under one umbrella:
- SQL Server Modeling Services provides tables and procedures that support features such as claims-based security, change tracking, versioning, and localization.
- Also includes clustering, database mirroring, resource governor, backup and restore, replication, reporting services, change data capture.
- These services include the Repository, a SQL Server database enhanced with capabilities for the storage of model schemas and instances.
- There are built-in domains such as the Common Language Runtime (CLR), Unified Modeling Language (UML), and System.Identity
- Key Point UML is playing a super big role here for application structure, behavior, and architecture, but also business process and data structure.
- UML is seen as a universal language for business analysts, architects, developers, testers and IT staff.
- When everything works well, you can enjoy:
- Maintaining catalogs of commonly used models,
- Providing impact analysis across models, and potentially traceability between software deliverables and the models that describe them
It is about being able to build customizable frameworks for database applications
These are the base layers that we want to “abstract away” for our applications.
Figure Above: The lower level data stores for the Entity Framework
Why Entity Framework?
- The ADO.NET Entity Framework is designed to enable developers to create data access applications by programming against a conceptual application model instead of programming directly against a relational storage schema.
- The goal is to decrease the amount of code and maintenance required for data-oriented applications.
We will start with Visual Studio 2010 and begin modeling our blogging. We’ll create 3 entities and 2 relationships.
Let’s not waste anytime and write some code. I’m using Visual Studio 2010 here.
Choose “WPF Application” and provide:
- Solution Name
Add a New Item (ADO.NET Entity Data Model)
Add “ADO.NET Entity Data Model.”
We will choose Empty model because we wish to build the database from the model, not the other way around.
An empty data model. This is the design surface where we can build our conceptual model. The design surface is ready for us to add entities.
On the design surface you can right mouse click to add a new entity. Note that the design surface is BloggingModel.edmx
You will need some simple skills. You will need to know how:
To add a “New Entity”
To add “scalar properties” right mouse click on the “Blog” entity:
You will add 3 entities. So you the 2 right mouse clicks above (Add Entity, Add Association)
- Id (int32)
- Name (string)
- Owner (string)
- Id (int32)
- CreatedDate (DateTime)
- ModifiedDate (DateTime)
- PostContent (string)
- Title (string)
- Id (int32)
- Name (string)
Built in pluralization that is pretty smart. If you say “Person,” it will pluralize to “People.”
Here is where we are so far:
It’s about relationships. From Blog to Post we need a 1-to-many relationship. A right-mouse click gets you “Add Association.”
How to setup the one-to-many relationship.
Blog has a one-to-many to Post.
Now it is time to setup a many-to-many.
Make sure you select:
- Post to Tag relationship
- Many to Many
Make sure to select many-to-many. It is from Post to Tag.
The promise of model driven development is that the database gets created based on the model. A simple right-mouse click gets you to “Generate Database from Model.”
Notice the model called BloggingModel.edmx.sql.
Here is the sql generated by Visual Studio 2010 Beta2.
Before we can run the script above, we need a database so what we have place to put the tables, relationships, indexes and so on.
Use Visual Studio 2010’s Server Explorer to “Create a New SQL Server Database.”
We will name ours BlogDB
Choose the target database for which the data will place all the tables, relationships, indices, etc.
Here is the script that results from the model we previously defined.
I pasted the script above into Microsoft SQL Server Management Studio.
We still need to run the sql script. I started Microsoft SQL Server Management Studio and hit “New Query”. The query uses “BlogDB” and then creates the needed tables and relationships.
To verify everything correctly, let’s get SQL Server Management Studio to generate a Database Diagram for us. Let’s add all the tables.
We’ve come full circle. We started with a conceptual entities in the ADO.NET Entity Framework and ended up with a physical representation in SQL Server.