Seeding and Debugging Entity Framework (EF) DBs

In Tom Dykstra’s terrific ASP.NET MVC EF tutorial, the following code is used to seed the student table.

 var students = new List<Student>
{
    new Student { FirstMidName = "Carson",   LastName = "Alexander", EnrollmentDate = DateTime.Parse("2005-09-01") },
    new Student { FirstMidName = "Meredith", LastName = "Alonso",    EnrollmentDate = DateTime.Parse("2002-09-01") },
    new Student { FirstMidName = "Arturo",   LastName = "Anand",     EnrollmentDate = DateTime.Parse("2003-09-01") },
    new Student { FirstMidName = "Gytis",    LastName = "Barzdukas", EnrollmentDate = DateTime.Parse("2002-09-01") },
    new Student { FirstMidName = "Yan",      LastName = "Li",        EnrollmentDate = DateTime.Parse("2002-09-01") },
    new Student { FirstMidName = "Peggy",    LastName = "Justice",   EnrollmentDate = DateTime.Parse("2001-09-01") },
    new Student { FirstMidName = "Laura",    LastName = "Norman",    EnrollmentDate = DateTime.Parse("2003-09-01") },
    new Student { FirstMidName = "Nino",     LastName = "Olivetto",  EnrollmentDate = DateTime.Parse("2005-09-01") },
};

 students.ForEach(s => context.Students.AddOrUpdate(p => p.LastName, s));
 context.SaveChanges(); 
 

The AddOrUpdate method should generally only be used to seed a DB,  you it cautiously for upsets in your code or you run the risk of losing data. ( For more information see Take care with EF 4.3 AddOrUpdate Method on Julie Lerman's blog.) Tom states:

The first parameter passed to the AddOrUpdate method specifies the property to use to check if a row already exists. For the test student data that we are providing, the LastName property can be used for this purpose since each last name in the list is unique.

An important consideration in many applications is how you handle nearly duplicate data. For example,  how will your application work when you have two students named Alexander Carson. Even very small databases like the NFL players have several members with exactly the same name (for example Steve Smith of the Carolina Panthers and Steve Smith of the St. Louis Rams).

Using the code above, you can’t add another Alexander Carson. You can add another Alexander Carson who enrolled on a different date from the first entry with the following code:

 foreach (Student e in students)
{
    var studentInDB = context.Students.Where(
        s => s.FirstMidName == e.FirstMidName &&
        s.LastName == e.LastName  &&
        s.EnrollmentDate == e.EnrollmentDate).SingleOrDefault();
      
    if (studentInDB == null)
    {
        context.Students.Add(e);
    }
}

context.SaveChanges();

While the code above solves the problem of adding another Alexander Carson, it’s not a general solution. You can’t add another Alexander Carson with a duplicate enrolment date. An approach I like to take, especially with larger seed databases, is to add each record one at a time. The following code shows how to do this:

 context.Students.Add(new Student { FirstMidName = "Carson", LastName = "Alexander" });
context.SaveChanges();
context.Students.Add(new Student { FirstMidName = "Carson", LastName = "Alexander" });
context.SaveChanges();

You don’t need to call SaveChanges after each Add, but doing so will show you exactly where a problem occurs. For example, when I was working on Jon Galloway’s fabulous MVC Music Store tutorial, I added a minimum string length constraint of 2. Everything worked fine, the constraint worked, I couldn’t add new albums with a name less than 2 characters. At some point I migrated the code to a new system where the migrations code had to insert all the seed data into the DB. The initializer failed with an obscure error. Switching the code to the Add/SaveChanges approach quickly exposed the problem (there was a seed album with a one character name).

While working on this blog, I was getting the following error when running update-database from the Package Manager Console.

Sequence contains more than one element

I couldn’t see how there was any duplicates in my code, so I went into debug mode. Adding System.Diagnostics.Debugger.Break(); to the migrations code caused Visual Studio to shut down and restart. I just happened to see Rowan in the kitchen and he suggested I look at his blog Running & Scripting Migrations From Code. I copy/pasted the following lines from his blog into the about method of my controller and was able to step into and debug the seed method.

 public ActionResult About()
{
    var configuration = new Configuration();
    var migrator = new DbMigrator(configuration);
    migrator.Update();
    ViewBag.Message = "Your app description page.";

    return View();
}

It was an excellent message, I had previously added duplicates of Alexander Carson with the Add method, and subsequently deleted that Add code. The debugger quickly showed my problem.  Rowans blog states:

Our team has been careful to ensure that all the operations available as Power Shell commands can be easily performed from your code.

It’s a good idea to run the command update-database multiple times to make sure it works each time (after the 1st run, it shouldn’t make any changes).

Follow me ( @RickAndMSFT )   on twitter where I have a no spam guarantee of quality tweets.