PDC 2008 Conference Scheduling Using a Genetic Algorithm

If you read my What Do I Do post, you'll know that I'm the Content Owner for this year's PDC 2008 in Los Angeles. MIX08 is behind us, and I've just recently transitioned away from my Web GO role. This means that I can now focus 100% of my time and attention on our October event. It's going to be fantastic!

One of the many responsibilities I have as Content Owner is to create the master schedule for the event. This is the schedule that tells you which session is in which room and at what time. For PDC 2005, we delivered over 200 sessions at the conference, not including repeats (we run repeats of popular sessions that are filled to capacity).

Because PDC is where we talk about the future of Microsoft's platform, all of the content relates in some way to our overall strategy (which is typically delivered during big keynotes and general sessions). This means that some sessions need to be scheduled ahead of others to provide foundational and prerequisite knowledge. For example, a 200-level (intermediate) session covering a specific topic should precede a 300-level (advanced) or 400-level deep-dive. The following diagram is a simplification, but it illustrates my point.

Here are some additional constraints that must be considered when creating a master schedule:

  • Only one session can be presented per room during any given time slot.
  • A speaker can only present one session during any given time slot (i.e. can't be in two places at the same time).
  • A speaker may only be available on specific days.
  • A session may require audio/video equipment that is only present in specific rooms.
  • Popular sessions should be scheduled in larger rooms.

To tackle the scheduling problem, sessions are typically grouped into tracks. These tracks are then scheduled in parallel, almost like mini-conferences running alongside each other. While this can be effective, many studies show that conference attendees don't attend tracks; they attend sessions. In other words, it's relatively rare for someone to sit through all of the sessions in a single track. An attendee is much more likely to hop from track to track to attend the sessions they're interested in.

Because of this track-hopping behavior, post-event surveys often reflect the inability of people to attend all of their favorite sessions. This is almost always due to a conflict where two or more desired sessions share the same time slot, and an attendee is forced to pick only one. For our MIX07 and MIX08 events, we tried to mitigate this undesirable outcome by publishing the session recordings within 24 hours of their completion (we actually averaged under 12 hours in both cases). This helps, but it's not a panacea.

In an ideal world, conference participants would be able to attend all of their favorite sessions. We almost always do a pre-event survey asking people to pick the sessions they'd like to attend, and we use that data to extrapolate expected room capacities for scheduling. For PDC, where we announce many brand new technologies, we have the additional challenge of making educated guesses about how many people will want to attend sessions that aren't revealed until the first day of the conference. It's an inexact science, to be sure.

For PDC 2008, we're going to try something brand new. In my spare time, I've been working on a genetic algorithm (you may want to review this article to understand some upcoming terms) that takes all of the above factors into account, including a couple more:

  • The total walking distance required for a participant to attend all of their favorite sessions. This distance can be significant in the Los Angeles Convention Center. The algorithm prefers shorter routes.
  • The degree to which room capacities are "balanced." This means that—on average—the algorithm prefers schedules that leave relatively equal space in each room. Otherwise, one room may be near 100% capacity while another is only at 25%.

In my version of the algorithm, each solution in the population represents a conference schedule. The fitness function takes all of the aforementioned factors into account, and penalizes solutions with undesirable attributes. At the end of each generation, an elite group of solutions is retained, and the remainder are subject to both crossover and mutation. Generally speaking, optimal solutions are discovered in under 500 total generations.

The end result is that most conference attendees should be able to attend most of their desired sessions, all without a rigid track structure.

Update: Channel 9 has published a 32-minute video interview of me discussing this technique.