Well is there really a "solution" at all in general? This particular case I think I constrained enough that you can claim an answer but does it generalize? Let's look at what I got first, the raw results are pretty easy to understand.
The experiment I conducted was to run a fixed number of queries (5000 in this case) but to break them up so that the compiled query was reused a decreasing amount. The first run is the "best" 1 batch of 5000 selects all using the compiled query. Then 2 batches of 2500, and so on down to 5000 batches of 1. As a control I also run the uncompiled case at each step expecting of course that it makes no difference. Note the output indicates we selected a total of 25000 rows of data -- that is 5 per select as expected. Here are the raw results:
Testing 1 batches of 5000 selects
5000 selects uncompiled 9200.0ms 25000 records total 543.48 selects/sec
5000 selects compiled 5401.0ms 25000 records total 925.75 selects/sec
Testing 2 batches of 2500 selects
5000 selects uncompiled 9181.0ms 25000 records total 544.60 selects/sec
5000 selects compiled 5402.0ms 25000 records total 925.58 selects/sec
Testing 5 batches of 1000 selects
5000 selects uncompiled 9169.0ms 25000 records total 545.32 selects/sec
5000 selects compiled 5432.0ms 25000 records total 920.47 selects/sec
Testing 100 batches of 50 selects
5000 selects uncompiled 9184.0ms 25000 records total 544.43 selects/sec
5000 selects compiled 5511.0ms 25000 records total 907.28 selects/sec
Testing 1000 batches of 5 selects
5000 selects uncompiled 9166.0ms 25000 records total 545.49 selects/sec
5000 selects compiled 6526.0ms 25000 records total 766.17 selects/sec
Testing 2500 batches of 2 selects
5000 selects uncompiled 9165.0ms 25000 records total 545.55 selects/sec
5000 selects compiled 7892.0ms 25000 records total 633.55 selects/sec
Testing 5000 batches of 1 selects
5000 selects uncompiled 9157.0ms 25000 records total 546.03 selects/sec
5000 selects compiled 10825.0ms 25000 records total 461.89 selects/sec
And there you have it. Even at 2 uses the compiled query still wins but at 1 use it loses. In fact, the magic number for this particular query is about 1.5 average uses to break even. But why? And how might it change?
Well, as has been observed in the comments, Linq query compilation isn't like regular expression compilation. In fact compiling the query doesn't do anything that isn't going to have to happen anyway. In fact, actually creating the compiled query with Query.Compile hardly does anything at all, it's all deferred until the query is run just as it would have been had the query not been compiled. So what is the overhead? Why is it slower at all? And what's the point of it?
Well the main purpose of that compiled query object is to have an object, of the correct type, that also has the correct lifetime. The compiled query can live across DataContexts, in fact it could potentially live for the entire life of your program. And since it has no shared state in it, it's thread-safe and so forth. It exists to:
1) Give the Linq to SQL system a place to store the results of analyzing the query (i.e. the actual SQL plus the delegate that will be used to extract data from the result set)
2) Allow the user to specify the "variable parts" of the query. The most common case isn't that the query is exactly the same from run to run, usually it's "nearly" the same... That is it's the same except that perhaps the search string is different in the where clause, or the ID being fetched is different. The shape is the same. Creating a delegate with parameters allows you to specify which things are fixed and which things are variable.
Now there was some debate about how to make compiled queries durable, automatically caching them was considered, but this was something I was strongly against. Largely because of the object lifetime issues it would cause. First, you would have to do complicated matching of a created query against something that was already in the cache -- something I'd like to avoid. Secondly you have to decide where to store the cache, if you associate it with the DataContext then you get much less query re-use because you only get a benefit if you run the same query twice in the same data context. To get the most benefit you want to be able to re-use the query across DataContexts. But then, do you make the cache global? If you do you have threading issues accessing it, and you have the terrible problem that you don't know when is a good time to discard items from the cache. Ultimately this was my strongest point, at the Linq data level we do not know enough about the query patterns to choose a good caching policy, and, as I've written many times before, when it comes to caching good policy is crucial. In fact, analogously, we had to make changes in the regular expression caching system back in Whidbey precisely because we were seeing cases where our caching assumptions were resulting in catastrophically bad performance (Mid Life Crisis due to retained compiled regular expressions in our cache) -- I didn't want to make that mistake again.
So that's roughly how we end up at our final design. Any Linq to SQL user can choose how much or how little caching is done. They control the lifetime, they can choose an easy mechanism (e.g. stuff it in a static variable forever) or a complicated recycling method depending on their needs. Usually the simple choice is adequate. And they can easily choose which queries to compile and which to just run in the usual manner.
Let's get back to the overhead of compiled queries. Besides the one-time cost of creating the delegate there is also an little extra delegate indirection on each run of the query plus the more complicated thing we have to do: since the compiled query can span DataContexts we have to make sure that the DataContext we are being given in any particular execution of a compiled query is compatible with the DataContext that was provided when the query was compiled the first time.
Other than that the code path is basically the same, which means you come out ahead pretty quickly. This test case was, as usual, designed to magnify the typical overheads so we can observe them. The result set is a small number of rows, it is always the same rows, the database is local, and the query itself is a simple one. All the usual costs of doing a query have been minimized. In the wild you would expect the query to be more complicated, the database to be remote, the actual data returned to be larger and not always the same data. This of course both reduces the benefit of compilation in the first place but also, as a consolation prize, reduces the marginal overhead.
In short, if you expect to reuse the query at all, there is no performance related reason not to compile it.