SQL 2012 Operations Research – Simulated Annealing implemented in T-SQL can cut CO² Emission / Traveltime and Cost by more than 30%


Many organizations face combinational problems. Our service division has a quite interesting one. The assignment of engineers to customer workshops. The following screenshot illustrates the issue.

(planned Workshop Travel EMEA subset of data)

Assume Microsoft would just employ 2 engineers and would be asked to deliver 3 workshops (W1,W2,W3). Let`s assume all engineers are skilled to deliver all workshops to all customers. How many assignment options would exist if there is just one possible start date per workshop?

In this case we have 8 options ( = number of engineers power to the number of requests). You can think of this as a combination lock where each workshop request is another wheel and each engineer is a number on each wheel.

Each combination comes at a cost. The target is to find the combination where the cost is minimal. Cost could be measured in money or distance or in the number of nights engineers have to stay in hotel rooms.

Iterating each possible combination would take very long! Assume we can calculate 1 Billion Combinations per second. It would still take more than 3170 years to find the best combination in a search space of 10 engineers^20 requests (=100,000,000,000,000,000,000)

The actual number of combinations for Microsoft has more than 5000 Digits! This is a huge haystack to search the needle.There are several algorithms to solve this problem. Exhaustive enumeration (Brute Force) is the slowest possible solution but guaranties that you find the absolute minimum.One could also use Evolutionary Algorithms or Dynamic Programming. However all heuristics have in common that sometimes the best combination will be missed.

Simulated annealing (SA) is a generic probabilistic metaheuristic for the global optimization problem of locating a good approximation to the global optimum of a given function.

To keep things simple we start with a small random testset. This allows us to iterate every possible combination to benchmark the result quality of our heuristic.

We have 5 Engineers and 5 Workshops

Each Workshop can be delivered on one out of 5 Days. This results in 25 assignment options per workshop.

e.g: Workshop 1 could be delivered by engineer #0 on day #0 for a cost of 1883 (in this case travel kilometers)

You might think the best solution is to select the cheapest option whenever a new workshop request appears. Unfortunately this approach will lead you to a local and not the global minimum! This strategy is described as greedy . The following example is great to understand where the name might come from:

Imagine you are a thief and you would like to maximize your stolen goods. You can pick from a list of 5 objects but you can just carry 20kg:

  EUR KG
Safe 1000 19kg
Sound System 400 5kg
Watch 200 1kg
TV Set 500 8kg
Vase 600 11kg

Taking the most expensive that would fit in you bag would not bring the optimal solution. In this case the safe would give you a value of 1000. Then you have just 1kg left. So you can also take the watch (1000+200 = 1200). Instead taking the watch the TV Set and the vase would give you a value of 1300 (200+500+600). In this example the weight is the limiting constraint. In our engineer workshop assignment problem the time is the constraint.

When engineer #0 delivers Workshop #0 on day #0 he/she cannot deliver another workshop on day #0. So the next step is to create a list of rules that tell us which options are NOT Allowed to be selected together!

E.g:

If you select the first option for workshop #1 you can no longer select one of the following options for any other workshop:

Creating these rules can be challenging! There is just one SQL statement required but it contains a cross join to validate if there are any collisions between two options.

Select o1.oid as 'SelectedOption',o2.oid as 'ExcludesOption' from Options o1 cross join Options o2 join Request r1 on o1.rid = r1.rid join Request r2 on o2.rid = r2.rid where o1.oid <> o2.oid and o1.eid = o2.eid and (o1.startday between o2.startday and o2.startday + r2.duration or  o2.startday between o1.startday and o1.startday + r1.duration)

Due to the cross join the statement complexity is of exponential nature O(n²) where n is the total number of options. If you have 25 options per workshop and 5 workshops, the total number of options is 100. So you have to test 100 * 100 = 10000 options for collisions. Imagine you have to plan 3000 workshops for 1000 engineers where each workshop can start on one out of 5 different days = 5000 options per workshop * 3000 = 150000000 Options = 22500000000000000 possible option conflicts. Evaluating all options at the speed of 1 billion per second would still take 260 days! So what I used to do is, I order the options per workshop by their cost and keep just the cheapest 100 options. This means I have 100 * 3000 = 300000 Options in Total = 90000000000. (takes around 90 seconds and in worst case 670GB (=90000000000*8Byte) uncompressed space.)

 

Now the actual Simulated Annealing starts!

You start with an unassigned solution.

Null is considered as a cost of 100000 km which is extremely expensive.This is very important! At the end the algorithm will only assign a NULL if there are not enough resources to deliver all the workshops.

1) WHILE @energy> 0 do Steps 2) to 6)

Calculate the cost of the current solution: Select @cost = SUM(COALESCE(o.cost,100000)) from Solution s left join Options o on s.Selection = o.oid

2) Change the Solution a bit

Take a random Workshop and assign a random option. Update Solution set Selection = @option where rid = @rid
Update the rest of the Solution according to the rules! (Remember selecting an option for one request could mean "deselecting" another option for another request!) Update Solution set Selection = null where Selection in (Select ExcludesOption from Rules where SelectedOption = @option)

3) Calculate the deltacost of the former and the new solution! Select @deltacost = SUM(COALESCE(o.cost,@unassignedoptioncost)) - @cost from Solution s left join Options o on s.Selection = o.oid

4) Compare the Solutions

If the new solution is cheaper -> Always accept it!

If the new solution is more expensive -> Accept it only with a certain probability that is related to an artificial energy level that is constantly decreasing
                          
declare @x float = RAND()                          
declare @probability float = EXP(@deltacost*-1.0/@energy)
if (@probability > @x)

...Accept the Solution

else

...Discard the Solution

5) Recalculate the energy

This can be as simple as set @energy*= 0.997

6) Start again from the beginning

 

Make sure that you save all tested combinations. It could be that you leave a good minimum and you will never find it again!

There are a few things that have a huge impact on result quality. E.g. using a cubic function instead of a geometric degressive returns great results much quicker:

Comparing the results

The Brute Force algorithm (it`s actually backtracking, see it as an optimized version of Brute Force) returns the absolute minimum in around 2 Minutes.

The SA Algorithm finds the same result in around 15 seconds and the good thing -> it scales!

As you can see here in our little example, there is not just one absolute minimum! We can see that several combinations lead to a cost of 2711 travel km distance!

Visualize the results - The fun part 🙂

Showing all stored combinations ordered by cost: You can see the unassigned requests are counted with 5000 km

 

Lets focus more on the results where cost < 3000:

Drill trough to the actual solution:

You could also use the new Excel Geoflow feature to visualize the different solutions:

 

Conclusion

Doing this project on SQL Server 2012 was comparable simple. I ended up with around 400 lines of code and it utilized full parallelism!

Simulated Annealing could be used in many other areas of our daily life. Chip manufacturer use the algorithm to plan the microprocessors layout. Why don`t we use it for City planning? I recently talked to a student of spacial planning and they never considered this method.

The http://www.unfpa.org/ forecasts a world population of 9.6 billion people by 2050. At the same time resources are limited and declining. We have to make smarter more efficient decisions in the future to keep prosperity! Luckily our computers are getting faster! 🙂

 

btw... this kind of problem is similar to the way the SQL Optimizer works! However the Optimizer relies on dynamic programming as a search strategy.

--Full Demo Script

use master

GO

Create Database OptimizerSample

GO

use OptimizerSample

 

Create Table City (cid int primary key identity(0,1), name varchar(64), location geography)

Create Table Engineer (eid int primary key identity(0,1),engineercity int references city)

Create Table StartDays (startday int primary key identity(0,1))

Create Table Request (rid int primary key identity (0,1), duration int,requestcity int references city)

Create Table Options (oid int primary key identity (0,1), rid int references request, eid int references Engineer, startday int, cost int)

Create Table Solution (rid int, Selection int)

Create Table Placement (rid int, Selection int)

Create Table SolutionHistory (SolutionID int,rid int, oid int)

Create Table Config ([key] varchar(32), [value] int)

GO

CREATE CLUSTERED INDEX ClusteredSH ON [dbo].[SolutionHistory]

(

       [SolutionID] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

Create View SolutionReport

as --Dont forget to change the Cost for unassigned option!

Select Top 100000000000 SolutionID,o.rid,o.oid,o.eid,o.cost,

ce.Location.ToString() as EngineerLocation, cr.Location.ToString() as RequestLocation,

ce.Location.Long as ELong,ce.Location.Lat as ELat , cr.Location.Long as RLong,cr.Location.Lat as RLat,

CAST(CONCAT('LINESTRING(',COALESCE(ce.Location.Long,0),' ',COALESCE(ce.Location.Lat,0), ',' ,COALESCE(cr.Location.Long,0),' ',COALESCE(cr.Location.Lat,0),')') as geography) as Line

from  SolutionHistory sh left join Options o on sh.oid = o.oid

left join Engineer e on o.eid = e.eid left join Request r on sh.rid = r.rid left join City cr on r.requestcity = cr.cid left join City ce on e.engineercity = ce.cid

order by SolutionID, r.rid

GO

Create View OptionsAusschluss

as

Select Top 100000000000 o1.oid as 'SelectedOption',o2.oid as 'ExcludesOption'

--,o1.startday,r1.duration,o2.startday

from Options o1 cross join Options o2 join Request r1 on o1.rid = r1.rid join Request r2 on o2.rid = r2.rid

where o1.oid <> o2.oid and o1.eid = o2.eid and

(      o1.startday between o2.startday and o2.startday + r2.duration

or     o2.startday between o1.startday and o1.startday + r1.duration)

--or (o1.rid = o2.rid and  o1.oid <> o2.oid )

order by o1.oid , o2.oid

GO

 

CREATE Procedure SetSolution (@rid int,@optionsperrequest int)

as

       begin

       set nocount on

                    --print CONCAT('SET RID:', @rid, ' TO OPTION: ', @optionsperrequest)

              declare @offset int = CAST(RAND(CHECKSUM(NEWID()))*@optionsperrequest as int);

              declare @option int

              Select @option = oid from Options O where O.rid = @rid Order by Oid OFFSET @offset ROWS FETCH NEXT 1 ROWS ONLY

              Update Solution set Selection = @option where rid = @rid

              --print CONCAT('SET RID:', @rid, ' TO OPTION: ', @option)

 

              Update Solution set Selection = null

              where Selection in (

              Select ExcludesOption from Rules where SelectedOption = @option)

       end

GO

--TESTDATA--##################################################################### REPLACE THIS PART WITH ACTUAL DATA!

Insert Into City Values ('Amsterdam','POINT (4.89330005645752 52.3730888366699)')

,('Berlin','POINT (13.3769798278809 52.5160713195801)'),('Brussels','POINT (5.52799987792969 52.3419990539551)')

,('Budapest','POINT (19.0648193359375 47.5062217712402)'),('Helsinki','POINT (25.992000579834 64.2900009155273)')

,('Kiev','POINT (30.5470008850098 50.4295501708984)'),('Lisboa','POINT (-9.1502103805542 38.7257308959961)')

,('London','POINT (8.22599983215332 46.8059997558594)'),('Madrid','POINT (-3.70577001571655 40.4202995300293)')

,('Milano','POINT (9.1810302734375 45.4689407348633)'),('Oslo','POINT (10.7499799728394 59.912281036377)')

,('Paris','POINT (2.34120011329651 48.8569297790527)'),('Rome','POINT (12.4958000183105 41.9030494689941)')

,('Vienna','POINT (16.3687992095947 48.2025413513184)'),('Zurich','POINT (8.53956031799316 47.3770713806152)')

Go

declare @numberofcities int

declare @numberofengineers int = 0

declare @numberofrequests int = 0

declare @numberofstartdays int = 0

declare @maxengineers int = 5

declare @maxStartDays int = 5

Select @numberofcities = COUNT(*) from City

while @numberofengineers < @maxengineers

begin

Insert Into Engineer VALUES (CAST(RAND(CHECKSUM(NEWID())) * @numberofcities as INT))

       set @numberofengineers += 1

end

while @numberofrequests < 5

begin

       Insert Into Request Values (3,CAST(RAND(CHECKSUM(NEWID())) * @numberofcities as INT))

       set @numberofrequests += 1

end

while @numberofstartdays< @maxStartDays

begin

Insert Into StartDays DEFAULT VALUES

   set @numberofstartdays += 1

end

--Engineer Anzahl * StartDays

INSERT INTO Config Values ('OptionsPerRequest',@numberofengineers*@maxStartDays) -- this could also be lower to select just the best available options

 

INSERT INTO Config Values ('UnassignedOptionCost',100000)

GO

WITH AllPossibleOptions (rid,eid,startday,cost)

AS

(

  Select rid,eid,startday, cr.location.STDistance(ce.location)/1000 as cost from Request R Cross Join Engineer E Cross Join StartDays

  join City cr on R.requestcity = cr.cid

  join City ce on E.engineercity = ce.cid

)

Insert Into Options

Select rid,eid, startday,cost from (

SELECT  rid,eid,startday,cost,

         rowid = ROW_NUMBER() OVER (PARTITION BY rid ORDER BY cost)

              FROM AllPossibleOptions)x where Rowid <= (Select [value] from Config where [key] = 'OptionsperRequest')  --maximale Optionsanzahl!

GO

--TESTDATA--#####################################################################

 

--Ermittle OptionsAusschlussregeln

Select * Into Rules from OptionsAusschluss

GO

 

 

 

--Solver

use OptimizerSample

Truncate Table SolutionHistory

Truncate Table Solution

Truncate Table Placement

Update Config set [Value] = 100000  where [Key] = 'UnassignedOptionCost'

Insert Into Solution Select rid,null as 'Selection' from Request

GO

 

--Simuliere Abkühlung

set nocount on

declare @optionsperrequest int

declare @sid int = 0

declare @overalliterations float = -100

declare @temp float = Power(@overalliterations,3)*-1 + 1000000

declare @cost float

declare @deltacost int

declare @requestcount int

declare @unassignedoptioncost int

Select @requestcount = COUNT(*) from Request

Select @optionsperrequest = [Value] from config where [key] = 'OptionsperRequest'

Select @unassignedoptioncost = [value] from Config where [key] = 'UnassignedOptionCost'

Insert Into placement Select * from Solution

--Simulated Annealing            

while (@temp > 0)

begin

              --stelle Ausgangssituation her:

            Truncate Table Solution

            Insert Into Solution Select * from Placement

            Select @cost = SUM(COALESCE(o.cost,@unassignedoptioncost)) from Solution s left join Options o on s.Selection = o.oid

            --change Solution a bit in the right direction more expensive are more likey to be selected

                     declare @rid int;

                     declare @rand float = RAND(CHECKSUM(NEWID()));

                     WITH WeightedSelection (rid,cost)

                     AS

                     (

                     Select s.rid, SUM(COALESCE(cost,@unassignedoptioncost)/@cost) OVER (ORDER BY s.rid RANGE BETWEEN unbounded preceding and current row) AS DistributionFunction

                     from Solution s left join Options o on s.Selection = o.oid

                     )

                     Select @rid = w2.rid

                     from WeightedSelection w1 right join WeightedSelection w2 on w1.rid = w2.rid -1

                     where @rand between COALESCE(w1.cost,0) and ROUND(w2.cost,7)

                     --Do the change

                     exec SetSolution @rid,@optionsperrequest

                     --Evaluate the Change

            Select @deltacost = SUM(COALESCE(o.cost,@unassignedoptioncost)) - @cost from Solution s left join Options o on s.Selection = o.oid

                     if (@deltacost < 0)  --if better or same solution accept placement!

                     begin

                           --print CONCAT('ACCEPTED BETTER LAST COST ', @cost, ' NEW COST ', @deltacost + @cost)       

                           Truncate Table Placement

                           Insert Into Placement Select * from Solution

                           Insert Into SolutionHistory Select @sid,* from Solution

                           set @sid = @sid + 1

                     end

                     else if (@deltacost > 0)

                     begin

                           declare @x float = RAND()

                           declare @probability float = POWER(2.71828182845905,((@deltacost*-1.0/@temp)))

                           if (@probability > @x) -- if not better but due to probability

                           begin

                                  --print CONCAT('ACCEPTED WORSE LAST COST ', @cost, ' NEW COST ', @deltacost + @cost,' RAND ',@x,' ACCEPTANCE PROBABILITY ',@probability,' TEMP ', @temp)

                                  Truncate Table Placement

                                  Insert Into Placement Select * from Solution

                                  Insert Into SolutionHistory Select @sid,* from Solution

                                  set @sid = @sid + 1

                           end

                           set @overalliterations = @overalliterations + 0.1

       end

       set @temp = Power(@overalliterations,3)*-1 + 1000000

          --set @temp *= 0.997

end

go

Select SolutionID,SUM(COALESCE(cost,100000)) from SolutionReport group by SolutionID order by SUM(COALESCE(cost,100000)) asc

 

 

Comments (2)

  1. Bryan Johnson says:

    Do you have the sample code for this post?

  2. LukasSteindl says:

    Sourcecode can be found at the bottom of the blog post now. The algorithm works fine but it turns out that genetic algorithms are much more useful for this kind of problem and that you can even model this problem in Excel and use the Solver Addin to find the optimal assignment.

    Cheers!

    Lukas

Skip to main content