Project Server 2007: Why is my Archive DB so big?


In the August Cumulative Update for Project Server 2007 there was a fix for an issue with archiving projects (the Administrative Backup feature).  Some customers had noticed that the archive database was bigger than expected in relation to the retention policy they had set, and the size of their draft database.  In normal terms you might expect that with a project retention policy of 3 your archive database would be about 3 times larger than your draft database – all things being equal (which of course they never are – but a reasonable estimate).  So when it was 10+ times bigger this didn’t look right.  The description of the fixed bug in the KB article was:

  • When a project is deleted from the archived database, the related entities are not deleted if a backup retention policy that includes more than one version is enabled.

The term “deleted from the archive database” refers to the automatic action taken when you have a retention policy of, for example, 3 and then the administrative backup saves the 4th iteration of the project.  The first iteration of this project that was archived gets flushed out of archive database.  Before this fix only the entry in the MSP_PROJECTS table was removed – and the remaining entities referred to in the bug description are contained in the following tables in the Archive database:

MSP_ASSN_CUSTOM_FIELD_VALUES
MSP_PROJECT_FRONT_END
MSP_TASKS
MSP_TASK_BASELINES
MSP_PROJECT_RESOURCES
MSP_PROJECT_RESOURCE_BASELINES
MSP_ASSIGNMENTS
MSP_ASSIGNMENT_BASELINES
MSP_PROJECT_CALENDARS
MSP_PROJECT_CALENDARS
MSP_PROJECT_LOOKUP_TABLES
MSP_PROJECT_LOOKUP_TABLE_STRUCTURES
MSP_PROJECT_LOOKUP_TABLE_MASKS
MSP_PROJECT_CUSTOM_FIELDS
MSP_PROJECT_CUSTOM_FIELDS
MSP_TASK_CUSTOM_FIELD_VALUES
MSP_PROJ_RES_CUSTOM_FIELD_VALUES

After the fix all rows in all the tables for any newly deleted project version are correctly removed – but the old orphaned records are still there.

Assuming you have archived more versions than your retention policy (pre August CU) then you can expect there to be records in each (or most) of these tables that relate to project versions (by GUID) that no longer exist in the MSP_PROJECTS table.

The extent of the problem will depend on your use of this feature before the August CU, size of your projects in terms of tasks, assignments and the other entities in these tables.  If you have made extensive use of this feature with very large projects then this could lead to your archive database being many times bigger than it needs to be.  So how to recover this wasted space?  Not an easy answer.  The delete query that will get rid of the orphaned records is very straightforward – but before you rush to SQL Management Studio to delete them a few words of warning regarding tempdb, transaction logs and estimating the size of the problem.

The basic query required to delete from each of the tables will look like this:

USE ProjectServer_Archive

Delete from MSP_ASSN_CUSTOM_FIELD_VALUES
where NOT EXISTS (select * from MSP_PROJECTS
where MSP_PROJECTS.PROJ_UID = MSP_ASSN_CUSTOM_FIELD_VALUES.PROJ_UID)

(repeat for each table listed above)

We are looking for any rows in the table MSP_ASSN_CUSTOM_FIELD_VALUES that do not have corresponding records (matching PROJ_UIDs) in the MSP_PROJECTS table – then we delete them.  Regardless of your recovery model this will get recorded in your database transaction logs – and if you have millions of rows to delete then this will take a while and use GB of space – increasing the size of your transactions logs and also potentially your tempdb. 

WARNING! The following details and guidelines are supplied “as-is” and if you are not confident with SQL Server queries and administration then either work with your database administrator or open a support incident to get help from Microsoft.  Any large scale deletions are best carried out outside normal hours, and also at a time that does not conflict with backups.  There are certainly ways to minimize the impact – and you don’t need to clean everything up at once – and indeed you don’t need to clean it up at all if you are happy to waste a little disk space.

First thing to understand then is how many orphaned projects do I have out in my archive db – and how many projects that should be there?  The first answer can be obtained by using the query:

Select count (DISTINCT PROJ_UID) from MSP_TASKS
where NOT EXISTS (select * from MSP_PROJECTS
where MSP_PROJECTS.PROJ_UID = MSP_TASKS.PROJ_UID)

This tells us how many distinct project UIDs exist in the MSP_TASKS table that don’t correspond to “real” projects. 

And for the total real projects this is simply:

Select count (DISTINCT PROJ_UID) from MSP_PROJECTS

With these two numbers you can get an idea how much excess baggage your archive db is carrying – then we can think around the different possibilities for cleaning this up.  I played around with one of my test databases where I had around 2000 projects and 400 orphans.  The archive database was about 30GB, but probably should have been 25GB.  Not too extreme – you may have significantly more or fewer orphans and the clean up approach for each scenario may be different.

If you only have a small number of orphans – both in terms of % and magnitude then the simple delete query mentioned above may well cut it for you.  Just to give you an idea of how this works, the full deletion from all the tables on my server probably got rid of around 3 million rows across the 17 tables in about 9 hours.  My transaction log (with recovery mode set to simple) grew to 20GB and my tempd to 800MB.  The memory usage (working set) of this instance of SQL Server went to 3GB.  During this time my server (x64, Dual Proc, 4GB RAM) was unusable for anything else.  If your server does other work or has other SQL instances then this may be a good time to limit the resources so it does not slow your server down too much.

If you have a small % but it is still a large number then the deletion may be better handled using a different approach.  The reason the transaction log grows even with simple mode for recovery is that this is treated as a single transaction.  Once it is completed the used space in the transaction log is negligible and the data file can be shrunk down.

An approach that overcomes this growth in both transaction log and tempdb, but is a little slower is to use a loop to delete in a large number of smaller transactions.  This example was found at http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=62356 (thanks Kristen) but there are many good examples out there.

DECLARE @intRowCount INT
DECLARE @intErrNo INT
DECLARE @intRowsToDelete INT
DECLARE @dtLoop DATETIME
 
SELECT    @intRowCount = 1,    -- Force first iteration
    @intErrNo = 0
 
SELECT    @intRowsToDelete = COUNT(*)    -- Number of rows to be deleted
FROM    MSP_ASSN_CUSTOM_FIELD_VALUES
WHERE    NOT EXISTS (select * from MSP_PROJECTS 
 where MSP_PROJECTS.PROJ_UID = MSP_ASSN_CUSTOM_FIELD_VALUES.PROJ_UID)
 
WHILE @intRowCount > 0 AND @intErrNo = 0
BEGIN
    SELECT    @dtLoop = GetDate()
 
    SET ROWCOUNT 10000    -- number of delete rows / iteration
 
    DELETE    
    FROM    MSP_ASSN_CUSTOM_FIELD_VALUES
    WHERE    NOT EXISTS (select * from MSP_PROJECTS 
 where MSP_ASSN_CUSTOM_FIELD_VALUES.PROJ_UID = MSP_PROJECTS.PROJ_UID)
    SELECT    @intErrNo = @@ERROR, @intRowCount = @@ROWCOUNT
 
    SET ROWCOUNT 0    -- Reset batch size to "all"
 
    SELECT    @intRowsToDelete = @intRowsToDelete - @intRowCount
        
-- Debugging usage only:
PRINT 'Deleted: ' + CONVERT(varchar(20), @intRowCount) 
    + ', Elapsed: ' + CONVERT(varchar(20), DATEDIFF(Second, @dtLoop, GetDate())) + ' seconds,'
    + ' remaining to delete=' + CONVERT(varchar(20), @intRowsToDelete)
    
 
    -- WAITFOR DELAY '000:00:05'    -- 5 seconds for other users to gain access
END

Again, this example just shows the code for a single one of the 17 tables.  I have commented out the “WAITFOR” section – but if you are using this type of code in a scenario where you need to let others have some db time during the process then worth using this option.

In my sample database this approach limited transaction log usage (simple recovery mode) to 600MB and tempdb to 15MB.  Memory still peaked quite high – so you might want to set some limits.  Also it took about 50% longer to run – at around 14 hours.

If your orphans make up the bulk of your table and the data you need to keep is just a small percentage then another approach might be to copy out the stuff you need to either a temporary table or file, then truncate the table and finally copy back in the saved good stuff.  There are still some gotchas here, as although bulk inserts don’t generally generate a lot of log usage – if the table being inserted into have indexes then the re-creation gets logged – even in simple or bulk insert recovery models.  Some examples of the code you might use here, again just dealing with a single table. 

First lets look at a simple temporary table:

use Test_Archive
 
select* into MSP_ASSN_CUSTOM_FIELD_VALUES_Backup from MSP_ASSN_CUSTOM_FIELD_VALUES
where EXISTS (select * from MSP_PROJECTS 
where MSP_PROJECTS.PROJ_UID = MSP_ASSN_CUSTOM_FIELD_VALUES.PROJ_UID)
 
truncate TABLE MSP_ASSN_CUSTOM_FIELD_VALUES
 
insert into MSP_ASSN_CUSTOM_FIELD_VALUES 
Select * from MSP_ASSN_CUSTOM_FIELD_VALUES_Backup

This is one of the fastest approaches, if you have the disk space, as it uses extra space in the archive db for the temporary tables, then uses a whole load of transaction log and tempdb for the insert back into the table.  In my sample db on the largest table in terms of rows, MSP_ASSN_CUSTOM_FIELD_VALUES, it took 9 minutes to push the data out, the truncate was instant (and isn’t logged – so not much use of transaction logs up to here) then the insert took 30 minutes and used 19GB of transaction log and 1.7GB of tempdb!

The final approach was to try a similar process, but using bcp and going via a file.  My output was based on a query to get the rows I wanted to keep so the bcp statement run from a command prompt was:

bcp "select * from Test_Archive.dbo.MSP_ASSN_CUSTOM_FIELD_VALUES where EXISTS (select * from Test_Archive.dbo.MSP_PROJECTS where Test_Archive.dbo.MSP_PROJECTS.PROJ_UID = Test_Archive.dbo.MSP_ASSN_CUSTOM_FIELD_VALUES.PROJ_UID)" queryout "MSP_ASSN_CUSTOM_FIELD_VALUES.dat" -T –c

This exported my 8 million rows from this table in around 9 minutes, creating a 1.7GB file. I then truncated the table and used a single batch bcp to import:

bcp Test_Archive.dbo.MSP_ASSN_CUSTOM_FIELD_VALUES in MSP_ASSN_CUSTOM_FIELD_VALUES.dat -T –c

This took around an hour, used 19GB transaction log and 1.7GB tempdb – so similar (but slower) than the table insert.  Another approach that limits the tempdb usage and transaction log usage is to use the –b flag on the bcp to send it to the server in smaller batches.  I tried:

bcp Test_Archive.dbo.MSP_ASSN_CUSTOM_FIELD_VALUES in MSP_ASSN_CUSTOM_FIELD_VALUES.dat -T –c –b1000

and as I write this blog it has used no tempdb, (none over the default 8MB anyway) just 800MB of transaction log – but it has been running for 2 days so far and is just over half way through.  I’ll post an update if and when it finishes!

So there is no simple answer – and even more options.  You could remove indexes while importing, and then add them back later – though you’d take a hit for the index rebuild at some time.  Re-organizing the data before import could work faster in terms of indexing.  One other possibility if you don’t have anything you want to keep is to truncate all of the above tables from the archive db – including the MSP_PROJECTS one.  Or even completely re-provision your PWA using backups of your draft, published and reporting databases – but a blank archive.  You’d need some support help on this one.

Finally whichever option you choose you will then want to shrink your files and database to recover space.  This too might take some time –depending on which method you used and how much data/space is in the database.

I hope this helps you to understand if the nature of this issue and what if anything you might need to do to regain your lost space.  Certainly food for thought.  And on the subject of food – Happy Thanksgiving to my US readers who catch this posting before their holiday break (which hopefully will not be too many of you – at 4:30PM PST on 11/26!)

Technorati Tags:
Comments (11)

  1. Carlos says:

    Hi,

    I have following problem: when I assign a resource to a task for example with a 25% of max. units and then this resource reports different amongs of hours each day (for example 2h, 4h and 6h), the % of assignment changes to the max % of dedication of this resource in this task, in this case 75 % (6h). If then I replan, Project does it well with only the 25% of max units, but this value dissappears and it is difficult for the Project Manager to know which was the original value of the task assignments if the project has lot of tasks and lot of resources.

    So the question is: Is there any way to show in Project the original max. units of the assignment of a resource to a task indenpent of the reporting if the resource?

    Thanks!!

  2. James Fraser says:

    Thanks for the timely advice. Recently noticed that an archive database for a client was about twice what I expected size wise. This looks like the problem.

    The two queries you provide to show the number of "bad" vs. "good" projects need to be tweaked for SQL 2000. SQL 2000 isn’t as UID friendly as 2005.

    Here’s a version I used for those:

    Select count (DISTINCT CONVERT(NVARCHAR(100),PROJ_UID)) from MSP_TASKS

    where NOT EXISTS (select * from MSP_PROJECTS

    where MSP_PROJECTS.PROJ_UID = MSP_TASKS.PROJ_UID)

    Select count (DISTINCT CONVERT(NVARCHAR(100),PROJ_UID))

    from MSP_PROJECTS

    I’m still considering the strategy to delete these, the Archive DB is 45 GB with 6700 orphaned projects for the 3800 valid versions. Thanks for the great list of options.

    jfraser

  3. Thanks James for the SQL 2000 version of the queries.  

    The good thing with this problem is that you can take your time with the fix – and probably the looping option is a good one to go with – not fast, but gets the job done with minimal impact on tempdb and the transaction logs.  If you have the disk space and the down time available the use of a temporary tables is also worth considering.  Fast and furious and can hit the logs and tempdb hard.  I guess with 2000 you might have to tweak the queries a bit more too.

    Best regards,

    Brian

  4. jc says:

    Hi Brian,

    Thanks for the insight. Was archive DB designed to store projects forever even project gets deleted in draft and publish DBs? My environment was set to keep 5 versions and has grown over 120G. I am looking for options to manage the space and hope can claim space back once project gets deleted. Any advices? Thanks.

  5. Hi JC,

    The design is intended to allow projects to be deleted from the active system (Draft and Published) to suit the needs of customers for data access and performance – while allowing old projects to be retrieved at a later date from the archives.  It also obviosuly gives an "on-line" backup facility where you can recover projects if something bad happened to the active one.  

    There is also an option under Server Settings, Delete Enterprise Objects that allows deletion from the Archived Database Only – which will include projects that may no longer be in the other databases. This sounds like the option you are looking for – but if you have been using the archive very actively this will still not deal with the orphaned entities mentioned in my blog.

    Best regards,

    Brian

  6. Malik Al-Shayeb says:

    Hi Brian

    I havea simple question which I could not find an answer to it on Google or on the blogs ,

    I have noticed that in the PWA permissions there is a permission called Clean Up Project server Data Base and as I was reading about it, all the topics mentioned a link in the Server settings which I can’t find !!.

    is there could you please instruct me where to find it and if it’s not available how can I perform a data base clean up.

  7. Hi Malik,

    I hadn’t noticed that permission – but  it is talking about the option to delete enterprise objects such as projects and resources.  There isn’t any option that just does background db housekeeping of any sort.  You will see if you disable the option that the Delete Enterprise Objects menu item is removed from Server Settings for all users.

    Best regards,

    Brian.

  8. You may also be able to squeeze some more space out of MSP_Links using the same query structure. I removed nearly 800MB from mine.

    Thx for paving the way – this post saved me a lot of heartache.

    -Dave-

  9. Thanks for the feedback David and the additional table we can clean.  I’ll update the article when I get a better connection – and we have also updated our internal docs as I also saw the question coming through our support team :).

    Thanks again,

    Brian

  10. Is your PWA Archive database too large? You may have a problem if your archive DB is significantly larger

  11. Rene Alvarez says:

    Brian. I’m seeing weird stuff in a Project Server 2007 deployment. It’s a small farm (1 app server, 1 sql 2005 server). Everything was ok until we ran a load test.

    We took 6 project managers to publish a 2,500 task project each with an average of 15 resoruces per task (something like 270,000 assignments).

    The server took like half an hour to process all 6 projects and we didn’t see any errors during the process. However the log files for the databases grew like grass and ate all the space in the disks.

    Then we started a shrinking process on the databases and no errors, but the reporting database didn’t reduce its size. Now it’s growing extremely fast (like 1MB everytime you hit refresh).

    I don’t like the option to reprovision the PWA site. Do you have any suggestions on how can we reduce the reporting database size?

    I tried also to rebuild it (with the backup-restore process). It took half an our without errors but still growing like hell. It grew 20GB in a matter of 1 hour.

    Please help!

Skip to main content