SQL Server 2005 Service Pack 2 failing to install due to an error in Do_sqlGroupMember custom action

One colleague reported to me that one of his customers was getting the following error message when trying to install SQL Server 2005 Service Pack 2:

Func Name='RestoreSetupParams'>
Failed to find installation media path due to datastore exception
in FindSetupFolder()
MSI (s) (C0!C0) [12:50:39:060]: Note: 1: 2203 2: C:\WINDOWS\system32\Setup\SqlRun.msi 3: -2147287038
Loaded DLL:
C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\xmlrw.dll
Version:
2.0.3609.0
Failed to find installation media path due to datastore exception
in FindSetupFolder()
Loaded DLL:
C:\Program Files\Microsoft SQL Server\90\Setup Bootstrap\sqlsval.dll
Version:
2005.90.3042.0
Failed to find installation media path due to datastore exception
in FindSetupFolder()
.
.
.
Doing Action: Do_sqlGroupMember
PerfTime Start: Do_sqlGroupMember : Sat Dec 15 12:51:43 2007
<Func Name='Do_sqlGroupMember'>
Local group CLOGIC\SQL Services doesn't exist
Failure adding user _svcSQL@cl.prvopen.mycorporation.com to local group CLOGIC\SQL Services (2221)
Error Code: 0x800708ad (2221)
Windows Error Text: The user name could not be found.
Source File Name: sqlca\sqlsecurityca.cpp
Compiler Timestamp: Sat Oct 7 09:43:41 2006
Function Name: Do_sqlGroupMember
Source Line Number: 1132

So I went to the source code of our Do_sqlGroupMember custom action, and found that this part of code does the following:

  1. If the computer from where setup is running is not a Domain Controller, it calls NetGetAnyDCName, to try to find the name of one DC on which the security group exists. If none is found, or if the node itself is a DC, one local variable which stores the name of the DC is set to NULL.
  2. Later, we call NetLocalGroupAddMembers API passing it the DC Name we obtained in previous step, the group name (having removed everything that was before the backslash, so it would be "SQL Services" in your case), and the user name you're trying to add to that hypothetically existing local group ( _svcSQL@cl.prvopen.mycorporation.com in your case). If this call to NetLocalGroupAddMembers fails with ERROR_NO_SUCH_ALIAS, it means there's no Local group with such name, so we log the following in the setup log file "Local group %s doesn't exist" (which in your case looks like "Local group CLOGIC\SQL Services doesn't exist"). But this message is not really an error, rather it's an informational message.
  3. If the call to NetLocalGroupAddMembers failed with ERROR_NO_SUCH_ALIAS, then we try to add the user to the group as if the group was Global. To do so, we call the NetGroupAddUser API. And that's the one which is failing with NERR_UserNotFound (2221 == The user name could not be found).

Now, reading through the documentation of that API, it explicitly mentions this:

User account names are limited to 20 characters and group names are limited to 256 characters. In addition, account names cannot be terminated by a period and they cannot include commas or any of the following printable characters: ", /, \, [, ], :, |, <, >, +, =, ;, ?, *. Names also cannot include characters in the range 1-31, which are nonprintable.

In your case, _svcSQL@cl.prvopen.mycorporation.com is 36 characters in length, and that's the reason why it fails with that error, because the API is truncating the user name and not finding it.

The user name is taken from the attributes of the SQL Server Service, from the node from where Service Pack 2 setup is run.

We've reviewed the account under which the SQL Server Service was running for this instance, and it was actually set as _svcSQL@cl.prvopen.mycorporation.com. But what was more surprising was that on the other node, it was set using the expected format used by SQL Server Setup when it creates the services in the first place (i.e. CLOGIC\_svcSQL).

I have no way to explain who and when that change was made, but this is definitely something you should investigate, between the people who operates these servers, so that you avoid doing this again in the future.

So, suggested action plan was: change that service account to its correct/original format and retry the installation of Service Pack 2.

After following that action plan, the installation of Service Pack 2 went through with no errors. While we tried to change the service account, we kind of found what could have been the cause of that change. In customer environment, when they tried to set the service account from the Services console (services.msc), if they used the AD Object Browser window, and typed CLOGIC\_svcSQL and then clicked on "Check Names", it resolved it to its fully qualified name, as in _svcSQL@cl.prvopen.mycorporation.com. So, it looks somebody did that operation in the past in customer's environment, and that subtle difference in the format used to name the service account makes SQL Server's Service Pack code fail, because it doesn't expect it to be expressed that way.

The reason why Windows resolves the account name in one or another format I don't know and didn't have the time during that incident to continue investigating. So this, I leave it open for someone with more experience in Windows AD than me, explains in the comments section.

Thanks and enjoy your day!