‘No Disks suitable for cluster disks’: one more possible cause

Today in the lab I was working on a test 3-node cluster. I was using Hyper-V for this. We normally use the iSCSI Software Target for testing, so I had set that up on a central VM and then published the target to all the VMs.

The disks were seen in the disk management of Node1, and I brought them online, partitioned and formatted them. However when I used Cluster Management to add them to the Storage group, it complained:

‘No Disks suitable for cluster disks were found, For diagnostic information about disks available run the Validate config wizard’

I did run the cluster validation (storage only) and got no further inputs. Shortly after some troubleshooting I realized I had forgotten to configure the volumes at the iSCSI initiator on each node. The easiest thing to do for this is to click the ‘Auto-configure’ button:

image

Due disclaimer: This is just ONE, but not the ONLY reason for the above error message! This is somewhat specific to the iSCSI initiator, so only check it if you are using iSCSI volumes in your cluster and if you get the above message.

DB Mirroring Tips

 

Here are some essential things to consider when trying to tune a DB mirroring setup. They revolve round the following concepts:

  • The effect of Virtual Log Files (VLFs) on recovery and therefore failover times
  • The interplay (or lack thereof) between mirroring and some other SQL Server engine features
  • Network speed and issues

 

  1. Be on the latest available Service Pack and Cumulative Update for SQL 2005 or 2008. There are many issues related to mirroring which are fixed in these updates. Some of the key ones which I have seen are given below.
  • The remedial steps for this set of problems revolves round fixing the Virtual Log File Fragmentation issues, such as what is described by Kimberly Tripp. The relevant blog posts are:
  • You should also be aware of the article in http://support.microsoft.com/kb/937531 (The shrink operation is not duplicated on the mirror database when you use database mirroring in SQL Server 2005)
  • If you are using MSDTC transactions with a database which is mirrored, please consider fact that it is not officially supported, as the below articles.
  • For the networking side, the key things to consider are given below.

I will keep adding to this post as I receive additional information. In the meantime, I hope you stay informed and therefore hopefully avoid known issues with mirroring.

If you like this post, please take a few seconds to leave a comment and do rate the post!

Rolling upgrade of 1 instance from 2-node, 2-instance sql failover cluster to slipstreamed sql 2008 SP1

Now that SQL Server 2008 SP1 is released (download link here), it is time to test a very interesting capability: Service Pack slipstreaming. That means that you can save some time by doing an ‘integrated’ installation of SQL 2008 which includes the SP1 binaries and avoid having to apply SP1 later on. Windows had this capability for some time but only now is it officially supported for SQL Server 2008.

Scenario

This walk-through was conducted on a 2-node, 2-SQL instance Windows 2003 x86 cluster. The 2 instances were:

  • VIRSQL2K – default instance of SQL Server 2000 SP4 (2039 build)
  • VIRSQL2K5SQL2K5 – named instance of SQL Server 2005 SP2 (3042 build)

Objective and Tools

We want to upgrade the SQL 2000 instance to SQL 2008, in-place. Currently we will not be touching the SQL 2005 instance (maybe I will do another blog post for that if there are any specific observations.) The objective was to minimize downtime of either instance while performing an in-place upgrade to SQL Server 2008 SP1. To achieve the objective, we use 2 new features which are available in SQL Server 2008:

Step-by-step

1. Pre-requisites

  • Install the Windows Installer 4.5 (found on SQL 2008 DVD) prior to your actual downtime. Do not reboot at this stage.
  • Install the hotfix for KB article 937444 (from http://support.microsoft.com/kb/937444). The SQL issue pertaining to this OS hotfix is documented in http://support.microsoft.com/kb/955828
  • Review all the cluster specific best practices documented in my previous post.
  • Install .NET Framework 3.5 SP1 (found on SQL 2008 DVD) prior to your actual downtime.
  • At a suitable window (you may couple this with your actual upgrade downtime) you need to reboot to complete the process.
  • Do these steps on both nodes. Take care to stagger the reboots mentioned above to avoid total non-availability of your SQL instances.

1. Prepare the merged (slipstreamed) media

In my case I was just dealing with x86 instances, so I optimized some steps from the blogs above

  • From SQL 2008 DVD, copy the x86 subfolder and the files from the root (setup.exe etc.), I did not copy the IA64 and x64 folders. I copied these to c:sql2008 on my local nodes
  • Extract the SQL 2008 SP1 contents to c:sql2008pcu on each of the nodes. To do this I used
    • SQLServer2008SP1-KB968369-x86-ENU.exe /x:c:sql2008pcu /q
  • I then copied setup.exe and setup.rll from the c:sql2008pcu folder to c:sql2008, overwriting the older ones
  • I then copied the FILEs (NOT the sub-folders, and also ignore the Microsoft.SQL.Chainer.PackageData.dll from c:sql2008pcux86) of the c:sql2008pcux86 folder, into the c:sql2008x86 folder.
    • If you accidentally copied Microsoft.SQL.Chainer.PackageData.dll (like I did once) you will receive an error message when you launch setup: ‘specified action LandingPage is not supported for the sql server patching operation’. (In fact the blog post from Peter also talks about this error.)
  • Peter’s original blog post and also the current one from Bob Ward mention copying the sqlsupport.msi files over. I did do that, though Peter mentioned it is no longer required. Copy the sqlsupport.msi from c:sql2008pcux86setup1033 to c:sql2008x86setup (note that there is no 1033 in the destination, that is not a typo.) You can overwrite the older version and you may note that the older version was actually larger than the new one!
  • At this time you are ready to roll.

2. Run setup on passive node

I ran setup from the command prompt, specifying the PCUSource by hand:

  • cd c:sql2008
  • setup.exe /Action=UPGRADE /PCUSource=c:sql2008pcu

Initially I select the SQL 2000 instance, which is active on the other node:

image

In the cluster security policy screen, you need to enter the service account domain group names. Now something related to this step caused a problem for later on, I will explain it later:

image

At the Upgrade Rules screen in setup, you can verify that we are slipstreaming:

image

Also in Upgrade Rules you might be warned that any other SQL instances active on this node will be restarted due to cluster resource DLL update. This is very important if you have not planned on those other instances being restarted. So you should note it and factor it into your upgrade plans:

image

Later the cluster upgrade report clearly tells us it is going to upgrade just this (passive) node:

image

Once again we check we are slipstreaming:

image

The rest of setup was fairly uneventful and at the end I checked the sqlservr.exe version on this (passive) node:

image

If you check the old SQL 2000 installation folder, you will note that the binaries and other folders have been cleaned up.

3. Change group for failed Full-Text Search resource

At this stage you may notice that the full-text search service cluster resource is either stopped or plain does not even show up. In my case I received the following message in cluster admin. I just moved out the fulltext resource into another cluster group for the moment.

image

4. Failover and watch the upgrade of system databases happen

At this stage, you are ready to failover the SQL 2000 instance from (let’s say) NodeA (which is still running SQL 2000 and has not been touched by upgrade process) to the NodeB (on which the SQL 2000 service has been upgraded, binaries-only, to SQL 2008). What is supposed to happen after the move group, is that the SQL 2000 instance is temporarily unavailable, SQL 2008 service starts up on NodeB, it then upgrades the system databases to SQL 2008 ‘format’ and also upgrades the user databases to SQL 2008 ‘format’. This is technically a point of ‘no-return’ from which roll back means reinstalling SQL 2000 and restoring from backups.

In my case, things did not go well initially, and on failover from NodeA to NodeB, SQL 2008 instance failed to come online on NodeB, restarted and couple of times and then flipped back to NodeA. Checking the event log showed this message:

initerrlog: Could not open error log file ”. Operating system error = 3(The system cannot find the path specified.).

I verified the startup parameters using Configuration Manager on NodeB and also double-checked using Enterprise Manager on NodeA, and they were correct. The only other possibility was a permissions issue, and I used Process Monitor from SysInternals to track down the issue. It turned out that my SQL 2008 service account (which was the same as what the 2000 version used) did not have access to a registry key. Then it dawned on me that the domain group membership I referred to previously, was not granted. To fix this was easy, used AD Users and Computers and added my service account into this group:

image

After this was fixed, I was able to move group again to NodeB at which stage the database upgrades happened and the instance was online. Here are some random snippets from the SQL errorlog at the time of upgrade, just to show you what happened under the hood:

  • 2009-04-11 04:08:06.43 Server Microsoft SQL Server 2008 (SP1) – 10.0.2531.0 (Intel X86)
  • 2009-04-11 04:08:17.14 spid7s Database ‘master’ running the upgrade step from version 654 to version 655.
  • 2009-04-11 04:08:18.84 spid8s Converting database ‘model’ from version 539 to the current version 655.
  • 2009-04-11 04:08:33.61 spid7s Database ‘master’ is upgrading script ‘sqlagent90_msdb_upgrade.sql’ from level 0 to level 2.
  • 2009-04-11 04:09:26.75 spid7s Recovery is complete.  This is an informational message only.  No user action is required.

Total time from start to finish: 1:01 minutes. That means that my instance was potentially unavailable just for this long. That is pretty impressive!

5. Run setup on previously active node

The process on NodeA was pretty much the same as described above. All of the steps described above are relevant, except for #4 above, because the databases are already upgraded once. The only thing you may come across if you have not handled the full-text search resource issue would be an error in setup: ‘The device is not ready’:

image

I am not 100% sure, but it seems related to the fact that full text was not moved out to another group. Moving it out fixed the above setup error on NodeA.

6. Verify setup

Just a casual check here using Management Studio (yes, the management tools get upgraded from Query Analyzer / Enterprise Manager to SSMS) and verify that we are indeed on SQL 2008 SP1:

image

Key Learnings

  • Read the blog posts mentioned above very carefully while preparing the merged drop
  • Watch out for the exact DLL to be excluded (Microsoft.SQL.Chainer.PackageData.dll).
  • Pre-create the domain group for SQL Server 2008 service account in the AD (only required if you are on Windows 2003)
  • Add the SQL 2000 service account to the SQL 2008 domain group before you run the setup program
  • Change group for the failed Full-Text Search resource from the SQL group to any other group
  • Due to resource DLL update, be prepared for a restart of any other SQL instance on the node you are upgrading (such as in a N-instance cluster)

I hope you enjoyed this blog post and if you do like it, please do take a second to rate this post and also leave a few comments if you can. See you later!

SQL Server 2000 / 2005 Failover Cluster Pre-installation Checklist

Here’s a quick checklist before you install an instance of SQL Server on a failover cluster; or any service pack or hotfix for an instance. Following this list should eliminate 90% of the most common causes of setup failure.

Please note that this checklist does not cover all the ‘normal’ requirements for installation of a SQL Server failover cluster instance. For that you may refer to this excellent white paper on the subject. I also recommend that you view my colleague Uttam Parui’s webcast as well prior to attempting any installations on Windows Server 2008. He covers many issues specific to Windows 2008 as well.

Common points for all SQL 200x versions

Checklist item Description
1. Name resolution Verify network name resolution of physical nodes using machine name and FQDN (fully qualified DNS name) using PING
  Verify network name resolution of SQL Server virtual network name using PING of network name and FQDN
2. Verify admin shares From each node, try to access the C$ and ADMIN$ shares of each of the other nodes.
3. Verify network binding order
  • Click Start, point to Settings, click Control Panel, and then double-click Network and Dial-up Connections.
  • On the Advanced menu, click Advanced Settings.
  • In the Connections box, make sure that your bindings are in the following order, and then click OK:
    • External public network
    • Internal private network (Heartbeat)
    • [Remote Access Connections]

The idea is to have the public network as the highest in the binding order.

4. Verify the cluster network priority and ensure that private network is for internal communications only.
  • Start Cluster Administrator.
  • Click the cluster name at the root of Administrator. On the File menu, click Properties.
  • On the Network Priority tab, verify that the private network is listed at the top. If it is not, use the Move Upbutton to increase its priority.
  • Click the private network, and then click Properties.
  • Click to select the Enable this network for cluster usecheck box.
  • Click Internal cluster communications only (private Network).

The idea here is to make sure that the private network only carries internal cluster traffic and that it is listed as the #1 preferred network to carry that traffic.

Refer to http://support.microsoft.com/kb/258750 for this and the previous point.

5. Ensure that all the nodes are in perfect time synchronization with your DC Using NET TIME command you can check (and optionally set) the local time on each node. Failure to synchronize the time on the nodes will cause Kerberos authentication failures which in some cases can cause setup to fail.
6. Verify that there are no ‘special’ characters in resource names Avoid the following special characters in the resource names:

  • >
  • <
  • &
  • (, )
  • +
  • Characters with a dieresis, a tilde, or a space character.

Reference: http://support.microsoft.com/kb/302491

7. Ensure all the physical disk resources in the cluster are online Though this issue primarily has been observed for SQL Server 2005, it may be a good idea to check it for all versions.

Reference: http://blogs.msdn.com/sqlserverfaq/archive/2008/12/17/sql-server-2005-setup-failing-to-install-on-cluster-with-unexpected-failure-while-performing-the-scc-checks.aspx

8. Remote Registry Ensure that the remote registry service is started
9. Service Account User Rights Assignment Ensure that your SQL Server service account has the following privileges:

  • Act as part of the operating system (Microsoft Windows 2000 only
  • Adjust memory quotas for a process
  • Bypass traverse checking
  • Log on as a batch job
  • Log on as a service
  • Replace a process-level token
  • Lock Pages in Memory (Only for SQL Server Service account to use AWE on 32-bit)

Reference: http://support.microsoft.com/kb/283811, http://support.microsoft.com/kb/840219 and http://support.microsoft.com/kb/254321

10. Service Account should be member of local administrators Unfortunately the SQL Server 2000 program mandates this pre-requisite. But if you carefully follow the steps to assign the user rights, NTFS permissions and registry permissions as per the above mentioned KB articles, then you can change the account later on post setup.

SQL Server 2005 specific points

Checklist item Description
1. Verify required services are started The following services must be started on both nodes:

  • Task Scheduler
  • Remote Registry
  • COM+ System Application
  • Server
  • Cryptographic services
2. MSDTC cluster resource requirement If you are installing anything other than the database engine component on Windows Server 2003, then you must create the MSDTC resource in the cluster prior to running setup.

Reference: http://support.microsoft.com/kb/301600

3. Folder structure for installation files If you are installing from a network dump, make sure that the Servers and Tools sub-folders are preserved correctly. For example in the D:SQLServer2005 folder, create the following two subfolders:

  • Servers
  • Tools
4. Remote desktop connections on passive nodes. Except for the node from where the installation is being run, please log off from any Remote Desktop or Terminal Services sessions on all the other nodes. It may be a good idea to log off from the console session as well.
5. Non-administrator Service Account The SQL Server 2005 service account need not be a member of local administrators. As long as it has the user rights (from point #9 in the previous table) assigned to it, it will work without any local administrative privileges. The SQL 2005 setup program automatically grants the necessary permissions and user rights to the service accounts chosen.

Reference: http://support.microsoft.com/kb/283811

Please let me know if you liked this article and if it was useful for your purposes, by rating the article and optionally providing any comments. If you have any questions as well please leave them as comments; I will respond to them!

Howto: Have a Clustered instance of SQL Server listen on multiple virtual IP addresses

Recently a colleague asked me if there were any changes specific to Windows Server 2008 which allow SQL Server (clustered instance) to listen on multiple subnets. There was only one change in Windows 2008 to my knowledge (which I have detailed in this post later on) but that had nothing specific to do with SQL listening on different subnets. I did some testing and have posted the results below.


Network configuration


Assuming the following network configuration on the local nodes, there are 3 networks on this cluster:



  • Cluster_Public: subnet is 255.255.255.0 (used only for mixed communications in the cluster)
  • Second_Public: subnet is 255.255.0.0 (used only for public communications in the cluster)
  • Cluster_Private: subnet is 255.0.0.0 (used only for private communications in the cluster)

clip_image001


SQL Server 2000


This version had the capability to bind to multiple virtual IP addresses. When setting up SQL Server, on the following page, you can specify both the IP addresses. The same page can also be subsequently accessed by re-running the setup program and selecting the Advanced option to Maintain Failover Clustering.


clip_image002

Assuming that setup succeeds, you can check the SQL Server error log (shown below) and verify that the SQL instance is successfully listening on both virtual IPs:

clip_image003

The key to this is actually that the SQL Server resource in the cluster is now dependent on the 2 VIPs:

clip_image004

However, due to a limitation in SQL 2000, we cannot directly do this in the cluster administrator, you always have to use the setup program to do the same. As we will see later on the limitation no longer exists in SQL 2005 or 2008.

SQL Server 2005 / 2008

Things are different and easier in these versions. One option is at setup time, if you want to configure SQL Server to listen on multiple virtual IPs, you need to specify them in the setup program:

clip_image005

However, later on if you want to maintain the cluster and ADD a new virtual IP, all you need to do to have SQL listen on multiple Virtual IP addresses is to follow the steps below:


  • Take the SQL Server network name offline
  • Add an additional  IP resource with the new (additional) VIP for SQL to listen on
  • Add the new IP resource as an additional dependency for the sql network name resource
  • Bring the SQL Server resource online, the errorlog should show sql listening on the 2 IPs:

clip_image006

Windows 2008 cluster, multiple subnets, ‘OR’ dependencies and SQL Server

So, with all this background, what has changed in Windows 2008? Well, it is a question of dependencies and how they affect the dependent resource (in this case SQL Server.) By default, in Windows 2000 and 2003, the failure of any one of multiple virtual IP address resource dependencies for SQL Server, will cause the SQL Server resource to restart. This is because the dependencies by default work in a ‘AND’ mode.

In Windows 2008, there is now support for each node to be on a different subnet (see this MVP article for a nice explanation) due to which support for an OR dependency has been introduced. However, for SQL Server to co-exist in this environment, both the subnets must be up for SQL Server to function correctly. I will cover this topic (SQL 2005 / 2008 on Windows Server 2008 with nodes in different subnets) in a later post.

Concurrency of Full, Differential and Log backups on the same database

NOTE: As of Dec 2020, the information in this blog post should be considered out-of-date, and provided for historical reference only.

Overview

Many a time, we are asked by customers about the effect of overlapping backup schedules: for example, whether they can backup the transaction log at the same time that a long-running complete backup of the database is taking place.  Books Online does talk about the concurrency, but does not cover all versions or combinations.

Here is a short test case to illustrate the concepts involved. At the end of the post, I have provided a summary of what works in what version.

Test Case

I ran the 3 types of backups on a SQL 2005 / 2008 database at the same time:

  • COMPLETE a.k.a. FULL (backup database mydb to disk = ‘c:tempmydb.bak’ with init)
  • DIFFERENTIAL (backup database mydb to disk = ‘c:tempmydb_diff.bak’ with differential, init)
  • LOG (backup log mydb to disk = ‘c:tempmydb.trn’ with init)

Here is the output of each command for verbose reference. Also included immediately below is a snapshot from sysprocesses showing the blocking of the DIFFERENTIAL backup and the concurrent execution of the LOG backup.

spid blocked waittype lastwaittype waitresource cmd
61 0 0x0062 ASYNC_IO_COMPLETION BACKUP DATABASE
61 0 0x0086 BACKUPBUFFER BACKUP DATABASE
61 0 0x0087 BACKUPIO BACKUP DATABASE
62 61 0x0004 LCK_M_U DB: 16 [BULKOP_BACKUP_DB] BACKUP DATABASE (this is the differential backup)
68 0 0x0087 BACKUPIO BACKUP LOG
68 0 0x0087 BACKUPIO BACKUP LOG

Output from FULL backup

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Processed 17512 pages for database ‘mydb’, file ‘mydbData’ on file 1.

Processed 1 pages for database ‘mydb’, file ‘mydbLog’ on file 1.

BACKUP DATABASE successfully processed 17513 pages in 18.986 seconds (7.206 MB/sec).

SQL Server Execution Times:

CPU time = 70 ms,  elapsed time = 20256 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Output from DIFFERENTIAL backup

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

Processed 32 pages for database ‘mydb’, file ‘mydbData’ on file 1.

Processed 1 pages for database ‘mydb’, file ‘mydbLog’ on file 1.

BACKUP DATABASE WITH DIFFERENTIAL successfully processed 33 pages in 0.321 seconds (0.801 MB/sec). <– actual time for the differential backup to complete

SQL Server Execution Times:

CPU time = 51 ms,  elapsed time = 19777 ms.   <– total time to execute the command, including wait and runnable time

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Note that the time taken by the command is just 0.321 seconds but the elapsed time is 19.777 seconds, which clearly demonstrates the blocking.

Output from LOG backup

SQL Server parse and compile time:

   CPU time = 0 ms, elapsed time = 0 ms.

Processed 3 pages for database ‘mydb’, file ‘mydbLog’ on file 1.

BACKUP LOG successfully processed 3 pages in 7.958 seconds (0.002 MB/sec).

SQL Server Execution Times:

CPU time = 60 ms,  elapsed time = 15106 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Time taken by command is around 52% of time elapsed, which is probably due to the fact that we are on a single-CPU system and the task had to wait for runnable time on the CPU.

Summary

Here is the summary of the testing I conducted. I hope the summary is useful to all of you.

What is requested?

What is already running? (rows below)

Complete

Differential

Log

Complete backup is running

Blocks in all versions *

Blocked in all versions *

Concurrently allowed in SQL 2005 / 2008. Not allowed in 2000.

Differential backup is running

Blocks in all versions *

Blocked in all versions *

Concurrently allowed in SQL 2005 / 2008. Not allowed in 2000.

Log backup already is running

Concurrently allowed in SQL 2005 / 2008.

Concurrently allowed in SQL 2005 / 2008.

Blocked in all versions *

* Note: wait type of the blocked process will be LCK_M_U with a waittype of BULKOP_BACKUP_DB