Take Caution and Consider well before using RADL or LOGREBUILDING

Article
10/06/2013

The purpose of this post is to discourage the use of RADL or LOGREBUILDING when corruption is encountered without understanding the impact. Unfortunate situations have been encountered where Database Administrators (or “experts” assisting them) have chosen to use them without grasping the effects of that decision.

Sometimes the BACKUP strategy and Backup validation process are insufficient to support your application when corruption is encountered. Such situations continue to arise from time to time. Either because the backed up images are corrupt themselves, or the restore of the backup would imply too great of a loss of data. In such situations, those with responsibility for the data are looking for any/all solutions to correct their
corrupt database situation. Unfortunately, those individuals don’t always understand the consistency needs of the application and can wrongly turn to solutions that work to simply eliminate the physical corruption of the database (like the two tools mentioned here), regardless of the consequences to the application consistency needs.

RADL and LOGREBUILDING are two tools you might encounter when trying to eliminate physical corruption in SQL Server databases. They are sometimes the only tools that will work. When used appropriately under the guidance of professionals, they have potential to eliminate some types of corruption. But they should not be used lightly and should not be used without engaging trained professionals to confirm their appropriate use for any specific corruption handling process. You might think of them as something akin to a chain saw when considering a home carpentry renovation task. In some rare situations they are the ideal tool.
Sometimes they are the only tool that can help. But for most situations, and if at all possible otherwise, try to avoid using them on databases where application consistency is something that is cared about.

So what is meant by the two terms? RADL refers to the phrase “Repair Allow Data Loss” for a parameter that is available through the SQL Server database consistency checking utility (DBCC) to remove physical corruption identified in a database. LOGREBUILDING refers to varying methods for rebuilding the transaction log in a database for the purpose of eliminating signs of corruption there. Transaction log corruption typically prevents recovery from completing successfully and the database from being available. Both RADL and LOGBUILDING work to restore the physical consistency by eliminating portions of the database. The details
of both are intentionally not discussed here to add an additional layer of indirection for their use.

If you actually have physical corruption in a database that you are trying to eliminate, then contact a support expert that is knowledgeable on their use and they can confirm with you the following: Your situation is appropriate for the use of either of them. Most commonly this demands that you have no viable backup that can be used for restore, for any number of reasons.

You understand that there is no guarantee that their use will lead to a successful result (for corruption is a gnarly beast with limitless and untestable variances to be able to reliably predict outcomes)
That you have contingency plans in place (i.e. that you have appropriately backed up everything before attempting their use)
That you acknowledge that there can, and usually WILL, be loss of data
That you understand the relationship between Relational DataBase Management System (RDBMS) consistency (both physical and logical) and Application consistency and how elimination of RDBMS inconsistency {a.k.a. physical corruption} may introduce Application inconsistency and that only Application or Data owners external to the RDBMS will be able to help you assess, identify, and resolve those logical inconsistencies. {More information on this can be found HERE }

DBCC CheckDB is a great utility. It is the best tool available to assess the physical and logical consistency of a SQL Server database, from the perspective of the SQL Server RDBMS.

You can find it documented at these locations:

For SQL Server 2012 - HERE

For SQL Server 2008r2 - HERE

For SQL Server 2008 - HERE

For SQL Server 2005 - HERE

There are three kinds of consistency that applications expect for proper functioning of an RDBMS: Physical and logical consistency in the database, and Application consistency outside of the control of the database. As mentioned above, more information on the layers of consistency can be found HERE. But specifically, in regards to this blog, the point is that SQL Server functionality can only check the first two of these. SQL Server cannot check Application consistency when it is not aware of that information; whether it is metadata about the data or relationships between the data elements.

Physical corruption happens. It happens to all Relational Database Management Systems and on all hardware providers. In rough terms, what does RADL actually do? It finds corruption and makes a ‘best guess’ on what it should do to correct it. It does this in most cases by simply taking out the inconsistency, whether it is a value, or a row, or a page, or an allocation chunk of pages. Think of it as removing chunks of the database. Sometimes, that can be fine. For instance, when the data isn’t mission critical, or where the consistency can be recreated and confirmed somehow by other means. But other times it is extremely bad, for it can undermine the entire confidence in the RDBMS functioning as a reliable datastore and invalidate the credibility of the applications that rely upon that data. It is usually effective whenever it is used (sometimes repeatedly) to eliminate all signs of corruption in a database. But you should always expect, that whenever RADL or LOGREBUILDING are used, that data WILL be lost. And in most cases it will.

Like injuries to your own body, there are limitless variations of corruption: number of files and pages affected, metadata about those pages, bits inside of those pages. But the buckets of corruption types can be
narrowed down to similar classifications for handling. But take note, the assessment of these corrupt states are only best guesses by the DBCC CHECKDB utility to provide insight into what might be going on and sometimes the only a tip of an iceberg can be observed. Just because CHECKDB only shows certain errors doesn’t mean that you can conclude with 100% confidence that only that corruption is present in the database. And this is why experienced database experts are usually restrained in their commitment ahead of time on the following things:

The true and complete extent of a corrupt state and all data and objects involved. It is not always fully known. And sometimes it is not fully knowable. And sometimes the time to understand the full
extent of the corruption exceeds practical limitations.
That a definitive pathway to correction is obtainable. This is a corollary of the preceding. And why one should always be suspicious if any one claims they completely grasp all the details to resolve a corrupt
situation. Slightly different symptoms can give rise to the same final error condition state.
What data will be lost on any cleanup or repair activity. This is the critical piece, but similarly is very hard to state definitively. Thus the importance of contingency planning or backing up before proceeding forward.

Operational Log Books are a good thing. It is valuable to have a persistent record of all changes that are made to a RDBMS and the operating system and hardware that support it. The use of RADL or LOGREBUILDING on a SQL Server database should be recorded in a persistent way for all to know. That can be valuable information for the data consumers, the administrators, and support professionals when working other issues involving that particular database. Their use is not something that should be hidden.

It is impossible to test and develop utilities to address the exact impact of each and every kind of corruption. If all error conditions could be deterministically known, then all corruptions could be resolved and even
prevented. There are High Availability (HA) solutions, however, that can greatly reduce the impact of physical corruption. In the SQL Server technology arena these technologies are all helpful:

Windows Azure – SQL Database

Always On

Database Mirroring

Log Shipping

In each case there are at least two separate instances of the physical files underneath the HA solution. And physical corruption rooted below the operating system is not transferred over into the HA solution. That is, if you get a corrupt file underneath a primary database, then the Always On, or Mirrorred, or Log Shipped secondary HA database will NOT have that corruption transferred to it. With Windows Azure – SQL Database, the corrupt state will never even likely appear. Windows Azure – SQL Database is a RDBMS solution with HA baked right into it. With SQL Server mirroring and Always On there is a feature such that when a corrupt page is detected ON the primary, that it automatically loads back an uncorrupt image of the page from the secondary. You can find more information on that feature HERE.

Without any High Availability solution the response to database corruption can be found in the steps for General SQL Database repair.

The use of RADL or LOGREBUILDING on a SQL Server database almost always implies insufficiency in Disaster Recovery Planning and validation efforts. And also missed exploitation of SQL Server’s various High Availability features. Usually this is as a result of a corruption handling event which are almost always stressful and unpleasant and have much visibility. Leverage the power of that pain to draw attention to the need for increased resources to consider HA solutions, version upgrades, and Disaster Recovery Planning and Validation efforts. This should serve to better avoid additional problems in the future.

RECOMMENDED ACTIONS

1) Test your database backups and restore processes to know they work and will satisfy operational needs if corruption should ever be encountered, to help avoid the need to ever have to use RADL or LOGREBUILDING.

2) Understand RADL/LOGREBUILDING impact. Don’t use RADL or LOGREBUILDING without understanding the impact.

3) Know the Application needs. Grasp the application consistency expectations of applications and users that use the data in the database. This is important if possible,
in order to better understand the impact should RADL or LOGREBUILDING unfortunately need to be used.

4) Contact support before using RADL/LOGREBUILDING. The use of RADL or LOGREBUILDING is risky. When using either of them, you need to engage support professionals to confirm that fact.

5) Backup before using RADL/LOGREBUILDING. Should you ever be given direction to use RADL or LAGREBUILDING always make contingency plans for the possibility that they might make things worse. Always work to backup the files (both online and offline) if possible {even though they have corruption in them} before moving forward with their use. And back them up to a separate disk subsystem.

6) Record the use of RADL/LOGBUILDING If you do use either of them, always record that fact clearly for others to be aware of. You should have a system log book, or such, where critical events can be logged for
awareness by future data users, administrators, and support professionals.

7) Leverage their use for more DRPV resources. Any time RADL or LOGREBUILDING is ever used, that situation should be provided upward as justification for more resources to be devoted to Disaster
Recovery Plan Validation (DRPV). Their use implies a deficient Disaster Recovery Plan. An appropriate backup strategy should be in place to avoid the need to ever use either of these two tools.

8) Consider SQL Server High Availability offerings to help reduce the impact of future database corruptions so RADL and LOGREBUILDING won’t be needed.

Take Caution and Consider well before using RADL or LOGREBUILDING

Additional resources