Dirty Records Recap
Most SQL Devs and DBAs people know what a “Dirty Record” or an “Intermediate record” is. In case this is new to you here is a very quick and simple explanation. Let’s follow along with the basic steps involved in updating a record in a table. When the insert runs, SQL Server gets the data from storage, such as a hard drive, and loads it into memory and the CPU. The data in memory is changed and then saved to the storage device. Finally, a message is sent confirming the rows that were affected.
For every data change to a table there is a brief moment where the change is made in this intermediate state, but is not committed. During this time, any other DML statement (SELECT, INSERT, DELETE, UPDATE) needing that data must wait until the lock is released. This is a safety feature put in place so that SQL Server evaluates only official data.
Locking and Concurrency
Some transactions may take a longer period of time and then rollback. In other words, the changes never become official data and in essence never took place.
Example: EmployeeID 1 was really hired in 1989, so while this update statement is running for fifteen seconds, there is dirty data (hire date of 1992) residing in the intermediate state, which is then rolled back leaving the original hire date of 1989 as the committed value.
SET HireDate = ’1/1/1992′
WHERE EmployeeID = 1
WAITFOR DELAY ’00:00:15′
Question: If, during those fifteen seconds, a 401K vesting program ran the query below looking for all the employees hired before 1990, would it overlook EmployeeID 1?
WHERE HireDate < ’1/1/1990′
A great deal of data can be changed during the intermediate state, but never get committed. This is known as locking, which prevents other processes from making decisions on dirty data. The result is that only committed data is used to isolate transactions. The drawback is that some processes that could run instantly now have to wait. Locking lowers the level of concurrency, which allows software the ability to run many processes at the same time.
Basic Table Hints
The next two examples show how the same query executed first without a table hint and then with a table hint, will return different Retail Price values for the same ProductID record.
By looking closely at the figure on the left, you can see the update statement has been running for one minute 38 seconds and the query on the right was started six seconds later. The committed value of the RetailPrice for ProductID 1 is $75 and the intermediate state of the RetailPrice value is $999. When the transaction reaches completion after 2:30, the price never changed and the $999 value was thrown away (never persisted) and the update was rolled back.
Since the query on the right does not have a table hint, it will wait until the data becomes official. The only downside to this safety feature is that the select query takes much longer to run.
While the update is in the uncommitted state, this query would show ProductID 1 having a value in the RetailPrice field of $999.00, even though this value was never committed to the database. Any query finding $999.00 would be returning false results based on dirty data. By running the same query three minutes later you would not know that the $999.00 value for the RetailPrice of ProductID 1 ever existed.
If you don’t want the query on the right to wait, there are some choices to have it run right away, regardless of the state of the data. If you are fine with the query on the right accessing dirty data, then issuing a locking hint for the query will yield results even if they include dirty records. Using the READUNCOMMITTED table hint allows the query to show uncommitted (dirty) data. The advantage is the query runs much faster. The following code tells the query to run without waiting:
FROM CurrentProducts (READUNCOMMITED)
Queries used for evaluation, such as tracking general trends, commonly use this type of table hint. Since this type of query is simply looking to see how many records are in a table, the data itself doesn’t need to be exact.
The following statements are equivalent and show more than one way of accomplishing the same result. In this case, SQL Server allows the query to run without waiting.
SELECT COUNT (*) FROM CurrentProducts ( READUNCOMMITED )
SELECT COUNT (*) FROM CurrentProducts ( NOLOCK )
The READUNCOMMITED and NOLOCK Table Hints are just two of the many used by SQL Server on how the isolation level for a query can be affected.
This lesson is a condensed excerpt from Chapter 9:
Beginning SQL Joes2Pros: The SQL Hands on Guide for Beginners (SQL Exam Prep Series 70-433 Volume 1 of 5)