posted by | on Uncategorized | No comments

SQL SERVER – CLUSTERED INDEX

August 08, 2012 by rickmorelan

Data Storage Terms

Think of a memory page as a carton of 12 eggs and each egg represents a row of data. If you only had two eggs to store, then only one carton would be needed. There are still 10 slots available for when you get more eggs to store. Since all the cartons are the same size and modifying the carton is not allowed, acquiring more and more eggs will eventually require more than one carton. What happens when the 12th and 13th egg finally arrives? Naturally, you will fill up the first carton with the 12th egg and get another carton for the 13th egg.

Memory Pages

So just how many records will fit into a single memory page? When more records come in, how does SQL Server manage multiple memory pages for each table?

A page can hold up to 8K of data. If, for example, each record used 2K of data, then 4 records would fit into a single memory page. If this table had 12 records, then it would fill 3 complete memory pages. When the 13th record arrives, a 4th memory page is needed to accommodate this new record. Since the physical storage of data impacts the speed and efficiency of queries, this post will briefly explore how indexes can impact the physical location of data and the way SQL Server retrieves data.

Clustered Index Data in Memory

Absent a clustered index or any constraints (e.g., a primary key, a foreign key, etc.), the default order of records is the order in which they were entered into the table. Let’s say we decide to add a clustered index to the table in the figure below based on the SSN field. Regardless of the order in which you insert these records, each record will be stored in order of SSN. If you insert SSN 888-88-8888 first and then later add 222-22-2222, SQL Server would physically reorder the records in storage so they line up by SSN. However if you tell the table the data need to be stored in order by SSN regardless if the entry order then you can Cluster your table in the SSN field.

Page Splits

The clustered index is the placement order of a table’s records in memory pages. When you insert new records, then each record will be inserted into the memory page in the order it belongs. In the figure below we see another new record (Major Disarray) being inserted, in sequence, between Jonny and Rick. Since there is no room in this memory page, some records will need to shift around. The page split occurs when Irenes’ record moves to the second page. Page splits are considered very bad for performance, and there are a number of techniques to reduce, or even eliminate, the risk of page splits.

Primary Key – Clustered or not?

You can create a clustered index on the table on any field you choose. Sometime SQL will create a clustered index for you. Often times the field having the Primary Key makes a great candidate for the clustered index. Therefore, the first two of the three statements shown below place the Clustered Index on the EmpID field:

– The EmpID field will be a Primary Key and Clustered
CREATE TABLE PayRates
(EmpID INT PRIMARY KEY,
YearlySalary SMALLMONEY NULL)
GO

– The EmpID field will be a Primary Key and Clustered
CREATE TABLE PayRates
(EmpID INT PRIMARY KEY CLUSTERED,
YearlySalary SMALLMONEY NULL)
GO

– The EmpID field will be a Primary Key and NOT Clustered
CREATE TABLE PayRates
(EmpID INT PRIMARY KEY NONCLUSTERED,
YearlySalary SMALLMONEY NULL)
GO

This lesson is a condensed excerpt from Chapter 8:
SQL Architecture Basics: Core Architecture Concepts (SQL Exam Prep Series 70-433 Volume 3 of 5)

Welcome to Joes2Pros
Joes2Pros Academy
Joes2Pros Website