Tuesday, September 20, 2005 11:10 PM
by
RepeatableRead
Nibbling Deletes - Intro
So, your application was a success. You're getting oodles and oodles
of traffic. Only, it's building up faster than you had anticipated, and
after a bit of calculation, you've determined that if the present
trends persist, you'll have 24 TB of data within 1 year. Only, you
really don't need data > 3 months old in the main OLTP tables.
Question: How do you archive, or nuke that older
data without impacting performance? Trying to delete vast sums of data
all at once will lock your table, which will block threads, which will
stall your application, and cause you all sorts of grief.
Enter the 'nibbling' delete - a trick I came up with to archive
oodles (0.3-10 million rows/day) of 'traffic' data on a very busy
server. In my situation, a clustered index didn't exist along the
'date' domain of the entity in question, so trying to handle that many
locks on a row-level basis (for a ranged, date-based, delete) would
have taken the server to its knees (well, would have hiccuped it for a
few seconds at least - and I couldn't afford ANY interuptions).
The Main Concept: Sql Server can, and will, do a
fine job of just getting rid of a few rows at a time. It's when you try
to remove 40k or 4 million at a gulp that it gets cranky. What we need
is a simple way to 'spoon feed' it a few rows to nibble on at a time.
When it's done with those rows, it can have a few more. Happily, SQL
Server provides a way to set up your own 'nibbling' operation, with the
use of some simple logic and the assistance of the WAITFOR operator. As
follows:
DECLARE @count int
SET @count = 2000
SET ROWCOUNT 2000
WHILE @count = 2000 BEGIN
DELETE FROM myBigTable
WHERE someCondition = true
SELECT @count = @@ROWCOUNT
WAITFOR DELAY '000:00:00.200'
END
Simple huh? The where clause can be as simple as something like
WHERE recordedDate > @30DaysAgo. However, you may find that spooling
all of those results is too expensive, so you could do something like:
DECLARE @count int
SET @count = 2000
SET ROWCOUNT 2000
WHILE @count = 2000 BEGIN
DELETE FROM myBigTable
WHERE targetID IN
(SELECT TOP 2000 targetID
FROM myBigTable WITH(NOLOCK)
WHERE something = somethingElse)
SELECT @count = @@ROWCOUNT
WAITFOR DELAY '000:00:00.200'
END
And that's it. Forcing SQL Server to limit the number of 'row
operations' (via setting the ROWCOUNT) stops it from gobbling up oodles
and oodles of locks, but allows SQL Server to just keep churning and
churning until all of the targeted rows are deleted. Giving SQL Server
2000 .2 seconds to 'think' between operation will give it enough time
to make sure that any operations that queue against your nibbling
delete will pass through without much of a wait (on a heavily trafficed
system some operations WILL queue behind yours, but the calling
application can't really tell the difference as they never take more
than .4 seconds to complete.
Of course, in SQL Server 2005, the SET ROWCOUNT operator has been
retired. Instead you'll just need to the TOP operator, which now
accepts variables:
DECLARE @target int
SET @target = 2000
DECLARE @count int
SET @count = 2000
WHILE @count = 2000 BEGIN
DELETE FROM myBigTable
WHERE targetID IN
(SELECT TOP @target targetID
FROM myBigTable WITH(NOLOCK)
WHERE something = somethingElse)
SELECT @count = @@ROWCOUNT
WAITFOR DELAY '000:00:00.200'
END
And frankly, that's a bit cleaner anyhow (i.e. WHERE mainTable.id IN (SELECT TOP # of ids from the mainTable)).
I'll post a bit more about this later, specifically on some other
tips and tricks, and ways to handle tables with FK constraints
(specifically with CASCADES enabled).