So, your application was a success. You're getting oodles and oodles of traffic. Only, it's building up faster than you had anticipated, and after a bit of calculation, you've determined that if the present trends persist, you'll have 24 TB of data within 1 year. Only, you really don't need data > 3 months old in the main OLTP tables.

Question: How do you archive, or nuke that older data without impacting performance? Trying to delete vast sums of data all at once will lock your table, which will block threads, which will stall your application, and cause you all sorts of grief.

Enter the 'nibbling' delete - a trick I came up with to archive oodles (0.3-10 million rows/day) of 'traffic' data on a very busy server. In my situation, a clustered index didn't exist along the 'date' domain of the entity in question, so trying to handle that many locks on a row-level basis (for a ranged, date-based, delete) would have taken the server to its knees (well, would have hiccuped it for a few seconds at least - and I couldn't afford ANY interuptions).

The Main Concept: Sql Server can, and will, do a fine job of just getting rid of a few rows at a time. It's when you try to remove 40k or 4 million at a gulp that it gets cranky. What we need is a simple way to 'spoon feed' it a few rows to nibble on at a time. When it's done with those rows, it can have a few more. Happily, SQL Server provides a way to set up your own 'nibbling' operation, with the use of some simple logic and the assistance of the WAITFOR operator. As follows:

DECLARE @count int
SET @count = 2000

SET ROWCOUNT 2000

WHILE @count = 2000 BEGIN

DELETE FROM myBigTable
WHERE someCondition = true

SELECT @count = @@ROWCOUNT
WAITFOR DELAY '000:00:00.200'

END

Simple huh? The where clause can be as simple as something like WHERE recordedDate > @30DaysAgo. However, you may find that spooling all of those results is too expensive, so you could do something like:

DECLARE @count int
SET @count = 2000

SET ROWCOUNT 2000

WHILE @count = 2000 BEGIN

DELETE FROM myBigTable
WHERE targetID IN
(SELECT TOP 2000 targetID
FROM myBigTable WITH(NOLOCK)
WHERE something = somethingElse)

SELECT @count = @@ROWCOUNT
WAITFOR DELAY '000:00:00.200'

END

And that's it. Forcing SQL Server to limit the number of 'row operations' (via setting the ROWCOUNT) stops it from gobbling up oodles and oodles of locks, but allows SQL Server to just keep churning and churning until all of the targeted rows are deleted. Giving SQL Server 2000 .2 seconds to 'think' between operation will give it enough time to make sure that any operations that queue against your nibbling delete will pass through without much of a wait (on a heavily trafficed system some operations WILL queue behind yours, but the calling application can't really tell the difference as they never take more than .4 seconds to complete.

Of course, in SQL Server 2005, the SET ROWCOUNT operator has been retired. Instead you'll just need to the TOP operator, which now accepts variables:

DECLARE @target int
SET @target = 2000
DECLARE @count int
SET @count = 2000

WHILE @count = 2000 BEGIN

DELETE FROM myBigTable
WHERE targetID IN
(SELECT TOP @target targetID
FROM myBigTable WITH(NOLOCK)
WHERE something = somethingElse)

SELECT @count = @@ROWCOUNT
WAITFOR DELAY '000:00:00.200'

END

And frankly, that's a bit cleaner anyhow (i.e. WHERE mainTable.id IN (SELECT TOP # of ids from the mainTable)).

I'll post a bit more about this later, specifically on some other tips and tricks, and ways to handle tables with FK constraints (specifically with CASCADES enabled).

Sponsor