Welcome to SqlAdvice Sign in | Join | Help

I think that there's a tendency among DBAs and SQL Server wonks to be a bit dismissive of Full Text Indexing. I think that in a lot of cases people tend to think of it in terms of adding significant overhead.

For a long time I actually looked at it as being a more powerful, but more EXPENSIVE, form of searching. Then I actually had the chance to work with it (in a past life) a good deal, and found that it didn't really add that much overhead in a lot of cases. In fact, I found that it was actually pretty cool technology and afforded a lot of functionality with no noticeable performance hits. Though, I found that indexes frequently crashed a lot and needed an extra bit of 'love' to keep them functioning correctly. (This was all on SQL Server 2000.)

But in some cases it can yield insane performance benefits
Fast-forward to today. I'm currently working with a client that has a decent sized database (about 20GB). More than 5GB of that DB is made up of a 'Books' table - which has about 8M rows. The table has both and author and title domain, and each of them is pretty wide (nvarchar(300) and above). This table provides the backing for key functionality, and users frequently search for authors or titles using LIKE and wildcards. We all know that's nasty - as it forces a scan.

These searches were actually causing a table-scan, which was throwing off excessive reads, and generating massive amounts of CPU utilization. To the point where you could literally SEE each of these queries show up in task manager - as the CPU spiked to handle the underlying scan. The image below helps highlight the severity of the problem - with each of the scanning queries represented by the spike you can see:

PreFullTextIndex

I threw an index against each of the columns, figuring that a search on something like '%JRR%Tolkien%' against such a big table would likely prefer to use an index scan instead of a table scan. Both of those indexes required 533MB of Disk to create. And, sure enough, SQL Server used those indexes as soon as they appeared, but it only cut the reads in about 1/4th of what they used to be, and still kept CPU utilization about where it had been before - meaning that you could still SEE these queries being executed by their tell-tale CPU spikes.

Enter Full Text Indexing
I figured that since Full Text Indexing actually tokenizes index data, instead of merely storing it in a B-Tree, that we'd end up with a much smaller index structure. I also hoped, that if we could get a smaller index to work from, we'd see a commensurate decrease in the amount of reads/churn going on on the server.

Creating a Full Text Index on 8M+ rows took a fair while. But I was happy to see that it was only 133MB (compared to the 533MB of the 'traditional' indexes). That made me hopeful that we'd at least see around a 1/4th reduction in the amount of reads.

Turns out I got way more than I had anticipated. The actual execution plan for the query using LIKE and % was coming in at a cost of around 27 or 28 on a consistent basis. It was also taking 5-7 seconds for each query. (Yeah, that puts it almost into the realm of making it a 'query from hell' - but definitely something you don't want people just firing off in an 'ad-hoc' fashion like they actually NEED to do with this query in question.)

With the full text index, the actual execution plan drops down to .27 - making it fully 100 times faster than its traditional counterpart. Execution time dropped down to 0 seconds (in Management Studio) as well. In past cases where I had used Full Text Indexing, I was only dealing with a table with less than 300K rows. Suffice it to say I was pretty tickled at the insane performance boost provided on an 8M row table with semi-wide domains (author and title) being indexed.

So, keep those insane performance benefits in mind the next time you're tempted to be a bit dismissive of the 'overhead' that Full Text Indexing might add to your environment.

I'm with Stephen Wynkoop on this one - I'm not sure I'm too hip on Microsoft's emerging direction to divide SQL Server Tools into Developer and Admin Roles.

Stephen just posted an editorial based on an interview that he had with Matt Nunn, and points out that Matt confirmed that Microsoft is trying to divide tools along discipline lines - wherein developer related tools will be in Visual Studio (VS) and DBA related tools will be in Management Studio (MS).

Stephen does a good, non-inflamatory, review of the pros and cons - but effectively ends with the opinion that he doesn't think it works for mixed-hat DBA/Developers.

I think he's spot on. Most likely because I think that there are likely VERY few DBAs out there that don't see some part of their job related to writing, modifying, or tuning code. Even if it's just a question of creating spiffy queries based on dynamic management views, most DBAs are going to want full-powered development IDEs/tools at their disposal to quickly pound-out scripts that they'll likely end up saving and occasionally re-using.

I'd say give developers powerful tools as planned, give DBAs the additional 'DBA' related tooling, and then give DBAs the option to install additional tools and support when they install MS to their own boxes. Just make it an option. Otherwise I really see myself possibly switching to use a tool from a third party.

If you do a lot of scripting, SP2 for SS 2005 is going to be a must-have. Luckily it was just released.

Here are some of the BIG improvements for people who like to script a lot:

  • Generate Script Wizard. You can now specify that the scripted objects include a DROP statement before the CREATE statement.  Scripting of objects into separate files is now possible. This feature is supported in SQL Server 2005 Express Edition SP2.
  • Fewer connection prompts when you open .Transact-SQL scripts. Management Studio determines the connection of the current context. This feature is supported in SQL Server 2005 Express Edition SP2.
    [Taken from the What's New document - emphasis added]

Those are some HUGE improvements. I've complained about the multiple scripts problem before - not having it was just unpardonable. And I've wondered, sooo many times, why opening .sql files had to be so painful - glad to see that MS put some cycles in to making that less painful.

There a number of other big improvements too, but these were ones that I was excited about.

0 Comments
Filed under: ,

One of my clients recently had a problem where a query with a semi-measly execution cost of .3 was taking 25 seconds to return. Worse, forcing the query to use a suitable index (instead of the PK with accompanying bookmarks) had a cost of 4.8 and returned instantly. Moreover, the query taking 25 seconds was only iterating over a couple of small tables (< 70k rows each) on decent hardware. There was really NO reason for the query to take 25 seconds - yet it was consistently doing so.


To troubleshoot I did the following:
a) updated statistics explicitly on the table in question (nightly statistics updates were already in place)
b) rebuilt indexes to remove fragmentation.
c) corrected/updated storage info using DBCC UPDATEUSAGE. (Usage stats were pretty out of whack - the table had a number of indexes showing that they were only 2-4 pages in size, while they were actually 200-400 pages in size. Imagine my disappointment when updating this didn't have ANY impact on that phantom 25 seconds.)
d) recompiled the offending sproc
e) double-checked to ensure that data type coercion issues weren't hosing an index or something similar...
Then finally remembered (with help from Wally) a similar problem I had a few years earlier with HT and some unexplained proc problems. As a simple test, I tweaked the single select within the sproc to use OPTION (MAXDOP 1) to check to see if HT might be the issue - and the sproc returned instantly after a recompile. At that point I knew that I either had HT processors, or a nasty, ugly bug going on with the way that parallelism was being handled. A few minutes later confirmation came back: HT processors. (Note too, that this SQL Server in question has been patched for HT support.)

Solution
So, HT procs.... I can easily disable them using the bios. Trouble is that they're nice on smaller boxes (I think the server in question only has 2 physical procs). But when they occasionally add a phantom 25 seconds to a core query.. that makes keeping them hard.  Not sure what I'll recommend at this point, but it looks like HT might have to go.

Just an FYI that my latest product review for SQL Server Magazine is in the February Edition. (My February edition only just BARELY showed up in the mail...).

In this review I take a look at Quest Software's Spotlight on SQL Server 5.7. It's a decent solution that can be really beneficial for production DBAs tasked with monitoring lots of critical servers. Quest has also done a great job making the interface so intuitive that you could easily plunk the software in front of a non technical user and have them call a DBA whenever an alert or alarm went off. (The UI uses simple color coding: green = good, yellow=warning, red areas of the screen indicate components/sub-systems that need expert lovin' and attention.)

0 Comments
Filed under:

It's almost a rule of thumb to avoid indexing strategies on tables with less than a few thousand rows. The reason, of course, is that SQL Server will frequently just ignore indexes on 'tiny' tables - preferring instead to perform table scans.

However, in environments with heavy load, and with borderline memory bottlenecks (i.e. heavy memory pressure), indexing can prove VERY beneficial, even on absolutely puny tables (as in less than 600 rows).

During some recent performance monitoring/tuning for a client with a 160GB+ database which typically has > 2400 users on it at any given time, I noticed that two queries tied to scheduling and custom template 'goodness' were generating more than their fair share of READS. (These two queries accounted for 17% of all calls on the system, and > 35% of all of the reads on the system.) As is typically the case, I have NO control over the underlying code (it's a third-party system), but still need to keep things performant.

A big problem was a bookmark lookup, against a 'small' table with roughly 120k rows in it. It was generating a very large number of reads. By tweaking the existing clustered index to make it cover better, I was able to reduce the total cost of both queries down significantly. Then, I noticed a number of other Bookmarks, and decided to take a stab at them. When everything was said and done, I was able to cut the cost of both queries to roughly HALF of what it was before. Granted, a sizeable portion of that was because of the covering index on my 'small' table, but the covering indexes on my 'tiny' tables helped out as well.

Empirical speak:

Original Cost of both Queries:
.176

Added Covering Index to 'small table' (~120k rows):
.140

Covering Index also added to a <500 row table:
.117

Covering Index also added to a ~600 row table:
.114 (not much of a boost)

Covering Index also added to a table with a whopping 364 rows:
.099 (nice boost!)

The beauty (other than seeing what amounted to a 50% reduction in overall cost) is the fact that indexes on such piddly tables are almost free - they certainly won't be taking up much space at all. (These were just standard/normal lookup tables - nothing hideous/super-wide).

So, keep micro-indexing in mind. Without micro-indexing I would have seen a drop from .176 down to .140 - a significant boost on such a heavily used system. But with micro-indexing I gained another significant performance boost.

Quick heads-up on a recently published article I did for SQLTeam.com. The article examines SQL Server Integration Services' Fuzzy-Lookup transformation for use in data-cleansing routines.

If you're interested, head over to the article - there's even a sample app you can use to take a quick look at how easy SSIS makes it to add robust data-cleansing to your ETL projects.

0 Comments
Filed under: ,

If you're in to SQL Server Reporting Services, but want better/richer functionality in Word and Excel - then zip out and skim my article at the ASPAlliance: Using SoftArtisans' OfficeWriter with SQL Server Reporting Services 2005.

The article provides an overview of what OfficeWriter brings to the table, and includes a number of quick-and-dirty samples showing how you can EASILY leverage 'native' Office functionality within your SSRS Reports.

If it doesn't show in the article, I'm very impressed with the functionality that OfficeWriter provides - and with how cleanly it integrates with SQL Server Reporting Services.

Disclosure: I didn't get paid, compensated, or anything for writing this article. Heck, I didn't even get a free copy of the software - though I should have, it's great stuff. (SoftArtisans contacted me a while back to see if I'd be interested in looking at OfficeWriter after I posted some SSRS related stuff on the SQLAdvice.com lists. I took a peek at the product, LIKED what I saw, and the rest is a terribly booooring article to anyone who's not interested....)

0 Comments
Filed under: ,

ORDER BY has never been technically 'allowed' in a view definition. For example, the following code:

CREATE VIEW dbo.[UsersByFirstName]
 AS
     SELECT * FROM dbo.[users] ORDER BY [FirstName]
GO

will throw an error if you attempt to run it. Books Online for SQL Server 2000 says:

The ORDER BY clause is invalid in views, inline functions, derived tables, and subqueries, unless TOP is also specified.

Accordingly, people used to use the TOP 100 PERCENT hack - something I never really liked because it typically ended up being seriously abused with lots of nested queries and could cause some very gross performance issues - but the HACK looked like so (and had pretty decent industry acceptance):

CREATE VIEW dbo.[UsersByFirstName]
 AS
     SELECT TOP 100 PERCENT * FROM dbo.[users] ORDER BY [FirstName]
GO

The problem though, is that it was never technically really guaranteed to work. Only it worked ALL the TIME in SQL Server 2000. But not with SQL Server 2005. With SQL Server 2005 it actually makes the list of breaking changes:

In SQL Server 2005, the ORDER BY clause in a view definition is used only to determine the rows that are returned by the TOP clause. The ORDER BY clause does not guarantee ordered results when the view is queried, unless ORDER BY is also specified in the query itself. [1]

I imagine some people might be bummed by that... but it's really a non-issue. Technically speaking, ORDERING should be done on a query by query basis - it's hard to argue that it's not intrinsically related to presentation (the only argument would be if you were SELECTing the TOP n of something - which will be honored by SQL Server as part of a VIEW... it's just that presentation of the resultant rows will still the responsibility of the outermost query [2] ).

References:
[1] - http://msdn2.microsoft.com/en-us/library/ms143179(SQL.90).aspx
[2] - http://blogs.msdn.com/sqltips/archive/2005/07/20/441053.aspx

Index Tuning is something that I _really_ enjoy - mostly because I just love seeing performance improvements. It's always exciting to see tangible improvements in existing systems.

Recently while looking into a query that was causing a lot of reads for one of my clients, I determined that while the query was making very good use of Index Seeks it was still doing some bookmarks - some expensive bookmarks that were accounting for nearly 18% of all reads on the client's system. So, a covering index seemed the logical choice. Figuring out what's needed for a covering index only involves a bit of deciphering, but I've recently seen notice that SQL Server 2005 makes it tons easier to figure out what columns need to be accounted for.

Check out the following screen cap of the sample query I was working on from SS 2000's Graphical Estimated Execution Plan:

It shows that the bookmark operation is taking 50% of the overall effort of the query. And that's about it.

Now look at this screencap from SS 2005's Graphical Estimate Execution Plan:

Spiffy eh?

Seriously, talk about a nice touch.

 

I love PostXING - now I've just configured it to work with the CS 2.0 upgrade that just happened on SqlAdvice.

(I'm actually using the latest version of PostXING - right out of SVN - but it's working great, and setup was a breeze.)

Setup was simple. I just put in my username and password, pointed the host at sqladvice.com, and configured the metaweblogapi to point to /blogs/metablog.ashx.

Now I'm in.

Heck, I might even start blogging a bit more now ;)

1 Comments
Filed under: ,
VPN Software should come with big warning labels: "WARNING: Does not play well with others."

As a consultant I frequently need to gain access to remote networks via VPN. This means I'm frequently hosing myself by installing yet another VPN client.
That, and I hate connecting to the vast majority of VPNs because they hose my network settings making it impossible to send/recieve email and do other 'normal' things while connected.

So, I recently began toying with the perfect solution: Virtualization.

This approach would work well using Microsoft's Virtual PC or VMware's Workstation - though the performance in VMware really is that much better. (Even if I DO hate the way it steals the volume control on my box and re-sets it to middle of the road each time the VM boots).

But.. here's the solution:
1) Build out a VM/VPC. (You'll need a valid license/copy of Windows - I use XP).
2) Load SQL Server Client Tools on to the box.
3) Install your VPN client software on this box too.

Since I make heavy use of Red Gate's SQL Compare/Data Compare tools, I throw those on my VM  / VPC as well. (I contacted them about this in regards to licensing, their response was that the SQL Comparison tools are licensed per user (though the legal agreement during instal process makes it look a bit differently)). In other words, a licensed user is free to install their copy of the software on as many machines as they like - as long as they are the ONLY user of the software. (Which is excellent, because not having the ability to do this was TOTALLY going to be a deal breaker for me - i.e. not being able to install my Red Gate tools was going to make this whole VM/VPC idea no fun. (Yes, I'm that much of a baby - and yes, the tools are that good).)

With all of this in place, you've now got a virtualized SQL Client that can connect to remote networks via VPN all day long - as much as needed and:
1) no interruption to your main/primary/host machine. (It has no clue that any VPN-stuff is going on at all since it's all taking place on the virtual machine).
2) No more bloat on your main box from all of those clients.
3) If something breaks, you can just restore a previous snapshot or copy of the VM / VPC and you're good to go.
4) If the VPN just starts getting cluttered/gum-ed up, there's no need to repave - just revert to a 'last-known-good' configuration and all the changes you've made over the past little while are gone.

Frankly, spiffy. (There are a few hiccups/nits such as moving files/data back and forth between the virtual machine and your host - but I'd say there's 95% less head-ache and worry than willy-nilly slapping VPN clients on your primary box (since the damned things RARELY un-install correctly).)


I recently did some book reviews for the ASP Alliance (which will be throwing up a SQL Server Reporting Services portal in the near future).

The reviews are for (links are to the reviews):
Hitchhiker's Guide to SQL Server 2000 Reporting Services
and
The Rational Guide to SQL Server Reporting Services

The reviews were easy to do as both books are excellent in their own spheres. Honestly, if you could only ever own two SRS books these would be excellent choices.
0 Comments
Filed under: ,

As a consultant, it's not uncommon for me to need to rewrite a poorly performing query or sproc without a full understanding of the underlying biz rules or context. Improved performance is the stated goal, but data fidelity is implied as critical.

Enter unit testing - something many of us rely upon heavily during 'normal' development at the application level. With sprocs/queries I've found two approaches that work well for unit testing.

1) Text Files: CTRL+SHIFT+F is your friend. Use it to 'burn' query results in either QA or MS to a .txt file, instead of sending results to Text/Grid. Just make sure to evaluate parameters in the targeted query/sproc, and burn a few result sets to disk as .txt files. (Make sure to keep track of which params correspond to which result sets). Then, tweak your sproc/query as needed, then rerun the same params, and output to corresponding text files. At this point, you can then just use a simple differencing application (I prefer to use WinMerge - free and awesome) to verify that your changes haven't impacted the logic.

2) SQL Data Compare: Red-Gate is your friend. (This is the approach that I typically use). Instead of 'burning' results to .txt files, 'burn' them to 'temp' tables in the DB. (By 'temp' I mean dbo.unittest__12a or something similar - not #temp, as I need the tables to appear for use by the Red-Gate tool.) Just make sure that each 'temp' table has a PK, and that you've created as many as needed to fully validate all of your parameters - then tweak the proc/query as needed, and then use SQL Data Compare to compare all of the tables used for your unit test. I prefer this route as it provides a very VISUAL way to quickly compare results.

Paolo has started up his own blog on SqlAdvice. Make sure to check it out and subscribe. The focus of his blog will be on Experimental SQL technologies, including AI and self generating SQL entities. Should be a great ride.
0 Comments
Filed under: ,
More Posts Next page »