SIDE NOTE - alot of these posts are going to ultimately sum up to "This doesn't really add value", or "I didn't find anything hugely interesting here". In order to save you, the faithful reader, some time, I'll try and tell you up front whether you're going to walk away a better person after reading a particular post.
In this post, I'm laying the groundwork for the next post, providing some background information. Probably useful.
A big piece of SQL performance involves how a query is processed logically. In a nutshell, the SQL Server compiler will parse a query into multiple pieces, and construct a series of "Virtual Tables" at each stage of the processing. It's useful knowing how this works - a few reasons being:
1) There are certain SQL rules that are pretty confounding, but make sense in light of how SQL Server is processing the query. For example, "why can't I use the 'MAX' aggregate function in my WHERE clause?"
2) Performance - if you can move certain processing to a logical step that will cut down the processing required by subsequent steps, you can improve performance
3) Weird Results - sometimes, things are included in a result set, when they don't seem like they should be. For example, sometimes the "GROUP BY" clause will toss in a result that seems like it should have been excluded by a WHERE clause.
There's more to this topic than I want to cover here - I didn't see anything on the web, but here's the book if you're interested:
So one of the big surprises for me was that, for each table you JOIN in a query, SQL Server does a Cartesian Join. Basically, for 2 tables, this means that the number of rows will be:
[# of rows in Table1] x [# of rows in Table2]
If you've got big tables, and queries that need to join alot of them at once, this seems like it would get pretty ridiculous within SQL Server? How does it manage to work with datasets this big? I've looked around on the web a bit for some sort of explanation, but haven't found anything.
My next post is going to contain a test I ran to try and illustrate the performance implications of this Cartesian join. I think I'll also post something to http://stackoverflow.com/ and see if someone there has an answer. Stay tuned...
Well, it looks like I got my answer!
ReplyDeletehttp://stackoverflow.com/questions/2328161/sql-server-logical-query-processing-how-does-it-manage-the-huge-datasets
Based on this, I'm not going to waste everyone's time with my "performance tests" :) - the virtual query processing is just that - "Virtual". By the time it gets to actually executing the query, alot of the "theory" (including Cartesian joins between all tables) - has been optimized out.