Monday, July 27, 2015

Logical Query Processing Order

One of the interesting things that I have learned during my 70-461 review is the Logical Query Processing order. I have been writing 'SELECT' statements for a while but it's only now that I really understood the order in how the statement is actually interpreted. So, a typical SELECT statement goes like this:

SELECT country, YEAR(hiredate) AS yearhired, COUNT(*) AS numemployees
FROM HR.Employees
WHERE hiredate >= '20030101'
GROUP BY country, YEAR(hiredate)
HAVING COUNT(*) > 1
ORDER BY country, yearhired DESC;


Before, I thought the main query clauses were processed in the order they were entered (also known as the "keyed-in order"):
1. SELECT
2. FROM
3. WHERE
4. GROUP BY
5. HAVING
6. ORDER BY

No wonder I used to get the occasional Invalid column name '<insert your bad column name here>'. That's because I didn't understand the logical query processing. The conceptual interpretation order actually goes like this:

1. FROM
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY

Whoa! Wait, so the SELECT statement is actually processed as the last (if no ORDER BY clause is present)?? Yes! I actually asked the same thing. Each of the steps above (phases) operates on one or more tables as inputs and the result is a virtual table that becomes that input of the next step. It's almost similar to method chaining. That explains why we get the Invalid column name error if for example, we used an alias defined in the SELECT clause in the WHERE clause. This is because the WHERE clause is actually evaluated before the SELECT clause, so it doesn't know anything about the alias yet.

No comments:

Post a Comment