The writing of stored procedures is a very common task in todays database world. Not only by database developers, but also application developers are writing the procedures. DBA’s also need to write procedures. In fact, I have faced a very common question in my interviews: ” How many stored procedures have you written?”
Since the world is going for fat servers (a server which runs an application where most of the program code resides on it rather than on the client computers), stored procedures are gaining more and more importance. Here is my humble try to give some tips to write good procedures. I hope this will add value for novice as well as experienced programmers.
1. Keywords – Use SQL keywords in capital letters to increase readability. Also use proper indentation to increase readability.
2. SQL-92 – Always try to use ANSI 92 syntax. Till now the old syntax is working the old syntax will be deprecated in the next release of MS SQL server.
As an example, for joining, use
SELECT
*
FROM
employee e1
INNER
JOIN
employee _dtl e2
ON
e1
.
id
=
e2
.
id
Instead of
SELECT
*
FROM
employee e1
,
employee_dtl e2
WHERE
e1
.
id
=
e2
.
id
3. Variables – Use as few as possible variables. It frees spaces in cache.
4. Dynamic Queries – Try to minimize the usage of dynamic queries. If you are using a dynamic query like: SELECT * FROM mydb.dbo.emp where empid = @eid then there is no problem. You can supply a value for the @eid parameter and there is no recompilation of the execution plan in the database cache. But if you are using a SQL query like SELECT * FROM emp where empid = ” + @eid and supply a parameter (say 100), then the cache will keep the execution plan for the value of 100 only. If the id changes (to say 101), it will recompile the statement. Hence, this approach is slower than the previous one. (You can get the exact value of the SQL statement from Profiler)
5. Fully Qualified Names – Always use the fully qualified name when calling stored procedures. This would be the format database_name.schema_name.table_name. For example, use EXEC master.dbo.Your_Proc_name instead of EXEC Your_Proc_name This is a very common mistake, which causes an extra trip to the procedure cache to get the execution plan for execution. Also try to use the schema name while creating a procedure. Like: CREATE PROCEDURE dbo.Your_Proc_name instead of CREATE PROCEDURE Your_Proc_name
6.SET NOCOUNT OFF – This returns the message that shows number of rows affected by SQL statement. This can cause extra network traffic and can have some serious impact on performance when the procedure is called frequently.
7. The sp_ prefix – Don’t use the “sp_” prefix in a stored procedure name as the “sp_” prefix is reserved for system stored procedures. Any stored procedure that has the “sp_” prefix will cause an extra lookup in the MASTER database If a stored procedure uses same name in both the user database and a system database, the stored procedure in the user database will never get executed.
8. sp_executeSQL and the KEEPFIXED PLAN options – Both sp_executesql and the KEEPFIXED PLAN option avoid the recompilation of a stored procedure. If you want to provide parameterized dynamic SQL, then go for sp_executesql instead of EXEC(proc_name). Here the execution plan for the procedure is stored with the variable name in cache memory. When the variable values are supplied, then the values are simply mapped to the query, hence no need for a recompilation.
Use the OPTION KEEPFIXED PLAN hint while selecting records from temporary tables. If the query contains this hint, then its plan is not recompiled. For more information about procedure recompilation, please go through the following article:http://technet.microsoft.com/en-us/library/cc966425.aspx
CREATE
PROCEDURE
my_proc
AS
CREATE
TABLE
#t
(
a
int
)
SELECT
*
FROM
#t
INSERT
#t
SELECT
*
from
retest
SELECT
COUNT
(
*
)
FROM
#t
WHERE
a
=
37
OPTION
(
KEEPFIXED PLAN
)
As an example of sp_executesql, we can write:
sp_executesql
N
'SELECT * FROM mydb.dbo.emp where empid = @eid'
,
N
'@eid int'
,
@eid
=
40
9. SELECT vs SET – A single SELECT statement can assign values to different variables and is much faster than multiple SET statements assigning values to multiple different variables.
SELECT
@Var1
=
@Var1
+
1
,
@Var2
=
@Var2
- 1
instead of
SET
@Var1
=
@Var1
+
1
SET
@Var2
=
@Var2
- 1
10. WHERE clauses – In a WHERE clause, the various operators used directly affect how fast a query can run. Here are the conditional operators used in the WHERE clause, ordered by their performance.
=, >, <, >=, <=, <>, !=, !>, !<
for details, refer to the article: http://msdn.microsoft.com/en-us/library/ms190276.aspx
11. More WHERE clause hints – Avoid unnecessary conditions in the WHERE Clause. You can easily use the first example instead of the second one. The second example uses one extra OR condition which can be avoided using the first example.
SELECT
emp_name
FROM
table_name
WHERE
LOWER
(
emp_name
)
=
'edu'
SELECT
emp_name
FROM
table_name
WHERE
emp_name
=
'EDU'
OR
emp_name
=
'edu'
Also, try to avoid IN. While checking the existence of some values, then use EXISTS instead of IN. Because IN counts the NULL values also, hence slower than EXISTS. Since EXISTS returns Boolean(Yes/No) but IN returns all values hence result set for IN is heavier than EXISTS.
SELECT
*
FROM
employee
WHERE
emp_no
NOT
IN
(
SELECT
emp_no
from
emp_detail
)
SELECT
*
FROM
employee
WHERE
NOT
EXISTS
(
SELECT
emp_no
FROM
emp_detail
)
12. CAST and CONVERT – Try to use CAST instead of CONVERT. CAST is ANSI-92 standard but CONVERT works in MS SQL server only. Also, Convert may be deprecated in future MS SQL releases. It is better to use CONVERT only when you need to format the DATETIME datatype with the style option. CAST cannot do this.
13. Avoid DISTINCT and ORDER BY – If you don’t need the DISTINCT/ORDER BY clause, then try to avoid so. Unnecessary DISTINCT or ORDER BY clauses cause extra work for the database engine. Hence making performance slower.
14. Avoid using cursors – Try to use temporary table/table variables with identity column and then iterate all the tables using WHILE loop and a looping counter, which will map with the identity column. For details, refer my previous article.http://www.sqlservercentral.com/articles/Stored+Procedures/64523/
15.SELECT statements – Try to use only the required number of columns in the SELECT clause instead of using *. Using * returns all columns, which unnecessarily create a fat recordset.
16.Subquery vs JOINs – This is a famous debate in many online forums. In fact most sub queries can be expressed as an equivalent form of JOIN. I have a good rule of thumb: subquery is faster when we have to retrieve data from large number of tables because it becomes tedious to join more tables. JOIN is faster to retrieve data from database when we have less number of tables. But try to avoid correlated sub queries because it makes the query much slower.
17. CREATE TABLE vs. SELECT INTO – Select * INTO works fine for small tables, but when dealing with large record sets or long-running queries, it creates locks on the system objects within the tempdb database. As a result, other queries and procedures that need to create objects within the tempdb database will have to wait for the long-running query to complete. This is because when an object is created, an exclusive lock is taken against the sysobjects, syscolumns, sysindexes tables.
Also, SELECT INTO not only copies data but also it copies the table structure, hence it performs slower.
18. Try to use table variables instead of Temporary Tables – Temp tables can cause stored procedures to recompile. But table variables were designed specifically to guard against stored procedure recompiles during execution.
If the result set is not containing a huge number of records then you should stick to table variable, otherwise temp table has its advantages. There is a misconception that temp tables always use the tembdb database but table variable do not. Table variables also use tempdb after a certain size. For more information refer to the following article : http://www.sqlservercentral.com/articles/Temporary+Tables/66720/
19.Use proper indexes – You can use the help of the data tuning advisor, but it does not gives the proper result all the time. Index scans are much faster than table scans. So identify the table scans from the execution plans. But when a table returns smaller rows, then it is better to use a table scan. You can see an excellent article on execution plans by G Vijayakumara at the following link:http://www.sqlservercentral.com/articles/Administration/executionplans/1345/
20. Use Profiler – The cachemiss event class indicates that the stored procedure is not in the cache. If the SP:Cachemiss class occurs frequently, it can indicate that more memory should be available to MS SQL server, thereby increasing the size of procedure cache. The cachehit event class indicates that a stored procedure is in the cache.
Last but not the least, again I am repeating the same advice from my previous article. Keep these guidelines in your mind, but don’t hesitate to break them if needed. After all, performance is the ultimate goal. If violating the general rule gives you good performance (it may happen based on your database) then don’t stick with the guidelines.
No comments:
Post a Comment