I’m happy to report that some progress has been made on my performance problem. I’m still working on it, and there will be a full progress report to come, but for now I’m simply happy to say I’ve gotten somewhere in the EXECSYNC dept. Now, that’s not to say I’ve gotten rid of them, but I now understand where they’re coming from and why.
The breakthrough came Monday evening when I happened across a very timely blog post by Paul White. The topic of the post was whether or not a select query can cause page splits. But about a third of the way down he starts explaining the internals of Index Spools. Now if you’ve done any Googling on the EXECSYNC wait type, you’ve seen this explanation for them:
“Occurs during parallel queries while synchronizing in query processor in areas not related to the exchange iterator. Examples of such area are bitmaps, large binary objects (BLOBs) and the spool iterator. LOBs can frequently use this wait state. Bitmap and spool use should not cause contention.”
This never really clarified anything for me, so when Paul said “spool” and “internals” my ears perked up. A little farther along and he gets into the spool iterator and how it works with parallelism. He had my full attention now. And then, the money quote:
“Don’t be fooled by the parallelism indicator on the index spool – the index building portion of its execution happens on a single thread.”
A single thread. He goes on to say:
Ignore the CXPACKET waits (they are perfectly normal) and look at the EXECSYNC waits. This query ran at DOP 8 on my machine, so 7 parallel threads had to wait while a single thread built the temporary index.”
Well. Not that I don’t believe Paul, but I wanted to verify that this was what I was seeing in my own test case. So I gave myself a crash course in Extended Events and recorded all EXECSYNC waits for my bigger query.
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name='EXECSYNC_Waits') DROP EVENT SESSION [EXECSYNC_Waits] ON SERVER; CREATE EVENT SESSION [EXECSYNC_Waits] ON SERVER ADD EVENT sqlserver.sql_statement_starting( ACTION (sqlserver.session_id, sqlserver.sql_text, sqlserver.plan_handle) WHERE (([sqlserver].[session_id]>(58) AND [sqlserver].[session_id]<(60)))), ADD EVENT sqlos.wait_info( ACTION (sqlserver.session_id, sqlserver.sql_text, sqlserver.plan_handle) WHERE (([sqlserver].[session_id]>(58) AND [sqlserver].[session_id]<(60) AND ([wait_type]=(191)) AND [duration]>(0)))), ADD EVENT sqlos.wait_info_external( ACTION (sqlserver.session_id, sqlserver.sql_text, sqlserver.plan_handle) WHERE (([sqlserver].[session_id]>(58) AND [sqlserver].[session_id]<(60) AND ([wait_type]=(191)) AND [duration]>(0)))), ADD EVENT sqlserver.sql_statement_completed( ACTION (sqlserver.session_id, sqlserver.sql_text, sqlserver.plan_handle) WHERE (([sqlserver].[session_id]>(58) AND [sqlserver].[session_id]<(60)))) ADD TARGET package0.ring_buffer( SET max_memory=4096) WITH (MAX_MEMORY = 4096KB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS, MAX_DISPATCH_LATENCY = 30 SECONDS, MAX_EVENT_SIZE = 0KB, MEMORY_PARTITION_MODE = NONE, TRACK_CAUSALITY = OFF, STARTUP_STATE = ON)
I started the event session and ran my query (MAXDOP=8) with the actual execution plan. When it finished I saved the execution plan and opened it in SQL Sentry. In the Top Operations tab I sorted by Operation. The plan had 8 Index Spool (Eager Spool) operations. I checked out the Extended Event output: 56 EXECSYNC waits. 7 per index spool. 7 threads waiting on one thread.
So why am I suddenly seeing longer EXECSYNC waits? While these servers have more CPUs than the VM Server, the processors are slower. All the CPUs in the world aren’t going to help a single threaded operation run faster.
But as I said, I am making progress. We turned off hyperthreading on the new servers and I’ve tried setting processor affinity to only 1 of the NUMA nodes. And that’s helped my test queries, including the one with all the index spools. In fact, it’s brought that one from 14 minutes to under 8 minutes, and it’s now faster than the VM. Unfortunately, the end user application has not seen the same improvements, so at this point I’ve opened a case with Microsoft. I’ll keep you posted.
P.S. If I ever happen to meet Paul White, I’m giving him a big ol’ hug.
I have read some good stuff here. Definitely value bookmarking for revisiting. I wonder how much attempt you set to create this type of magnificent informative website.
Great post. I stumbled across it today while troubleshooting a very similar issue. When I added OPTION(RECOMPILE) after my problem query, it ran much faster (sub-second vs. 17-20 seconds). I was considering this article by Erland Sommarskog as well: http://www.sommarskog.se/query-plan-mysteries.html .
Thanks for the help,