Wednesday, July 23

Using STRAIGHT_JOIN To Work Around The Optimiser

As explained in a previous post, I had an issue with an overzealous MySQL optimiser, which then decided to return incorrect results after I had chastised it. I was therefore stuck between incredibly slow (read: unusable) performance and unreliability and had no choice but to work my design around the behaviour.

I had already come across the STRAIGHT_JOIN keyword, but only as used as a replacement for a more regular JOIN. The idea is to tell the optimiser to join the tables in the order in which they appear in the query. On its own, JOIN is both commutative and associative, in that you can do them in any order. Left to its own devices the optimiser may choose to take what you have written and turn it upside down if it thinks the whole thing will be faster, but STRAIGHT_JOIN instructs it to stick to how it's written.

Outer joins, however, aren't commutative and you'd most likely get different result if you reversed one and for this reason the optimiser will never try to. Since the view that I was having difficulty with was made up of three outer joins, I didn't bother looking into STRAIGHT_JOIN much. Sure, I could have applied it to base views in order to cut short the work the optimiser had to do anyway, and this probably would have helped in this particular case too. But since the base views seemed to be working okay in other cases I chose to avoid doing this - I may as well take advantage of the optimiser when it works as it should do.

What I didn't realise was that you can selectively ask the optimiser to STRAIGHT_JOIN on demand. You do this by calling it during the select itself:

select STRAIGHT_JOIN a,b,c from rest-of-query;

This instructs the optimiser to STRAIGHT_JOIN as much as possible while evaluating the query. Since this implies one path, you don't have to adjust OPTIMIZER_SEARCH_DEPTH to restrict itself globally either - it can be left at 62 (or whatever you found it useful to be) and should, in theory, continue to optimise wheres and other parts of your query.

So my strategy is now the following - to write and use views as I am currently, with some consideration on what order to join the tables/views within them in. If ever, when consuming a view at a higher level (say, in a procedure or function), I find them to take more than a few seconds to complete, I will add the STRAIGHT_JOIN keyword to effectively disable join optimisation but only when that particular procedure or function is called.

Hopefully this will allow me to take advantage of the optimiser when I can, and tell it to bugger off when it can't quite handle the job it's supposed to. The only flaw in this plan is if STRAIGHT_JOINing results in a sub-optimal execution plan and makes a query slower than it could have been (albeit with a correctly functioning optimiser). That's something I'll have to leave to later, when I have a substantial amount of data to play around with.

No comments:

Post a Comment