
You can now use Python to send the slow queries captured by MySQL to our tuningwizard service
MySQL / MariaDB / Percona Server for MySQL has this cool feature called slow query log, which captures any SQL running for more than X seconds. That threshold is determined by the long_query_time parameter.
If you have the slow query log enabled on your server, then you have a list of slow queries to work with already!
Now, because tuningwizard is actually a web API, its easy to build automation around it. If we could somehow send it our top slow queries for a revision.. Wait, you can!
1. Is the slow query log enabled?
For MySQL/MariaDB/Percona to log slow queries to a file we need at least:
- slow_query_log set to 1 or ON
- slow_query_log_file set to a valid path on the server
- long_query_time set to a meaningful value for our environment
If slow_query_log is set to 0 or OFF you can enable it without restarting the server by running the statement below on your database server, BUT make sure your long_query_time is not set too low if you don’t want performance to suffer.
SET GLOBAL slow_query_log = 1;
Only new connections will see this configuration change, so make sure you start a new one before testing the slow query. A quick way to do it is to run
SELECT "Test",sleep(5);
Make sure the number you put in sleep() is higher than what you have in long_query_time
2. Prioritizing the slow queries
The slow query log contains all queries executed that took more than long_query_time seconds. This means that the same query could be logged many times, with same or different parameters. Also, anything above the threshold is logged, meaning it would still be nice to focus on those queries that took most of our precious database time.
Fortunately, Percona has this great tool called pt-query-digest which will organize the slow query log for us. You can download it for free here.
pt-query-digest --output=json /var/lib/mysql/slow_query.log --max-line-length=0 --limit=20 > /tmp/slow_query.json
With the command above, pt-query-digest will read the slow query log file at /var/lib/mysql/slow_query.log (check slow_query_log_file on your server to get the path), aggregate and sort the slow queries and keep the slowest top 20. It will send the result to a file in /tmp/slow_query.json (you can choose any path you want)
3. Connecting to the tuningwizard API through Python
Next is to get everything in place so we can connect to the database and the tuningwizard API through Python. The steps to get you set up can be found here
4. Sending your slow queries for revision
We are now going to make some modifications to the test script provided in the previous step so queries are read from the pt-query-digest output file (/tmp/slow_query.json in this case) and sent to tuningwizard for analysis.
Create/modify the Python script with your favorite editor and include the code below. For the purpose of this example we are going to call the script optimize_slow_queries.py
# Import optimization job object
from twcollector import OptJob
# Import json for pretty printing
import json
from pygments import highlight, lexers, formatters
pfile = open('params.json','r')
pjs = json.load(pfile)
sqjs = open('/tmp/slow_query.json','r')
sqdict = json.load(sqjs)
for i,queryblock in enumerate(sqdict['classes']):
print("============== Query ",i+1,"==================\n")
sqltext = queryblock['example']['query']
print(sqltext)
# For each slow query instantiate optimization job by
# passing connections parameters, together with the query
myjob = OptJob(pjs["hostname"],
pjs["username"],
pjs["password"],
pjs["dbname"],
pjs["port"],
sqltext,pjs["token"])
# Run analysis
myjob.analyze()
# Print the results in colorful JSON
print("\n \n")
print(" ## Analysis result ")
result = json.dumps(myjob.analysis_result,indent=4)
colorful_json = highlight(result,
lexers.JsonLexer(), formatters.TerminalFormatter())
print(colorful_json)
# Delete object
del myjob
print("\n \n")
Execute the script by running python ./optimize_slow_queries.py
If everything went well, you should start seeing the recommendations flowing
============== Query 1 ==================
select b.price,b.seat,f.flightno,p.firstname,p.lastname
from
booking b,
passenger p,
flight f
where
b.flight_id = f.flight_id
and b.passenger_id = p.passenger_id
and f.departure between '2015-06-01 00:00:00' and '2015-07-30 00:00:00'
## Analysis result
{
"costly_operations": [
"460698 rows were read from table `flight` by scanning it entirely. The conditions specified for this table accounted for 50% of the rows, so, even though indexes exist, the server decided that a full scan was cheaper.",
"The Join between `booking` and `flight` produced 25517771 rows. Approximately 110 rows were retrieved from `booking` for each of the 231980 rows obtained from `flight`",
"The Join between `passenger` and `booking` produced 25517771 rows. Approximately 1 rows were retrieved from `passenger` for each of the 25517771 rows obtained from `booking`"
],
"recommendations": [
{
"description": "For table `flight`, consider including conditions on columns `arrival` to leverage an existing index, with higher cardinality. This way, you can narrow down the initial amount of rows retrieved from the table before filtering by the original conditions",
"type": "QUERY_REWRITE",
"relevance": "medium"
},
{
"description": "For table `booking`, consider including additional conditions, even if they are not indexed. This will help reducing the amount of rows generated by the Join.",
"type": "QUERY_REWRITE",
"relevance": "low"
},
{
"description": "The LIMIT clause can be used in cases where long Joins are generated but either only a fixed amount of rows is required or where the whole result set is too long to consume",
"type": "QUERY_REWRITE",
"relevance": "low"
}
]
}
============== Query 2 ==================
select al.iata,al.airlinename,ap.capacity,count(*) from airline al,airplane ap, airplane_type at, flight f where al.airline_id=ap.airline_id and ap.type_id = at.type_id and f.airline_id = al.airline_id and flightno like 'AL'
...

Leave a Reply