Friday, July 27, 2012

Understanding JOINs MySQL

“JOIN” is a SQL keyword used to query data from two or more related tables. Unfortunately, the concept is regularly explained using abstract terms or differs between database systems. It often confuses me. Developers cope with enough confusion, so this is my attempt to explain JOINs briefly and succinctly to myself and anyone who’s interested.

Related Tables

MySQL, PostgreSQL, Firebird, SQLite, SQL Server and Oracle are relational database systems. A well-designed database will provide a number of tables containing related data. A very simple example would be users (students) and course enrollments:

‘user’ table:

id
name
course
1
Alice
1
2
Bob
1
3
Caroline
2
4
David
5
5
Emma
(NULL)

MySQL table creation code:

  1. CREATE TABLE `user` (  
  2.     `id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,  
  3.     `name` varchar(30) NOT NULL,  
  4.     `course` smallint(5) unsigned DEFAULT NULL,  
  5.     PRIMARY KEY (`id`)  
  6. ) ENGINE=InnoDB;  

The course number relates to a subject being taken in a course table…

‘course’ table:

id
name
1
HTML5
2
CSS3
3
JavaScript
4
PHP
5
MySQL

MySQL table creation code:

  1. CREATE TABLE `course` (  
  2.     `id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,  
  3.     `name` varchar(50) NOT NULL,  
  4.     PRIMARY KEY (`id`)  
  5. ) ENGINE=InnoDB;  

Since we’re using InnoDB tables and know that user.course and course.id are related, we can specify a foreign key relationship:

  1. ALTER TABLE `user`  
  2. ADD CONSTRAINT `FK_course`  
  3. FOREIGN KEY (`course`) REFERENCES `course` (`id`)  
  4. ON UPDATE CASCADE;  

In essence, MySQL will automatically:

  • re-number the associated entries in the user.course column if the course.id changes
  • reject any attempt to delete a course where users are enrolled.

important: This is terrible database design!

This database is not efficient. It’s fine for this example, but a student can only be enrolled on zero or one course. A real system would need to overcome this restriction — probably using an intermediate ‘enrollment’ table which mapped any number of students to any number of courses.

JOINs allow us to query this data in a number of ways.


The most frequently used clause is INNER JOIN. This produces a set of records which match in both the user and course tables, i.e. all users who are enrolled on a course:

  1. SELECT user.name, course.name  
  2. FROM `user`  
  3. INNER JOIN `course` on user.course = course.id;  

Result:

user.name
course.name
Alice
HTML5
Bob
HTML5
Carline
CSS3
David
MySQL



What if we require a list of all students and their courses even if they’re not enrolled on one? A LEFT JOIN produces a set of records which matches every entry in the left table (user) regardless of any matching entry in the right table (course):

  1. SELECT user.name, course.name  
  2. FROM `user`  
  3. LEFT JOIN `course` on user.course = course.id;  

Result:

user.name
course.name
Alice
HTML5
Bob
HTML5
Carline
CSS3
David
MySQL
Emma
(NULL)


Perhaps we require a list all courses and students even if no one has been enrolled? A RIGHT JOIN produces a set of records which matches every entry in the right table (course) regardless of any matching entry in the left table (user):

  1. SELECT user.name, course.name  
  2. FROM `user`  
  3. RIGHT JOIN `course` on user.course = course.id;  

Result:

user.name
course.name
Alice
HTML5
Bob
HTML5
Carline
CSS3
(NULL)
JavaScript
(NULL)
PHP
David
MySQL

RIGHT JOINs are rarely used since you can express the same result using a LEFT JOIN. This can be more efficient and quicker for the database to parse:

  1. SELECT user.name, course.name  
  2. FROM `course`  
  3. LEFT JOIN `user` on user.course = course.id;  

We could, for example, count the number of students enrolled on each course:

  1. SELECT course.name, COUNT(user.name)  
  2. FROM `course`  
  3. LEFT JOIN `user` ON user.course = course.id  
  4. GROUP BY course.id;  

Result:

course.name
count()
HTML5
2
CSS3
1
JavaScript
0
PHP
0
MySQL
1





Our last option is the OUTER JOIN which returns all records in both tables regardless of any match. Where no match exists, the missing side will contain NULL.

OUTER JOIN is less useful than INNER, LEFT or RIGHT and it’s not implemented in MySQL. However, you can work around this restriction using the UNION of a LEFT and RIGHT JOIN, e.g.

  1. SELECT user.name, course.name  
  2. FROM `user`  
  3. LEFT JOIN `course` on user.course = course.id  
  4. UNION  
  5. SELECT user.name, course.name  
  6. FROM `user`  
  7. RIGHT JOIN `course` on user.course = course.id;  

Result:

user.name
course.name
Alice
HTML5
Bob
HTML5
Carline
CSS3
David
MySQL
Emma
(NULL)
(NULL)
JavaScript
(NULL)
PHP

I hope that gives you a better understanding of JOINs and helps you write more efficient SQL queries.

The InnoDB Monitor


Starting from version 3.23.41 InnoDB includes the InnoDB Monitor which prints information on the InnoDB internal state. When swithed on, InnoDB Monitor will make the MySQL server `mysqld' to print data (note: the MySQL client will not print anything) to the standard output about once every 15 seconds. This data is useful in performance tuning.

There is a separate innodb_lock_monitor which prints the same information as innodb_monitor plus information on locks set by each transaction.

The printed information includes data on:

  • lock waits of a transactions,
  • semaphore waits of threads,
  • pending file i/o requests,
  • buffer pool statistics, and
  • purge and insert buffer merge activity of the main thread of InnoDB.

You can start InnoDB Monitor through the following SQL command:

 
CREATE TABLE innodb_monitor(a int) Engine = innodb;

and stop it by

 
DROP TABLE innodb_monitor;

The CREATE TABLE syntax is just a way to pass a command to the InnoDB engine through the MySQL SQL parser: the created table is not relevant at all for InnoDB Monitor. If you shut down the database when the monitor is running, and you want to start the monitor again, you have to drop the table before you can issue a new CREATE TABLE to start the monitor. This syntax may change in a future release.

A sample output of the InnoDB Monitor:

=====================================
120727 15:50:57 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 3 1_second, 3 sleeps, 0 10_second, 4 background, 4 flush
srv_master_thread log flush and writes: 4
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 3, signal count 3
Mutex spin waits 1, rounds 25, OS waits 0
RW-shared spins 3, rounds 90, OS waits 3
RW-excl spins 0, rounds 0, OS waits 0
Spin rounds per wait: 25.00 mutex, 30.00 RW-shared, 0.00 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 2501
Purge done for trx's n:o < 1F2E undo n:o < 0
History list length 23
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 67, OS thread handle 0x1a00, query id 381 localhost 127.0.0.1 root
--------
FILE I/O
--------
I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: wait Windows aio (read thread)
I/O thread 3 state: wait Windows aio (read thread)
I/O thread 4 state: wait Windows aio (read thread)
I/O thread 5 state: wait Windows aio (read thread)
I/O thread 6 state: wait Windows aio (write thread)
I/O thread 7 state: wait Windows aio (write thread)
I/O thread 8 state: wait Windows aio (write thread)
I/O thread 9 state: wait Windows aio (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
589 OS file reads, 16 OS file writes, 7 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 244997, node heap has 1 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 12090794
Log flushed up to   12090794
Last checkpoint at  12090794
0 pending log writes, 0 pending chkp writes
10 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 62685184; in additional pool allocated 0
Dictionary memory allocated 83635
Buffer pool size   3776
Free buffers       3324
Database pages     451
Old database pages 0
Modified db pages  0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 450, created 1, written 10
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 451, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread id 3904, state: waiting for server activity
Number of rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================



Below is the description of each section..

SEMAPHORES
This section reports threads waiting for a semaphore and statistics on how many times threads have needed a spin or a wait on a mutex or a rw-lock semaphore. A large number of threads waiting for semaphores may be a result of disk I/O, or contention problems inside InnoDB. Contention can be due to heavy parallelism of queries or problems in operating system thread scheduling. Setting the innodb_thread_concurrency system variable smaller than the default value might help in such situations.

LATEST FOREIGN KEY ERROR
This section provides information about the most recent foreign key constraint error. It is not present if no such error has occurred. The contents include the statement that failed as well as information about the constraint that failed and the referenced and referencing tables.

LATEST DETECTED DEADLOCK
This section provides information about the most recent deadlock. It is not present if no deadlock has occurred. The contents show which transactions are involved, the statement each was attempting to execute, the locks they have and need, and which transaction InnoDB decided to roll back to break the deadlock. The lock modes reported in this section are explained in Section 14.2.8.1, “InnoDB Lock Modes”.

TRANSACTIONS
If this section reports lock waits, your applications might have lock contention. The output can also help to trace the reasons for transaction deadlocks.

FILE I/O
This section provides information about threads that InnoDB uses to perform various types of I/O. The first few of these are dedicated to general InnoDB processing. The contents also display information for pending I/O operations and statistics for I/O performance.

On Unix, the number of threads is always 4. On Windows, the number depends on the setting of the innodb_file_io_threads system variable.

INSERT BUFFER AND ADAPTIVE HASH INDEX
This section shows the status of the InnoDB insert buffer and adaptive hash index. (See Section 14.2.10.3, “Insert Buffering”, and Section 14.2.10.4, “Adaptive Hash Indexes”.) The contents include the number of operations performed for each, plus statistics for hash index performance.

LOG
This section displays information about the InnoDB log. The contents include the current log sequence number, how far the log has been flushed to disk, and the position at which InnoDB last took a checkpoint. (See Section 14.2.6.3, “InnoDB Checkpoints”.) The section also displays information about pending writes and write performance statistics.

BUFFER POOL AND MEMORY
This section gives you statistics on pages read and written. You can calculate from these numbers how many data file I/O operations your queries currently are doing.

For additional information about the operation of the buffer pool, see Section 8.6.2, “The InnoDB Buffer Pool”.

ROW OPERATIONS
This section shows what the main thread is doing, including the number and performance rate for each type of row operation.

Performance of MySQL Semi-Synchronous Replication

This is a blogpost by Aaron http://blog.9minutesnooze.com/performance-mysql-replication-high-latency/

I have seen a few posts on DBA.SE (where I answer a lot of questions) recommending the use of semi-synchronous replication in MySQL 5.5 over a WAN as a way to improve the reliability of replication. My gut reaction was that this is a very bad idea with even a tiny write load, but I wanted to test it out to confirm. Please note that I do not mean to disparage the author of those posts, a user whom I have great respect for.

What is semi-synchronous replication?

The short version is that one slave has to acknowledge receipt of the binary log event before the query returns. The slave doesn’t have to execute it before returning control so it’s still an asynchronous commit. The net effect is that the slave should only miss a maximum of 1 event if the master has a catastrophic failure. It does not improve the reliability of replication itself or prevent data drift.

What about performance, though? Semi-synchronous replication causes the client to block until a slave has acknowledged that it has received the event. On a LAN with sub-millisecond latencies, this should not present much of a problem. But what if there is 85ms of latency between the master and the slave, as is the case between Virginia and California? My hypothesis is that, with 85ms of latency, it is impossible to get better than 11 write queries (INSERT/UPDATE/DELETE) per second – 1000ms / 85ms = 11.7.

Let’s test that out.

I spun up identical m1.small instances in EC2′s us-east-1 and us-west-1 regions using the latest Ubuntu 12.04 LTS AMI from Alestic and installed the latest Percona Server 5.5 from their apt repository.

gpg --keyserver  hkp://keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A

gpg -a --export CD2EFD2A | apt-key add -

echo deb http://repo.percona.com/apt precise main > /etc/apt/sources.list.d/percona.list

apt-get update

apt-get install percona-server-server-5.5 libmysqlclient-dev

Although mostly irrelevant to the benchmarks, my.cnf is configured thusly (basically identical to support-files/my-huge.cnf shipped with the distribution):

[mysqld]

port          = 3306

socket          = /var/run/mysql/mysql.sock

skip-external-locking

key_buffer_size = 384M

max_allowed_packet = 1M

table_open_cache = 512

sort_buffer_size = 2M

read_buffer_size = 2M

read_rnd_buffer_size = 8M

myisam_sort_buffer_size = 64M

thread_cache_size = 8

query_cache_size = 32M

thread_concurrency = 8



server-id     = 1

log-bin=mysql-bin

binlog_format=mixed



innodb_data_home_dir = /var/lib/mysql

innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend

innodb_log_group_home_dir = /var/lib/mysql

innodb_buffer_pool_size = 384M

innodb_additional_mem_pool_size = 20M

innodb_log_file_size = 100M

innodb_log_buffer_size = 8M

innodb_flush_log_at_trx_commit = 0

innodb_lock_wait_timeout = 50

Then, I wrote a very simple Ruby script to perform 10k inserts into a table, using the sequel gem, which I love.

apt-get install rubygems

gem install sequel mysql2 --no-rdoc --no-ri



#!/usr/bin/env ruby

# insertperf.rb



require 'logger'

require 'rubygems'

require 'sequel'



logger = Logger.new(STDOUT)

localdb = "inserttest"



db = Sequel.connect( :database => localdb,

                     :adapter  => 'mysql2',

                     :user     => 'root',

                     :logger   => logger )



db["DROP DATABASE IF EXISTS #{localdb}"].all

db["CREATE DATABASE #{localdb}"].all

db["CREATE TABLE IF NOT EXISTS #{localdb}.foo (

  id int unsigned AUTO_INCREMENT PRIMARY KEY,

  text VARCHAR(8)

) ENGINE=InnoDB"].all



n = 10000

t1 = Time.new

n.times do

  value = (0...8).map{65.+(rand(25)).chr}.join

  db["INSERT INTO #{localdb}.foo (text) VALUES (?)", value].insert

end

t2 = Time.new

elapsed = t2-t1

logger.info "Elapsed: #{elapsed} seconds. #{n/elapsed} qps"

With MySQL configured, let’s knock out a few INSERTs into the us-east-1 database, which has no slaves:

# w/ no slaves

...

INFO -- : (0.000179s) INSERT INTO test.foo (text) VALUES ('FKGDLOWD')

...

INFO -- : Elapsed: 9.37364 seconds. 1066.82142689499 qps

My control is roughly 1000 inserts/sec with each query taking less than .2ms.

Then, I set up a traditional, asynchronous MySQL slave on the server in us-west-1 and ran the test again:

# w/ traditional replication

...

INFO -- : (0.000237s) INSERT INTO test.foo (text) VALUES ('CVGAMLXA')

...

INFO -- : Elapsed: 10.601943 seconds. 943.223331798709 qps

Somewhat inexplicably, the performance was slightly worse with the slave attached, but not by much. ~950 inserts/sec

Next is semi-synchronous replication. First, I tested the latency between us-east-1 and us-west-1.

# ping -c 1 184.72.189.235

PING 184.72.189.235 (184.72.189.235) 56(84) bytes of data.

64 bytes from 184.72.189.235: icmp_req=1 ttl=52 time=85.5 ms

Latency between us-east-1 and us-west-1 is 85ms, so I still predict 11 inserts/sec at most, which means my script will take 15 minutes instead of 10 seconds:

I set up semi-synchronous replication like this:

master> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';

master> SET GLOBAL rpl_semi_sync_master_enabled = 1;

slave> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';

slave> SET GLOBAL rpl_semi_sync_slave_enabled = 1;

slave> STOP SLAVE; START SLAVE;

I started the script and, as predicted, each insert was taking approximately 85ms. There was a screaming 2 month old in the next room so in the interest of brevity, I reduced the count from 10k to 1k. That should take about 90s:

# w/ semi-sync replication

...

INFO -- : (0.086301s) INSERT INTO test.foo (text) VALUES ('JKIJTUDO')

...

INFO -- : Elapsed: 86.889529 seconds. 11.5088666207409 qps

Just as I suspected – 11 inserts/sec.

In conclusion, the speed of light is a bitch, so don’t enable semi-synchronous replication over wide area or high latency networks.