Friday, July 27, 2012

Understanding JOINs MySQL

“JOIN” is a SQL keyword used to query data from two or more related tables. Unfortunately, the concept is regularly explained using abstract terms or differs between database systems. It often confuses me. Developers cope with enough confusion, so this is my attempt to explain JOINs briefly and succinctly to myself and anyone who’s interested.

Related Tables

MySQL, PostgreSQL, Firebird, SQLite, SQL Server and Oracle are relational database systems. A well-designed database will provide a number of tables containing related data. A very simple example would be users (students) and course enrollments:

‘user’ table:

id
name
course
1
Alice
1
2
Bob
1
3
Caroline
2
4
David
5
5
Emma
(NULL)

MySQL table creation code:

  1. CREATE TABLE `user` (  
  2.     `id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,  
  3.     `name` varchar(30) NOT NULL,  
  4.     `course` smallint(5) unsigned DEFAULT NULL,  
  5.     PRIMARY KEY (`id`)  
  6. ) ENGINE=InnoDB;  

The course number relates to a subject being taken in a course table…

‘course’ table:

id
name
1
HTML5
2
CSS3
3
JavaScript
4
PHP
5
MySQL

MySQL table creation code:

  1. CREATE TABLE `course` (  
  2.     `id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,  
  3.     `name` varchar(50) NOT NULL,  
  4.     PRIMARY KEY (`id`)  
  5. ) ENGINE=InnoDB;  

Since we’re using InnoDB tables and know that user.course and course.id are related, we can specify a foreign key relationship:

  1. ALTER TABLE `user`  
  2. ADD CONSTRAINT `FK_course`  
  3. FOREIGN KEY (`course`) REFERENCES `course` (`id`)  
  4. ON UPDATE CASCADE;  

In essence, MySQL will automatically:

  • re-number the associated entries in the user.course column if the course.id changes
  • reject any attempt to delete a course where users are enrolled.

important: This is terrible database design!

This database is not efficient. It’s fine for this example, but a student can only be enrolled on zero or one course. A real system would need to overcome this restriction — probably using an intermediate ‘enrollment’ table which mapped any number of students to any number of courses.

JOINs allow us to query this data in a number of ways.


The most frequently used clause is INNER JOIN. This produces a set of records which match in both the user and course tables, i.e. all users who are enrolled on a course:

  1. SELECT user.name, course.name  
  2. FROM `user`  
  3. INNER JOIN `course` on user.course = course.id;  

Result:

user.name
course.name
Alice
HTML5
Bob
HTML5
Carline
CSS3
David
MySQL



What if we require a list of all students and their courses even if they’re not enrolled on one? A LEFT JOIN produces a set of records which matches every entry in the left table (user) regardless of any matching entry in the right table (course):

  1. SELECT user.name, course.name  
  2. FROM `user`  
  3. LEFT JOIN `course` on user.course = course.id;  

Result:

user.name
course.name
Alice
HTML5
Bob
HTML5
Carline
CSS3
David
MySQL
Emma
(NULL)


Perhaps we require a list all courses and students even if no one has been enrolled? A RIGHT JOIN produces a set of records which matches every entry in the right table (course) regardless of any matching entry in the left table (user):

  1. SELECT user.name, course.name  
  2. FROM `user`  
  3. RIGHT JOIN `course` on user.course = course.id;  

Result:

user.name
course.name
Alice
HTML5
Bob
HTML5
Carline
CSS3
(NULL)
JavaScript
(NULL)
PHP
David
MySQL

RIGHT JOINs are rarely used since you can express the same result using a LEFT JOIN. This can be more efficient and quicker for the database to parse:

  1. SELECT user.name, course.name  
  2. FROM `course`  
  3. LEFT JOIN `user` on user.course = course.id;  

We could, for example, count the number of students enrolled on each course:

  1. SELECT course.name, COUNT(user.name)  
  2. FROM `course`  
  3. LEFT JOIN `user` ON user.course = course.id  
  4. GROUP BY course.id;  

Result:

course.name
count()
HTML5
2
CSS3
1
JavaScript
0
PHP
0
MySQL
1





Our last option is the OUTER JOIN which returns all records in both tables regardless of any match. Where no match exists, the missing side will contain NULL.

OUTER JOIN is less useful than INNER, LEFT or RIGHT and it’s not implemented in MySQL. However, you can work around this restriction using the UNION of a LEFT and RIGHT JOIN, e.g.

  1. SELECT user.name, course.name  
  2. FROM `user`  
  3. LEFT JOIN `course` on user.course = course.id  
  4. UNION  
  5. SELECT user.name, course.name  
  6. FROM `user`  
  7. RIGHT JOIN `course` on user.course = course.id;  

Result:

user.name
course.name
Alice
HTML5
Bob
HTML5
Carline
CSS3
David
MySQL
Emma
(NULL)
(NULL)
JavaScript
(NULL)
PHP

I hope that gives you a better understanding of JOINs and helps you write more efficient SQL queries.

The InnoDB Monitor


Starting from version 3.23.41 InnoDB includes the InnoDB Monitor which prints information on the InnoDB internal state. When swithed on, InnoDB Monitor will make the MySQL server `mysqld' to print data (note: the MySQL client will not print anything) to the standard output about once every 15 seconds. This data is useful in performance tuning.

There is a separate innodb_lock_monitor which prints the same information as innodb_monitor plus information on locks set by each transaction.

The printed information includes data on:

  • lock waits of a transactions,
  • semaphore waits of threads,
  • pending file i/o requests,
  • buffer pool statistics, and
  • purge and insert buffer merge activity of the main thread of InnoDB.

You can start InnoDB Monitor through the following SQL command:

 
CREATE TABLE innodb_monitor(a int) Engine = innodb;

and stop it by

 
DROP TABLE innodb_monitor;

The CREATE TABLE syntax is just a way to pass a command to the InnoDB engine through the MySQL SQL parser: the created table is not relevant at all for InnoDB Monitor. If you shut down the database when the monitor is running, and you want to start the monitor again, you have to drop the table before you can issue a new CREATE TABLE to start the monitor. This syntax may change in a future release.

A sample output of the InnoDB Monitor:

=====================================
120727 15:50:57 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 3 1_second, 3 sleeps, 0 10_second, 4 background, 4 flush
srv_master_thread log flush and writes: 4
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 3, signal count 3
Mutex spin waits 1, rounds 25, OS waits 0
RW-shared spins 3, rounds 90, OS waits 3
RW-excl spins 0, rounds 0, OS waits 0
Spin rounds per wait: 25.00 mutex, 30.00 RW-shared, 0.00 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 2501
Purge done for trx's n:o < 1F2E undo n:o < 0
History list length 23
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 67, OS thread handle 0x1a00, query id 381 localhost 127.0.0.1 root
--------
FILE I/O
--------
I/O thread 0 state: wait Windows aio (insert buffer thread)
I/O thread 1 state: wait Windows aio (log thread)
I/O thread 2 state: wait Windows aio (read thread)
I/O thread 3 state: wait Windows aio (read thread)
I/O thread 4 state: wait Windows aio (read thread)
I/O thread 5 state: wait Windows aio (read thread)
I/O thread 6 state: wait Windows aio (write thread)
I/O thread 7 state: wait Windows aio (write thread)
I/O thread 8 state: wait Windows aio (write thread)
I/O thread 9 state: wait Windows aio (write thread)
Pending normal aio reads: 0 [0, 0, 0, 0] , aio writes: 0 [0, 0, 0, 0] ,
 ibuf aio reads: 0, log i/o's: 0, sync i/o's: 0
Pending flushes (fsync) log: 0; buffer pool: 0
589 OS file reads, 16 OS file writes, 7 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 244997, node heap has 1 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 12090794
Log flushed up to   12090794
Last checkpoint at  12090794
0 pending log writes, 0 pending chkp writes
10 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total memory allocated 62685184; in additional pool allocated 0
Dictionary memory allocated 83635
Buffer pool size   3776
Free buffers       3324
Database pages     451
Old database pages 0
Modified db pages  0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 450, created 1, written 10
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
No buffer pool page gets since the last printout
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 451, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread id 3904, state: waiting for server activity
Number of rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================



Below is the description of each section..

SEMAPHORES
This section reports threads waiting for a semaphore and statistics on how many times threads have needed a spin or a wait on a mutex or a rw-lock semaphore. A large number of threads waiting for semaphores may be a result of disk I/O, or contention problems inside InnoDB. Contention can be due to heavy parallelism of queries or problems in operating system thread scheduling. Setting the innodb_thread_concurrency system variable smaller than the default value might help in such situations.

LATEST FOREIGN KEY ERROR
This section provides information about the most recent foreign key constraint error. It is not present if no such error has occurred. The contents include the statement that failed as well as information about the constraint that failed and the referenced and referencing tables.

LATEST DETECTED DEADLOCK
This section provides information about the most recent deadlock. It is not present if no deadlock has occurred. The contents show which transactions are involved, the statement each was attempting to execute, the locks they have and need, and which transaction InnoDB decided to roll back to break the deadlock. The lock modes reported in this section are explained in Section 14.2.8.1, “InnoDB Lock Modes”.

TRANSACTIONS
If this section reports lock waits, your applications might have lock contention. The output can also help to trace the reasons for transaction deadlocks.

FILE I/O
This section provides information about threads that InnoDB uses to perform various types of I/O. The first few of these are dedicated to general InnoDB processing. The contents also display information for pending I/O operations and statistics for I/O performance.

On Unix, the number of threads is always 4. On Windows, the number depends on the setting of the innodb_file_io_threads system variable.

INSERT BUFFER AND ADAPTIVE HASH INDEX
This section shows the status of the InnoDB insert buffer and adaptive hash index. (See Section 14.2.10.3, “Insert Buffering”, and Section 14.2.10.4, “Adaptive Hash Indexes”.) The contents include the number of operations performed for each, plus statistics for hash index performance.

LOG
This section displays information about the InnoDB log. The contents include the current log sequence number, how far the log has been flushed to disk, and the position at which InnoDB last took a checkpoint. (See Section 14.2.6.3, “InnoDB Checkpoints”.) The section also displays information about pending writes and write performance statistics.

BUFFER POOL AND MEMORY
This section gives you statistics on pages read and written. You can calculate from these numbers how many data file I/O operations your queries currently are doing.

For additional information about the operation of the buffer pool, see Section 8.6.2, “The InnoDB Buffer Pool”.

ROW OPERATIONS
This section shows what the main thread is doing, including the number and performance rate for each type of row operation.