Usually we thinks/expect
that inserts with large number of threads would to be faster but this is not
always true(Due to MySQL limitations). The bottleneck might be the CPU, IO
controller and OS. MySQL cannot fully use available cores/cpus in server e.g. MySQL 5.0 (Innodb) is limited
to 4 cores etc. We are working on scaling inserts, we have two data processing
clusters each of which use 50 threads - so total 100 threads to insert data into MySQL database (version
5.0.51). The issue, inserts are delayed by minute(s) and the backlog continues
to grow... After examining innodb monitor status we found long list of
transactions waiting for AUTO-INC lock: For example:
------- TRX HAS BEEN WAITING 9 SEC FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `alertxxxx/alertdata` trx id 0 685590474 lock mode AUTO-INC waiting ------------------ ---TRANSACTION 0 685590465, ACTIVE 10 sec, process no 8457, OS thread id 1169045824 setting auto-inc lock
Why AUTO-INC lock?
[
Warning] O_DIRECT, serializes
the writes in ext3. Howerver, impact can be lowered by using
innodb_file_per_table)
3. Innodb thread concurrency - Keep the value low for this option variable (default 8 ok), however the correct value for this variable is dependent on environment and workload. This option variable is explained MySQL Site
4. Innodb buffer pool - Innodb maintains a buffer pool for caching data and indexes in memory. Making the pool larger can improve performance by reducing the amount of disk I/O needed, here is the detail about it.
5. Innodb log file size- The larger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. But larger log files also mean that recovery is slower in case of a crash. As of MySQL 5.5, recovery times have been greatly improved and the whole log file flushing algorithm has been improved. In 5.5 you generally want larger log files as recovery is improved.
6. Filesystem selection and disk issues - 'xfs' is known to perform very well with MySQL. Also writing the redo logs, binary logs, data files in different physical disks is a good practice with a bigger gain than server configuration. RAID 10 is recommended for best performance, more detail about disk issue can be found at MySQL Site .
------- TRX HAS BEEN WAITING 9 SEC FOR THIS LOCK TO BE GRANTED: TABLE LOCK table `alertxxxx/alertdata` trx id 0 685590474 lock mode AUTO-INC waiting ------------------ ---TRANSACTION 0 685590465, ACTIVE 10 sec, process no 8457, OS thread id 1169045824 setting auto-inc lock
Why AUTO-INC lock?
When accessing the auto-increment counter,
InnoDB engine uses a special table-level AUTO-INC lock that it keeps to the end
of the current SQL statement, not to the end of the transaction. This basically
causes all inserts into the same table to serialize. With single row inserts it
is normally not too bad but could prevent scalability with multiple threads
inserting Bug #16979. However, we can get better through put (inserts per second)
with less number of threads. So after
dropping number of threads on both clusters by 50% initially - taking it to
20-20 sessions. The problem almost disappeared and when we further reduced
number of threads to 10-10 sessions, the problem disappeared!
Beginning with MySQL 5.1.22 - new locking model introduced for handling Innodb auto-increment in InnoDB. There is a good article which talks about this MySQL Site and Blog
Similarly, if you want to achieve fast insert performance, it can be interesting to load files instead of the loading the inserts one by one : it is 3 to 4 times faster. If the goal is a huge amount of data already known at that time, it is probably the best option.
Optimization: We can’t use load files for all projects. We need to insert data row by row. For same we can optimize variables/Configuration.
1. Optimize database structure - Design your tables to minimize their space on the disk. This can result in huge improvements by reducing the amount of data written to and read from disk. Smaller tables normally require less main memory while their contents are being actively processed during query execution. Any space reduction for table data also results in smaller indexes that can be processed faster. Ensure columns have the right data types and require least amount of storage; you should try to use the most precise type in all cases. For example, if an integer column is used for values in the range from 1 to 99999, MEDIUMINT UNSIGNED is the best type. For more information about optimizing database structure click MySQL Site. If you store large strings (TEXT) or BLOB, compression may help there.
Beginning with MySQL 5.1.22 - new locking model introduced for handling Innodb auto-increment in InnoDB. There is a good article which talks about this MySQL Site and Blog
Similarly, if you want to achieve fast insert performance, it can be interesting to load files instead of the loading the inserts one by one : it is 3 to 4 times faster. If the goal is a huge amount of data already known at that time, it is probably the best option.
Optimization: We can’t use load files for all projects. We need to insert data row by row. For same we can optimize variables/Configuration.
1. Optimize database structure - Design your tables to minimize their space on the disk. This can result in huge improvements by reducing the amount of data written to and read from disk. Smaller tables normally require less main memory while their contents are being actively processed during query execution. Any space reduction for table data also results in smaller indexes that can be processed faster. Ensure columns have the right data types and require least amount of storage; you should try to use the most precise type in all cases. For example, if an integer column is used for values in the range from 1 to 99999, MEDIUMINT UNSIGNED is the best type. For more information about optimizing database structure click MySQL Site. If you store large strings (TEXT) or BLOB, compression may help there.
Use procedure analyse () function to check
optimal data type based on table . But before changing data type, check your
future plans, So it will not create a
problem in near future.
2. Innodb flush method - e.g. O_DIRECT, if used can help to
avoid double buffering between the InnoDB buffer pool and the operating
system's filesystem cache. MySQL reference manual explain this MySQL Site.
[
3. Innodb thread concurrency - Keep the value low for this option variable (default 8 ok), however the correct value for this variable is dependent on environment and workload. This option variable is explained MySQL Site
4. Innodb buffer pool - Innodb maintains a buffer pool for caching data and indexes in memory. Making the pool larger can improve performance by reducing the amount of disk I/O needed, here is the detail about it.
5. Innodb log file size- The larger the value, the less checkpoint flush activity is needed in the buffer pool, saving disk I/O. But larger log files also mean that recovery is slower in case of a crash. As of MySQL 5.5, recovery times have been greatly improved and the whole log file flushing algorithm has been improved. In 5.5 you generally want larger log files as recovery is improved.
6. Filesystem selection and disk issues - 'xfs' is known to perform very well with MySQL. Also writing the redo logs, binary logs, data files in different physical disks is a good practice with a bigger gain than server configuration. RAID 10 is recommended for best performance, more detail about disk issue can be found at MySQL Site .
No comments:
Post a Comment