Monday, August 24, 2009

MySQL Engines and Optimization

MySQL Storage Engines

Storage engines are responsible for storing and retrieving all the data stored “in” MySQL. Each storage engine has its own strengths and weaknesses that determine its suitability in a particular situation.

MyISAM

MyISAM is the default storage engine in MySQL and provides a good comprimise between performance and features.

Pros:

  • platform independent

  • concurrent inserts

  • full-text indexes

  • compression

Cons:

  • no transations

  • table-level locking

  • potentially long repair times

Good For:

  • Applications with many reads and few writes.


InnoDB

InnoDB is a transactional storage engine that uses MVCC and row-level locking, and includes automatic crash recovery.

Pros:

  • ACID transactions

  • row-level locking and MVCC

  • crash recovery

  • clustered indexes

  • foreign key constraints

Cons:

  • alterations to table structure can be slow on large tables

Good For:

  • Online ordering and other transaction based applications.


Memory

Memory tables store all their data in memory which means they are very fast because there is no waiting for disk I/O. They also use hash indexes which makes them very fast for lookup queries. The table definition of a Memory table will survive a server restart, but all data will be lost.

Pros:

  • Very fast

Cons:

  • Uses fixed-length rows which can waste memory

  • Table-level locking

  • No support for TEXT or BLOB datatypes

  • No transactions

Used for:

  • Lookup or mapping tables

  • Caching results of periodically aggregated data

  • Intermediate results when analysing data


Archive

The archive engine is optimised for high-speed inserting and data compression. It supports only INSERT and SELECT queries and doesn’t support DELETE, REPLACE, or UPDATE queries, or indexes. Rows are buffered and compressed using zlib as they are inserted which means much less disk I/O than MyISAM tables.

Pros:

  • Fast INSERTs

  • Compression

Cons:

  • No support for indexes, SELECTs will perform a full table scan.

Uses:

  • Storing large amounts of rarely accessed data in a very small footprint, e.g. logs and audit records


CSV

The CSV storage engine stores data in text files using comma-seperated values. Other applications can open the table data file directly and read the contents. Likewise, if an application exports a CSV and saves it in the server’s data directory, the server can read the file straight away. CSV tables do not support indexes.

Uses:

  • Data interchange and certain types of logging.


Other Engines

There are several other storage engines available.

  • Blackhole - Essentially a no-op storage engine, all INSERTS are discarded although they are recorded in the binary log and can be replayed on slaves.

  • Federated - Federated tables refer to tables on a remote MySQL server.

  • NDB Cluster - A specialised storage engine designed for high-performance with redundancy and load-balancing capabilities.

  • Falcon - A next-generation storage engine designed for todays hardware (64bit CPUs and plenty of memory).

  • Maria - A replacement for MyISAM that includes transactions, row-level locking, MVCC, and better crash recovery.

Blackhole, Federated and NDB Cluster are suitable only for specific purposes and should only be used after careful consideration. Falcon and Maria are the two modern storage engines although neither are currently considered production-stable.

--------------------------------------------------------------------------------


Optimization

Optimizing for read performance

Key buffer

The key buffer stores database indexes in memory. This buffer should be large enough to hold all indexes used by eZ Publish. This should be in the range of hundreds of megabytes. Sites with large amounts of data require larger key buffers. To allocate a buffer of 500MB:

key_buffer = 500M

To find a suitable value for the key buffer, investigate the status variables key_read_requests and key_reads. The key_read_requests is the total number of key requests served from the cache while the key_reads shows the number of times MySQL had to access the filesystem to fetch the keys.

The lower the number of key_reads the better. The more memory you allocate to the key buffer the more requests will be served from the cache. There will always be some keys that need to be read from disk (for example when data changes), so the value will never be zero. By comparing the two values you see the hit ratio of your key buffer. The key_read_requests should be much larger than the key_reads. 99% cached requests is a good number to aim for in a read-intensive environment.

Table cache

The table cache tells MySQL how many tables it can have open at any one time. In SQL queries, several tables are typically joined. The rule of thumb is that you should multiply the maximum number of connections (described below) by the maximum number of tables used in joins. For example, if the maximum number of connections is set to 400, the table cache should be at least 400 * 10. The configuration setting below shows a table cache of 4000:

table_cache = 4000

Sort buffers

MySQL sorts query results before they are returned. The sort buffer is per connection, so you must multiply the size of the sort buffer by the maximum number of connections to predict the server memory requirements. In our case we use a 3MB sort buffer with 400 max connections, which can use a total of 1.2GB of memory.

sort_buffer_size = 3M

Max connections

MySQL has a limitation on the number of concurrent connections it can keep open. If you are using persistent connections in PHP, each process in Apache will keep a connection to MySQL open. This means that you need to set the number of max connections in MySQL to equal or greater than the number of Apache processes that can connect to the database. In a clustered environment, you must add up the processess on each webserver to determine the maximum. Setting sufficient max connections also ensures that users do not get errors about connecting to the MySQL database. The setting for 400 connections is shown below.

max_connections = 400

Query cache

MySQL is capable of caching the results of a query. The next time the same query is executed the result is immediately returned, as it is read from the cache rather than the database. For a read-intensive site, this can provide a significant performance improvement.

To enable the query cache, set the type to "1":

query_cache_type = 1

You can set the maximim size of each query that can be cached. If the query result is larger than the query cache limit, the results will not be cached. This is normally set to 1M:

query_cache_limit = 1M

The amount of memory globally available for query caches is set with the query cache size setting. This should be fairly large, and should be increased in size for large databases.

query_cache_size = 100M

To tune the query cache, use the show status command. This can be used to determine which settings need to be altered and to see the effect of alterations. The show status command will show you if the query cache is heavily in use and if you have free memory, which indicates whether the query cache buffer settings should be increased or decreased.

mysql> show status like "qcache%";  +-------------------------+----------+  | Variable_name           | Value    |  +-------------------------+----------+  | Qcache_free_blocks      | 34       |  | Qcache_free_memory      | 16466312 |  | Qcache_hits             | 1313227  |  | Qcache_inserts          | 78096    |  | Qcache_lowmem_prunes    | 0        |  | Qcache_not_cached       | 3328     |  | Qcache_queries_in_cache | 140      |  | Qcache_total_blocks     | 346      |  +-------------------------+----------+  8 rows in set (0.00 sec)

Optimizing for write performance


Disable flush transaction on commit

When using InnoDB, by default MySQL flushes data to disk when transactions are commited. This means that each transaction is flushed to disk when it occurs. This provides data security in case the database server crashes.

The default behaviour can be overridden with the following setting:

innodb_flush_log_at_trx_commit = 0

This setting makes MySQL flush the transaction cache every second instead of after each commit. This means transactions are not flushed to disk the moment they happen. While this improves performance, you must decide whether the risk of losing data due to a server crash is acceptable.

InnoDB buffer pool size

The InnoDB buffer pool caches table data and indexes. The larger the size of the buffer pool, the more data can be cached and the less disk I/O used. The InnoDB memory buffer pool in MySQL is by default quite low and should be made as large as 70% of the available memory. ("Available memory" means the memory not used by any other application or by another buffer in MySQL.) We increase this to 700MB to increase performance.

innodb_buffer_pool_size = 700M

InnoDB additional mem pool size

The InnoDB additional mem pool is the buffer used to store internal data structures. The more tables in the database, the more memory is required. If the additional mem pool size is not large enough to store data about the InnoDB tables, MySQL will use system memory and will write warnings to the error log.

innodb_additional_mem_pool_size = 50M

Key buffer

The key buffer is a memory cache of the indexes in a MySQL database. A large key buffer means that more indexes fit in memory and thus there is a faster execution of queries using indexes. We increase this to 500MB; the default is 16MB.

key_buffer = 500M

Log buffer size

The log buffer stores the transactions in memory before they are flushed to disk. By making the log buffer size larger, MySQL can wait longer before flushing the transaction log to disk and therefore use less disk I/O. The size recommended by MySQL is between 1MB and 8MB. We used 8MB for our test, which actually made MySQL a bit slower compared to the 1MB default. Therefore, we recommend somewhere in between, for example 4MB.

innodb_log_buffer_size = 4M

No comments: