Home News Feeds Planet MySQL
Newsfeeds
Planet MySQL
Planet MySQL - http://www.planetmysql.org/

  • Joins: inner, outer, left, right
    In (My)SQL, join is a means for combining records from two tables into a single set which can be either returned as is or used in another join. In order to perform the operation a join has to define the relationship between records in either table, as well as the way it will evaluate the relationship. The relationship itself is created through a set of conditions that are part of the join and usually are put inside ON clause. The rest is determined through a join type, which can either be an inner join or an outer join. The SQL clauses that set the respective join type in a query are [INNER] JOIN and {LEFT | RIGHT} [OUTER] JOIN. As you can see the actual keywords INNER and OUTER are optional and can be omitted, however outer joins require specifying the direction – either left or right. Examples of queries: SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.name = 1; SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 1; Inner join, outer join The primary difference between the two basic types (each has several subtypes) is in making the decision whether joining of two rows was successful or not, which essentially determines whether the combined row can be returned or not. Inner joins require that a row from the first table has a match in the second table based on the join conditions. In means that the first query from the example above will only return any rows if files table contains at least one record where owner_id is 1 (has to be equal to users.id by the join conditions and users.id is filtered in WHERE to accept only that one value). Otherwise it will return no rows at all, even if users contains a valid user record. Assuming there are two users, but only one has any files: mysql> SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.id = 1; +----+--------------+---------+----+----------+------------------+ | id | name | enabled | id | owner_id | filename | +----+--------------+---------+----+----------+------------------+ | 1 | Albin Kolano | 1 | 1 | 1 | audit report.doc | +----+--------------+---------+----+----------+------------------+ 1 row in set (0.00 sec) mysql> SELECT * FROM users JOIN files ON files.owner_id = users.id WHERE users.id = 2; Empty set (0.01 sec) Outer joins, on the other hand, consider a join successful even if no records from the second table meet the join conditions (i.e. whether there are any matches or not). In such case outer join sets all values in the missing columns to NULL. The second query from the example will return rows whenever there are matches in users and regardless of the contents of files table. mysql> SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 1; +----+--------------+---------+------+----------+------------------+ | id | name | enabled | id | owner_id | filename | +----+--------------+---------+------+----------+------------------+ | 1 | Albin Kolano | 1 | 1 | 1 | audit report.doc | +----+--------------+---------+------+----------+------------------+ 1 row in set (0.00 sec) mysql> SELECT * FROM users LEFT JOIN files ON files.owner_id = users.id WHERE users.id = 2; +----+-------------------+---------+------+----------+----------+ | id | name | enabled | id | owner_id | filename | +----+-------------------+---------+------+----------+----------+ | 2 | Nadzieja Surowiec | 1 | NULL | NULL | NULL | +----+-------------------+---------+------+----------+----------+ 1 row in set (0.00 sec) Left join, right join Unlike inner joins, outer joins require that the join direction is specified. Inner join is a symmetrical and bi-directional relationship, which means A JOIN B produces the same result as B JOIN A. That is not true for outer joins, because they accept when for a record in A there is no matching record in B and in such case the reverse operation is impossible (it would have to start with the non-existing record in B). This is the reason why setting the direction is necessary. A LEFT JOIN B finds matches for rows from table A in table B, while A RIGHT JOIN B finds matches for records from B in A. In practice there is very little or even no real purpose for using RIGHT JOIN and in majority of cases everyone just sticks to using LEFT JOIN if they ever need outer join. When does the join type matter? Choosing the appropriate type depends on the logic you are trying to implement. You have to use inner join when mandatory pieces of information are located in both tables and partial information is considered incomplete or even useless. The case of this could be listing user’s files based on the earlier example: mysql> SELECT * -> FROM users -> JOIN files -> ON files.owner_id = users.id -> WHERE users.name = 'Nadzieja Surowiec' -> AND users.enabled = 1; Empty set (0.00 sec) The query finds the user’s record by in users table and verifies that they are allowed to use the service through the value of users.enabled column and then searches for their files in files table. If there are no matches in either table, the query does not return any result, which is the correct behavior. If outer join was used in this case, a useless partial result could be returned or even incorrect result: mysql> SELECT * -> FROM users -> LEFT JOIN files -> ON files.owner_id = users.id -> WHERE users.name = 'Nadzieja Surowiec' -> AND users.enabled = 1; +----+-------------------+---------+------+----------+----------+ | id | name | enabled | id | owner_id | filename | +----+-------------------+---------+------+----------+----------+ | 2 | Nadzieja Surowiec | 1 | NULL | NULL | NULL | +----+-------------------+---------+------+----------+----------+ 1 row in set (0.00 sec) mysql> SELECT COUNT(1) -> FROM users -> LEFT JOIN files -> ON files.owner_id = users.id -> WHERE users.name = 'Nadzieja Surowiec' -> AND users.enabled = 1; +----------+ | COUNT(1) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec) The application relying on such queries not only would not be able to make anything out of such file information where file data is all set to NULL values, but also it would have to include additional and in fact redundant logic to filter out such results. The row count in this case is correct, as the query returned a single row, but it does not represent how many files the user has, so it is not a valid information that the application could use. Outer join must be used to perform a join with a table, which holds information that is only optional for the result. In our example we are working with a query that lists user’s files and we already established that the join between users and files has to be inner join. But let’s give our users the opportunity to choose a custom icon for any file if they want to. The information could be kept in a separate table called file_icon. Now, for each listed file we also want to see if user has set a custom icon for that file and return the icon name if they have set it. The icon information is entirely optional, so we want the query to return rows regardless of whether there is an entry for the given file in file_icon or not. Therefore we have to use outer join for this particular task. mysql> SELECT * -> FROM users -> JOIN files -> ON files.owner_id = users.id -> LEFT JOIN file_icon -> ON file_icon.file_id = files.id -> WHERE users.name = 'Albin Kolano' -> AND users.enabled = 1; +----+--------------+---------+----+----------+------------------+---------+------------------+ | id | name | enabled | id | owner_id | filename | file_id | icon_image | +----+--------------+---------+----+----------+------------------+---------+------------------+ | 1 | Albin Kolano | 1 | 1 | 1 | audit report.doc | 1 | MS-Word-Icon.png | | 1 | Albin Kolano | 1 | 2 | 1 | stats-201104.xls | NULL | NULL | +----+--------------+---------+----+----------+------------------+---------+------------------+ The outer join allowed us to grab the complete list of user’s files and along with their icons if any were set. If we used inner join instead, the result would be missing the record of stats-201104.xls file.

  • Training in London next week
    I’m going to deliver MySQL Training next week (May 21-24) in London. This is a rare opportunity as I do not personally deliver a lot of Training, especially outside of US. There are still some places left if you want to sign up. You will also get a signed copy of High Performance MySQL 3rd edition as an attendee.

  • MySQL Workbench 5.2.40 GA Released
    The MySQL Developer Tools team is announcing the next maintenance release of it’s flagship product, MySQL Workbench, version 5.2.40. This version contains more than 28 bug fixes applied over version 5.2.39. MySQL Workbench 5.2 GA • Data Modeling • Query • Administration Please get your copy from our Download site. Sources and binary packages are available for several platforms, including Windows, Mac OS X and Linux. http://dev.mysql.com/downloads/workbench/ Workbench Documentation can be found here. http://dev.mysql.com/doc/workbench/en/index.html Utilities Documentation can be found here.http://dev.mysql.com/doc/workbench/en/mysql-utilities.html In addition to the new Query/SQL Development and Administration modules, version 5.2 features improved stability and performance – especially in Windows, where OpenGL support has been enhanced and the UI was optimized to offer better responsiveness. This release also includes improvements to the scripting capabilities of the SQL Editor. You can read more about it in http://wb.mysql.com/workbench/doc/ For a detailed list of resolved issues, see the change log. http://dev.mysql.com/doc/workbench/en/wb-change-history.html If you need any additional info or help please get in touch with us. Post in our forums or leave comments on our blog pages. - The MySQL Workbench Team

  • MySQL Backup & Recovery Essentials
    Download PDF Presentation A hardware, software or human failure can occur at any time. Are you prepared? Many organizations take a risk of serious data loss and system downtime with inadequate procedures in place to support a disaster recovery. This presentation covers the essentials of MySQL backup and recovery options, identifying the necessary tools for an effective strategy to support data resilience and business continuity for your organization. MySQL has no one single unbreakable backup solution, so it is important to understand the impact of MySQL replication, storage engines, configuration options for durability, hardware configuration and the impact on locking and uptime for the various hot/warm/cold options available. Short Url: http://j.mp/EM-BandR Presenter: Ronald Bradford Schedule: RMOUG QEW – May 2012- Denver, Colorado

  • Benchmarking single-row insert performance on Amazon EC2
    I have been working for a customer benchmarking insert performance on Amazon EC2, and I have some interesting results that I wanted to share. I used a nice and effective tool iiBench which has been developed by Tokutek. Though the “1 billion row insert challenge” for which this tool was originally built is long over, but still the tool serves well for benchmark purposes. OK, let’s start off with the configuration details. Configuration First of all let me describe the EC2 instance type that I used. EC2 Configuration I chose m2.4xlarge instance as that’s the instance type with highest memory available, and memory is what really really matters. High-Memory Quadruple Extra Large Instance 68.4 GB of memory 26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each) 1690 GB of instance storage 64-bit platform I/O Performance: High API name: m2.4xlarge As for the IO configuration I chose 8 x 200G EBS volumes in software RAID 10. Now let’s come to the MySQL configuration. MySQL Configuration I used Percona Server 5.5.22-55 for the tests. Following is the configuration that I used: ## InnoDB options innodb_buffer_pool_size = 55G innodb_log_file_size = 1G innodb_log_files_in_group = 4 innodb_buffer_pool_instances = 4 innodb_adaptive_flushing = 1 innodb_adaptive_flushing_method = estimate innodb_flush_log_at_trx_commit = 2 innodb_flush_method = O_DIRECT innodb_max_dirty_pages_pct = 50 innodb_io_capacity = 800 innodb_read_io_threads = 8 innodb_write_io_threads = 4 innodb_file_per_table = 1 ## Disabling query cache query_cache_size = 0 query_cache_type = 0 You can see that the buffer pool is sized at 55G and I am using 4 buffer pool instances to reduce the contention caused by buffer pool mutexes. Another important configuration that I am using is that I am using “estimate” flushing method available only on Percona Server. The “estimate” method reduces the impact of traditional InnoDB log flushing, which can cause downward spikes in performance. Other then that, I have also disabled query cache to avoid contention caused by query cache on write heavy workload. OK, so that was all about the configuration of the EC2 instance and MySQL. Now as far as the benchmark itself is concerned, I made no code changes to iiBench, and used the version available here. But I changed the table to use range partitioning. I defined a partitioning scheme such that every partition would hold 100 million rows. Table Structure The table structure of the table with no secondary indexes is as follows: CREATE TABLE `purchases_noindex` ( `transactionid` int(11) NOT NULL AUTO_INCREMENT, `dateandtime` datetime DEFAULT NULL, `cashregisterid` int(11) NOT NULL, `customerid` int(11) NOT NULL, `productid` int(11) NOT NULL, `price` float NOT NULL, PRIMARY KEY (`transactionid`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (transactionid) (PARTITION p0 VALUES LESS THAN (100000000) ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (200000000) ENGINE = InnoDB, PARTITION p2 VALUES LESS THAN (300000000) ENGINE = InnoDB, PARTITION p3 VALUES LESS THAN (400000000) ENGINE = InnoDB, PARTITION p4 VALUES LESS THAN (500000000) ENGINE = InnoDB, PARTITION p5 VALUES LESS THAN (600000000) ENGINE = InnoDB, PARTITION p6 VALUES LESS THAN (700000000) ENGINE = InnoDB, PARTITION p7 VALUES LESS THAN (800000000) ENGINE = InnoDB, PARTITION p8 VALUES LESS THAN (900000000) ENGINE = InnoDB, PARTITION p9 VALUES LESS THAN (1000000000) ENGINE = InnoDB, PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ While the structure of the table with secondary indexes is as follows: CREATE TABLE `purchases_index` ( `transactionid` int(11) NOT NULL AUTO_INCREMENT, `dateandtime` datetime DEFAULT NULL, `cashregisterid` int(11) NOT NULL, `customerid` int(11) NOT NULL, `productid` int(11) NOT NULL, `price` float NOT NULL, PRIMARY KEY (`transactionid`), KEY `marketsegment` (`price`,`customerid`), KEY `registersegment` (`cashregisterid`,`price`,`customerid`), KEY `pdc` (`price`,`dateandtime`,`customerid`) ) ENGINE=InnoDB AUTO_INCREMENT=11073789 DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (transactionid) (PARTITION p0 VALUES LESS THAN (100000000) ENGINE = InnoDB, PARTITION p1 VALUES LESS THAN (200000000) ENGINE = InnoDB, PARTITION p2 VALUES LESS THAN (300000000) ENGINE = InnoDB, PARTITION p3 VALUES LESS THAN (400000000) ENGINE = InnoDB, PARTITION p4 VALUES LESS THAN (500000000) ENGINE = InnoDB, PARTITION p5 VALUES LESS THAN (600000000) ENGINE = InnoDB, PARTITION p6 VALUES LESS THAN (700000000) ENGINE = InnoDB, PARTITION p7 VALUES LESS THAN (800000000) ENGINE = InnoDB, PARTITION p8 VALUES LESS THAN (900000000) ENGINE = InnoDB, PARTITION p9 VALUES LESS THAN (1000000000) ENGINE = InnoDB, PARTITION p10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */ Also, I ran 5 instances of iiBench simultaneously to simulate 5 concurrent connections writing to the table, with each instance of iiBench writing 200 million single row inserts, for a total of 1 billion rows. I ran the test both with the table purchases_noindex which has no secondary index and only a primary index, and against the table purchases_index which has 3 secondary indexes. Another thing I would like to share is that, the size of the table without secondary indexes is 56G while the size of the table with secondary indexes is 181G. Now let’s come down to the interesting part. Results With the table purchases_noindex, that has no secondary indexes, I was able to achieve an avg. insert rate of ~25k INSERTs Per Second, while with the table purchases_index, the avg. insert rate reduced to ~9k INSERTs Per Second. Let’s take a look at the graphs have a better view of the whole picture. Note, in the above graph, we have “millions of rows” on the x-axis and “INSERTs Per Second” on the y-axis. The reason why I have chosen to show “millions of rows” on the x-axis so that we can see the impact of growth in data-set on the insert rate. We can see that adding the secondary indexes to the table has decreased the insert rate by 3x, and its not even consistent. While with the table having no secondary indexes, you can see that the insert rate is pretty much constant remaining between ~25k to ~26k INSERTs Per Second. But on the other hand, with the table having secondary indexes, we can see that there are regular spikes in the insert rate, and the variation in the rate can be classified as large, because it varies between ~6.5k to ~12.5k INSERTs per second, with noticeable spikes after every 100 million rows inserted. I noticed that the insert rate drop was mainly caused by IO pressure caused by increase in flushing and checkpointing activity. This caused spikes in write activity to the point that the insert rate was decreased. Conclusion As we all now there are pros and cons to using secondary indexes. While secondary indexes cause read performance to improve, but they have an impact on the write performance. Well most of the apps rely on read performance and hence having secondary indexes is an obvious choice. But for those applications that are write mostly or that rely a lot on write performance, reducing the no. of secondary indexes or even going away with secondary indexes causes a write throughput increase of 2x to 3x. In this particular case, since I was mostly concerned with write performance, so I went ahead to choose a table structure with no secondary indexes. Other important things to consider when you are concerned with write performance is using partitioning to reduce the size of the B+tree, having multiple buffer pool instances to reduce contention problems caused by buffer pool mutexes, using “estimate” checkpoint method to reduce chances of log flush storms and disabling the query cache.

Banner
Copyright © 2012 Joomfish Demo. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.