(Note : This an Article from last year when MySQL5.6 was released)
While Database technology is one of the oldest branches of computer science, it remains a fundamental computer technology that continues to attract new research. The current focus of Databases technology is towards adapting hot new tends like multi-core chips, solid state devices, NOSQL and Cloud. So what does a contemporary internet developer look for in a database for the internet era? And why does MySQL remain the most popular database for the web?
For a database to be useful while developing products for the Web, the most important requirements are that it should be quick and easy to download, quick to set up, powerful enough to get the job done, be fast and flexible to use and finally be scalable on the newest hardware. Compatibility with the latest technologies like the cloud also remains foremost in the minds of developers since they need their products to be future proof.
You may notice, some of these requirements are common for any web product to be successful.
Let us go into the details of each of the above requirements, discuss their importance and look at how MySQL addresses them.
In the instant web world of today with so many technologies competing for attention, you should either be available instantly on the cloud or should be quick and intuitive to be downloaded and setup. While 4-5 years ago someone may have given an application 10 minutes to download and set up, today, expect technologies to be ready for use within three minutes. The download process needs to be fast and the setup needs to be intuitive. A person using a technology for the first time does not want to be bothered with arcane questions to set up the software. The software should have reasonable defaults which can be tweaked later as the application is found suitable and is deemed worthy of some user time. It has been our endeavor to ensure that MySQL can indeed be set up, and the first query run within three minutes.
Many users and companies are looking to use the database as a service provided by a cloud provider. MySQL is the leading database in the cloud, offered by the vast majority of cloud services providers. Almost all cloud service providers like Amazon, Rackspace, Eucalyptus and Google provide MySQL as a service. It is a part of many cloud stacks including openstack and cloudstack which are available as a service. In many cases MySQL is the underlying database for cloud services provisioning and implementation. Being the ubiquitous cloud database means that MySQL is the best fit for the cloud and also the nimblest database to take keep up with the advances in cloud technology.
This assures a new user that MySQL is an easy database to use. It gives them the confidence that there will be a return for the time spent to adapt their software and infrastructure to the requirements for MySQL.
When we talk about software being powerful there are many aspects that need to be covered.
The first requirement to be considered powerful is that all the basics must be well implemented. This creates a strong foundation on which this power is built. In terms of databases this means that all aspects of ACID (atomicity, consistency, isolation and durability) are implemented. ACID is provided in most databases by having strong transactional capabilities. This is usually achieved by implementing the 2 phase commit protocol. As an over-simplification we can say the transaction is done in two phases first it is written to a log file and then this log file is flushed to the disk.
This logging needs to happen at the lowest layer in a database called the Storage Layer.
Besides the logging, there are many crucial database level algorithms that need to be implemented in a balanced way to ensure that a Web database is able to shoulder the load that large websites expect to generate. A good transactional storage engine may need to implement its own threading, memory management and disk management
Storage layer and engines
In MySQL the storage layer of the database is abstracted. See “Storage engines Layer” in fig. MySQL Architecture. Though the default Storage engine is InnoDB there can be any number of storage engines that can be plugged in at this layer.
InnoDB is an extremely powerful transactional storage engine that has withstood the test of time by handling databases having terabytes of data and used by millions of users.
Since MySQL has an abstract layer and then the real storage engine, the logging is done at 2 levels: first at the abstract storage engine layer and then per storage engine based on the capability of the storage engine. At the abstract storage engine layer the log is called the binary log. At the InnoDB engine level, it can be oversimplified and called the redo log.
The binary log can be used to provide some level of atomicity, consistency and durability between engines but since engines are so varied it is a complex topic and beyond the scope of this article. Later, we will cover another very important feature that uses the binary log for its implementation. Observant readers might wonder about the missing I- Isolation. In a multi-user, multi-threaded database there are thousands maybe millions of transactions that are happening per minute. Isolation is a way to ensure that these transactions are isolated from each other. The database system produces an output as if the transactions are happening serially and not in parallel. There are two high level algorithms that provide isolation, either with locks or via multi-version concurrency control (MVCC). Locking restricts or manages the modification of the same data by multiple transactions. This can lead to a delay in accessing data locked by a long running transaction. MVCC produces a new version of data for each new transaction that is initiated on that data. If any transaction commits, all the other transactions having an older version of the data may need to abort or take the new data into consideration. The InnoDB engine provides Isolation by implementing MVCC.
Besides these fundamentals, the database engine should provide common database features like stored procedures and triggers and a reasonable adherence to the SQL standards.
Development and management tools
Any software is as powerful as the tools that are provided to make the software more usable.
For databases it is important that users can visually model their database using entity relationship (E-R) diagrams to manage the database objects, manage users, and have an easy visual way to modify their existing database. For large installations it is not only important to manage a single instance of the database but also be able to manage multiple database installations. DBA (database administrator) today need to know exactly what is happening with their database in real-time. Is there an installation that is stalled? Is there a user, process or query that is hogging time? An answer to such question in minutes rather than milliseconds may translate to downtime for a database, involving loss of reputation and money. If a large website goes down it is front page news.
However well the data and database is managed, database users need a way to ensure that there is insurance from disaster in the form of a backup of data.
MySQL is shipped with a modern visual tool called MySQL Workbench, which allows users to model and manage their databases as well as their users. For large installations MySQL provides a paid tool called MySQL Enterprise Monitor to manage multiple installations and look at moving graphs of ongoing database activity in the data-center. There are also other tools available from multiple vendors to manipulate and manage MySQL installations. There are multiple free and paid tools available for backing up and restoring a MySQL database.
Some of the tools mentioned above are built on a strong foundation called Performance Schema. Performance Schema is (PS) — a framework that has the meta-data about all that is happening in the database as well as constructs to allow a user to view and filter this data. Performance Schema captures information about users and queries down to threads, mutexes and their corresponding wait times and process times. It contains constructs to allow this fine grained information to be extracted using events and filters. DBAs with a strong knowledge of PS can find out if anything out of the ordinary is happening with their MySQL database. Users can set up the amount of monitoring that they need and the performance schema will populate only the requested data. Since PS is expected to generate a huge amount of data; users can limit the total data stored. The data is transiently stored in cyclic tables in a limited amount of memory.
As computer technology progresses some things change while others remain the same. The number of computer languages available to a programmer is among the things that have changed while the popularity of the old C, C++ and Java remain almost the same. Programmers need to be able to access a database using the language of their choice. The latest enhancements to the database should be made available in the language that they are using. We are fortunate that the hottest new languages want to adopt MySQL. MySQL has excellent drivers for Node.js, Ruby and GoLang. To maintain hundreds of such language connectors is a huge drain on any engineering organization. MySQL is made powerful by the number of computer languages that programmers can use to connect to it. The topmost layer in MySQL Architecture shows the client layer where the connectors exist. Whichever language you may use to write your program, chances are; there will be a connector (or client API) that allows you to connect to MySQL and exchange data with it. The MySQL community is a great source of strength in this area. There are a huge number of language connectors not only written but which continue to be maintained by the MySQL community. MySQL is very lucky to have great community developed drivers for Ruby and GoLang MySQL continues to officially maintain the ODBC, C++, Java, Net and (the most recent) Python connectors.
Any Web product benefits from user and community attention. Attention generates and maintains the cycle of adoption, growth and stability. In the connected socially networked world of today, users expect quick answers to their problems. A large user base also ensures that any problem that a user experiences, may have already been manifested and resolved. Our user community is a huge blessing for MySQL. An open and well informed MySQL user community is ready to answer any questions users may have, related to their MySQL installation. There are a multitude of forums where myriad discussions are recorded about users having difficulties, along with the solutions to those problems. The community generates ideas and sometimes code to improve MySQL, files bugs and most importantly is the sounding board that encourages MySQL developers to do better and compliments them for a job well done.
Redundancy, replication and sharding
As a database grows popular, larger and more highly trafficked websites rely on it. For large and distributed websites, high availability is extremely important. Redundancy needs to be built into all layers of Web architecture, including the database layer. MySQL provides this via MySQL Replication a way to define and manage a series of master-slave set-ups. For some large installations where the read load is much higher that the write load, it makes sense to distribute this read load to many, sometimes hundreds if not thousands; of slave replicas of the master server. MySQL replication has many options to configure these master slave set-ups to ensure that a slave automatically takes over as a master if the master fails. Replication leverages the binary log to ensure that database events are transported from the master to the slave/s database instance. MySQL provides for replication that is row or statement based, synchronous or asynchronous, instant or time delayed. The capabilities are powerful while the administration is easy using tools like Workbench and MySQL Utilities. Database replication is a vast and complex topic with a huge amount of current database research being focused on it. It’s no wonder then that the latest version of MySQL has a number of enhancements for MySQL Replication like check-sums, global transaction ID, crash safe slaves etc.
A different, sometimes complementary, approach to handling large database implementations is to partition or shard the data. A single large table could be broken up into smaller portions depending upon the commonality of the data. For example a user table could have separate partitions or shards based on the nationality of the user. This is especially useful when the most frequent queries require data from a single partition. MySQL has extensive support for table partitioning at the InnoDB storage engine level. The latest labs releases demonstrate how MySQL can be sharded across multiple machines. The entire complexity of identifying the shard on which the required data resides is the responsibility of the new sharding infrastructure called MySQL Fabric.
No real world software is complete without adequate attention to security. A database must ensure that it is secure. In the world of multi-tenancy databases special care needs to be taken to ensure that a user has sufficient privileges before any access is given to the data. MySQL implements user privileges at the database, table and column levels. There are also privileges for objects like views, procedures etc. Besides the normal create, modify and delete privileges, DBAs can also restrict the quantum of a privilege given to a user by, for example limiting the number of queries the user can run per hour or the number of connections the user may open.
MySQL also provides integration with third-party authentication modules like PAM. Recent releases give the MySQL DBA extensive password control to ensure adequate strength of passwords and expiration rules.
Let us now talk about the speed and efficiency required from a database system for the Web. The easiest way to increase the speed of access to the data is by adding indexes. Indexes should not only be created on simple columns with integers and strings but complex columns with large text fields (called full text or CLOB). The indexation needs to understand the sorting rules for the different world languages and character sets. Users with large databases need the option to create sub-indexes or secondary indexes on the primary index. Indexes should be quick to create and quick to update when new data is added. MySQL provides easy index creation and modification with some special techniques for fast index creation. The latest version of MySQL provides options to create a full-text index for all language collations. On the fly creation and deletion of indexes is also offered by the latest version of MySQL.
The speed of a query depends on the amount of data that needs to be fetched for a query, for complex queries the sequence of fetching the data may matter. The optimizer uses statistics about the data contained in the database to determine the most efficient sequence for fetching data. Optimizers are complex and constantly changing because of the complex filtering, increasing the size and wider distribution of data that needs to be gathered for the query. MySQL database’s speed is also enhanced by its versatile optimizer. MySQL is constantly working on our optimizer which continues to evolve. The latest version of the optimizer provides exponential speed improvements for several classes of queries.
When databases were first created hard disks were in vogue, and many database algorithms were based on the rotation and buffering attributes of hard disks. Databases today need to adapt to the new world of solid state devices (SSDs). Though these devices are currently expensive, a limited use of these can result in huge speed gains. Databases for the web need to be SSD aware. The newest MySQL is adapting to use SSDs. It is now possible to relocate portions of the database which are more frequently accessed to a different path. A user can now choose his log files to be on a path which points to an SSD. The same flexibility is also provided for portions (read partitions) or complete tables or table spaces.
On the hardware side, there is not only progress in terms of SSD’s but the nature of the microprocessor itself has changed. The microprocessor technology continues to increase the number of processors on the chip and multiple cores on each processor. This means that a server has multiple threads and these threads can run uninterrupted on their separate core. This has huge implications for finely tuned software like a database servers. The thread bottlenecks and points of contention are now very different from a single or dual core chips. The database therefore needs to be re-architected to take advantage of say 64 threads running at the same time. Users expect that if they spend money to upgrade from a 32 core machine to a 64 core machine, the throughput of the database should also double. The MySQL team has had to work hard on architecture to ensure that the latest MySQL scales almost linearly up to 64 cores and half a million queries per second.
Read DimitryK’s blog at http://dimitrik.free.fr/blog/index.html for benchmark numbers for MySQL
Flexibility—online schema changes
Since I already talked about scalability earlier, let me devote the final paragraphs to flexibility
The current trend of NoSQL was initiated because database users felt constrained at having to define a fixed schema before developing their application. With today’s agile models; schema changes happen more frequently and need to happen dynamically on 24×7 sites. These sites are business critical and down-times for these changes are very expensive for an Internet business. Paradoxically, increasing competition means that new changes need to be provided at the speed of the Internet, without disruption to the current production environment. Traditionally, changing database schemas was a huge investment of time for a database that did not allow dynamic schema changes. This business requirement has led to the new online schema change model which the latest MySQL 5.6 also supports. You can add and modify columns in a table on the fly. New indexes can also be created and dropped online. Relationships between tables in the form of foreign keys can also be altered while the server is running.
This is the Internet age and trends rise and get established very quickly. One of the ways for any software to provide flexibility to its users is to embrace upcoming trends. MySQL adapted to the NoSQL trend by providing their users an option to access data using either SQL or NoSQL.
The user gets the full advantage of having a strong ACID compliant underlying layer while having the flexibility of a schema-less data architecture. The protocol we have chosen to expose for NoSQL is the memcached protocol. So if you have a running application using memcached you can choose to use it with MySQL. Since the NoSQL interface connects to the lowest layer of MySQL (See MySQL Architecture), there is no SQL processing overhead from the parser/optimizer and query execution layer.
MySQL has also embraced the big data trend with the integration of Hadoop using the MySQL applier for Hadoop. This allows realtime transfer of data from MySQL to a Hadoop database. Big data is being used to crunch the huge data coming getting created on the web. This data is being generated from multiple sources including devices connected to the Web. And since the underlying database of the web is MySQL, it is important that Hadoop and MySQL talk to each other realtime.
There are multiple success stories of MySQL which you can read on the web. Would encourage you to read blogs from Twitter (new-tweets-per-second-record-and-how), Facebook (Mark Callaghan).
Staying at the forefront and remaining the most popular database is a complex and interesting challenge for the MySQL development team. We thank you dear reader for your support and attention.