MySQL 5.7 vs. MySQL 8.0: New Features, Migration Planning, and Pre-Migration Checks

MySQL 5.7 vs. MySQL 8.0: New Features, Migration Planning, and Pre-Migration Checks

MySQL 8.0 has introduced several powerful enhancements that significantly improve its functionality, performance, and usability. Notably, MySQL 8.0 features a completely redesigned database optimizer, which requires thorough evaluation before critical databases are upgraded from version 5.7 to 8.0.

In this article, we will explore the primary features of MySQL 8.0 and provide key guidelines for planning an upgrade from MySQL 5.7 to MySQL 8.0.

Reasons to upgrade from MySQL 5.7 to MySQL 8.0

  • Extended SQL language with many new features which allow more complicated queries for data analysts.
  • Extended features for database administrators.
  • Organization policy of keeping up-to-date versions in all components.
  • AWS/GCP/Azure cloud MySQL version EOL. Cloud vendors are offering extended support, at a very high price.

MySQL 8.0 most useful features for programmers and data analysts

  1. Window Functions: MySQL 8.0 includes support for window functions, which are SQL functions performing calculations across a set of table rows related to the current row. This is particularly useful for tasks involving cumulative, moving average, running totals, or top-N per group calculations.

Benefits of Using Window Functions

  • Efficiency: Reduces the complexity of SQL queries that require advanced data analysis, making them easier to write and often more efficient to execute.
  • Flexibility: Allows for advanced analytics right within SQL, reducing the need for external processing in applications or additional querying layers.
  • Clarity: Makes SQL queries more understandable by avoiding nested subqueries and complex joins that can obscure the intent of the query.

Example SQL with Window Functions

Here’s an example that demonstrates several window functions in action:

SELECT employee_id, department_id, salary, AVG(salary) OVER (PARTITION BY department_id) AS avg_department_salary, MAX(salary) OVER (PARTITION BY department_id) AS sum_department_salary, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS salary_rank, LEAD(salary) OVER (PARTITION BY department_id ORDER BY salary DESC) AS next_lower_salary FROM employees;

In this query:

  • AVG(salary) OVER (...) calculates the average salary per department.
  • SUM(salary) OVER (...) calculates the total salary per department.
  • ROW_NUMBER() OVER (...) provides a ranking of employees in each department based on their salary.
  • LEAD(salary) OVER (...) gives the salary of the next lower-paid employee within the same department.

This query could be executed without using window functions, but it would necessitate far more complex subqueries and would not be as readable or maintainable as demonstrated in this example.

The same query, when formulated without window functions, appears significantly more complex and challenging to comprehend. It becomes difficult to maintain and presents substantial obstacles for knowledge sharing among team members :

SELECT e.employee_id, e.department_id, e.salary, d.avg_department_salary, d.max_department_salary, e.salary_rank, e.next_lower_salary FROM (SELECT e1.employee_id, e1.department_id, e1.salary, (SELECT COUNT(distinct e2.salary) FROM employees e2 WHERE e2.department_id = e1.department_id AND e2.salary > e1.salary) + 1 AS salary_rank, (SELECT e3.salary FROM employees e3 WHERE e3.department_id = e1.department_id AND e3.salary < e1.salary ORDER BY e3.salary DESC LIMIT 1) AS next_lower_salary FROM employees e1) e JOIN (SELECT department_id, AVG(salary) AS avg_department_salary, MAX(salary) AS max_department_salary FROM employees GROUP BY department_id) d ON e.department_id = d.department_id ORDER BY e.department_id, e.salary DESC;
  1. Common Table Expressions (CTEs): are a powerful SQL feature that allow you to define a temporary result set which you can then reference within a SELECT, INSERT, UPDATE, or DELETE statement. CTEs are particularly useful for breaking down complex queries into simpler parts, making them easier to read, maintain, and debug.

Key Features of CTEs

  • Readability and Maintenance: CTEs make queries more readable and maintainable by isolating subqueries and named temporary results.
  • Recursion: One of the standout features of CTEs is their ability to recurse, which means they can reference themselves. This is especially useful for dealing with hierarchical or tree-structured data, such as organizational charts, categories, etc.
  • Multiple Uses in a Single Query: Once defined, a CTE can be used multiple times within the same query, which simplifies complex logic and can improve performance by avoiding redundant calculations.

CTE Example 1

An example of a task that is ideally suited for using a Common Table Expression (CTE) is when you need to enhance readability and maintainability, particularly in complex querying scenarios involving multiple stages of data aggregation or transformations:In a telecommunications company, the marketing department plans to offer a monthly special bonus to five distinct customer groups based on specific criteria. These groups may be adjusted monthly in response to insights and analyses provided by the marketing department.:

  1. All 1000 oldest customers.
  2. Last 500 joined customers.
  3. First 50 customers whose last name starts with A.
  4. First 100 customers which were born in the current month but are not in one of the previous groups.
  5. First 1000 customers which had never got rejected payments and are not in the previous groups.

This customer_id should be inserted monthly into a table named : customers_bonus_table

WITH CustomersGroup1 AS ( SELECT customer_id FROM customers ORDER BY creating_date ASC LIMIT 1000), CustomersGroup2 AS ( SELECT customer_id FROM customers ORDER BY creating_date DESC LIMIT 500), CustomersGroup3 AS ( SELECT customer_id FROM customers WHERE customer_last_name LIKE 'A%' LIMIT 50), CustomersGroup4 AS ( SELECT customer_id FROM customers WHERE MONTH(customer_birth_date) = MONTH(current_date) AND customer_id not in (select customer_id from CustomersGroup1 UNION select customer_id from CustomersGroup2 UNION select customer_id from CustomersGroup3 LIMIT 100), CustomersGroup5 AS ( SELECT customer_id FROM customers WHERE customer_id not in (select customer_id from customers_rejected_payment UNION select customer_id from CustomersGroup1 UNION select customer_id from CustomersGroup2 UNION select customer_id from CustomersGroup3 UNION select customer_id from CustomersGroup4 LIMIT 1000), CustomersTotal AS ( UNION select customer_id from CustomersGroup1 UNION select customer_id from CustomersGroup2 UNION select customer_id from CustomersGroup3 UNION select customer_id from CustomersGroup4 UNION select customer_id from CustomersGroup5 ) INSERT INTO customers_bonus_table (customer_id , month) (select customer_id , month(current_date) from CustomersTotal );

In this example, it's readily apparent which segment of the query corresponds to each customer group. This clarity makes it straightforward to update the query from month to month, such as adding or removing customer groups as needed. Additionally, it simplifies the process of sharing maintenance practices for subsequent months.

CTE Example 2

Another scenario where Common Table Expressions (CTEs) prove highly beneficial involves situations with five primary tables, each equipped with indexes that need to be joined in a particular manner to optimize performance. In such cases, a database administrator (DBA) might create the initial segment of the CTE to implement best-practice joins that make optimal use of these indexes. Subsequently, data analysts or programmers can then tailor the second segment of the CTE according to their specific requirements:

  • Customers (customer_id, customer_name, region_id, customer_status…)
  • Orders (order_id, customer_id, employee_id, order_date, ship_id, order_country, order_shipping_date…)
  • Order_Details (order_id, product_id, quantity, price…)
  • Products (product_id, product_name, supplier_id, category_id…)
  • Employees (employee_id, employee_name, department_id…)

The initial part of the CTE is authored by the DBA to guarantee optimal performance. This assumes that the `customers` table has an indexed `customer_status` column, and the `orders` table is indexed on `order_country` and `order_shipping_date`. Strategically using these indexes in the joins is intended to minimize the data returned from each join, thereby enhancing performance theoretically :

WITH CustomersOrdersActiveCTE AS( SELECT c.*, o.*, od.*, p.*, e.* FROM Customers c JOIN Orders o ON c.customer_id = o.customer_id where c.customer_status = 'active' and o.order_country is not null and o.order_shipping_date > current_date - 1 year JOIN Order_Details od ON o.order_id = od.order_id JOIN Products p ON od.product_id = p.product_id JOIN Employees e ON o.employee_id = e.employee_id)

In this example, the use of a CTE by a database expert ensures high-quality queries on critical large tables within the organization.

  1. JSON Table Functions: Improved JSON support, including a new JSON_TABLE function, which enables the use of JSON data as if it were a relational table, simplifying JSON data integration.

MySQL 8.0 most useful features for DBAs

  1. Descending Indexes: The ability to create descending indexes directly is added, improving the flexibility and efficiency of index usage for descending order queries.
  2. Invisible Indexes: These allow indexes to be "invisible" to the optimizer. This is useful for testing the potential impact of removing an index before actually deleting it, aiding in performance tuning.
  3. Roles: MySQL 8.0 introduced the concept of roles, which are named collections of privileges. Roles make it easier to manage privileges for users, simplifying database security administration.
  4. Histograms: Support for creating histograms on indexed data, which provides more detailed statistics to the optimizer and can lead to better query performance.
  5. Performance Schema and Sys Schema Enhancements: Improved monitoring capabilities through enhancements to Performance Schema and the addition of the Sys Schema, which provide better insights into database performance and help with troubleshooting.
  6. Atomic DDL Statements: DDL operations (such as CREATE TABLE, ALTER TABLE, etc.) are now atomic, which enhances stability and reliability by ensuring that schema changes either complete entirely or not at all, thereby preventing partial updates.

Upgrading production-critical database from MySQL 5.7 to MySQL 8.0

Upgrading from MySQL 5.7 to MySQL 8.0 is a significant step that involves changes to many aspects of the database management system, including syntax changes, feature enhancements, and improvements in performance and security protocols. To ensure a smooth transition, several critical areas should be thoroughly checked and prepared and also organizational and human related aspects. 

We prepared a list of both organizational and technical prerequisites, so you will have a smooth migration(it was written based on real experience of upgrading hundreds of RDS DB instances across several shops)


Organizational Preparation

  • Advance Planning: Start planning the upgrade at least one year in advance. Inform all team and group leaders about the upcoming shift to v8.0, emphasizing its significance, potential risks, and the essential role of their collaboration.
  • Understanding Changes and Risks: Acknowledge that while MySQL 8.0 offers numerous new features, it also introduces a different optimizer. This change could potentially lead to serious performance issues for certain queries post-upgrade.
  • Risk of No Easy Rollback: Consider that once upgraded to v8.0, rolling back to a previous version may not be straightforward and could result in data loss or require system downtime.
  • Extended Testing Period: Allocate a 3-6 month period prior to the production upgrade to thoroughly test everything. This is critical to ensure a smooth transition.
  • Load Testing Tools: The development or utilization of tools capable of applying production-level load on a testing database is crucial. This will help simulate real user interactions and identify potential issues before going live.
  • Organizational Communication: Communicate the details of the upgrade to your organization several months in advance. Gaining the full support of development teams is essential, as their involvement is critical for thorough testing and verification.

These strategic steps are designed to mitigate risks and ensure that the upgrade process is managed effectively, supporting a seamless transition to MySQL 8.0.

Technical Preparation

1. Compatibility Checking

  • InnoDB Enhancements: MySQL 8.0 makes InnoDB the only supported storage engine for transactional tables, as other engines such as MyISAM have been deprecated for transactional use. It also includes enhancements like instant add/drop column operations.
  • Character Sets: The default character set has changed to utf8mb4 in MySQL 8.0. Ensure that your application supports this and consider converting your existing datasets to utf8mb4 if they aren't already.
  • Stored Procedures and Functions: Test all stored procedures and functions to ensure they operate as expected under the new version, particularly those that might be impacted by syntax changes or the stricter SQL mode.
  • Highly recommended to copy production DB to a staging environment then apply upgrade to v8.0 and check for errors and check all system functionality by the help of development teams.
  • If there are errors such as deprecating of features or changes in key words, each error should be handled before we can proceed.
  • For AWS MySQL RDS, when transitioning to version 8.0, it is crucial to establish a new parameter group specifically for v8.0. Notably, the parameter group for v8.0 contains additional parameters compared to v5.7. Therefore, carefully assess the non-default settings configured in the v5.7 parameter group and apply the necessary adjustments to the new v8.0 parameter group to ensure optimal performance and compatibility.

2. Performance Considerations

  • Optimizer Changes: MySQL 8.0 includes improvements in the optimizer which may change the performance characteristics of your queries. Run thorough tests to check the performance implications.
  • Indexing: With the introduction of descending indexes and other indexing improvements, review your existing indexes to see if any optimizations are applicable.
  • Configuration Options: Many system variables and options have been added, deprecated, or removed in MySQL 8.0. Review and adjust the MySQL configuration to optimize performance.
  • Analyze the production DB heaviest queries, avoid as much as possible heavy frequent query on big tables which have no exact index to support it, fix all heavy performance issues before the upgrade (add an index or optimize the query) as the differences between the optimizers are unpredictable.

3. Testing

  • Upgrade Testing: Perform a full upgrade test in a staging environment that mirrors your production environment as closely as possible. This includes testing all application functionalities.
  • Rollback Plan: Always prepare a rollback plan in case the upgrade faces unforeseen problems. Ensure that backups are tested and can be restored, even though it is not easy to plan a rollback.
  • Functional testing should be done by the development team's most experienced persons, as they are aware of the standard of the application behavior.
  • Test exactly how you will upgrade the DB in production it can be during system downtime with onsite version upgrade (check exactly how much time it takes includes: stop traffic on system, take backup, perform the upgrade directly on production DB, make sure AWS support are available for quick response if anything goes wrong), other option can be blue-green deployment feature by AWS. Note that if you use IaC to provision infrastructure, with the Blue/Green deployment, you are going to end up with a different instance ID, and you need to make sure that it is updated across all your IaC assets.

4. Data Backup

  • Complete Backups: Ensure that complete and reliable backups are available before starting the upgrade. This provides a fallback option if the upgrade encounters issues.
  • Log Files: Backup all log files that might help in diagnosing problems that occur during the upgrade.

5. Security Enhancements

  • Authentication: MySQL 8.0 introduces stronger password encryption (caching_sha2_password). Ensure that your application is compatible with these changes.
  • Privileges and Roles: Review changes in the handling of privileges and roles. Implement roles for better security management if not already in use.

6. Replication

  • Replication Formats: If using replication, ensure that all replication partners are compatible with MySQL 8.0. Consider the impact of replication format changes and new features like binary log encryption.
  • CDC Consistency: Ensure that GTID (Global Transaction Identifiers) or Binary log settings are compatible and recover well after upgrading the DB.

7. Third-Party Tools and Integration

  • Compatibility Checks: Verify that all connected applications, third-party plugins, and tools are compatible with MySQL 8.0. This includes ORMs, monitoring tools, and backup solutions.
  • Driver Updates: Update database drivers in your application's codebase to versions that support MySQL 8.0, such as JDBC, ODBC, etc.

8. System Variables and Configuration

  • System Variable Changes: Review the changes in system variables. For example, variables related to InnoDB like innodb_large_prefix are deprecated.
  • Tuning and Optimization: Adjust and tune new variables introduced in MySQL 8.0 to leverage enhancements like the redo log and undo directory.

Upgrading to MySQL 8.0 can significantly enhance database functionality and performance but requires careful planning and testing to avoid disruption. Ensure thorough documentation is available for every step of the upgrade process to facilitate troubleshooting and potential rollback.

Production Traffic Testing In Detail

Testing production-level traffic on a test environment is crucial for ensuring that your application can handle real-world usage scenarios without performance degradation or downtime. This process involves replicating the production traffic patterns and load in a controlled environment to validate stability, performance, scalability, and potentially discover any issues that might not surface under normal development testing conditions. Here’s a step-by-step guide on how to effectively simulate production-level traffic:

1. Understanding Production Traffic

Capture Traffic Patterns:

  • Use tools like Wireshark, tcpdump, or application logs to capture real traffic data on your production servers.
  • Analyze API call frequencies, user interactions, and data inputs to understand typical and peak load patterns.
  • If possible, enable General Log on the database and gain chronological order of all the queries that are hitting the database. 

Key Metrics to Consider:

  • Requests per second (RPS)
  • Concurrent users/sessions
  • Data throughput (e.g., MB/s)
  • Peak vs. average load

2. Environment Setup

Replicate Production Environment:

  • Hardware and software configuration (CPU, RAM, network latency and throughput, database servers, etc.) should mirror the production environment as closely as possible.
  • Use the same deployment configurations and networking topology (load balancers, firewalls, etc.).

Tooling for Load Testing:

  • JMeter: Useful for web applications to simulate heavy loads and analyze performance under different conditions.
  • Rapydo SQL Runner: Rapydo SQL Runner, can execute large logs of queries against a database from one or multiple clients
  • Gatling: A powerful tool for simulating complex user behaviors and modern web applications.
  • Locust: An easy-to-use, scriptable tool for testing websites with Python, ideal for developers familiar with Python.
  • Taurus: Provides an abstraction layer over JMeter, Gatling, and other tools, simplifying configuration and execution of tests.

3. Test Data Management

Data Mirroring:

  • Use data anonymization or data masking techniques to create a production-like dataset while ensuring privacy and compliance.
  • The test database should be populated to reflect the scale of production data, maintaining the same data distribution and volume.

4. Test Execution

Load Testing:

  • Gradually increase the load to identify the point of failure or performance degradation.
  • Simulate both average and peak loads.

Stress Testing:

  • Push the system beyond normal operational capacity to see how it handles extreme stress.
  • Helps in identifying the breaking point and how the system recovers from failure.

Soak Testing:

  • Run the system at high load for a prolonged period (hours or days) to identify performance degradation over time or memory leaks.

Realistic Scenarios:

  • Simulate user behavior as realistically as possible, including think times, session lengths, and navigation patterns.
  • Use randomized inputs to simulate a variety of actions.

5. Monitoring and Analysis

Setup Monitoring Tools:

  • Implement monitoring on the test environment similar to production (using tools like Prometheus, Grafana, ELK stack, etc.).
  • Monitor CPU, memory, database performance, response times, error rates, and throughput.

Analyze Results:

  • Collect and analyze metrics to identify bottlenecks, errors, and performance issues.
  • Use A/B testing comparisons if applicable to compare different configurations or software versions.

6. Iterative Improvement

Feedback Loop:

  • Use the insights gained from each test to modify application configurations, scale resources, optimize code, and improve database queries.
  • Repeat tests to validate changes and improvements.

7. Documentation

Record Findings:

  • Document test plans, configurations, results, and insights comprehensively.
  • Ensure that learning is accessible and actionable for future testing cycles and deployments.

Conclusion

By thoroughly testing with production-level traffic in a controlled test environment, you can significantly minimize risks associated with deployments and ensure that your application performs well under real-world conditions. This proactive approach is essential for maintaining reliability, performance, and user satisfaction.

Summary 

In this article, we explored the key features of MySQL 8.0, including examples of how these enhancements can benefit users and DB platform management. We also discussed the primary motivations for upgrading from MySQL 5.7 to 8.0 and outlined essential strategies for successfully transitioning to the new version in production environments with minimal risk.

More from the blog

How to Gain a Bird's-Eye View of Stressing Issues Across 100s of MySQL DB Instances

Rapydo Scout offers a unique solution for monitoring stress points across both managed and unmanaged MySQL database instances in a single interface, overcoming the limitations of native cloud vendor tools designed for individual databases. It features a Master-Dashboard divided into three main categories: Queries View, Servers View, and Rapydo Recommendations, which together provide comprehensive insights into query performance, server metrics, and optimization opportunities. Through the Queries View, users gain visibility into transaction locks, the slowest and most repetitive queries across their database fleet. The Servers View enables correlation of CPU and IO metrics with connection statuses, while Rapydo Recommendations deliver actionable insights for database optimization directly from the MySQL Performance Schema. Connecting to Rapydo Scout is straightforward, taking no more than 10 minutes, and it significantly enhances the ability to identify and address the most pressing issues across a vast database environment.

Keep reading

Unveiling Rapydo

Rapydo Emerges from Stealth: Revolutionizing Database Operations for a Cloud-Native World In today's rapidly evolving tech landscape, the role of in-house Database Administrators (DBAs) has significantly shifted towards managed services like Amazon RDS, introducing a new era of efficiency and scalability. However, this transition hasn't been without its challenges. The friction between development and operations teams has not only slowed down innovation but also incurred high infrastructure costs, signaling a pressing need for a transformative solution. Enter Rapydo, ready to make its mark as we step out of stealth mode.

Keep reading

SQL table partitioning

Using table partitioning, developers can split up large tables into smaller, manageable pieces. A database’s performance and scalability can be improved when users only have access to the data they need, not the whole table.

Keep reading

Block queries from running on your database

As an engineer, you want to make sure that your database is running smoothly, with no unexpected outages or lags in response-time. One of the best ways to do this is to make sure that only the queries you expect to run are being executed.

Keep reading

Uncover the power of database log analysis

Logs.They’re not exactly the most exciting things to deal with, and it’s easy to just ignore them and hope for the best. But here’s the thing: logs are actually super useful and can save you a ton of headaches in the long run.

Keep reading