Technical Introduction to MemSQL
Whitepaper | 33 pages
TECHNICAL WHITEPAPER — Introducon to MemSQL 2019
I ntroducon to MemSQL Table of Contents Introducon 3 D ata Plaorm Landscape 3 H ow MemSQL Modernizes Data Plaorms 5 Core MemSQL Technical Concepts 7 Code Generaon and Compiled Plans 7 Lock-Free data structures 9 Mul-Version Concurrency Control 10 Disk-opmized columnstores and memory-opmized rowstore tables 11 Distributed Query Processing 13 Core Architecture 16 Key Components of a MemSQL Cluster 17 Database Parons and Sharding 18 Sharded and Reference tables 19 Parallel Data Ingest with Pipelines 20 Cluster Management 21 Dynamic Cluster resizing 22 Data Replicaon 22 Feedback-driven Workload Manager 23 MemSQL Security 24 MemSQL Studio 25 Cloud-nave Support and Managed Service 26 MemSQL Innovaon History 27 Conclusion 28 Appendix: MemSQL Capability Checklist 29 2
I ntroducon to MemSQL Introducon This whitepaper introduces MemSQL as a modern data plaorm with the speed, scale and cost efficiencies to generate business value and insights from operaonal data. You’ll learn how MemSQL is designed and built to ingest data at high speeds, scale-out efficiently and deliver record-breaking query performance with familiar relaonal SQL. Data Plaorm Landscape Historically, databases fit into one of two categories: those opmized for online transaconal processing (OLTP) and those opmized for online analycal processing (OLAP). Transaconal systems are tradionally separate from Analycal processing systems, even though they could potenally be combined into a single system. Transacon systems are oen revenue generang, have strict availability requirements, and are viewed as mission crical. OLAP running on Data warehouse systems are used to analyze large amounts of data and require a lot of compung resources. They are not generally as mission crical. Combining data warehouse and transacon systems in a single database generally results in the transacon workload suffering and thus affecng the business adversely. Hence, separaon of the two has become standard. A third type of data plaorm, called an operaonal data store (ODS), is used to support operaonal analycs. This data plaorm allows the business to have near real-me visibility into rapidly changing events, such as orders and/or customer interacon. Success of the operaonal data store is driven by the ability to handle streaming ingest of data with a high number of concurrent analycal queries. 3
I ntroducon to MemSQL An ODS receives transacons from an OLTP system in a minimally intrusive manner using techniques such as extract transform and load(ETL) or change data capture. It also serves as a source for the data warehouse. Tradionally, an ODS would serve as an up-to-date replica of operaonal data for a variety of analyc requirements. However, over the years, more data sources have driven a massive increase in data volume and velocity creang a series of challenges for legacy ODS technologies to keep up. Figure 1. Legacy mul-ered ODS architecture 4
I ntroducon to MemSQL How MemSQL Modernizes Data Plaorms First released in 2011, MemSQL is a third generaon RDBMS wrien in C/C++. MemSQL is designed to run efficiently on modern systems; both mul-core systems with a big memory footprint and lower-powered edge compung devices. MemSQL is ANSI SQL-Compable and navely supports structured, semi-structured, and unstructured (full-text search) data. With built-in connectors to Kaa, Spark, S3, and Hadoop, as well as legacy transaconal systems, MemSQL easily integrates with a broad ecosystem to cover both real-me streaming and batch workloads. Figure 2. Modernizing legacy data plaorms with MemSQL 5
I ntroducon to MemSQL With support for JSON and Documents, MemSQL can ingest data from modern sources such as mobile phones, social media, and smart devices and provide both transaconal and analycal capabilies on a single plaorm with the ease and familiarity of SQWith MemSQL, all enterprise features, such as paroning, security, high availability (HA), and disaster recovery (DR), are included in the product and not licensed separately. As a modern and efficient plaorm, workloads oen run on less hardware when compared to legacy systems. Thus cost savings are realized in reduced soware spend, the reducon of data silos through decoupling transacons from analyc systems, the cost of deployment, and maintenance costs in comparison to legacy vendors. MemSQL can run legacy transaconal workloads and serve as the ODS or data hub that performs as an operaonal analyc backbone to power real-me decisions across reports, interacve dashboards, data science, and more. 6
I ntroducon to MemSQL Core MemSQL Technical Concepts In this secon we will go over some key technical concepts that underpin MemSQL’s architecture. It’s these key technical differences that differenate it from other soluons and make it a soluon of choice for customers. Code Generaon and Compiled Plans When a query is submied to a database, the query is interpreted, an execuon plan generated and the query is executed. Opmizing this process is crical for performance. MemSQL embeds an industrial compiler (LLVM) for low-level opmizaons along hot code paths - opmizaons that are not possible when execung via interpretaon. This approach also takes advantage of newer instrucon sets that are available with modern cpus. When a MemSQL server encounters a query shape for the first me, it generates a just-in-me execuon plan, wrien in C++, which is incrementally compiled to machine code as it processes the query. This has two-fold benefits allowing opmal query performance with first run queries while offering extreme fast response me to repeat queries. Each compiled plan is cached to prepare for future invocaons of the given query. When future queries match an exisng parameterized query plan template, MemSQL bypasses code generaon and executes the query immediately using the cached plan. 7
I ntroducon to MemSQL Figure 3. Compiled plans in MemSQL ulizing and bypassing the code generator Aer code generaon, the compiled plans are saved for later use in a plan cache. Each MemSQL node has its own plans and plan cache. A plan cache consists of two layers: the in-memory plan cache and the on-disk plan cache. Plans stored in the in-memory plan cache remain unl they expire, or unl the memSQL node restarts. When a plan expires, it stays put in the on-disk plan cache, and is loaded back into memory the next me the query is executed. By interpreng SQL statements and implemenng compiled query plans, MemSQL removes interpretaon overhead and minimizes code execuon paths. Another key feature of MemSQL’s compiled plans is that they do not pre-specify values for the parameters. Query parameters are dynamically extracted from a query template, producing a normalized query that is then transformed into a specialized nave representaon (MemSQL Plan Language, or MPL). The generated execuon plan is wrien in C++ and compiled to machine code. When queries that match the query template are executed, MemSQL substutes the parameter values, allowing the request to reuse already-compiled plans and run quickly. Addionally, compiled plans are also reused across server restarts, so they need to be only compiled once in an applicaon’s lifeme. 8
I ntroducon to MemSQL Lock-Free data structures Tradional databases use locks (latches and enqueues) to manage serializaon and they run into issues such as deadlocks or priority inversion, when processes block each other as they complete execuon. This results in performance and scalability issues with increasing volumes of concurrent read and write operaons. Lock-free data structures that enable beer scalability and performance are at the core of MemSQL’s engine. Every component of the engine is built using lock-free data structures including queues, stacks, hash tables, skip lists, and linked lists. For superior memory management and efficiently managing transacon state, lock-free queues and stacks are used throughout the system. In the area of code generaon, lock-free hash tables are used to map query shapes to the compiled plans in the plan cache. MemSQL also implements lock-free skip lists and hash tables that are kept in memory for fast, random access to data. Lock-free skip lists are the primary index-backing data structure in MemSQL. Compared to B-trees for disk-based databases, skip lists perform extremely well in-memory and under high concurrency, thereby delivering beer scalability. Figure 4. Skiplist index 9
I ntroducon to MemSQL Mul-Version Concurrency Control MemSQL favors parallelism and uses mul-version concurrency control (MVCC) to prevent queries from blocking each other in mul-threaded applicaons. Today’s real-me applicaons - especially those with high volumes of streaming data, or mixed read and write workloads - cannot tolerate the performance loss that comes with database locking. By having a concept of versioning, and preserving older versions, MVCC allows the database to reduce the number of read-write conflicts among operaons. Versions in MemSQL are implemented as a lock-free linked list. Each me a transacon modifies a row, MemSQL creates a new version that sits on top of the exisng one. The new version is visible only to the transacon that performed the write - when accessing the same row, read queries see the old version. Modified rows are queued for garbage collecon “behind the scenes,” so that old versions are efficiently cleaned up, without the need for a full-table scan. MVCC delivers efficiency and consistency across transacons. A lock in MemSQL only occurs in the case of a write-write conflict on the same row. MemSQL takes a row level lock in this case because it’s easier to program around - the alternave would be to fail the second transacon, which requires the programmer to resolve the failure. Implemenng lock-free data structures with MVCC enables MemSQL to avoid locking on both reads and writes when updang tables. As a result, writes can operate at greater throughput, while a large number of concurrent reads happen simultaneously. Since reads and writes never block one another, this minimizes query stalls and allows for greater parallelism - concurrent threads can modify the same object, and even if one thread stalls or stops in the middle of an operaon, the remaining threads can carry on processing data. 10
I ntroducon to MemSQL Mul-version concurrency control summary: ● Every write creates a new version of row ● Commits are atomic ● Old versions are garbage-collected ● Reads are never blocked ● Row-level locking for writes include DELETE ● Allows for online ALTER TABLE Figure 5. MVCC control summary Disk-opmized columnstores and memory-opmized rowstore tables MemSQL supports storing and processing data using two types of data stores: a completely in-memory rowstore and a disk-backed columnstore. Rowstores and columnstores differ both in storage format (row vs. column) and in storage medium (RAM vs. disk). MemSQL allows querying rowstore and columnstore data together in the same query. The rowstore is typically used for highly concurrent online transacon processing (OLTP) and mixed OLTP/analycal workloads. The whole data set is kept in memory - providing fast writes and supporng thousands of concurrent queries. Rowstores uses a transacon log to avoid disk I/O bolenecks on writes. Transacons are commied to disk as logs and periodically compressed as snapshots of the enre database. 11
I ntroducon to MemSQL Figure 6. MemSQL designed for durability To restore a database, MemSQL loads the most recent snapshot and replays remaining transacons from the log. The granularity of logging and frequency of snapshots are both configurable. Because MemSQL only writes logs and snapshots to disk, all disk I/O is sequenal. In-memory writes are serialized into a transacon buffer. A background process pulls groups of transacons and persists them to disk. MemSQL’s disk-based columnstore features up to 80% compression and is capable of storing petabytes of data. Columnstores are opmized for complex queries over large data sets that don’t fit in memory. The user determines whether tables are stored as rowstores or columnstores at table definion me. Figure 7. Rowstores and columnstores in MemSQL 12
I ntroducon to MemSQL MemSQL supports indexing on Rowstore (Skiplist and Hash indexes) and Columnstore (Hash indexes) to efficiently retrieve rows as needed. Distributed Query Processing MemSQL supports fast distributed query processing, with a query opmizer that is fully aware of data distribuon, and a query execuon system that takes advantage of compilaon and vectorizaon , achieving 10X to 100X performance gains. The following diagram illustrates MemSQL query processing at a high level. Query Opmizaon The MemSQL Query Opmizer uses search and heuriscs, driven by cost models based on stascs, to find high-quality distributed query execuon plans. The opmizer is fully aware of data distribuon and can use broadcast, shuffle, local-global aggregaon, s emi-join reducon, and co-located join Figure8. Query opmizaon operaons to solve queries with limited and judicious use of data movement across the cluster. Stascs and summary informaon available to the opmizer include disnct count informaon, histograms, and high-quality random samples of data from both row store and column store tables. The MemSQL Query Opmizer is a modular component in the database engine. The opmizer framework is divided into three major modules: 13
I ntroducon to MemSQL (1) Rewriter: The Rewriter applies SQL-to-SQL rewrites on the query. Depending on the characteriscs of the query and the rewrite itself, the Rewriter decides whether to apply the rewrite using heuriscs or cost; the cost being the distributed cost of running the query. The Rewriter intelligently applies certain rewrites in a top-down fashion while applying others in a boom-up manner, and also interleaves rewrites that can mutually benefit from each other. (2) Enumerator: The Enumerator is a central component of the opmizer, which determines the distributed join order and data movement decisions as well as local join order and access path selecon. It considers a wide search space of various execuon alternaves and selects the best plan, based on the cost models of the database operaons and the network data movement operaons. The Enumerator is also invoked by the Rewriter to cost transformed queries when the Rewriter wants to perform a cost-based query rewrite. (3) Planner: The Planner converts the chosen logical execuon plan to a sequence of distributed query and data movement operaons. The Planner uses SQL extensions called RemoteTables and ResultTables to represent a series of Data Movement Operaons and local SQL Operaons using a SQL-like syntax and interface, making it easy to understand, flexible, and extensible. Query Execuon MemSQL query execuon technology tends to be superior overall to query execuon technology in legacy database systems, somemes by up to a factor of 10 or more. MemSQL's query execuon technology is thus oen a movang factor to move applicaons to MemSQL to get lower TCO, a beer user experience, or enable applicaons that were not feasible before. MemSQL parameterizes queries, compiles them, and stores them in a plan cache. On subsequent execuons, MemSQL takes a query plan from the cache and runs it so it need not be compiled again. 14
I ntroducon to MemSQL Unlike established database products, MemSQL compilaon translates a query all the way to machine code. This, combined with in-memory row store storage structures designed with code generaon in mind, allows query processing rates on the order of 20 million rows per second per core against an in-memory skip list row store table. That is about 10X faster than the per-core processing rate for scans of the B-tree indexes found in most legacy databases, in many cases. For columnstore tables, MemSQL uses a high-performance vectorized query execuon engine that can operate on blocks of 4K rows at a me, very efficiently. This vectorized execuon engine also makes use of single-instrucon, mulple-data (SIMD) instrucons available on Intel and compable processors that support the AVX-2 instrucon set. Processing rates on columnstore tables are oen over 100 million rows per second per core, and somemes as much as 2 billion rows per second per core when using SIMD and operaons on encoded (compressed) data. MemSQL also supports high-performance data movement for broadcast and shuffle operaons. This implementaon gets is speed by sending data over the wire in its nave in-memory format, so it does not have to be serialized on the sending side or deserialized on the receiving side. Rather, it can be operated on directly aer it is received, saving CPU instrucons, and thus total execuon me. Query Processing Summary Together, compiled query plans, flexible storage opons, lock-free data structures, MVCC and a mature opmizer allows for beer performance on mulple cores with high data accessibility, even under high concurrency. This is part of the "secret sauce" that makes MemSQL different than other data plaorms out there. 15
I ntroducon to MemSQL Core Architecture MemSQL ulizes a distributed, shared-nothing architecture that runs on a cluster of servers, and leverages memory and disk infrastructure for high throughput on concurrent workloads. No two nodes in a MemSQL cluster share CPU, memory, or disk. Figure 9. MemsSQL architecture Our architecture is built for horizontal scalability on commodity hardware, in your data center or in the cloud. MemSQL enables high performance and fault tolerance on large data sets and high-velocity data. 16
I ntroducon to MemSQL Key Components of a MemSQL Cluster As shown in Figure 5, a MemSQL cluster consists of aggregator nodes and leaf nodes. The aggregator serves as a query interceptor and router, manages cluster metadata and is responsible for cluster monitoring and failover. A leaf node is a MemSQL server instance that stores data and executes queries issued by the aggregator. In typical deployments, the aggregator-to-leaf node rao is generally 1:5. Increasing the number of aggregators can improve operaons like data loading and can allow for MemSQL to process more client requests concurrently. Applicaons serving many clients have a higher aggregator-to-leaf rao, and those with more demanding storage requirements need more leaves per aggregator. Client applicaons connect to an aggregator, which serves as the query router in the cluster. When the client sends a SQL query, the aggregator will parse, compile and distribute the query across the leaf nodes. In the leaf node, MemSQL may further opmize the query as needed and execute on the local store of data. This allows MemSQL to maintain high query performance even with rapidly changing data. The leaf nodes quickly compute the query results and send them back to the aggregator. The aggregator then aggregates the results from each leaf and sends the final result back to the client. Figure 10. Massively parallel processing for query execuon 17
I ntroducon to MemSQL Database Parons and Sharding When a user creates a database in MemSQL, it is always paroned (a minimum of 2 parons). As seen below , a database is a sum of all of its parons. Parons reside on the leaf nodes. Figure 11. Using leaf nodes as parons with MemSQL Each paron in-itself is implemented as a database on a leaf. When a sharded table is created, it is split according to the number of parons of its encapsulang database. Each paron will hold a slice of the table. Figure 12. Leveraging shard keys for distributed tables 18
I ntroducon to MemSQL Sharded and Reference tables MemSQL supports both distributed (or sharded) and reference (or duplicated) tables. Both table formats can be as rowstore or columnstore tables. For sharded tables, the primary key acts as the hash and each shard is stored on the respecve leaf nodes. For reference tables, the table is replicated to all nodes (including aggregators) and is well suited for smaller, slowly changing tables. Figure 13. Sharded vs Reference Table designsData Types MemSQL supports a variety of data types, including integers, mestamp, string types like CHAR and VARCHAR, and compound types such as computed columns, ENUM and SET. Addional complex data types that are supported include geospaal, full text (search capability), and semi-structured JSON data. Parallel Data Ingest with Pipelines MemSQL Pipelines is a MemSQL database feature that ingests data from external sources in a connuous manner. As a built-in component of the database, Pipelines can extract, transform, and load external data without the need for third-party tools or middleware. 19
I ntroducon to MemSQL Pipelines are robust, scalable, highly performant, and supports fully distributed workloads. Pipelines scales with MemSQL clusters as well as distributed data sources like Kaa, Amazon S3 and HDFS. Pipelines data is loaded in parallel from the data source to MemSQL leaves, which improves throughput by bypassing the aggregator. Addionally, Pipelines has been opmized for low lock contenon and concurrency. The architecture of Pipelines ensures that transacons are processed exactly once, even in the event of failover. Pipelines makes it easier to debug each step in the ETL process by storing exhausve metadata about transacons, including stack traces and stderr messages. MemSQL Pipelines Data Flow Figure 14. Parallel data ingest in MemSQL using Pipelines 20
I ntroducon to MemSQL Cluster Management In this secon, we’ll go over the key cluster management aspects of MemSQL, including the workload manager, distributed storage, and security. MemSQL’s distributed system allows clusters to be scaled out at any me to provide increased storage capacity and processing power. Sharding occurs automacally and the cluster re-balances data and workload distribuon. Data remains highly available and nodes can go down with lile effect on performance. As Data is distributed and the cluster is self-healing and elasc, it allows for scale-out/in data processing. With ered storage, you can take advantage of MemSQL’s memory-opmized rowstore tables for high-speed query processing or ingeson, or disk-opmized columnstore tables for analycs. MemSQL has a SQL query opmizer that runs on both row-based and column-based tables. This gives you the ability to do transaconal processing, analyc processing, or both at once, using the best table structure for each workload. Dynamic Cluster resizing MemSQL features powerful but simple cluster management with dynamic cluster operaons and no single point of failure. You can add or remove nodes - leaves or aggregators - to the cluster at any me while keeping the cluster online, even while running a workload. Figure 15. Scaling-out a MemSQL cluster 21
I ntroducon to MemSQL Data Replicaon A MemSQL cluster is resilient to failure with automac failover and self-healing capabilies. MemSQL allows you to store a redundant copy of data within a cluster. Leaf nodes are organized into availability groups such that each node is paired with a node in the other availability group. Each leaf node has a pair that replicates its data, and can be configured to do so synchronously or asynchronously. In case of node failure, MemSQL restores data and promotes replica parons to put the cluster back online. Figure 16. Replicas in MemSQL are promoted to master MemSQL also supports fully automac cross-data center replicaon that can be provisioned with a single command. The replica cluster stores a read-only copy of data asynchronously replicated from the primary cluster. In the event of a major failure in the primary cluster, MemSQL can promote the secondary cluster, immediately making it a "full" MemSQL cluster. In addion to providing disaster recovery assurance, the secondary cluster can also be used for heavy read-only workloads. 22
I ntroducon to MemSQL Feedback-driven Workload Manager MemSQL automacally manages cluster workloads funcons that limit execuon of queries that require fully-distributed execuon to ensure that they are matched with available system resources. Using built-in ML funcons, workload management intelligently esmates the number of connecons and threads needed to execute queries that require reshuffle and broadcast operaons, and admits the query only if workload management can assign the necessary resources. Workload management also esmates the amount of memory required to execute queries and only runs queries if sufficient memory is expected to be available. Queries that are not immediately executed are queued and are executed when system resources become available. Workload management improves overall query execuon efficiency and prevents workload surges from overwhelming the system. It allows queries to run successfully when the system is low on connecons, threads, or memory. Resource pools include the following: ●Memory Percentage: This is the percentage of memory resources allocated to the pool ●Query Timeout : The number of seconds specifying the me aer which a query running in the pool will be automacally terminated ●So and Hard CPU Limit Percentage: This is the percentage of CPU resources allocated to the pool ●Maximum Concurrency: The maximum number of concurrent SQL queries that are allowed to run cluster-wide across all aggregators 23
I ntroducon to MemSQL MemSQL Security Security is an important aspect of any data plaorm. To meet regulatory and compliance requirements, MemSQL supports several security features in the areas of authencaon, authorizaon, auding, and encrypon. Exisng account access can be easily managed via PAM (Pluggable Authencaon Module), SAML, or GSSAPI (Kerberos) authencaon support. MemSQL also implements RBAC to protect sensive data for tens of thousands of disnct users and their specific access roles. MemSQL’s auding feature provides configurable database logging to a secure external locaon to support informaon security tasks such as tracking user access. Data can be encrypted at ingest me and is distributed across nodes over TLS. 24
I ntroducon to MemSQL MemSQL Studio The MemSQL Studio interface lets you monitor, debug and interact with all of your MemSQL Clusters. Designed to be lightweight, easy to deploy, and easy to upgrade, MemSQL Studio provides the tools you need to maintain cluster health without the overhead of complex, heavyweight, and error-prone client soware. The built-in query profiler delivers historical usage stascs to shed light on what queries and resources are ulizing the most me. You can visualize and diagnose query bolenecks and compute resources to ensure opmal performance and availability. Figure 17. MemSQL Studio tool MemSQL Studio turns user acons into standard SQL queries that are run against your MemSQL Cluster. Results are then displayed back to you in the form of tables and graphics that help you understand your cluster beer. Conceptually, MemSQL Studio is a UI on top of the MemSQL database engine itself, pairing the stability and security guarantees of the command line with the ease of use of a visual UI. MemSQL Studio also comes with an in-built visual SQL client, so it can be used safely alongside other tools such as MemSQL Ops. 25
I ntroducon to MemSQL Cloud-nave Support and Managed Service As a cloud-nave database, MemSQL can deploy across hybrid, mul-cloud, and on-premises environments. The MemSQL Kubernetes Operator provides an easy way to deploy and manage data infrastructure on private or public clouds. Managing a cluster is simple with the Kubernetes Operator. With the Kubernetes command-line interface (CLI) and the Kubernetes API, the Operator can be used the same way as other standard Kubernetes tools. The commands and operaons are the same across all the major clouds, public and private, as well as in on-premises environments. You describe the state of the cluster that you want; Kubernetes creates it, and then maintains that state for you. MemSQL also provides Managed Services on AWS/Azure/GCP Public Cloud plaorms. MemSQL manages all management aspects of the cluster freeing the customer of the necessity to maintain a dedicated staff for administraon of the cluster. 26
I ntroducon to MemSQL MemSQL Innovaon History MemSQL offers a full featured database plaorm through 6 plus years of innovave engineering. The chart below describes some of the notable advances over me. MemSQL History and Roadmap Figure 18. MemSQL innovaon history 27
I ntroducon to MemSQL Conclusion Modern enterprises requires a data plaorm that is versale, cost efficient, and performant. Not only does the data plaorm needs to support and improve legacy workloads, but also be able to deliver on new business requirements. MemSQL is a modern data plaorm that is well suited to meet today’s demanding requirements. It offers an easy migraon path from legacy plaorms, is cloud-friendly, and supports modern workloads seamlessly. MemSQL allows for infrastructure convergence, simplicity, and support for predicve capabilies in a cost-effecve and highly performant manner. 28
I ntroducon to MemSQL Appendix: MemSQL Capability Checklist MemSQL Capabilies MemSQL ARCHITECTURE Distributed Shared Nothing Scale out Architecture ⬤ ACID transacons ⬤ High Availability and Disaster Recovery ⬤ Lock-free synchronizaon ⬤ Mul-version concurrency control ⬤ Distributed Query execuon ⬤ Deployment flexibility (Mul-Cloud and On-Premises) ⬤ Compressed columnar table format on disk ⬤ Row store in memory ⬤ QUERY SQL-92 ⬤ SQL-99 OLAP Extensions ⬤ SQL-2003 extensions ⬤ Mul-statement transacon ⬤ 29
I ntroducon to MemSQL SELECT FOR UPDATE ⬤ Procedural Language with support for Procedures, UDF, TVF and ⬤ UDAF Full text search ⬤ Vectorizaon and single instrucon, mulple data (SIMD) ⬤ Operaons on encoded data ⬤ Bloom filter pushdown for hash join ⬤ Local join support (hash, merge, nested loop) ⬤ Distributed join support (broadcast and reshuffle) ⬤ Query opmizaon and Auto Stascs ⬤ Oracle compability extensions ⬤ Adapve query compilaon and execuon ⬤ Just-in-me (JIT) compilaon ⬤ STORAGE Rowstore In-Memory (with compression) ⬤ Rowstore skiplist indexes In-Memory ⬤ Columnstore on Disk ⬤ Columnstore Hash indexes on Disk ⬤ Sharded and Reference tables ⬤ Temporary tables ⬤ DATA TYPES 30
I ntroducon to MemSQL JSON ⬤ Relaonal ⬤ Full text ⬤ Geospaal ⬤ INGESTION Nave pipeline for data ingeson ⬤ LOAD DATA enabling bulk loads ⬤ Nave Parallel Ingest from many data sources (Linux File System, ⬤ S3, Azure Blob Store, HDFS, Kaa) Supports most popular formats (CSV, Avro, JSON) ⬤ Load to stored procedures (ELT) ⬤ Pipelines with transform scripts ⬤ CLUSTER OPERATIONS Zero downme node management (add/remove) ⬤ Automac recovery from node failures ⬤ Rolling producon upgrades ⬤ Automated full backup ⬤ Database Replicaon across Geographies to enable disaster ⬤ recovery Management and Monitoring tools ⬤ Online operaonal capabilies (add/remove nodes/re-balance ⬤ etc.) 31
I ntroducon to MemSQL Monitoring UI with Visual Explain plan ⬤ Troubleshoong UI ⬤ Automated Workload management ⬤ SECURITY Auding ⬤ Strict Mode ⬤ Encrypon ⬤ RBAC ⬤ Authencaon (GSSAPI/Kerberos) ⬤ ECOSYSTEM Looker ⬤ Zoomdata ⬤ ⬤ Tableau ⬤ Streamsets ⬤ SAS Access ⬤ Informaca ⬤ Data Virtuality pipes ⬤ Talend ⬤ Power BI CLIENT DRIVERS ⬤ MariaDB command-line client 32
I ntroducon to MemSQL ⬤ MariaDB C/C++ connector ⬤ JDBC ⬤ ODBC ⬤ MySQL Connector ⬤ DBD-MariaDB Perl library ⬤ MySQL Connector/NET (C# and other .NETlanguages) 33