I ntroducon to MemSQL Pipelines are robust, scalable, highly performant, and supports fully distributed workloads. Pipelines scales with MemSQL clusters as well as distributed data sources like Kaa, Amazon S3 and HDFS. Pipelines data is loaded in parallel from the data source to MemSQL leaves, which improves throughput by bypassing the aggregator. Addionally, Pipelines has been opmized for low lock contenon and concurrency. The architecture of Pipelines ensures that transacons are processed exactly once, even in the event of failover. Pipelines makes it easier to debug each step in the ETL process by storing exhausve metadata about transacons, including stack traces and stderr messages. MemSQL Pipelines Data Flow Figure 14. Parallel data ingest in MemSQL using Pipelines 20
