TimescaleDB: Designing a scalable time-series database on PostgreSQL
Michael J. Freedman is a Professor in the Computer Science Department at Princeton University, as well as the co-founder and CTO of Timescale, which provides an open-souce database that scales out SQL for time-series data. His research broadly focuses on distributed systems, networking, and security, and has led to commercial products and deployed systems reaching millions of users daily. Honors include a Presidential Early Career Award (PECASE), Sloan Fellowship, NSF CAREER Award, ONR Young Investigator Award, DARPA CSSG membership, and multiple award publications.
Time-series data is emerging everywhere, from IoT sensors and industrial machines, to transportation and logistics, to devops and monitoring, to finance. We have found that many users start by storing their time-series data in PostgreSQL, but then lose its query power and ecosystems by migrating to some NoSQL or time-series architecture once they reach a certain scale. Yet this "SQL or scale" tradeoff is a false one: we demonstrate how to build an efficient, scale-out time-series database engineered up from PostgreSQL.
In this talk, I describe why the characteristics of these time-series workloads allow for specific design decisions to scale relational DBs, even though (and why) this has remained elusive for general OLTP workloads. In TimescaleDB, we leverage these workloads to perform automated time/space partitioning and distributed query optimizations, even though it exposes the abstraction of a single continuous table across all your data.
TimescaleDB is implemented as a PostgreSQL extension, and offers performance improvements both for single-node and cluster deployments. I'll conclude with performance benchmarks, as well as intuition about the scenarios in which its design shines (and those in which it does not).
- 50 min
- PGConf US 2017