Designing a Multi-tenant Database for Scale with PostgreSQL
Ozgun is a co-founder and the CTO at Citus Data. Prior to Citus, Ozgun worked as a software developer for four years in the Distributed Systems Engineering team at Amazon. There, he proposed, designed, and implemented novel algorithms on distributed caching and consistency; and also worked on building systems for scalable data analytics. Ozgun earned his M.S. in Computer Science from Stanford University, and his B.S. from Galatasaray University. He also holds patents on distributed cache consistency and load balancing.
Principal Engineer on the Citus Cloud team at Citus Data, developing and operating a distributed PostgreSQL-as-a-Service.
Creator of pganalyze.com, hosted PostgreSQL Performance Monitoring, author of pg_query (Ruby extension to parse queries using the raw_parser) and other tools. I love working with PostgreSQL statistics and visualizing them.
No video of the event yet, sorry!
If you’re building a SaaS application, you probably already have the notion of tenancy built in your data model. Typically, most information relates to tenants/customers/accounts and your database tables capture this natural relation. With smaller amounts of data (10s of GB), it’s easy to throw more hardware at the problem and scale up your database. As these tables grow however, you need to think about ways to scale your multi-tenant database across dozens or hundreds of machines. We'll cover the three major options you have to setting up a multi-tenant app and the trade-offs of each: (a) Create one database per tenant, (b) Create one schema per tenant, and (c) Have all tenants share the same table(s) We'll then describe each design pattern's tradeoffs, and focus on one design pattern (used by Google and Salesforce) that optimizes for scale. Afterwards, we'll continue the tutorial with two hands-on sessions. First, we're going to look at a sample multi-tenant app and its database schema. We'll talk about the hierarchical data model and model an example data set for migrating it to a distributed environment. We'll next start a three machine cluster, create tables, and load the data. We'll then run example distributed queries, go over the concept of colocated tables, and show dynamically scaling out the cluster. We'll conclude by reviewing common questions that come up when designing multi-tenant databases and Q&A.
- 3 h
- PGConf US 2017