Building a Custom Data Warehouse Using PostgreSQL

Date: 
Thursday, January 18, 2018 - 18:30
Source: 
New York City PostgreSQL User Group
Attendees: 
137
City: 
New York

6:30 - Meet, eat, drinks
6:55 - Introductions from organizers
7:00 - Bruce Momjian: Postgres v11 Roadmap
7:10 - David Kohn: Building a Custom Data Warehouse
8:10 - Socialize at TBD

At Moat we add ~500 million rows each day to just a few tables in our data warehouse. Previously, we used Postgres for data for the last month (which takes ~15TB of diskspace), and we love its ability to handle concurrency and its cost compared to the commercial column-stores that we use for our historical data. By using a combination of composite-types, arrays and Postgres' TOAST mechanism we were able to build a custom data warehouse that combines the best aspects of Postgres and a commercial column-store database, tailored specifically for our needs without modifying any of the Postgres source. Our new store uses ~1TB of disk/month, has sped our ETL considerably (we're now compute rather than I/O bound) and allowed us to migrate our full historical data warehouse to a single Postgres instance. This talk will cover:
- An overview of our data problems and why we chose to go this route
- A deep dive into the different Postgres mechanisms (TOAST, arrays, composite types, SRF's) we use to make our solution work, how we put them together, and how others might apply similar techniques to their use-cases

- Strengths and weaknesses of the approach, lessons learned
- Upcoming features in Postgres that are going to make this approach even more attractive
- Future work/new features in PG10 that have helped us/new features we want

Speaker: David Kohn is a Data Engineer focusing mostly on Postgres at Moat Inc. How he came to do data engineering at an advertising analytics company from battery engineering and electrochemistry startups is a story too long for a short bio, but you can ask him if you'd like.