If you have >10 Million records or more in you traditional PostgreSQL db, doing some data analysis will be take painfully long, same for the disk space cost.

I have a dCache billing db that generates >1 Billion records / year, the origional purpose was to find a way to save space cost, so that I can have more years data keep on disk.

Throughly looked into the solutions https://wiki.postgresql.org/wiki/Foreign_data_wrappers, I found this column-oriented cstore_fdw is the best one to play with. 

cstore-fdw is an open sourced Postgres extension that uses the Optimized Row Columnar (ORC) format for its data layout. Table metadata  handled by protobuf-c for serializing and deserializing table metadata. It's dedicated for data analysis purpose.
ORC improves upon the RCFile format developed at Facebook, open source
protocal buffer is developed at Google, open source

It brings huge benefit on space cost, as well as performance.

Postgres works with it through Postgres foreign data wrapper APIs, so it's transparent to user end. Regular SQLs, stored procedures, triggers are all good with it. No  table level vacuum any more, but it does need vacuum analyze sometime for a while to let PG planner to work better with it.

You can find more feathers and quick setup in https://citusdata.github.io/cstore_fdw/


Here is the copy of from the link:

Introducing cstore_fdw, the first column-oriented store available for PostgreSQL. Using it will let you:

  • Leverage typical analytics benefits of columnar stores
  • Deploy on stock PostgreSQL or scale-out PostgreSQL (CitusDB)

Download and get started at https://github.com/citusdata/cstore_fdw.


Key areas improved by this extension:

  • Faster Analytics — Reduce analytics query disk and memory use by 10x
  • Lower Storage — Compress data by 3x
  • Easy Setup — Deploy as standard PostgreSQL extension
  • Flexibility — Mix row- and column-based tables in the same DB
  • Community — Benefit from PostgreSQL compatibility and open development

Learn more on our blog post.

Faster Analytics

cstore_fdw brings substantial performance benefits to analytics-heavy workloads:

  • Column projections: only read columns relevant to the query
  • Compressed data: higher data density reduces disk I/O
  • Skip indexes: row group stats permit skipping irrelevant rows
  • Stats collections: integrates with PostgreSQL’s own query optimizer
  • PostgreSQL-native formats: no deserialization overhead at query time

Lower Storage

Cleanly implements full-table compression:

Easy Setup

If you know how to use PostgreSQL extensions, you know how to use cstore_fdw:

  • Deploy as standard PostgreSQL extension
  • Simply specify table type at creation time using FDW commands
  • Copy data into your tables using standard PostgreSQL COPY command


Have the best of all worlds… mix row- and column-based tables in the same DB:

  (num integer, name text)
SERVER cstore_server;

CREATE TABLE plain_table
  (num integer, name text);

COPY cstore_table FROM STDIN (FORMAT csv);
-- 1, foo
-- 2, bar
-- 3, baz
-- \.

COPY plain_table FROM STDIN (FORMAT csv);
-- 4, foo
-- 5, bar
-- 6, baz
-- \.

SELECT * FROM cstore_table c, plain_table p WHERE c.name=p.name;
-- num | name | num | name 
--   1 |  foo |   4 |  foo
--   2 |  bar |   5 |  bar
--   3 |  baz |   6 |  baz


Join the cstore users Google Group.

Full integration with rich PostgreSQL ecosystem:

  • Compatible with all existing PostgreSQL data types
  • Leverage semi-structured data using hstore or json
  • Quickly keep track of distinct values with HyperLogLog

Learn more about the Optimized Row Column (ORC) file format , which influenced the layout used by cstore_fdw, or dive into the code.

Comments powered by CComment