[postgis-users] Optimizing PostGIS/Geoserver schema for huge dataset

Andy Colson andy at squeakycode.net
Fri Mar 31 19:33:50 PDT 2017


This says 100:
http://stackoverflow.com/questions/6104774/how-many-table-partitions-is-too-many-in-postgres

This says 1000 is too many:
http://dba.stackexchange.com/questions/95977/maximum-partitioning-postgresql

Honestly you'd have to benchmark it, because I have no idea if there is a difference between 100 and 1000.

That being said, I'm surprised a good index isn't fast enough.  Partitions do cut the index size down, which is good, but it still has to scan all the child tables to see which match.

Do you ever update or delete data from the flow table?
Correct me if I'm wrong, but it looks like your where clause only uses fields: timeid, wkb_geometry and streamflow.  Yes?


Your explain analyze include table conus_flow, which isnt int your query or view.  Are you sure that's the right explain analyze?

-Andy


On 03/31/2017 08:18 PM, Andrew Gaydos wrote:
> Thanks for the help!
>
> I originally tried putting everything into a single non-partitioned table but the performance was horrible! Since each set of 2.3M rows shares the same timestamp, I thought this would be a good way to divide up the data when partitioning - I set a constraint on each, e.g.
>
> table 1: constraint: timeid=101
> table 2: constraint: timeid=102
> etc.
>
> I could try grouping times into a single table, e.g.
>
> table 1: constraint: 100 <= timeid < 110
> table 2: constraint: 110 <= timeid < 120
> etc.
>
> so that would give me 1000 partitions of 24 million rows each.
>
> Is this what you were suggesting? What do you think the optimal balance of partitions and rows would be? 100 partitions of 240 million rows each? 10 partitions of 2.4 billion rows each? At some point I think I would run into the insufferable performance I was getting with a single table, though.
>
> Actually, now that I check the number of partitions is closer to 17,000, and number of rows per is 2.7M, so 46 billion rows altogether...
>
> Thanks again!
> -Andy
>
> On Fri, Mar 31, 2017 at 6:15 PM, Andy Colson <andy at squeakycode.net <mailto:andy at squeakycode.net>> wrote:
>
>     On 03/31/2017 11:38 AM, Andrew Gaydos wrote:
>
>         Hi,
>
>
>
>         My questions are
>
>          1. It seems that for every session, there is a one-time penalty for the first query (several minutes) after which queries tend to run much quicker (about 10 seconds for all the tiles to be served). What is going on here?
>          2. Is there a way to optimize GeoServer's queries against this schema, or a more efficient query to try?
>          3. other postgres optimizations that might help?
>
>         I'm pretty new to both GeoServer and PostGIS and have a sinking feeling that I could be structuring this dataset and queries more efficiently, but I've run out of ideas and don't have any postgres experts at work to ask, so I'm posting here.
>
>         Thanks for any insight!
>
>         -Andy
>
>
>     Andy's Unite!
>
>     err.. anyway, Here is the problem:
>
>         data table: (10,000 partitions, each with 2.3 million rows)
>
>
>
>     Lots of partitions will kill planning time. Look at the very bottom of:
>     https://www.postgresql.org/docs/9.5/static/ddl-partitioning.html <https://www.postgresql.org/docs/9.5/static/ddl-partitioning.html>
>
>     Do you have your heart set on lots of partitions?  How'd you feel about 100? or maybe 1000?
>
>     -Andy
>



More information about the postgis-users mailing list