[postgis-users] Optimizing PostGIS/Geoserver schema for huge dataset

Andrew Gaydos gaydos at ucar.edu
Fri Mar 31 18:18:02 PDT 2017


Thanks for the help!

I originally tried putting everything into a single non-partitioned table
but the performance was horrible! Since each set of 2.3M rows shares the
same timestamp, I thought this would be a good way to divide up the data
when partitioning - I set a constraint on each, e.g.

table 1: constraint: timeid=101
table 2: constraint: timeid=102
etc.

I could try grouping times into a single table, e.g.

table 1: constraint: 100 <= timeid < 110
table 2: constraint: 110 <= timeid < 120
etc.

so that would give me 1000 partitions of 24 million rows each.

Is this what you were suggesting? What do you think the optimal balance of
partitions and rows would be? 100 partitions of 240 million rows each? 10
partitions of 2.4 billion rows each? At some point I think I would run into
the insufferable performance I was getting with a single table, though.

Actually, now that I check the number of partitions is closer to 17,000,
and number of rows per is 2.7M, so 46 billion rows altogether...

Thanks again!
-Andy

On Fri, Mar 31, 2017 at 6:15 PM, Andy Colson <andy at squeakycode.net> wrote:

> On 03/31/2017 11:38 AM, Andrew Gaydos wrote:
>
>> Hi,
>>
>>
>>
>> My questions are
>>
>>  1. It seems that for every session, there is a one-time penalty for the
>> first query (several minutes) after which queries tend to run much quicker
>> (about 10 seconds for all the tiles to be served). What is going on here?
>>  2. Is there a way to optimize GeoServer's queries against this schema,
>> or a more efficient query to try?
>>  3. other postgres optimizations that might help?
>>
>> I'm pretty new to both GeoServer and PostGIS and have a sinking feeling
>> that I could be structuring this dataset and queries more efficiently, but
>> I've run out of ideas and don't have any postgres experts at work to ask,
>> so I'm posting here.
>>
>> Thanks for any insight!
>>
>> -Andy
>>
>
> Andy's Unite!
>
> err.. anyway, Here is the problem:
>
> data table: (10,000 partitions, each with 2.3 million rows)
>>
>
>
> Lots of partitions will kill planning time. Look at the very bottom of:
> https://www.postgresql.org/docs/9.5/static/ddl-partitioning.html
>
> Do you have your heart set on lots of partitions?  How'd you feel about
> 100? or maybe 1000?
>
> -Andy
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20170331/a9788586/attachment.html>


More information about the postgis-users mailing list