[postgis-users] Queries on partitioned table not behaving as expected
Rubio Vaughan
rubio.vaughan at geodan.nl
Tue Nov 3 03:26:28 PST 2015
Andy,
Thanks for the suggestion. Select performance wasn't the goal in itself.
But we're trying to keep the data in multiple tables for
maintenance/update purposes. Since this severely affects performance we
investigated partitioned tables in order to get performance close to
single-table select performance. As you suggest, it looks like my next
step will involve a custom function.
Best regards,
Rubio
On 2-11-2015 21:41, Andy Colson wrote:
> Yep, +1 do all this.
>
> Additionally: select performance is not a reason to use partitioned
> tables. Its only update/delete performance that's increased.
>
> Ok, _some_ selects might be faster with partitions, but I'd bet not
> indexed select statements. You'd have to benchmark it.
>
> If you really want 1400+ tables you might be able to write yourself a
> function that figures out what table/tables to query and generates a
> sql statement dynamically.
>
> Like from the docs:
>
> http://www.postgresql.org/docs/9.4/static/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN
>
>
> (sorry, watch the word wrap)
>
> -Andy
>
>
> On 11/2/2015 12:53 PM, Nicolas Ribot wrote:
>> Hi,
>>
>> AFAIU, the restriction on partitioned tables is even bigger. From PG doc
>> (http://www.postgresql.org/docs/current/interactive/ddl-partitioning.html#DDL-PARTITIONING-CONSTRAINT-EXCLUSION)
>>
>> :
>>
>> "The following caveats apply to constraint exclusion:
>>
>> *
>>
>> Constraint exclusion only works when the query's WHERE clause
>> contains constants (or externally supplied parameters). For example,
>> a comparison against a non-immutable function such
>> as|CURRENT_TIMESTAMP| cannot be optimized, since the planner cannot
>> know which partition the function value might fall into at run time.
>>
>> "
>> Using a subquery to provide values won't trigger the constraint
>> exclusion.
>> One solution could be an anonymous block or a function to explicitly
>> write query with constant values got from a previous query.
>>
>> Rubio, take also care that a lot of partitioned tables is not
>> recommended by Pg doc:
>> "All constraints on all partitions of the master table are examined
>> during constraint exclusion, so large numbers of partitions are likely
>> to increase query planning time considerably. Partitioning using these
>> techniques will work well with up to perhaps a hundred partitions; don't
>> try to use many thousands of partitions."
>>
>> I would go for a smaller set of bigger tables (10's millions rows or
>> so).
>>
>> Nicolas
>>
>> On 2 November 2015 at 18:28, Bborie Park <dustymugs at gmail.com
>> <mailto:dustymugs at gmail.com>> wrote:
>>
>> That check constraint is not going to help you as it is too
>> complicated for the partitioning task. I don't believe the
>> partitioning constraint can be functional but needs to be simpler
>> and built upon a basic data type...
>>
>> Something like...
>>
>> CREATE TABLE max_extent AS (
>> id integer PRIMARY KEY,
>> geom geometry(POLYGON)
>> );
>>
>> And then the parent/partition tables had a reference to pc_extent...
>>
>> CREATE TABLE ahn3_pointcloud.ahn3_all
>> (
>> id integer NOT NULL,
>> pa pcpatch(7),
>> max_extent_id integer
>> );
>>
>> Then your queries become...
>>
>> WITH max_extents AS (
>> SELECT
>> id
>> FROM max_extent
>> WHERE PC_Intersects(geom,
>> ST_MakeEnvelope(120740,486076,121074,486292, 28992))
>> )
>> SELECT
>> *
>> FROM ahn3_all
>> JOIN max_extents
>> ON ahn3_all.max_extent_id = max_extents.id
>> <http://max_extents.id>
>>
>> -bborie
>>
>> On Mon, Nov 2, 2015 at 8:45 AM, Rubio Vaughan
>> <rubio.vaughan at geodan.nl <mailto:rubio.vaughan at geodan.nl>> wrote:
>>
>> Dear all,
>>
>> We are trying to load a large LIDAR pointcloud dataset into
>> multiple tables in PostGIS, using the point cloud extension.
>> Eventually the whole data set will consist of 1400+ tables,
>> containing about 1 million records each. Using a union view on
>> all these tables would result in terrible performance, which is
>> why we are trying to optimize query performance using
>> partitioned tables. According to the documentation, the use of
>> partitioned tables with CHECK constraints should cause the query
>> planner to only scan those tables for which the CHECK constraint
>> matches. Excerpt from the documentation:
>> --------------------------------------------------------------------------------
>> *18.7.4. Other Planner Options*
>> constraint_exclusion(enum): Controls the query planner's use of
>> table constraints to optimize queries. The allowed values of
>> constraint_exclusion are on (examine constraints for all
>> tables), off (never examine constraints), and partition (examine
>> constraints only for inheritance child tables and UNION ALL
>> subqueries). partition is the default setting. It is often used
>> with inheritance and partitioned tables to improve performance.
>> When this parameter allows it for a particular table, the
>> planner compares query conditions with the table's CHECK
>> constraints, and omits scanning tables for which the conditions
>> contradict the constraints.
>> --------------------------------------------------------------------------------
>>
>> However, as you can see from the example query below, the
>> indexes for all child tables are still scanned. I would expect
>> the query planner to only scan table c_25gn1, which contains the
>> queried region. Does anyone here have experience with
>> partitioned tables? I would be delighted to get some pointers
>> for figuring out this problem.
>>
>> Thanks in advance for any help!
>>
>> Best regards,
>> Rubio Vaughan
>>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list