[postgis-users] Substantial different index use between 9.5 and 9.6 using Postgis
Bill Measday
bill at measday.com
Thu Dec 1 22:44:47 PST 2016
2 things
First, unfortunately the full query doesn't work as in some cases the
subquery returns more than one row. Not sure why, as the geom should be
a rectangle (more or less given the projection, but was created from 4
points), but maybe right near the boundary there are issues (on
reflection I guess the north/south edges are not quite parallel).
Second - I forgot to complete part of my response. Should have said
Second, you lost me a little with your second suggestion. As I
gather _st_contains explicitly avoids the index, so the explain
still uses the sequential scan. I assume I'm misunderstanding your
suggestion.
Rgds
Bill
On 2/12/2016 5:29 PM, Bill Measday wrote:
> Thanks Regina,
>
> First, your first suggestion didn't seem to achieve anything - even
> the costs in the explain remained the same (1e10).
>
> Second, you lost me a little with your second suggestion. As I gather
> _st_contains explicitly avoids the index.
>
> Tried
>
> explain SELECT address_default_geocode_pid,
> (SELECT elevation FROM m_elevations e WHERE e.geom &&
> ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)),
> ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
> FROM address_default_geocode
> WHERE latitude = -33.87718472 AND longitude = 151.27544336;
>
> (ie changed WHERE ST_Contains(e.geom,
> ST_SetSRID(ST_MakePoint(longitude, latitude), 4326)) to WHERE e.geom
> && ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))
>
> and that seems to have done the trick. Tee explain now uses the index
> and the actual query seems more or less on par with the version run
> under 9.5.
>
> Odd though, but thank you very much for the help/solution. Now I'll
> try the full insert query.
>
> Regards
>
> Bill
>
>
> On 2/12/2016 3:48 PM, Regina Obe wrote:
>> I have seen this on rare occasions in past but not recently where the
>> planner doesn't break apart the ST_Contains or ST_Intersects into the
>> component steps.
>> We made some changes to costing in 2.3.0, but ST_Contains (and
>> _ST_Contains) wasn't one of them so that doesn't quite explain this.
>> Couple of things to try
>>
>> 1) On the _ST_Contains function in set the cost to something higher
>> like 1000:
>>
>> CREATE OR REPLACE FUNCTION _st_contains(geom1 geometry, geom2
>> geometry) RETURNS boolean AS
>> '$libdir/postgis-2.3', 'contains'
>> LANGUAGE c IMMUTABLE STRICT
>> COST 1000;
>>
>> And then try your query again/
>>
>>
>> 2) If Item 1 doesn't work what happens if you explicitly put in &&
>> and _ST_Contains
>>
>> That should force the index to kick in.
>>
>>
>> -----Original Message-----
>> From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On
>> Behalf Of Bill Measday
>> Sent: Thursday, December 01, 2016 7:47 PM
>> To: postgis-users at lists.osgeo.org
>> Subject: [postgis-users] Substantial different index use between 9.5
>> and 9.6 using Postgis
>>
>> This is a post I put on the potgres performance list. They suggested
>> I take it up on the PostGis list
>>
>> Postgres versions 9.5.4 and 9.6.0 running on Windows Server 2012.
>> Installed using EnterpriseDB. Both instances are on the same server,
>> postgresql.conf for both are the same except max_locks_per_transaction =
>> 200 in 9.6 (caused insertion errors otherwise).
>>
>> On 9.5:
>> geo=# SELECT PostGIS_full_version();
>> postgis_full_version
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> POSTGIS="2.2.2 r14797" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel.
>> 4.9.1,
>> 04 March 2015" GDAL="GDAL 2.0.2, released 2016/01/26" LIBXML="2.7.8"
>> LIBJSON="0.12" RASTER
>> (1 row)
>>
>> On 9.6:
>>
>> postgis_full_version
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> POSTGIS="2.3.0 r15146" GEOS="3.5.0-CAPI-1.9.0 r4090" PROJ="Rel.
>> 4.9.1,
>> 04 March 2015" GDAL="GDAL 2.1.1, released 2016/07/07" LIBXML="2.7.8"
>> LIBJSON="0.12" RASTER
>> (1 row)
>>
>> Databases on the 2 instances are the same (as far as I can tell).
>>
>> I have 2 relevant tables (created using same script in both
>> instances, except I added an auto increment RID to the 9.6 version as
>> primary key to the larger of the 2 tables - other already had one).
>> One contains a geometry column (geom geometry(1107464) - a polygon)
>> with gist index.
>> This table has around 10 billion records. The disks these databases
>> on aren't particularly fast, and indexing took about a week.
>> Second table has latitude (numeric(10, 8)), and longitude (numeric(11,
>> 8)) and about 10 million records.
>>
>> The query I'm running is (a part of an insertion into a new table I
>> was trying to run)
>> SELECT address_default_geocode_pid,
>> (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom,
>> ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
>> ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
>> FROM address_default_geocode;
>>
>> Under 9.5 the insertion takes about 11 hours. I gave up on 9.6.
>>
>> I thought I'd try just one record, so:
>>
>> SELECT address_default_geocode_pid,
>> (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom,
>> ST_SetSRID(ST_MakePoint(longitude, latitude), 4326))),
>> ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
>> FROM address_default_geocode
>> WHERE latitude = -33.87718472 AND longitude = 151.27544336;
>>
>> This returns 3 rows (which is more than the average I'd expect BTW). On
>> 9.5 takes a few seconds (3-5) and again I gave up on 9.6
>>
>> Looking just at the query shown above, I note a difference in
>> explained behaviour. Here is the output from 9.5:
>>
>> QUERY PLAN
>>
>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> Seq Scan on address_default_geocode (cost=0.00..37760293.94 rows=1
>> width=25)
>> Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
>> 151.27544336))
>> SubPlan 1
>> -> Bitmap Heap Scan on m_elevations e
>> (cost=282802.21..37401439.43 rows=3512160 width=8)
>> Recheck Cond: (geom ~
>> st_setsrid(st_makepoint((address_default_geocode.longitude)::double
>> precision, (address_default_geocode.latitude)::double precision), 4326))
>> Filter: _st_contains(geom,
>> st_setsrid(st_makepoint((address_default_geocode.longitude)::double
>> precision, (address_default_geocode.latitude)::double precision), 4326))
>> -> Bitmap Index Scan on m_elevations_geom_idx
>> (cost=0.00..281924.17 rows=10536480 width=0)
>> Index Cond: (geom ~
>> st_setsrid(st_makepoint((address_default_geocode.longitude)::double
>> precision, (address_default_geocode.latitude)::double precision), 4326))
>> (8 rows)
>>
>> From 9.6
>>
>> QUERY PLAN
>>
>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> Seq Scan on address_default_geocode
>> (cost=10000000000.00..23297309357.08 rows=1 width=49)
>> Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
>> 151.27544336))
>> SubPlan 1
>> -> Seq Scan on m_elevations e
>> (cost=10000000000.00..13296950520.12 rows=3512159563 width=8)
>> Filter: st_contains(geom,
>> st_setsrid(st_makepoint((address_default_geocode.longitude)::double
>> precision, (address_default_geocode.latitude)::double precision), 4326))
>> (5 rows)
>>
>> Interestingly (change is hard coding of coordinates in second line):
>>
>> explain SELECT address_default_geocode_pid,
>> (SELECT elevation FROM m_elevations e WHERE ST_Contains(e.geom,
>> ST_SetSRID(ST_MakePoint(151.27544336, -33.87718472), 4326))),
>> ST_SetSRID(ST_MakePoint(latitude, longitude), 4283)
>> FROM address_default_geocode
>> WHERE latitude = -33.87718472 AND longitude = 151.27544336;
>>
>> Gives (in 9.6)
>>
>> QUERY PLAN
>>
>> --------------------------------------------------------------------------------------------------------
>>
>> Seq Scan on address_default_geocode
>> (cost=10037428497.36..10037787334.33 rows=1 width=49)
>> Filter: ((latitude = '-33.87718472'::numeric) AND (longitude =
>> 151.27544336))
>> InitPlan 1 (returns $0)
>> -> Bitmap Heap Scan on m_elevations e
>> (cost=272194.20..37428497.36 rows=3512160 width=8)
>> Recheck Cond: (geom ~
>> '0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
>> Filter: _st_contains(geom,
>> '0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
>> -> Bitmap Index Scan on m_elevations_geom_idx
>> (cost=0.00..271316.16 rows=10536480 width=0)
>> Index Cond: (geom ~
>> '0101000020E610000036E3976ED0E86240B879C29647F040C0'::geometry)
>> (8 rows)
>>
>> Which looks better.
>>
>> So for some reason, 9.6 planner decides not to use the index for a small
>> number of records returned from address_default_geocode.
>> I have vacuum analysed both tables.
>> Clearly a sequential scan on 10 billion records is pretty slow (to say
>> the least).
>>
>> Has anyone seen anything like this/got any thoughts?
>>
>> I tried "set enable_seqscan=false" but didn't seem to have any effect.
>>
>> Regards
>>
>> Bill
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20161202/517b98d9/attachment.html>
More information about the postgis-users
mailing list