[postgis-users] PostGIS on Greenplum - Performance issue

suraj birla surajbirla at gmail.com
Tue Nov 15 16:04:03 PST 2016


I created the index on the geom column on both the tables..but the index is
not being used as per execution plan. Analyze the table also.. still index
not used.

here is the DDL for the table

CREATE TABLE test
(
  idle_viol_id character varying(36),
  country_cd character varying(3),
  country character varying(3),
  geom geometry,
  CONSTRAINT enforce_dims_geom CHECK (st_ndims(geom) = 2),
  CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) = 'POINT'::text
OR geom IS NULL),
  CONSTRAINT enforce_srid_geom CHECK (st_srid(geom) = 4326)
)
WITH (
  OIDS=FALSE
)
DISTRIBUTED BY (idle_viol_id);


-- Index: idol_viol_point_gix

-- DROP INDEX idol_viol_point_gix;

CREATE INDEX idol_viol_point_gix
  ON test
  USING gist
  (geom);



CREATE TABLE usa_adm1
(
  gid serial NOT NULL,
  id_0 numeric(10,0),
  iso character varying(3),
  name_0 character varying(75),
  id_1 numeric(10,0),
  name_1 character varying(75),
  hasc_1 character varying(15),
  ccn_1 numeric(10,0),
  cca_1 character varying(254),
  type_1 character varying(50),
  engtype_1 character varying(50),
  nl_name_1 character varying(50),
  varname_1 character varying(150),
  geom geometry,
  CONSTRAINT usa_adm1_pkey PRIMARY KEY (gid),
  CONSTRAINT enforce_dims_geom CHECK (st_ndims(geom) = 2),
  CONSTRAINT enforce_geotype_geom CHECK (geometrytype(geom) =
'MULTIPOLYGON'::text OR geom IS NULL),
  CONSTRAINT enforce_srid_geom CHECK (st_srid(geom) = 4326)
)
WITH (
  OIDS=FALSE
)
DISTRIBUTED BY (gid);


-- Index: usa_adm1_gix


CREATE INDEX usa_adm1_gix
  ON usa_adm1
  USING gist
  (geom);


  update test a
  set country = iso
  from usa_adm2 b  -- in (239,40,173,82,16,147)
  where ST_Intersects(a.geom, b.geom)

explain plan

Update  (cost=0.00..7504805.69 rows=4179111 width=1)
  ->  Result  (cost=0.00..5872340.80 rows=8358221 width=50)
        ->  Assert  (cost=0.00..5871922.89 rows=8358221 width=107)
              Assert Cond: (geometrytype(public.test.geom) = 'POINT'::text
OR public.test.geom IS NULL) IS DISTINCT FROM false AND
(st_ndims(public.test.geom) = 2) IS DISTINCT FROM false AND
(st_srid(public.test.geom) = 4326) IS DISTINCT FROM false
              ->  Split  (cost=0.00..5871028.56 rows=8358221 width=107)
                    ->  Nested Loop  (cost=0.00..5870134.23 rows=4179111
width=107)
                          Join Filter: public.test.geom && usa_adm1.geom
AND _st_intersects(public.test.geom, usa_adm1.geom)
                          ->  Broadcast Motion 4:4  (slice1; segments: 4)
 (cost=0.00..624.34 rows=52 width=258822)
                                ->  Table Scan on usa_adm1
 (cost=0.00..432.85 rows=13 width=258822)
                          ->  Table Scan on test  (cost=0.00..450.11
rows=282542 width=103)
Settings:  optimizer=on
Optimizer status: PQO version 1.650


 Thanks

Suraj



*From:* postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] *On
Behalf Of *Darafei "Kom?pa" Praliaskouski
*Sent:* Tuesday, November 15, 2016 1:43 PM
*To:* postgis-users at lists.osgeo.org
*Subject:* Re: [postgis-users] PostGIS on Greenplum - Performance issue



Hey,


did you create a gist index on geometry? can you show the query plans?
your e-mail also contains confidentiality notice and goes to a public list,
fix this into a right way.





вт, 15 нояб. 2016 г. в 21:55, Suraj Birla <sbirla at lytx.com>:



Hi,



We have greenplum database with PostGIS extension.



*Greenplum version :*

PostgreSQL 8.2.15 (Greenplum Database 4.3.9.0 build 1) on
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Aug
8 2016 05:36:26

*PostGIS version*

POSTGIS="2.0.3 r11128" GEOS="3.3.8-CAPI-1.7.8" PROJ="Rel. 4.8.0, 6 March
2012" LIBXML="2.7.8"



I’m working on a POC to find the country/state/country of a transaction
based on the GPS lon and lat.  The transaction table has 15-20 millions
records.

I downloaded the polygon for all countries in a table.

Added a geometry column to the transaction table and derived the value
using ST_SetSRID(ST_MakePoint(idle_viol_pos_long_deg::double
precision,idle_viol_pos_lat_deg::double precision),4326).



Then started updating the transaction table with country code using the
below statement.



  update test a

  set country = su_a3

  from (select su_a3 ,geom  from ne_10m_admin_0_countries) b

  where ST_Intersects(a.geom, b.geom)

  and country is null



Updating 1 million records took around 1 hr.



If I try to update 20 millions , not sure how long it will take.



I tried to reduce the entry in polygon table to 5 as the transaction
received is only from 5 countries.. Didn’t see any gain in the performance.

Created index on geom column of the transaction table and still no gain.



Question. Whether I’m on the right track? If yes how to overcome the
performance issue?

Any guidance would be appreciated.



Thanks

Suraj

_______________________________________________
postgis-users mailing list
postgis-users at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20161115/8f8d3c2d/attachment.html>


More information about the postgis-users mailing list