[postgis-users] Grid of points inside Polygon

Rémi Cura remi.cura at gmail.com
Fri Nov 29 02:22:08 PST 2013


First,
do you really need no duplicates?

Then, Creating an index on a 500 million row table is going to take time
and a lot of spaces (several gigas??).
I don't think you have a choice here.
Yet I'm not sure the index would be used,  I'm not an expert.

What you can do is
_create a table with only a part of points to test
_create an index on it
_check how much time distinct takes, and if it uses index.


CREATE TABLE grid_temp AS
SELECT st_x,st_y
FROM grid
LIMIT 1000000; --only 1 million points

CREATE INDEX ON grid_temp(st_x);
CREATE INDEX ON grid_temp(st_y);

EXPLAIN ANALYZE
SELECT DISTINCT st_x, st_y
FROM grid_temp ;
--(if you see some index uses in the return, it's good news)

Then doing it on all your data.



For the record,
The ideal solution would have been  :
no have a X | Y table , but a X | Y | id_of_the_road table
This way you can create a table y grouping points by id_of_the_road
 (creating so a multipoint per road).

With this , you would have far less rows, you could create an index on geom.

Then you would get the intersecting multipoints (which would be efficient),
and dedup only pair by pair (which would be efficient).


Cheers,

Rémi-C



2013/11/29 James David Smith <james.david.smith at gmail.com>

> Hi Remi / all,
>
> I am just coming back to this project. I have made some progress. But
> would appreciate advice on how best to proceed now. What I have now is
> a table with two columns, 'st_x' and 'st_y'. They are the 20 metre by
> 20 metre grid points that are within 1km of main roads in the UK. The
> table has 499,677,666 rows!  Because of the method that I used to
> create this however, I have alot of duplication of points. So I think
> I need to do something like:
>
> CREATE TABLE grid_no_duplications AS (SELECT DISTINCT(x, y) FROM grid);
>
> However I thought that first I could put an index on it?
>
> CREATE INDEX grid_x_y ON grid (st_x, st_y);
>
> Is this a good approach?
>
> Cheers
>
> James
>
> On 15 November 2013 16:39, Rémi Cura <remi.cura at gmail.com> wrote:
> > good =)
> >
> > It is better to do the buffer computation first,
> > because this way you are sure youwon't compute the same buffer many
> times.
> >
> > Something like (not so sure without trying it)
> > UPDATE ukmajrdbuffer SET geom = ST_Buffer(geom,1000);
> >
> > You may have an error if you hav e a fixed geom data type, then you could
> > just add a new column like
> >
> > ALTER TABLE ukmajrdbuffer ADD COLUMN the_geom geometry ;
> > UPDATE ukmajrdbuffer SET the_geom = ST_Buffer(geom,1000);
> >
> >
> > Cheers,
> >
> > Rémi-C
> >
> >
> >
> > 2013/11/15 James David Smith <james.david.smith at gmail.com>
> >>
> >> Hey Remi,
> >>
> >> I've actually managed to get the file 'ukmajorroads' already and have
> >> loaded it into my database.  There are 395356 rows in the database.
> There is
> >> a field called 'geom' and I have built an index on it as below;
> >>
> >> CREATE INDEX ukrds_index ON ukrds USING GIST (geom);
> >>
> >> So should I now run the same query, but do the buffering of the roads
> 'on
> >> the fly' maybe?  For example as below?
> >>
> >>
> >> CREATE TABLE lines_for_each_road AS
> >> WITH all_lines AS (
> >> SELECT *
> >> FROM all_lines
> >> ),
> >> cutted_lines AS ( --we cut the line to keep only part of lines inside
> >> road_buffer
> >> SELECT ST_Intersection(all_lines.the_geom,st_buffer(ukmajrdbuffer.geom,
> >> 1000)) as lines_cutted, direction
> >> FROM ukmajrdbuffer, all_lines
> >> WHERE ST_Intersects(st_buffer(ukmajrdbuffer.geom, 1000),
> >> all_lines.the_geom)=TRUE
> >> ),
> >> cutted_lines_SN AS ( --this is the cutted lines which going from South
> to
> >> North
> >> SELECT *
> >> FROM cutted_lines
> >> WHERE direction = 'SN'
> >> ),
> >> cutted_lines_EW AS ( --this is the cutted lines going from East toWest
> >> SELECT *
> >> FROM cutted_lines
> >> WHERE direction = 'EW'
> >> ),
> >> points AS ( -- we take the intersection of EW  lines with SN lines ,
> that
> >> is the points on the grid.
> >> SELECT ST_Intersection(clSN.lines_cutted, clEW.lines_cutted) AS point
> >> FROM cutted_lines_SN as clSN, cutted_lines_EW AS clEW
> >> WHERE ST_Intersects(clSN.lines_cutted, clEW.lines_cutted)=TRUE --no
> point
> >> ot compute an intersection if lines don't intersect
> >> )
> >> SELECT row_number() over() AS id  , point
> >> FROM points ;
> >>
> >>
> >> On 15 November 2013 15:42, Rémi Cura <remi.cura at gmail.com> wrote:
> >>>
> >>> Still is a shame :
> >>> with proper data we should have some result with about one hour i
> guess.
> >>>
> >>> Cheers,
> >>> Rémi C
> >>>
> >>>
> >>> 2013/11/15 James David Smith <james.david.smith at gmail.com>
> >>>>
> >>>> Remi,
> >>>>
> >>>> Ok. Cool. I've set the below query running. On Monday I will also
> >>>> attempt to get the original road lines file too. Let's see if we have
> >>>> a result on Monday.
> >>>> ----------------------------------
> >>>> CREATE TABLE lines_for_each_road AS
> >>>> WITH all_lines AS (
> >>>> SELECT *
> >>>> FROM all_lines
> >>>> ),
> >>>> cutted_lines AS ( --we cut the line to keep only part of lines inside
> >>>> road_buffer
> >>>> SELECT ST_Intersection(all_lines.the_geom,ukmajrdbuffer.geom) as
> >>>> lines_cutted, direction
> >>>> FROM ukmajrdbuffer, all_lines
> >>>> WHERE ST_Intersects(ukmajrdbuffer.geom, all_lines.the_geom)=TRUE
> >>>> ),
> >>>> cutted_lines_SN AS ( --this is the cutted lines which going from South
> >>>> to North
> >>>> SELECT *
> >>>> FROM cutted_lines
> >>>> WHERE direction = 'SN'
> >>>> ),
> >>>> cutted_lines_EW AS ( --this is the cutted lines going from East toWest
> >>>> SELECT *
> >>>> FROM cutted_lines
> >>>> WHERE direction = 'EW'
> >>>> ),
> >>>> points AS ( -- we take the intersection of EW  lines with SN lines ,
> >>>> that is the points on the grid.
> >>>> SELECT ST_Intersection(clSN.lines_cutted, clEW.lines_cutted) AS point
> >>>> FROM cutted_lines_SN as clSN, cutted_lines_EW AS clEW
> >>>> WHERE ST_Intersects(clSN.lines_cutted, clEW.lines_cutted)=TRUE --no
> >>>> point ot compute an intersection if lines don't intersect
> >>>> )
> >>>> SELECT row_number() over() AS id  , point
> >>>> FROM points ;
> >>>> ------------------------------------------
> >>>> Thanks
> >>>>
> >>>> James
> >>>>
> >>>> On 15 November 2013 15:37, Rémi Cura <remi.cura at gmail.com> wrote:
> >>>> >
> >>>> > You should try to get the original road file, because i will be very
> >>>> > easy to
> >>>> > reapply some buffer and filtering to keep only major road.
> >>>> > Yet, why not let it run during the week end?
> >>>> >
> >>>> > Cheers,
> >>>> >
> >>>> > Rémi-C
> >>>> >
> >>>> >
> >>>> > 2013/11/15 James David Smith <james.david.smith at gmail.com>
> >>>> >>
> >>>> >> Hey Remi,
> >>>> >>
> >>>> >> Do you think before I try running the big query you have just sent
> >>>> >> me,
> >>>> >> that I should go back and try to get the original file of uk
> roads? I
> >>>> >> mean the very original file that has not had any buffers applied or
> >>>> >> any merging done.
> >>>> >>
> >>>> >> Or shall we just go for it and see if it has finished when I come
> >>>> >> into
> >>>> >> work on Monday?! haha.
> >>>> >>
> >>>> >> Thanks
> >>>> >>
> >>>> >> James
> >>>> >>
> >>>> >> On 15 November 2013 14:01, Rémi Cura <remi.cura at gmail.com> wrote:
> >>>> >> > Outch,
> >>>> >> > you have only 8 roads in GB? =)
> >>>> >> >
> >>>> >> > This is not going to go fast till you don't have some dozens k's
> of
> >>>> >> > roads.
> >>>> >> >
> >>>> >> > I'm guessing you are not at will to send those roads?
> >>>> >> >
> >>>> >> >
> >>>> >> > The next step will be (when you'll be sure everything is OK)
> >>>> >> >
> >>>> >> >
> >>>> >> > CREATE TABLE lines_for_each_road AS
> >>>> >> > WITH all_lines AS (
> >>>> >> > SELECT *
> >>>> >> > FROM all_lines
> >>>> >> > ),
> >>>> >> > cutted_lines AS ( --we cut the line to keep only part of lines
> >>>> >> > inside
> >>>> >> > road_buffer
> >>>> >> > SELECT ST_Intersection(all_lines.the_geom,ukmajrdbuffer.geom) as
> >>>> >> > lines_cutted, direction
> >>>> >> > FROM ukmajrdbuffer, all_lines
> >>>> >> > WHERE ST_Intersects(ukmajrdbuffer.geom, all_lines.the_geom)=TRUE
> >>>> >> > ),
> >>>> >> > cutted_lines_SN AS ( --this is the cutted lines which going from
> >>>> >> > South
> >>>> >> > to
> >>>> >> > North
> >>>> >> > SELECT *
> >>>> >> > FROM cutted_lines
> >>>> >> > WHERE direction = 'SN'
> >>>> >> > ),
> >>>> >> > cutted_lines_EW AS ( --this is the cutted lines going from East
> >>>> >> > toWest
> >>>> >> > SELECT *
> >>>> >> > FROM cutted_lines
> >>>> >> > WHERE direction = 'EW'
> >>>> >> > ),
> >>>> >> > points AS ( -- we take the intersection of EW  lines with SN
> lines
> >>>> >> > ,
> >>>> >> > that is
> >>>> >> > the points on the grid.
> >>>> >> > SELECT ST_Intersection(clSN.lines_cutted, clEW.lines_cutted) AS
> >>>> >> > point
> >>>> >> > FROM cutted_lines_SN as clSN, cutted_lines_EW AS clEW
> >>>> >> > WHERE ST_Intersects(clSN.lines_cutted, clEW.lines_cutted)=TRUE
> --no
> >>>> >> > point ot
> >>>> >> > compute an intersection if lines don't intersect
> >>>> >> > )
> >>>> >> > SELECT row_number() over() AS id  , point
> >>>> >> > FROM points ;
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > Cheers,
> >>>> >> >
> >>>> >> > Rémi-C
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > 2013/11/15 James David Smith <james.david.smith at gmail.com>
> >>>> >> >>
> >>>> >> >> Hey Remi,
> >>>> >> >>
> >>>> >> >> I'll do a few checks and get back to you. Maybe I did something
> >>>> >> >> wrong
> >>>> >> >> because I set this query going on my local PostgreSQL
> installation
> >>>> >> >> but
> >>>> >> >> also on our Linux Cluster machine which is much more powerful.
> And
> >>>> >> >> it
> >>>> >> >> finished on my local installation BEFORE the cluster. So maybe I
> >>>> >> >> did
> >>>> >> >> something wrong on the local query.
> >>>> >> >>
> >>>> >> >> The query that has finished took about 1 hour.
> >>>> >> >> The query on our cluster is still running.
> >>>> >> >>
> >>>> >> >> select count(*) from ukmajrdbuffer = 8
> >>>> >> >> This is because before I was given the data the roads buffer had
> >>>> >> >> already been dissolved unfortunately.
> >>>> >> >>
> >>>> >> >> James
> >>>> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >>
> >>>> >> >> On 15 November 2013 13:43, Rémi Cura <remi.cura at gmail.com>
> wrote:
> >>>> >> >> > Good !
> >>>> >> >> >
> >>>> >> >> > Something is strange with the result,
> >>>> >> >> > you get only 190k lines intersecting road buffer,
> >>>> >> >> > it is very few , I expected at least 10times this!
> >>>> >> >> > How many time took this computing by the way?
> >>>> >> >> >
> >>>> >> >> > How many road buffer geom do you have ? (select count(*) from
> >>>> >> >> > ukmajrdbuffer
> >>>> >> >> > ).
> >>>> >> >> >
> >>>> >> >> > Cheers,
> >>>> >> >> > Rémi-C
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > 2013/11/15 James David Smith <james.david.smith at gmail.com>
> >>>> >> >> >>
> >>>> >> >> >> Hey.
> >>>> >> >> >>
> >>>> >> >> >> Yes, it's done. Was just getting some lunch! :-)
> >>>> >> >> >>
> >>>> >> >> >> select count(*) from lines_for_each_road
> >>>> >> >> >> Result = 187033
> >>>> >> >> >>
> >>>> >> >> >> I have also just ran 'VACUUM ANALYZE' on the tables
> >>>> >> >> >> 'lines_for_each_road' as well as the table 'all_lines'
> >>>> >> >> >>
> >>>> >> >> >> I also can confirm that I have ran the following commands:
> >>>> >> >> >>
> >>>> >> >> >> CREATE INDEX all_lines_index ON all_lines USING GIST (
> the_geom
> >>>> >> >> >> )
> >>>> >> >> >> CREATE INDEX ukmajrdbuffer_index ON ukmajrdbuffer USING GIST
> >>>> >> >> >> (geom);
> >>>> >> >> >>
> >>>> >> >> >> Should we now uncomment this line from the previous query?
> >>>> >> >> >>
> >>>> >> >> >> "  SELECT row_number() over() AS id--,*  "
> >>>> >> >> >>
> >>>> >> >> >> Thanks again Remi,
> >>>> >> >> >>
> >>>> >> >> >> James
> >>>> >> >> >>
> >>>> >> >> >>
> >>>> >> >> >>
> >>>> >> >> >> On 15 November 2013 13:14, Rémi Cura <remi.cura at gmail.com>
> >>>> >> >> >> wrote:
> >>>> >> >> >> > Also if you do have indexes,
> >>>> >> >> >> > can you run a "VACUUM ANALYZE", so that the indexes will be
> >>>> >> >> >> > used?
> >>>> >> >> >> >
> >>>> >> >> >> > Cheers,
> >>>> >> >> >> >
> >>>> >> >> >> > Rémi-C
> >>>> >> >> >> >
> >>>> >> >> >> >
> >>>> >> >> >> > 2013/11/15 Rémi Cura <remi.cura at gmail.com>
> >>>> >> >> >> >>
> >>>> >> >> >> >> It should be finished by now,
> >>>> >> >> >> >> can you check you have geom indexes on :
> >>>> >> >> >> >> "ukmajrdbuffer.geom" and  "all_lines.the_geom"
> >>>> >> >> >> >>
> >>>> >> >> >> >>
> >>>> >> >> >> >> How many geoms do you have in "ukmajrdbuffer"?
> >>>> >> >> >> >>
> >>>> >> >> >> >> Cheers,
> >>>> >> >> >> >> Rémi-C
> >>>> >> >> >> >>
> >>>> >> >> >> >>
> >>>> >> >> >> >> 2013/11/15 Rémi Cura <remi.cura at gmail.com>
> >>>> >> >> >> >>>
> >>>> >> >> >> >>> Hey Sandro,
> >>>> >> >> >> >>>
> >>>> >> >> >> >>> Thanks for this, it is at least twice faster =)
> >>>> >> >> >> >>>
> >>>> >> >> >> >>> Cheers,
> >>>> >> >> >> >>> Rémi-C
> >>>> >> >> >> >>>
> >>>> >> >> >> >>>
> >>>> >> >> >> >>>
> >>>> >> >> >> >>>
> >>>> >> >> >> >>> 2013/11/15 James David Smith <
> james.david.smith at gmail.com>
> >>>> >> >> >> >>>>
> >>>> >> >> >> >>>> Thanks both. Geometries now fixed.
> >>>> >> >> >> >>>>
> >>>> >> >> >> >>>> The query 'CREATE TABLE lines_for_each_road....' has now
> >>>> >> >> >> >>>> been
> >>>> >> >> >> >>>> set
> >>>> >> >> >> >>>> running. Will report back when it's done. I suspect it
> may
> >>>> >> >> >> >>>> take
> >>>> >> >> >> >>>> a
> >>>> >> >> >> >>>> while!
> >>>> >> >> >> >>>>
> >>>> >> >> >> >>>> James
> >>>> >> >> >> >>>>
> >>>> >> >> >> >>>> On 15 November 2013 11:03, Sandro Santilli
> >>>> >> >> >> >>>> <strk at keybit.net>
> >>>> >> >> >> >>>> wrote:
> >>>> >> >> >> >>>> > On Fri, Nov 15, 2013 at 11:50:42AM +0100, Rémi Cura
> >>>> >> >> >> >>>> > wrote:
> >>>> >> >> >> >>>> >> Yep, maybe something like
> >>>> >> >> >> >>>> >>
> >>>> >> >> >> >>>> >> UPDATE ukmajrdbuffer SET the_geom =
> >>>> >> >> >> >>>> >> ST_MakeValid(the_geom)
> >>>> >> >> >> >>>> >> WHERE ST_IsValid(the_geom) = FALSE
> >>>> >> >> >> >>>> >
> >>>> >> >> >> >>>> > ST_MakeValid internally checks for ST_IsValid, so no
> >>>> >> >> >> >>>> > need
> >>>> >> >> >> >>>> > to add the condition (which would run the test twice).
> >>>> >> >> >> >>>> >
> >>>> >> >> >> >>>> > --strk;
> >>>> >> >> >> >>>> > _______________________________________________
> >>>> >> >> >> >>>> > postgis-users mailing list
> >>>> >> >> >> >>>> > postgis-users at lists.osgeo.org
> >>>> >> >> >> >>>> >
> >>>> >> >> >> >>>> >
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>>> >> >> >> >>>> _______________________________________________
> >>>> >> >> >> >>>> postgis-users mailing list
> >>>> >> >> >> >>>> postgis-users at lists.osgeo.org
> >>>> >> >> >> >>>>
> >>>> >> >> >> >>>>
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>>> >> >> >> >>>
> >>>> >> >> >> >>>
> >>>> >> >> >> >>
> >>>> >> >> >> >
> >>>> >> >> >> >
> >>>> >> >> >> > _______________________________________________
> >>>> >> >> >> > postgis-users mailing list
> >>>> >> >> >> > postgis-users at lists.osgeo.org
> >>>> >> >> >> >
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>>> >> >> >> _______________________________________________
> >>>> >> >> >> postgis-users mailing list
> >>>> >> >> >> postgis-users at lists.osgeo.org
> >>>> >> >> >>
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> >
> >>>> >> >> > _______________________________________________
> >>>> >> >> > postgis-users mailing list
> >>>> >> >> > postgis-users at lists.osgeo.org
> >>>> >> >> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>>> >> >> _______________________________________________
> >>>> >> >> postgis-users mailing list
> >>>> >> >> postgis-users at lists.osgeo.org
> >>>> >> >> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>>> >> >
> >>>> >> >
> >>>> >> >
> >>>> >> > _______________________________________________
> >>>> >> > postgis-users mailing list
> >>>> >> > postgis-users at lists.osgeo.org
> >>>> >> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>>> >> _______________________________________________
> >>>> >> postgis-users mailing list
> >>>> >> postgis-users at lists.osgeo.org
> >>>> >> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>>> >
> >>>> >
> >>>> >
> >>>> > _______________________________________________
> >>>> > postgis-users mailing list
> >>>> > postgis-users at lists.osgeo.org
> >>>> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>>> _______________________________________________
> >>>> postgis-users mailing list
> >>>> postgis-users at lists.osgeo.org
> >>>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> postgis-users mailing list
> >>> postgis-users at lists.osgeo.org
> >>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >>
> >>
> >>
> >> _______________________________________________
> >> postgis-users mailing list
> >> postgis-users at lists.osgeo.org
> >> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> >
> >
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org
> > http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20131129/928446ad/attachment.html>


More information about the postgis-users mailing list