[postgis-users] Help with SQL query?

Paragon Corporation lr at pcorp.us
Fri Nov 27 21:04:28 PST 2015


For completeness, your vector query would then look like

 

CREATE TABLE mymodel.network AS

SELECT rid As gid, ST_Polygon(ST_Reclass(rast,  1, '[0-600):0, [600-10000]:1','1BB', 0) ) As geom

FROM mymodel.concentrated ;

 

 

 

I think you can put in –number in there like - -1000-600.  There was an issue way back with negatives but I thnk that was fixed in PostGIS 2.1 something so should work

 

So what that basically does is create a new raster from original setting of pixel values from 0 to < 600 to 0 and from >= 600 to 10000 to 1 and making that a 1BB (1-bit booleanl raster), and then definiing 0 as no data so that when you convert it to a multipolygon you'll only vectorize the 600-10000 range.

 

I think ST_polygon against a 1BB is faster than larger band pixel types.

 

Hope that helps,

Regina

 

 

From: postgis-users [mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Paragon Corporation
Sent: Friday, November 27, 2015 10:04 PM
To: 'PostGIS Users Discussion' <postgis-users at lists.osgeo.org>
Subject: Re: [postgis-users] Help with SQL query?

 

Darrel,

 

I think the equivalent in PostGIS terminology would be ST_Reclass -  <http://postgis.net/docs/manual-2.2/RT_ST_Reclass.html> http://postgis.net/docs/manual-2.2/RT_ST_Reclass.html

 

And that's a fairly fast operation as I recall.

 

Hope that helps,

Regina

 

 

 

From: postgis-users [ <mailto:postgis-users-bounces at lists.osgeo.org> mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Darrel Maddy
Sent: Friday, November 27, 2015 4:09 PM
To: PostGIS Users Discussion < <mailto:postgis-users at lists.osgeo.org> postgis-users at lists.osgeo.org>
Subject: Re: [postgis-users] Help with SQL query?

 

Dear Regina,

 

Yes, I should have been clearer. To be honest I did not fully understand what I was trying to do at the outset :)

 

The binary raster to point conversion  was an afterthought as I had already been extracting data along short transects using a point shape file.  For the record arc has a function in the raster calculator which works as a conditional such that 
CON(raster, true, false, condition) which in my case was simply CON(FA, 1, 0 “value > 600”).  This produces a raster which I then had to convert to points and eliminate the zeros.  I’m sure there is a way to do this in QGIS but the current raster calculator does not allow that directly.  FYI my rasters have only one band as they are being used to store numerical model matrix outputs so that I can readily visualise them in order to allow me to see structures I would not recognise hidden within the 5 million cell datasets.

 

Anyhow I will experiment with doing more of this in postgis (it would be great if I could script an end-to-end solution). Once I am happy with the numerical model I will have the model write the data directly into postgis. 

 

I have made some progress this week  thanks to your help. Hopefully I am beginning to see how best to use this tool for my intended purpose.

 

Thanks again

 

Darrel

 

 

 

From: postgis-users [ <mailto:postgis-users-bounces at lists.osgeo.org> mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Paragon Corporation
Sent: 27 November 2015 20:02
To: 'PostGIS Users Discussion' < <mailto:postgis-users at lists.osgeo.org> postgis-users at lists.osgeo.org>
Subject: Re: [postgis-users] Help with SQL query?

 

Darrel,

 

Oh that's what you are trying to do sorry I didn't recognize that whole CASE thing as a binary check operation until you described the purpose.

 

For the bit operation type stuff it is much faster to define that 0/1 as a geometry (which it looks like you've done, but I don't know if you just have one pixel cell per or what or details of how you do it in ArcGIS.  I suspect that logic can be recreated easily in PostGIS with something like below:

 

If your network raster is just a set of 0s and 1s (is it a 1BB) or you just want to treat 0 as no-date (which is essentially what you case statement was trying to do I think) then you could just convert it to a geometry with these functions

 <http://postgis.net/docs/manual-2.2/RT_ST_Polygon.html> http://postgis.net/docs/manual-2.2/RT_ST_Polygon.html,  <http://postgis.net/docs/manual-2.2/RT_ST_SetBandNoDataValue.html> http://postgis.net/docs/manual-2.2/RT_ST_SetBandNoDataValue.html 

 

and this SQL Statement

 

CREATE TABLE mymodel.network AS

SELECT rid As gid, ST_Polygon(ST_SetBandNoDataValue(rast,1, 0) ) As geom

FROM mymodel.concentrated ;

 

 

Once you have that concentrated as a network channel as a geometry, then you can use ST_Clip and that should be pretty fast and give you the same results.

 <http://postgis.net/docs/manual-2.2/RT_ST_Clip.html> http://postgis.net/docs/manual-2.2/RT_ST_Clip.html

 

 

So your query would look something like

 

WITH  foo AS (

  SELECT  mymodel.deposition.rid,   ST_SummaryStats( ST_Clip(rast, geom) ) As st

            FROM mymodel.deposition INNER JOIN mymodel.network ON ( ST_Intersects(rast,geom) )

           

)

SELECT SUM( (st).sum )

FROM foo;

 

Hope that helps,

Regina

 <http://www.postgis.us> http://www.postgis.us

 <http://postgis.net> http://postgis.net

 

 

From: postgis-users [ <mailto:postgis-users-bounces at lists.osgeo.org> mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Darrel Maddy
Sent: Friday, November 27, 2015 12:42 PM
To: PostGIS Users Discussion < <mailto:postgis-users at lists.osgeo.org> postgis-users at lists.osgeo.org>
Subject: Re: [postgis-users] Help with SQL query?

 

Dear Regina,

 

Many thanks for this suggestion.

 

I ran the query in this form and for one raster it takes 3372s (~55 mins).  I guess that is what I had anticipated from the single tile  exercise I ran with the alternate algorithm.

 

This still seems a little too long however and so I have started to explore ways to improve upon this by pre-processing some of the data.  The cells of interest represent the ‘main channels’ in a drainage network.  Although in future this network may change position from one output timestep to another (they are actually 1000l iterations of the model apart) in this particular variant the position of these cells is static.  With that in mind I decided to create a binary raster where the 1’s represent the channel cells (I had to do this in arcgis as QGIS does not appear to have a Con function in the raster calculator!)  and then exported this as a point layer. I then deleted the points coded zero and saved the shp file in QGIS.  I imported the ntwork shp file into postgis (there are about 15500 15 points) and I am now running:

 

CREATE TABLE mymodel.networkdep AS

SELECT filename, gid, ST_Value(rast, geom) val

FROM mymodel.deposition, mymodel.network

WHERE ST_Intersects(rast, geom) 

ORDER BY gid, rid;

 

I will sum by raster (i.e. filename) using the new table but will settle for just having the relevant data for now. This took  2057s(~35 minutes!) to complete! 

 

If this is the best way to do it I will explore the OGR library and try and hardcode the network point output directly into the model or, more realistically, write a short routine to extract this automatically from the flow accumulation output rasters without recourse to a GIS.

 

I am learning a lot through this exercise, so thanks once again to all of you who have made suggestions, they are very much appreciated.

 

Best wishes

 

Darrel 

 

 

 

 

 

 

 

 

From: postgis-users [ <mailto:postgis-users-bounces at lists.osgeo.org> mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Paragon Corporation
Sent: 26 November 2015 05:08
To: 'PostGIS Users Discussion' < <mailto:postgis-users at lists.osgeo.org> postgis-users at lists.osgeo.org>
Subject: Re: [postgis-users] Help with SQL query?

 

That timing seems much slower than I recall.

 

FWIW expression based mapalgebra as I recall is slower than using the call back function approach.  So you could try wrapping your CASE in a call back function.

 

However I think something else might be going on here and postgres might be repeating work.  I forgot under what conditions it decides to reevaluate a function call, I just remember being really surprised by it.

 

To avoid that, you can try using a CTE, also you don't need that ST_Union call which for larger number of rasters is expensive, and you might even generate a raster that is too big to compute.

 

I'm also guessing your rasts are all tiled the same, so you really don't need ST_Intersects, just use the same box operator

 

So try this:

 

WITH  foo AS (

  SELECT ST_SummaryStats( ST_MapAlgebra(deposition.rast, concentrated.rast, 'CASE WHEN [rast2] > 0.0 THEN [rast1] ELSE NULL END' ) ) As st

            FROM mymodel.deposition INNER JOIN mymodel.concentrated ON ( deposition.rast  ~=  concentrated.rast )

            WHERE deposition.rid=1

 

)

SELECT SUM( (st).sum )

FROM foo;

 

 

Hope that helps,

Regina

 <http://www.postgis.us> http://www.postgis.us

 <http://postgis.net> http://postgis.net

 

 

From: postgis-users [ <mailto:postgis-users-bounces at lists.osgeo.org> mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Darrel Maddy
Sent: Wednesday, November 25, 2015 5:06 PM
To: PostGIS Users Discussion < <mailto:postgis-users at lists.osgeo.org> postgis-users at lists.osgeo.org>; Brent Wood < <mailto:pcreso at yahoo.com> pcreso at yahoo.com>
Subject: Re: [postgis-users] Help with SQL query?

 

Dear Brent,

 

I must confess that my attempts to do this are so far proving very unsuccessful

 

If  I run the following query:

 

SELECT  (ST_SummaryStats(ST_Union(rast))).sum AS sum

FROM  (SELECT ST_MapAlgebra(deposition.rast, concentrated.rast, 'CASE WHEN [rast2] > 0.0 THEN [rast1] ELSE NULL END' ) As rast

            FROM mymodel.deposition, mymodel.concentrated

            WHERE ST_Intersects(deposition.rast, concentrated.rast) AND deposition.rid=1 ) foo ;

 

It takes around 30 seconds to complete as I assume it is only looking at one tile(they are 256x256 pixels) i.e. rid 1. It is not easy to check the sum – for that I need one complete raster.

 

For the record this was marginally faster than

SELECT (ST_SummaryStats(ST_Union(rast))).sum AS sum

FROM (SELECT ST_MapAlgebra(deposition.rast, concentrated.rast, 'CASE WHEN [rast2] > 0.0 THEN [rast1] ELSE NULL END' ) As rast

            FROM mymodel.deposition, mymodel.concentrated

            WHERE mymodel.deposition.filename='10_depo.tif' AND ST_UpperleftX(mymodel.deposition.rast) = ST_UpperleftX(mymodel.concentrated.rast) AND 

                         ST_UpperleftY(mymodel.deposition.rast) = ST_UpperleftY(mymodel.deposition.rast) ) foo ;

Even after I built indexes for the clauses after the WHERE.

 

Now there are 144 tiles in each of the rasters I want to perform this operation on.  Logic would therefore suggest this should take ~4500s

 

However when I perform the following query

 

SELECT  (ST_SummaryStats(ST_Union(rast))).sum AS sum

FROM  (SELECT ST_MapAlgebra(deposition.rast, concentrated.rast, 'CASE WHEN [rast2] > 0.0 THEN [rast1] ELSE NULL END' ) As rast

            FROM mymodel.deposition, mymodel.concentrated

            WHERE ST_Intersects(deposition.rast, concentrated.rast) AND deposition.filename='10_depo.tif' ) foo ;

 

The query is still running after 18000s!  I must therefore assume I have done something wrong but as you may have guessed the answer eludes me.

 

Any further suggestions would be welcome but I will continue to try and find a solution as I have 135 rasters to perform this operations on now and potentially many thousands more in the future.

 

Darrel

 

.

 

I

 

 

 

 

 

From: postgis-users [ <mailto:postgis-users-bounces at lists.osgeo.org> mailto:postgis-users-bounces at lists.osgeo.org] On Behalf Of Darrel Maddy
Sent: 24 November 2015 19:52
To: Brent Wood < <mailto:pcreso at yahoo.com> pcreso at yahoo.com>;  <mailto:postgis-users at lists.osgeo.org> postgis-users at lists.osgeo.org
Subject: Re: [postgis-users] Help with SQL query?

 

Dear Brent,

 

Many thanks. The data are tiled (256x256) hence the large number of rows from the original 135 tifs. I did not build any indexes however, so I will do some reading and see how best to approach that (the threads you listed look useful so thanks for that).

 

I will run some additional mini queries limited to just one comparison and check using QGIS as you suggest – I probably should have done that first!

 

My workstation has 64GB Ram and I would be surprised if it was significantly caching to disk. I also have a hexacore intel extreme processor so I would not expect this to be hardware limited. I must confess I expected it to finish within a couple of hours.

 

Anyhow very many thanks. I will continue to explore and report back hopefully with positive news.

 

Darrel

 

 

From: Brent Wood [ <mailto:pcreso at yahoo.com> mailto:pcreso at yahoo.com] 
Sent: 24 November, 2015 7:36 PM
To: Darrel Maddy < <mailto:darrel.maddy at newcastle.ac.uk> darrel.maddy at newcastle.ac.uk>;  <mailto:postgis-users at lists.osgeo.org> postgis-users at lists.osgeo.org
Subject: Re: [postgis-users] Help with SQL query?

 

Indexing can improve performance by 100s of x, without them things can be slow. Also, did you tile the images when you imported them? If not, then each iteration is working through all the pixels in the image, rather than a small subset. Essentially with tiles, you have a deep (long) table rather than a wide one. RDBMSs work better with lots of small records than a few wide ones, especially when indexes are used.

 

This might help:

 <http://gis.stackexchange.com/questions/43053/how-to-speed-up-queries-for-raster-databases> http://gis.stackexchange.com/questions/43053/how-to-speed-up-queries-for-raster-databases

 

and see the raster tutorial they mention for the SRTM data, as to how that is loaded into Postgis:

 <https://trac.osgeo.org/postgis/wiki/WKTRasterTutorial01> https://trac.osgeo.org/postgis/wiki/WKTRasterTutorial01

 

To test the logic (the syntax is correct or it wouldn't be working) you could add to the "where" clause an extra filter so that only a small subset of the entire dataset is included (like just one QGIS operation) then compare this with the QGIS result.

 

That would be much faster that testing on the entire dataset. Once you know it is correct for the test case(s), then you can run it on the complete set.

 

Note that some queries can build up large in-memory objects, so make sure your system is not swapping to disk, as that will also slow things down (hugely).

 

Cheers

 

Brent

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20151128/d8122fe6/attachment.html>


More information about the postgis-users mailing list