[postgis-users] Re: Transform Performance

strk at refractions.net strk at refractions.net
Wed Oct 26 11:34:11 PDT 2005


I made some tests today.

Let testme(int) be an IMMUTABLE function.
Let MyTable contain an arbitrary number of records
Let MyTable.MyField contain all integers of the same value.

-- testme() invoked 1 time
select testme(1) from MyTable

-- testme() invoked for every row !!
select testme(MyField) from MyTable

-- testme() invoked 3 times !!
select testme(1), testme(1), testme(1);

-- testme() invoked 3 times for every row !!!
select testme(1), testme(1), testme(1) from MyTable

Now let's see your tests:

On Wed, Oct 26, 2005 at 11:09:44AM -0600, Charlie Savage wrote:
> Interesting, I see a similar thing.  To first recap, this query takes 1 
> second:
> 
> create temp table t1 as
> select ogc_fid, wkb_geometry as wkb_geometry
> from my_table;
> 
> 
> And this query, with a transform, takes 34 seconds:
> 
> create temp table t3 as
> select ogc_fid, Transform(wkb_geometry, 4326) as wkb_geometry
> from my_table;
> 
> 
> Similar to you, calling transform_geometry directly is more than 3 times 
> faster, approximately 10 seconds:
> 
> create temp table t2 as
> select ogc_fid,
> transform_geometry(wkb_geometry, '+proj=tmerc +lat_0=49 +lon_0=-2 
> +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m 
> +no_defs'::text, '+proj=longlat +ellps=WGS84 +datum=WGS84 
> +no_defs'::text, 4326) as wkb_geometry
> from my_table;
> 
> 
> While this query takes about 1.2 seconds (get_proj4_from_srid is 
> specified as IMMUTABLE, see below):
> 
> create temp table t3 as
> select ogc_fid, get_proj4_from_srid(4326)
> from my_table;

get_proj4_from_srid() gets a constant input, so
it isn't invoked for every row of my_table.
This is the same as:

	select testme(1) from MyTable;


> And this query 1.1 seconds:
> 
> create temp table t4 as
> select ogc_fid, SRID(wkb_geometry)
> from my_table;
> 
> Unexpectedly, combining the calls together takes 9 seconds:
> 
> create temp table t5 as
> select ogc_fid, get_proj4_from_srid(SRID(wkb_geometry))
> from my_table;

This is because this time get_proj4_from_srid() does
NOT get a constant argument, so it is invoked for
every row, even if evaluation of SRID(wkb_geometry)
always return the same result.
This is the same as:

	select testme(MyField) from MyTable;


> And combining everything together takes 17 seconds:
> 
> create temp table t8 as
> select ogc_fid,
>   transform_geometry(wkb_geometry,
>     get_proj4_from_srid(SRID(wkb_geometry)), get_proj4_from_srid(4326), 
> 4326)
> from my_table;

In this case the first get_proj4_from_srid() is invoked once for
each row in my_table, while the second should be invoked only once.

> 
> It certainly seems like 1/2 the time is spent in the transform method 
> itself, although its not all obvious to me why (it doesn't do much 
> besides the method calls listed above).

For what I can tell this is because you don't have any IMMUTABLE function
call with all constant arguments in the outmost query.

	-- last get_proj4_from_srid(CONSTANT) can be simplified
	transform_geometry(g, get_proj4_from_srid(SRID(g)),
		get_proj4_from_srid(4326));

	-- g is not a CONSTANT, no simplification possible
	transform(g, 4326);

The following should compare with the previous:

	-- get_proj4_from_srid(srid_column) cannot be simplified
	transform_geometry(g, get_proj4_from_srid(SRID(g)),
		get_proj4_from_srid(srid_column));


POSSIBLE OPTIMIZATIONS

First optimization that comes to mind is reducing scans by making
outer-level get_proj4_from_srid() calls take a constant argument:

	transform_geometry(g, get_proj4_from_srid(find_srid(..)),
		get_proj4_from_srid(4326));

Given that we feed find_srid() with constant values (easy in a mapfile)
this should make:

	- SINGLE scan of geometry_columns for find_srid()
	- DOUBLE scan of spatial_ref_sys for first get_proj4_from_srid()

Next, we'll want to reduce scans or make them faster:

	- The find_srid() scan can be dropped using a constant SRID for
	  the source geometry.
	- spatial_ref_sys scan could be made faster with an index on the
	  srid column and a reduction of records in it.

Finally, for OPTIMAL PERFORMANCE, we can completely drop accessory scans,
manually extracting proj4text values from spatial_ref_sys (and/or mapfile):

	transform_geometry(g,
	  '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs',
	  '+proj=utm +zone=32 +ellps=WGS84 +datum=WGS84 +units=m +no_defs',
	  32632) from MyTable;

Note that using '+init=epsg:4326' rather then explicit proj4 text takes more
time (my test is x1.5) probably due to the fact that it will be PROJ4 library
itself to require a scan externally to the db (epsg id database).

TIMINGS

Below the timing difference for transformation of 215 MultiPolygons
with a total of 237467 points.
Spatial_ref_sys has no index on the srid column and contains 2671 rows.

=# select count(transform(the_geom, 32632)) from input.plmshp02_1;

Time: 16150.968 ms

=# select count(transform_geometry(the_geom,
	'+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs',
	'+proj=utm +zone=32 +ellps=WGS84 +datum=WGS84 +units=m +no_defs',
	32632)) from input.plmshp02_1;

Time: 1961.001 ms

First query is 8.2 times faster the second one.

EPILOGUE

Volunteers to fill the wiki are welcome.

--strk;




More information about the postgis-users mailing list