[postgis-users] Re: Transform Performance
strk at refractions.net
strk at refractions.net
Wed Oct 26 11:34:11 PDT 2005
I made some tests today.
Let testme(int) be an IMMUTABLE function.
Let MyTable contain an arbitrary number of records
Let MyTable.MyField contain all integers of the same value.
-- testme() invoked 1 time
select testme(1) from MyTable
-- testme() invoked for every row !!
select testme(MyField) from MyTable
-- testme() invoked 3 times !!
select testme(1), testme(1), testme(1);
-- testme() invoked 3 times for every row !!!
select testme(1), testme(1), testme(1) from MyTable
Now let's see your tests:
On Wed, Oct 26, 2005 at 11:09:44AM -0600, Charlie Savage wrote:
> Interesting, I see a similar thing. To first recap, this query takes 1
> second:
>
> create temp table t1 as
> select ogc_fid, wkb_geometry as wkb_geometry
> from my_table;
>
>
> And this query, with a transform, takes 34 seconds:
>
> create temp table t3 as
> select ogc_fid, Transform(wkb_geometry, 4326) as wkb_geometry
> from my_table;
>
>
> Similar to you, calling transform_geometry directly is more than 3 times
> faster, approximately 10 seconds:
>
> create temp table t2 as
> select ogc_fid,
> transform_geometry(wkb_geometry, '+proj=tmerc +lat_0=49 +lon_0=-2
> +k=0.999601 +x_0=400000 +y_0=-100000 +ellps=airy +units=m
> +no_defs'::text, '+proj=longlat +ellps=WGS84 +datum=WGS84
> +no_defs'::text, 4326) as wkb_geometry
> from my_table;
>
>
> While this query takes about 1.2 seconds (get_proj4_from_srid is
> specified as IMMUTABLE, see below):
>
> create temp table t3 as
> select ogc_fid, get_proj4_from_srid(4326)
> from my_table;
get_proj4_from_srid() gets a constant input, so
it isn't invoked for every row of my_table.
This is the same as:
select testme(1) from MyTable;
> And this query 1.1 seconds:
>
> create temp table t4 as
> select ogc_fid, SRID(wkb_geometry)
> from my_table;
>
> Unexpectedly, combining the calls together takes 9 seconds:
>
> create temp table t5 as
> select ogc_fid, get_proj4_from_srid(SRID(wkb_geometry))
> from my_table;
This is because this time get_proj4_from_srid() does
NOT get a constant argument, so it is invoked for
every row, even if evaluation of SRID(wkb_geometry)
always return the same result.
This is the same as:
select testme(MyField) from MyTable;
> And combining everything together takes 17 seconds:
>
> create temp table t8 as
> select ogc_fid,
> transform_geometry(wkb_geometry,
> get_proj4_from_srid(SRID(wkb_geometry)), get_proj4_from_srid(4326),
> 4326)
> from my_table;
In this case the first get_proj4_from_srid() is invoked once for
each row in my_table, while the second should be invoked only once.
>
> It certainly seems like 1/2 the time is spent in the transform method
> itself, although its not all obvious to me why (it doesn't do much
> besides the method calls listed above).
For what I can tell this is because you don't have any IMMUTABLE function
call with all constant arguments in the outmost query.
-- last get_proj4_from_srid(CONSTANT) can be simplified
transform_geometry(g, get_proj4_from_srid(SRID(g)),
get_proj4_from_srid(4326));
-- g is not a CONSTANT, no simplification possible
transform(g, 4326);
The following should compare with the previous:
-- get_proj4_from_srid(srid_column) cannot be simplified
transform_geometry(g, get_proj4_from_srid(SRID(g)),
get_proj4_from_srid(srid_column));
POSSIBLE OPTIMIZATIONS
First optimization that comes to mind is reducing scans by making
outer-level get_proj4_from_srid() calls take a constant argument:
transform_geometry(g, get_proj4_from_srid(find_srid(..)),
get_proj4_from_srid(4326));
Given that we feed find_srid() with constant values (easy in a mapfile)
this should make:
- SINGLE scan of geometry_columns for find_srid()
- DOUBLE scan of spatial_ref_sys for first get_proj4_from_srid()
Next, we'll want to reduce scans or make them faster:
- The find_srid() scan can be dropped using a constant SRID for
the source geometry.
- spatial_ref_sys scan could be made faster with an index on the
srid column and a reduction of records in it.
Finally, for OPTIMAL PERFORMANCE, we can completely drop accessory scans,
manually extracting proj4text values from spatial_ref_sys (and/or mapfile):
transform_geometry(g,
'+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs',
'+proj=utm +zone=32 +ellps=WGS84 +datum=WGS84 +units=m +no_defs',
32632) from MyTable;
Note that using '+init=epsg:4326' rather then explicit proj4 text takes more
time (my test is x1.5) probably due to the fact that it will be PROJ4 library
itself to require a scan externally to the db (epsg id database).
TIMINGS
Below the timing difference for transformation of 215 MultiPolygons
with a total of 237467 points.
Spatial_ref_sys has no index on the srid column and contains 2671 rows.
=# select count(transform(the_geom, 32632)) from input.plmshp02_1;
Time: 16150.968 ms
=# select count(transform_geometry(the_geom,
'+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs',
'+proj=utm +zone=32 +ellps=WGS84 +datum=WGS84 +units=m +no_defs',
32632)) from input.plmshp02_1;
Time: 1961.001 ms
First query is 8.2 times faster the second one.
EPILOGUE
Volunteers to fill the wiki are welcome.
--strk;
More information about the postgis-users
mailing list