[postgis-users] On UpdateGeometrySRID() performance

Thu Nov 1 10:07:56 PDT 2018

When I need to load and transform a large dataset before it lands in its
final tables/rows, I generally use unlogged tables for the intermediate
steps.  This saves a lot of disk writing (especially so if running
PostgreSQL on a storage layer doing its own copy-on-write).  The downside
is the unlogged tables won't be saved if for whatever reason PostgreSQL
unexpectedly stops, but that generally doesn't matter in this case because
you can just restart the load.

On Thu, Nov 1, 2018, 11:54 Paul Ramsey <pramsey at cleverelephant.ca> wrote:

> Also, if you created an index during your import process, that'll also
> make your update slower.
> I usually skip bulk updates in favour of writing a fresh table with the
> stuff I want..
>
> CREATE TABLE mynewtable AS SELECT ST_SetSRID(geom, 4326) AS geom, ... FROM
> myoldtable
>
> That generally writes as fast as is possible though still only single
> threaded.
>
> P
>
>
> On Thu, Nov 1, 2018 at 9:51 AM Darafei "Komяpa" Praliaskouski <
> me at komzpa.net> wrote:
>
>> Hi,
>>
>> Please feed in your WKT polygons in EWKT format:
>>
>> SRID=4326;POLYGON(...)
>>
>> Since every update in Postgres is essentially Delete+Insert, time of
>> rewriting each and every row being equal to initial Insert time is expected
>> thing.
>>
>> You can also update SRID in two columns in one go:
>>
>> update tablename set geom1 = ST_SetSRID(geom1, 4326), geom2 =
>> ST_SetSRID(geom2, 4326);
>>
>> Out of curiosity, where did you learn about UpdateGeometrySRID before
>> learning about ST_SetSRID?
>>
>> ср, 31 окт. 2018 г. в 21:57, jerry73204 <jerry73204 at gmail.com>:
>>
>>> Hi all,
>>>
>>> I'm stuck in the low performance of UpdateGeometrySRID().
>>>
>>> I get started with a 50GB polygon dataset in CSV in EPSG:4326
>>> coordinates. Since I find no way to `\copy` the csv while preserving the
>>> SRID, the data is imported with null SRID and then `SELECT
>>> UpdateGeometrySRID('table', 'column', 4326)`.
>>>
>>> The `UpdateGeometrySRID()` takes as long time as that of `\copy`, which
>>> turns out to be approx two hours. The dataset has two geometry columns
>>> and thus I have to take triple time to finish this data.
>>>
>>> I profiled the postgresql daemon. The avg disk writing speed is 30MB/s,
>>> while occasionally peaks to 100MB/s. The SSD, F2FS formatted disk is
>>> capable of up to 150MB/s. The daemon does not utilize the 4-core
>>> i5-7600k CPU. It seems to be a single process task with avg CPU load
>>> 20%, while other workers are idle. I wonder if there's a room for
>>> improving the performance. Also, I'm looking for if it's possible to
>>> preserve SRID with `\copy`.
>>>
>>> Jerry Lin
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>> --
>> Darafei Praliaskouski
>> Support me: http://patreon.com/komzpa
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20181101/9381122e/attachment.html>