[gdal-dev] Python-OGR not looping over more than 500 PGSQL features

Even Rouault even.rouault at mines-paris.org
Fri Oct 9 14:40:22 EDT 2009


Selon Matthieu Rigal <rigal at rapideye.de>:

Matthieu,

I've been able to reproduce the behaviour you've observed by running the
following snippet :

>>>
import ogr

ds = ogr.Open('PG:dbname=autotest')
lyr = ds.GetLayerByName('testhuge')
feat = lyr.GetNextFeature()
count = 0
while feat is not None:
    count = count + 1
    feat.DumpReadable()
    ds.ExecuteSQL("UPDATE testhuge SET test = '%s' WHERE test = '%s'" % (
feat.GetField(0),feat.GetField(0) ))
    feat = lyr.GetNextFeature()

print 'count = %d' % count
<<<

The counter effectively stops at 500. This is the case with GDAL trunk, but also
with OSGEO4W's GDAL/OGR 1.5.4, so if it is a regression, it is against an older
version.

Let me try to explain that behaviour. To improve reading performance, we try to
minimize the number of round-trips to the server, so we read features by chunks
of 500. This is done by the following sequence:

BEGIN;
DECLARE OGRPGLayerReader CURSOR FOR SELECT * FROM testhuge;
FETCH 500 IN OGRPGLayerReader;

at that point we have the first 500 features in RAM, so we'll issue the next
'FETCH 500 IN...' only after 500 GetNextFeature() calls.

Now, as you add a ExecuteSQL() in the loop, we execute the SQL in a new
transaction. If one is already opened at that point, it is closed before. So you
have now the following sequence

COMMIT; # Close existing active transaction
BEGIN;
UPDATE testhuge SET test = '1' WHERE test = '1';
COMMIT;

So the OGRPGLayerReader cursor is now in a defunc state, and when issuing the
next 'FETCH 500 IN OGRPGLayerReader;' the server returns an error and
GetNextFeature() returns in trun NULL. End of loop, end of game.

There's a workaround though. If I've well understood your test case (updating a
feature that you've just read), you could probably replace the ExecuteSQL by a
SetFeature() call after updating the fields needed. In trunk, SetFeature() is
implemented by UPDATE and I could test that it does not stop the current
transaction opened by GetNextFeature() (the effect of this is that the updates
will only be commited to storage at the end of the read loop). In 1.6 branch and
earlier, a DELETE and an INSERT are done, so I'm not sure how they deal with
transactions. At first sight, this looks like it would work too, although a bit
less efficient.

I'm not sure how we can improve the situation more generally. Perhaps, we could
try not to force closing existing transaction when executing a SQL UPDATE or
DELETE (SELECT is another story as we open a new cursor) ? No idea what the
other consequences / side-effects could be.

I'd also note that there are many other things you cannot do. For that above
reasons, you cannot interleaving reading 2 layers like this as the
lyr2.GetNextFeature() will begin by COMMIT'ing the transaction opened by
lyr1.GetNextFeature()

ds = ogr.Open('PG:dbname=autotest')
lyr1 = ds.GetLayerByName('testhuge')
lyr2 = ds.GetLayerByName('testhuge')
feat1 = lyr1.GetNextFeature()
feat2 = lyr2.GetNextFeature()
count = 0
while feat1 is not None and feat2 is not None:
    count = count + 1
    feat1.DumpReadable()
    feat2.DumpReadable()
    feat1 = lyr1.GetNextFeature()
    feat2 = lyr2.GetNextFeature()

There's probably place for improvements in the management of transactions in the
driver but this looks like pretty tricky to implement reliably.

Best regards,

Even


> Hi people,
>
> I recently discovered a strange behaviour which was not happening before, I
> think...
> I'm using Python 2.5 with GDAL/OGR 1.6 and PGsql 8.3 on Linux.
>
> I am reading a table containing 858 features with several columns and
> geometry.
> I first do a SELECT on the table and the GetFeatureCount tells me 858 lines
> are being readen.
> Then I am looping on the OGR object Feature by feature, to make some
> calculation and I send an UPDATE for this particular line through another OGR
> object.
> I have a counter after every UPDATE.
>
> Whereas it should go through all the 858 features, it stops after having gone
> through 500 features, my counter at the end shows 500 and looking to the
> database, it clearly stops after 500.
> I've done a test on a second database and it also stopped at 500...
>
> Is there some recently introduced limitations on OGR side ? I have searched
> all the website without finding something about it... May it come from some
> Python memory limitation ( I doubt) ? Or from some PGSQL server limitation (I
> doubt also because 858 features are returned at the beginning ?
>
> Thanks for your help,
> Best Regards,
> Matthieu Rigal
>
> RapidEye AG
> Molkenmarkt 30
> 14776 Brandenburg an der Havel
> Germany
>
> Follow us on Twitter! www.twitter.com/rapideye_ag
>
> Head Office/Sitz der Gesellschaft: Brandenburg an der Havel
> Management Board/Vorstand: Wolfgang G. Biedermann
> Chairman of Supervisory Board/Vorsitzender des Aufsichtsrates:
> Juergen Breitkopf
> Commercial Register/Handelsregister Potsdam HRB 17 796
> Tax Number/Steuernummer: 048/100/00053
> VAT-Ident-Number/Ust.-ID: DE 199331235
> DIN EN ISO 9001 certified
>
> *************************************************************************
> Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte
> Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
> E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den
> Absender und vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie
> die unbefugte Weitergabe dieser E-Mail ist nicht gestattet.
>
> The information in this e-mail is intended for the named recipients
> only. It may contain privileged and confidential information. If you
> have received this communication in error, any use, copying or
> dissemination of its contents is strictly prohibited. Please erase all
> copies of the message along with any included attachments and notify
> RapidEye AG or the sender immediately by telephone at the number
> indicated on this page.
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>




More information about the gdal-dev mailing list