[gdal-dev] Python-OGR not looping over more than 500 PGSQL features

Even Rouault even.rouault at mines-paris.org
Fri Oct 9 14:46:55 EDT 2009


Selon Even Rouault <even.rouault at mines-paris.org>:

Hmm, of course, read lyr2 = ds.GetLayerByName('testhuge2') in the last snippet.

> Selon Matthieu Rigal <rigal at rapideye.de>:
>
> Matthieu,
>
> I've been able to reproduce the behaviour you've observed by running the
> following snippet :
>
> >>>
> import ogr
>
> ds = ogr.Open('PG:dbname=autotest')
> lyr = ds.GetLayerByName('testhuge')
> feat = lyr.GetNextFeature()
> count = 0
> while feat is not None:
>     count = count + 1
>     feat.DumpReadable()
>     ds.ExecuteSQL("UPDATE testhuge SET test = '%s' WHERE test = '%s'" % (
> feat.GetField(0),feat.GetField(0) ))
>     feat = lyr.GetNextFeature()
>
> print 'count = %d' % count
> <<<
>
> The counter effectively stops at 500. This is the case with GDAL trunk, but
> also
> with OSGEO4W's GDAL/OGR 1.5.4, so if it is a regression, it is against an
> older
> version.
>
> Let me try to explain that behaviour. To improve reading performance, we try
> to
> minimize the number of round-trips to the server, so we read features by
> chunks
> of 500. This is done by the following sequence:
>
> BEGIN;
> DECLARE OGRPGLayerReader CURSOR FOR SELECT * FROM testhuge;
> FETCH 500 IN OGRPGLayerReader;
>
> at that point we have the first 500 features in RAM, so we'll issue the next
> 'FETCH 500 IN...' only after 500 GetNextFeature() calls.
>
> Now, as you add a ExecuteSQL() in the loop, we execute the SQL in a new
> transaction. If one is already opened at that point, it is closed before. So
> you
> have now the following sequence
>
> COMMIT; # Close existing active transaction
> BEGIN;
> UPDATE testhuge SET test = '1' WHERE test = '1';
> COMMIT;
>
> So the OGRPGLayerReader cursor is now in a defunc state, and when issuing the
> next 'FETCH 500 IN OGRPGLayerReader;' the server returns an error and
> GetNextFeature() returns in trun NULL. End of loop, end of game.
>
> There's a workaround though. If I've well understood your test case (updating
> a
> feature that you've just read), you could probably replace the ExecuteSQL by
> a
> SetFeature() call after updating the fields needed. In trunk, SetFeature() is
> implemented by UPDATE and I could test that it does not stop the current
> transaction opened by GetNextFeature() (the effect of this is that the
> updates
> will only be commited to storage at the end of the read loop). In 1.6 branch
> and
> earlier, a DELETE and an INSERT are done, so I'm not sure how they deal with
> transactions. At first sight, this looks like it would work too, although a
> bit
> less efficient.
>
> I'm not sure how we can improve the situation more generally. Perhaps, we
> could
> try not to force closing existing transaction when executing a SQL UPDATE or
> DELETE (SELECT is another story as we open a new cursor) ? No idea what the
> other consequences / side-effects could be.
>
> I'd also note that there are many other things you cannot do. For that above
> reasons, you cannot interleaving reading 2 layers like this as the
> lyr2.GetNextFeature() will begin by COMMIT'ing the transaction opened by
> lyr1.GetNextFeature()
>
> ds = ogr.Open('PG:dbname=autotest')
> lyr1 = ds.GetLayerByName('testhuge')
> lyr2 = ds.GetLayerByName('testhuge')
> feat1 = lyr1.GetNextFeature()
> feat2 = lyr2.GetNextFeature()
> count = 0
> while feat1 is not None and feat2 is not None:
>     count = count + 1
>     feat1.DumpReadable()
>     feat2.DumpReadable()
>     feat1 = lyr1.GetNextFeature()
>     feat2 = lyr2.GetNextFeature()
>
> There's probably place for improvements in the management of transactions in
> the
> driver but this looks like pretty tricky to implement reliably.
>
> Best regards,
>
> Even
>
>
> > Hi people,
> >
> > I recently discovered a strange behaviour which was not happening before, I
> > think...
> > I'm using Python 2.5 with GDAL/OGR 1.6 and PGsql 8.3 on Linux.
> >
> > I am reading a table containing 858 features with several columns and
> > geometry.
> > I first do a SELECT on the table and the GetFeatureCount tells me 858 lines
> > are being readen.
> > Then I am looping on the OGR object Feature by feature, to make some
> > calculation and I send an UPDATE for this particular line through another
> OGR
> > object.
> > I have a counter after every UPDATE.
> >
> > Whereas it should go through all the 858 features, it stops after having
> gone
> > through 500 features, my counter at the end shows 500 and looking to the
> > database, it clearly stops after 500.
> > I've done a test on a second database and it also stopped at 500...
> >
> > Is there some recently introduced limitations on OGR side ? I have searched
> > all the website without finding something about it... May it come from some
> > Python memory limitation ( I doubt) ? Or from some PGSQL server limitation
> (I
> > doubt also because 858 features are returned at the beginning ?
> >
> > Thanks for your help,
> > Best Regards,
> > Matthieu Rigal
> >
> > RapidEye AG
> > Molkenmarkt 30
> > 14776 Brandenburg an der Havel
> > Germany
> >
> > Follow us on Twitter! www.twitter.com/rapideye_ag
> >
> > Head Office/Sitz der Gesellschaft: Brandenburg an der Havel
> > Management Board/Vorstand: Wolfgang G. Biedermann
> > Chairman of Supervisory Board/Vorsitzender des Aufsichtsrates:
> > Juergen Breitkopf
> > Commercial Register/Handelsregister Potsdam HRB 17 796
> > Tax Number/Steuernummer: 048/100/00053
> > VAT-Ident-Number/Ust.-ID: DE 199331235
> > DIN EN ISO 9001 certified
> >
> > *************************************************************************
> > Diese E-Mail enthaelt vertrauliche und/oder rechtlich geschuetzte
> > Informationen. Wenn Sie nicht der richtige Adressat sind oder diese
> > E-Mail irrtuemlich erhalten haben, informieren Sie bitte sofort den
> > Absender und vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie
> > die unbefugte Weitergabe dieser E-Mail ist nicht gestattet.
> >
> > The information in this e-mail is intended for the named recipients
> > only. It may contain privileged and confidential information. If you
> > have received this communication in error, any use, copying or
> > dissemination of its contents is strictly prohibited. Please erase all
> > copies of the message along with any included attachments and notify
> > RapidEye AG or the sender immediately by telephone at the number
> > indicated on this page.
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
> >
>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>




More information about the gdal-dev mailing list