[gdal-dev] Large GeoJSONs and aborting file opening

Mike Flannigan mflan at mflan.com
Thu Jul 29 10:28:54 PDT 2021


Yeah, there must be a way of breaking this up in Postgis or some other
method.  Maybe by county or city region, if that meets your workflow
needs.  I would break it up outside of QGIS, and then import the various
geoj's into QGIS.

I'm thinking these methods don't work on a geoj:
https://www.reddit.com/r/QGIS/comments/mbljrw/render_visible_layers_only/
https://gis.stackexchange.com/questions/111784/displaying-only-selected-features-on-map-in-qgis


Mike


On 7/29/21 11:30 AM, Simon Eves wrote:
> The problematic one in this case is about 30GB, with ~5.9M features, 
> of property parcels in Florida, each with polygons with 5-10 vertices 
> and 57 (!) other columns. Below is the first feature as printed by 
> ogrinfo. It appears to have originated as a Shapefile, which we have 
> also converted to regular GeoJSON (the 30GB one), and linear 
> GeoJSONL/Seq.
>
> The Shapefile version imports with no issues or obvious delays, with 
> features flowing basically immediately, and the overall process taking 
> about 7 minutes.
>
> The regular GeoJSON version spends over 20 minutes in the GDALOpenEx 
> call, and another 10 minutes before features flow (haven't looked yet 
> in what), after which it takes about the same 7 minutes.
>
> The GeoJSONL/Seq version spends about 12 minutes in GDALOpenEx and 
> then the same 10 minutes, and then the same 7 minutes.
>
> Note that ogrinfo has the same initial delays (20 minutes and 12 
> minutes) before it prints anything.
>
> This is all with GDAL 3.2.2 on Ubuntu 20.04 on a quad i7 4.2 with 32GB 
> and SSD.
>
> The 7 minutes of import is not the issue, and once features are 
> flowing, our code is able to be aborted. The issue is the 20-30 
> minutes where it can't because it's (seemingly) stuck in GDAL calls.
>
> Simon
> ______________________________
>
> OGRFeature(fl_parcels):0
>   CNTYNAME (String) = ESCAMBIA
>   LINK (String) = 27-083S321301060003
>   PARCELID (String) = 083S321301060003
>   NPARNO (String) = 12-033-083S321301060003
>   DORUC (String) = 000
>   PAUC (String) = 1
>   PARUSEDESC (String) = VACANT RESIDENTIAL
>   SPASS_CD (String) = (null)
>   IMPROVVAL (Integer) = 0
>   LNDVAL (Integer64) = 1069
>   JV (Integer64) = 1069
>   JV_CHNG (Integer) = (null)
>   JV_HMSTD (Integer) = (null)
>   AV_SD (Integer64) = 1069
>   AV_NSD (Integer64) = 1069
>   AV_HMSTD (Integer) = (null)
>   JV_CLASS_U (Integer) = (null)
>   ONAME (String) = GRAF MABIE PARTNERSHIP
>   OADDR1 (String) = 5544 BAKER RD
>   OADDR2 (String) = (null)
>   OCITY (String) = MILTON
>   OSTATE (String) = FL
>   OZIPCD (String) = 32570
>   PHYADDR1 (String) = (null)
>   PHYCITY (String) = PERDIDO KEY
>   PHYZIP (String) = 32507
>   SLEGAL (String) = LT 6 BLK 3 PERDIDO BAY COUNTRY
>   ALTKEY (String) = 103002391
>   ACTYRBLT (Integer) = (null)
>   EFFYRBLT (Integer) = (null)
>   TOTLVGAREA (Integer) = (null)
>   NOBULDNG (Integer) = (null)
>   NORESUNTS (Integer) = (null)
>   PARSPLT (String) = (null)
>   LNDSQFOOT (Real) = 14610.000000000000000
>   CONSTCLASS (String) = (null)
>   SALEPRC1 (Integer64) = (null)
>   SALEYR1 (Integer) = (null)
>   SALEMO1 (Integer) = (null)
>   ORBOOK1 (String) = (null)
>   ORPAGE1 (String) = (null)
>   SALEPRC2 (Integer) = (null)
>   SALEYR2 (Integer) = (null)
>   SALEMO2 (Integer) = (null)
>   NBRHDCD (String) = (null)
>   PUBLICLND (String) = (null)
>   TAXAUTHCD (String) = MSTU
>   SEC (String) = 8
>   TWN (String) = 03S
>   RNG (String) = (null)
>   CENSUSBK (String) = 12033002604
>   SOURCEAGE (String) = ESCAMBIA COUNTY PROPERTY APPRAISER
>   SOURCEDATE (Integer64) = 1506643200
>   LAT_DD (Real) = 30.334948776670700
>   LONG_DD (Real) = -87.417015732515793
>   MGRS (String) = 16RDU5991555975
>   ACRES (Real) = 0.335475404173654
>   EXMPT (String) = (null)
>   LU_RES (String) = (null)
>   LUCODE (String) = 000
>   GCID (Integer) = 3070217
>   DESCRIPT (String) = VACANT RESIDENTIAL
>   FLAG (String) = (null)
>   FGDLAQDATE (Integer64) = 1509494400
>   AUTOID (Integer) = 3070217
>   Shape_Leng (Real) = 161.443473611912992
>   Shape_Area (Real) = 1357.620793937390090
>   POLYGON ((-87.4169063602653 30.3347107536787,-87.4169237108049 
> 30.3346997733855,-87.4172722303389 30.3349846323649,-87.41724423478
> 23 30.3350359296124,-87.4172384512691 
> 30.3351985804435,-87.4171183385966 30.335195856325,-87.4169790313658 
> 30.3350555432658,-87.4167
> 283286418 30.3348031641612,-87.4167416558679 
> 30.3347974225575,-87.4167635326352 30.3347875319118,-87.4167852417644 
> 30.3347773059899,
> -87.4168030952182 30.334768463082,-87.4168206972148 
> 30.3347594525361,-87.4168382153925 30.3347501905331,-87.416855482113 
> 30.33474067
> 7073,-87.4168725811955 30.3347309121558,-87.4168895964589 
> 30.334720937691,-87.4169063602653 30.3347107536787))
>
>
> On Thu, Jul 29, 2021 at 6:49 AM Mike Flannigan <mflan at mflan.com 
> <mailto:mflan at mflan.com>> wrote:
>
>
>     I would like to hear more about large GeoJSON files.
>     How large are they?
>
>     My GeoJSON files contain linear features only.  My
>     largest one is 50.2 MB with 1,230,000 newlines in it.
>     Next biggest one is 12 MB with 280,000 newlines.  These
>     and about 140 other geojsons are open in the same project
>     and I have no problems.  In fact I converted from
>     SHP to geojson 2 years ago because I used to have problems
>     with SHP linear files.
>
>     I use QGIS 3.16.8 on Linux Mint.
>
>
>     Mike
>
>
>     On 7/28/21 2:36 PM, gdal-dev-request at lists.osgeo.org
>     <mailto:gdal-dev-request at lists.osgeo.org> wrote:
>     > Date: Wed, 28 Jul 2021 12:22:12 -0700
>     > From: Simon Eves<simon.eves at omnisci.com
>     <mailto:simon.eves at omnisci.com>>
>     > To:gdal-dev at lists.osgeo.org <mailto:To%3Agdal-dev at lists.osgeo.org>
>     > Subject: [gdal-dev] Large GeoJSONs and aborting file opening
>     > Message-ID:
>     >     
>      <CAJf0KTRsaskSOsPv8tbA+iTB+TqL_ui5y4n05WGLDw_3guRs4w at mail.gmail.com
>     <mailto:CAJf0KTRsaskSOsPv8tbA%2BiTB%2BTqL_ui5y4n05WGLDw_3guRs4w at mail.gmail.com>>
>     > Content-Type: text/plain; charset="utf-8"
>     >
>     > Dear All,
>     >
>     > I am aware that some improvements were made in the 2.3 timeframe
>     with
>     > regards to dealing with large GeoJSON files, although even in
>     3.2, it's
>     > still very slow and memory hungry.
>     >
>     > Our system allows for aborting imports, but this only works
>     reliably once
>     > it has actually got to the stage of reading features from the
>     file. With
>     > the GeoJSON, it just sits in the GDALOpenEx call for ages.
>     >
>     > My question, therefore, is whether it might be practical to run the
>     > GDALOpenEx in a separate thread with a future to return the
>     resulting
>     > handle, such that it could be monitored and killed if necessary?
>     >
>     > Mainly I would be concerned that killing the thread might trash
>     some global
>     > GDAL state that might then not be recoverable, or that the open
>     relies on
>     > some TLS for the process thread and therefore might not work
>     properly.
>     >
>     > We're going to try it anyway, but opinions welcomed, thanks!
>     >
>     > Simon
>
>
>
>
> -- 
> <http://www.omnisci.com/>
> 	
> Simon Eves
> Senior Graphics Engineer, Rendering Group
> 100 Montgomery St (5th Floor), San Francisco, CA 94104, USA
>
>
> 	
> Email: simon.eves at omnisci.com <mailto:simon.eves at omnisci.com> | Cell: 
> +1 (415) 902-1996
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20210729/a8534971/attachment-0001.html>


More information about the gdal-dev mailing list