[gdal-dev] Large GeoJSONs and aborting file opening
Mike Flannigan
mflan at mflan.com
Thu Jul 29 10:28:54 PDT 2021
Yeah, there must be a way of breaking this up in Postgis or some other
method. Maybe by county or city region, if that meets your workflow
needs. I would break it up outside of QGIS, and then import the various
geoj's into QGIS.
I'm thinking these methods don't work on a geoj:
https://www.reddit.com/r/QGIS/comments/mbljrw/render_visible_layers_only/
https://gis.stackexchange.com/questions/111784/displaying-only-selected-features-on-map-in-qgis
Mike
On 7/29/21 11:30 AM, Simon Eves wrote:
> The problematic one in this case is about 30GB, with ~5.9M features,
> of property parcels in Florida, each with polygons with 5-10 vertices
> and 57 (!) other columns. Below is the first feature as printed by
> ogrinfo. It appears to have originated as a Shapefile, which we have
> also converted to regular GeoJSON (the 30GB one), and linear
> GeoJSONL/Seq.
>
> The Shapefile version imports with no issues or obvious delays, with
> features flowing basically immediately, and the overall process taking
> about 7 minutes.
>
> The regular GeoJSON version spends over 20 minutes in the GDALOpenEx
> call, and another 10 minutes before features flow (haven't looked yet
> in what), after which it takes about the same 7 minutes.
>
> The GeoJSONL/Seq version spends about 12 minutes in GDALOpenEx and
> then the same 10 minutes, and then the same 7 minutes.
>
> Note that ogrinfo has the same initial delays (20 minutes and 12
> minutes) before it prints anything.
>
> This is all with GDAL 3.2.2 on Ubuntu 20.04 on a quad i7 4.2 with 32GB
> and SSD.
>
> The 7 minutes of import is not the issue, and once features are
> flowing, our code is able to be aborted. The issue is the 20-30
> minutes where it can't because it's (seemingly) stuck in GDAL calls.
>
> Simon
> ______________________________
>
> OGRFeature(fl_parcels):0
> CNTYNAME (String) = ESCAMBIA
> LINK (String) = 27-083S321301060003
> PARCELID (String) = 083S321301060003
> NPARNO (String) = 12-033-083S321301060003
> DORUC (String) = 000
> PAUC (String) = 1
> PARUSEDESC (String) = VACANT RESIDENTIAL
> SPASS_CD (String) = (null)
> IMPROVVAL (Integer) = 0
> LNDVAL (Integer64) = 1069
> JV (Integer64) = 1069
> JV_CHNG (Integer) = (null)
> JV_HMSTD (Integer) = (null)
> AV_SD (Integer64) = 1069
> AV_NSD (Integer64) = 1069
> AV_HMSTD (Integer) = (null)
> JV_CLASS_U (Integer) = (null)
> ONAME (String) = GRAF MABIE PARTNERSHIP
> OADDR1 (String) = 5544 BAKER RD
> OADDR2 (String) = (null)
> OCITY (String) = MILTON
> OSTATE (String) = FL
> OZIPCD (String) = 32570
> PHYADDR1 (String) = (null)
> PHYCITY (String) = PERDIDO KEY
> PHYZIP (String) = 32507
> SLEGAL (String) = LT 6 BLK 3 PERDIDO BAY COUNTRY
> ALTKEY (String) = 103002391
> ACTYRBLT (Integer) = (null)
> EFFYRBLT (Integer) = (null)
> TOTLVGAREA (Integer) = (null)
> NOBULDNG (Integer) = (null)
> NORESUNTS (Integer) = (null)
> PARSPLT (String) = (null)
> LNDSQFOOT (Real) = 14610.000000000000000
> CONSTCLASS (String) = (null)
> SALEPRC1 (Integer64) = (null)
> SALEYR1 (Integer) = (null)
> SALEMO1 (Integer) = (null)
> ORBOOK1 (String) = (null)
> ORPAGE1 (String) = (null)
> SALEPRC2 (Integer) = (null)
> SALEYR2 (Integer) = (null)
> SALEMO2 (Integer) = (null)
> NBRHDCD (String) = (null)
> PUBLICLND (String) = (null)
> TAXAUTHCD (String) = MSTU
> SEC (String) = 8
> TWN (String) = 03S
> RNG (String) = (null)
> CENSUSBK (String) = 12033002604
> SOURCEAGE (String) = ESCAMBIA COUNTY PROPERTY APPRAISER
> SOURCEDATE (Integer64) = 1506643200
> LAT_DD (Real) = 30.334948776670700
> LONG_DD (Real) = -87.417015732515793
> MGRS (String) = 16RDU5991555975
> ACRES (Real) = 0.335475404173654
> EXMPT (String) = (null)
> LU_RES (String) = (null)
> LUCODE (String) = 000
> GCID (Integer) = 3070217
> DESCRIPT (String) = VACANT RESIDENTIAL
> FLAG (String) = (null)
> FGDLAQDATE (Integer64) = 1509494400
> AUTOID (Integer) = 3070217
> Shape_Leng (Real) = 161.443473611912992
> Shape_Area (Real) = 1357.620793937390090
> POLYGON ((-87.4169063602653 30.3347107536787,-87.4169237108049
> 30.3346997733855,-87.4172722303389 30.3349846323649,-87.41724423478
> 23 30.3350359296124,-87.4172384512691
> 30.3351985804435,-87.4171183385966 30.335195856325,-87.4169790313658
> 30.3350555432658,-87.4167
> 283286418 30.3348031641612,-87.4167416558679
> 30.3347974225575,-87.4167635326352 30.3347875319118,-87.4167852417644
> 30.3347773059899,
> -87.4168030952182 30.334768463082,-87.4168206972148
> 30.3347594525361,-87.4168382153925 30.3347501905331,-87.416855482113
> 30.33474067
> 7073,-87.4168725811955 30.3347309121558,-87.4168895964589
> 30.334720937691,-87.4169063602653 30.3347107536787))
>
>
> On Thu, Jul 29, 2021 at 6:49 AM Mike Flannigan <mflan at mflan.com
> <mailto:mflan at mflan.com>> wrote:
>
>
> I would like to hear more about large GeoJSON files.
> How large are they?
>
> My GeoJSON files contain linear features only. My
> largest one is 50.2 MB with 1,230,000 newlines in it.
> Next biggest one is 12 MB with 280,000 newlines. These
> and about 140 other geojsons are open in the same project
> and I have no problems. In fact I converted from
> SHP to geojson 2 years ago because I used to have problems
> with SHP linear files.
>
> I use QGIS 3.16.8 on Linux Mint.
>
>
> Mike
>
>
> On 7/28/21 2:36 PM, gdal-dev-request at lists.osgeo.org
> <mailto:gdal-dev-request at lists.osgeo.org> wrote:
> > Date: Wed, 28 Jul 2021 12:22:12 -0700
> > From: Simon Eves<simon.eves at omnisci.com
> <mailto:simon.eves at omnisci.com>>
> > To:gdal-dev at lists.osgeo.org <mailto:To%3Agdal-dev at lists.osgeo.org>
> > Subject: [gdal-dev] Large GeoJSONs and aborting file opening
> > Message-ID:
> >
> <CAJf0KTRsaskSOsPv8tbA+iTB+TqL_ui5y4n05WGLDw_3guRs4w at mail.gmail.com
> <mailto:CAJf0KTRsaskSOsPv8tbA%2BiTB%2BTqL_ui5y4n05WGLDw_3guRs4w at mail.gmail.com>>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Dear All,
> >
> > I am aware that some improvements were made in the 2.3 timeframe
> with
> > regards to dealing with large GeoJSON files, although even in
> 3.2, it's
> > still very slow and memory hungry.
> >
> > Our system allows for aborting imports, but this only works
> reliably once
> > it has actually got to the stage of reading features from the
> file. With
> > the GeoJSON, it just sits in the GDALOpenEx call for ages.
> >
> > My question, therefore, is whether it might be practical to run the
> > GDALOpenEx in a separate thread with a future to return the
> resulting
> > handle, such that it could be monitored and killed if necessary?
> >
> > Mainly I would be concerned that killing the thread might trash
> some global
> > GDAL state that might then not be recoverable, or that the open
> relies on
> > some TLS for the process thread and therefore might not work
> properly.
> >
> > We're going to try it anyway, but opinions welcomed, thanks!
> >
> > Simon
>
>
>
>
> --
> <http://www.omnisci.com/>
>
> Simon Eves
> Senior Graphics Engineer, Rendering Group
> 100 Montgomery St (5th Floor), San Francisco, CA 94104, USA
>
>
>
> Email: simon.eves at omnisci.com <mailto:simon.eves at omnisci.com> | Cell:
> +1 (415) 902-1996
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20210729/a8534971/attachment-0001.html>
More information about the gdal-dev
mailing list