[Qgis-user] Migrating legacy QGIS instance

Walt Ludwick walt at valedalama.net
Wed Aug 12 08:59:48 PDT 2020


Oh, i get it (duh!):  overhead = minimum file size.  Makes sense, since
every .gpkg is its own SQLite instance -and 5mb is a small price to pay for
a RDBMS in a single file, essentially.  Still, something to bear in mind,
while designing one's information architecture for the GIS.  Thanks,
Charles!

On Tue, Aug 11, 2020 at 9:02 PM Charles Dixon-Paver <charles at kartoza.com>
wrote:

> Sorry for the confusion Walt, but the "overhead" I was referring to here
> is actually the fact that gpkg is implemented as a SQLite container with a
> *minimum* filesize which adds a couple MB. I think the "overhead" will
> vary depending on the type of data stored. Basically, if you make one for
> every shapefile you could probably expect to end up with an additional ~5MB
> of bloat to your existing data store for each shapefile converted...
>
> Upper limits as you stated should be (in theory) ~140TB, or at least
> somewhere upwards from whatever I would usually consider practical to store
> in a database that's stored as a single flat file...
>
> Regarding geomoose on Mac, you could try use docker to test it out
> https://github.com/geomoose/docker-geomoose
>
> In terms of the specifics on how to restructure your data infrastructure,
> it seems like it's going to depend a lot on the specifics of your use case
> and is probably outside the scope of this mailing list, or at least this
> thread... Migrating projects is another beast altogether, so maybe someone
> else can offer advice on that.
>
> Regards
>
> On Tue, 11 Aug 2020 at 20:20, Walt Ludwick <walt at valedalama.net> wrote:
>
>> This makes good sense to me, Charles.  I've got enough experience with
>> databases (tho not so much with geographic ones) that i'm comfortable w/
>> SQL query tools. Unless a list or directory is small enough to eyeball with
>> ease (certainly the case with this legacy QGIS instance i've inherited),
>> i'd much rather search than dig for the data, so... In this sense at least,
>> less fragmentation is more.
>>
>> That being said: i don't know if i can bundle all into a single .gpkg; if
>> there is a size limit as low as 5MB on each one, then certainly not.
>> Google search on string "Geopackage size limit" returns multiple
>> credible-looking pages that cite a limit (subject to filesystem
>> constraints) of 140TB.  Can you clarify about the "~5MB of storage overhead
>> for each unique .gpkg" comment?
>>
>> In any case: if i go for selective consolidation -selection scheme still
>> TBD[1]- then i must certainly bear in mind your caution about the data loss
>> risk associated with careless use of certain processing tools/
>> configurations.  If there be tools & configs oriented to one & only one
>> .gpkg file, i don't yet know about them... But i'll certainly watch out for
>> that and keep a good backup!
>>
>> [1] As to selection (or classification, i should say) and naming of .gpkg
>> files that will consolidate any number of .shp files: i am thinking along
>> lines of either data type (raster and vector being two high-level
>> groupings, with subtypes that might have more to do with the schema of
>> tabular data), or else data source (which often has much to do with data
>> reliability, maintainability -and value, ultimately).  Need to think a bit
>> more deeply on this, and would be happy for any guidance from more
>> experienced GIS admins.
>>
>>
>> On Tue, Aug 11, 2020 at 2:31 PM Charles Dixon-Paver <charles at kartoza.com>
>> wrote:
>>
>>> Regarding the one-vs-many approach to gpkgs, I recommend consolidation
>>> (within reason). I feel that the temptation to use gpkg as a drop-in
>>> replacement for shp is familiarity with processes I personally consider to
>>> be largely outmoded. I think it's worth getting over the initial
>>> (relatively shallow) learning curve so that when you start working with db
>>> oriented systems like PostGIS, everything makes sense right out of the gate.
>>>
>>> Basically it boils down to how you want to manage or distribute them as
>>> you don't have traditional db roles. Personally, I try to package things
>>> into "data.gpkg/something" and "data.gpkg/somethingelse" wherever possible,
>>> rather than "a.gpkg/a" and "b.gpkg/b". It usually makes moving data around
>>> easier for me. If you have a lot of inputs, maybe split it into unique
>>> gpkgs based on some categorising criteria (like you might do with a schema)
>>> rather than one monolithic gpkg. Performing maintenance (vacuum) on a large
>>> number of unique gpkgs seems like an unnecessary chore.
>>>
>>> One limitation for gpkg is that certain processing tools/ configurations
>>> will only support writing to an entire gpkg, so if you lack experience
>>> you'll need to be careful not to overwrite all of your data and also have a
>>> decent backup plan in place. Usually you can get away with utilising a
>>> scratch.gpkg for that purpose with no risk to your primary datastore.
>>>
>>> Using the one-per-item feature offers little data management benefit
>>> from shapefiles aside from removing the auxiliary files and being able to
>>> store styles (as well as lae). There is little performance benefit over shp
>>> directly from what I understand (both use WKB), but there is ~5MB of
>>> storage overhead for each unique gpkg (if I remember correctly), but this
>>> will depend on your use case.
>>>
>>> Hope that helps.
>>>
>>> On Tue, 11 Aug 2020 at 15:13, Basques, Bob (CI-StPaul) <
>>> bob.basques at ci.stpaul.mn.us> wrote:
>>>
>>>> *Depending on your end goal, you might be more suited to leaving things
>>>> as they are and using  some sort of content explorer to organize the
>>>> existing data.  Then worry about migrating to different formats as needed.*
>>>>
>>>>
>>>>
>>>> *We’ve been using GeoMoose for this purpose.  It can connect to just
>>>> about any data source on the back end, such as SHP, Postgres, and
>>>> GeoPackage to name a few, but also can connect to proprietary services as
>>>> well.  Because it can use Mapserver as a display engine and data query
>>>> tool, it lends itself to online exploration of the data without the need
>>>> for a full blown GIS tool.  This allows for wide spread use by non-GIS
>>>> pros.  The datasets can still be managed by you with QGIS and/or in
>>>> Postgres/postgis, or whatever you prefer for that purpose.  The Mapserver
>>>> setup allow for connecting to just about any type of service behind the
>>>> scenes, and with the right configuration, you can also enable each dataset
>>>> in the GeoMoose catalog as a WMS/WFS data source, thee standard for open
>>>> data format access and publishing.*
>>>>
>>>>
>>>>
>>>> *Bobb*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:* Qgis-user <qgis-user-bounces at lists.osgeo.org> * On Behalf Of *Walt
>>>> Ludwick
>>>> *Sent:* Tuesday, August 11, 2020 7:45 AM
>>>> *To:* qgis-user at lists.osgeo.org
>>>> *Subject:* Re: [Qgis-user] Migrating legacy QGIS instance
>>>>
>>>>
>>>>
>>>> *Think Before You Click: *This email originated *outside *our
>>>> organization.
>>>>
>>>>
>>>>
>>>> I'm on MacOS -and not so very comfortable with command line scripting-
>>>> so it looks like i might have to go the drag&drop way to import these .shp
>>>> files. Will take some time, but at least that way i can be sure about what
>>>> i've put where, and in what form.
>>>>
>>>>
>>>>
>>>> But i do wonder about the (a) "stick multiple shps into a single gpkg"
>>>> OR (b) "create one per feature" decision, since i'm not experienced enough
>>>> to have a clear preference about this.  Can you say anything about pros &
>>>> cons of going one way vs the other?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Aug 11, 2020 at 11:45 AM Charles Dixon-Paver <
>>>> charles at kartoza.com> wrote:
>>>>
>>>> Easiest way for me is to use the GDAL ogr2ogr
>>>> <https://gdal.org/programs/ogr2ogr.html> command using a bash script
>>>> or cmd batch to traverse your directories (depending on how you installed
>>>> QGIS this should be on your path). I don't know what environment you're
>>>> running though.
>>>>
>>>>
>>>>
>>>> You can either stick multiple shps into a single gpkg or create one per
>>>> feature as you prefer. ogr2ogr can also push shp files directly into
>>>> PostGIS. When you want to consolidate or migrate data (between gpkgs or
>>>> from gpkg to PostGIS) you can simply select the feature layers you want and
>>>> use drag and drop from the QGIS 3 Browser panel to copy multiple features
>>>> to a target location.
>>>>
>>>>
>>>>
>>>> Others might have different approaches though.
>>>>
>>>>
>>>>
>>>> Regards
>>>>
>>>>
>>>>
>>>> On Tue, 11 Aug 2020 at 12:24, Walt Ludwick <walt at valedalama.net> wrote:
>>>>
>>>> I've inherited a legacy GIS, built up over some years in versions 2.x,
>>>> that i'm now responsible to maintain.  Being an almost complete n00b (did
>>>> take a short course in QGIS a good few years ago, but still..), i could
>>>> really use some advice about migration.
>>>>
>>>> i've created a new QGIS instance in version 3.14, into which i am
>>>> trying to bring all useful content from our old system: oodles of
>>>> shapefiles, essentially, plus all those other files (each .shp file appears
>>>> to bring with it a set of.shx, .dbf, .prj, qpj  files, plus a .cpg file for
>>>> each layer, it seems).  This is a significant dataset- 14gb, >1000 files
>>>> -and that is just base data, not counting Projects built on this data or
>>>> Layouts used for presenting these projects in various ways. Some of this is
>>>> cruft that i can happily do without, but still:  i've got a lot of
>>>> porting-over to do, without a clear idea of how best to do it.
>>>>
>>>> The one thing i'm clear about is: i want it all in a non-proprietary
>>>> database (i.e. no more mess of .shp and related files) that is above all
>>>> quick & easy to navigate & manage. It is a single-user system at this
>>>> point, but i do aim to open it up to colleagues (off-LAN, i.e. via
>>>> Internet) as soon as i've developed simple apps for them to use.  No idea
>>>> how long it'll take me to get there, so...
>>>>
>>>> Big question at this point is: What should be the new storage format
>>>> for all this data?  Having read a few related opinions on StackOverflow, i
>>>> get the sense that GeoPackage will probably make for easiest migration (per this
>>>> encouraging article
>>>> <https://medium.com/@GispoFinland/learn-spatial-sql-and-master-geopackage-with-qgis-3-16b1e17f0291>,
>>>> it's a simple matter of drag&drop -simple if you have just a few, i guess!
>>>> [1]), and can easily support my needs in the short term, but then i wonder:
>>>> How will i manage migration to PostGIS when i eventually put  this system
>>>> online with different users/ roles enabled?
>>>>
>>>>
>>>>
>>>> [1] Given that i need to pull in some hundreds of .shp files that are
>>>> stored in a tree of many folders & subfolders, i also wonder: is there a
>>>> simple way that i can ask QGIS to traverse a certain directory, pull in all
>>>> the .shp files -each as its own .gpkg layer, i suppose?
>>>>
>>>>
>>>>
>>>> Any advice about managing this migration would be much appreciated!
>>>>
>>>> _______________________________________________
>>>> Qgis-user mailing list
>>>> Qgis-user at lists.osgeo.org
>>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>>
>>>> _______________________________________________
>>>> Qgis-user mailing list
>>>> Qgis-user at lists.osgeo.org
>>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>
>>> _______________________________________________
>> Qgis-user mailing list
>> Qgis-user at lists.osgeo.org
>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-user/attachments/20200812/7e86ad57/attachment-0001.html>


More information about the Qgis-user mailing list