[Qgis-user] Migrating legacy QGIS instance

Charles Dixon-Paver charles at kartoza.com
Tue Aug 11 13:02:26 PDT 2020


Sorry for the confusion Walt, but the "overhead" I was referring to here is
actually the fact that gpkg is implemented as a SQLite container with a
*minimum* filesize which adds a couple MB. I think the "overhead" will vary
depending on the type of data stored. Basically, if you make one for every
shapefile you could probably expect to end up with an additional ~5MB of
bloat to your existing data store for each shapefile converted...

Upper limits as you stated should be (in theory) ~140TB, or at least
somewhere upwards from whatever I would usually consider practical to store
in a database that's stored as a single flat file...

Regarding geomoose on Mac, you could try use docker to test it out
https://github.com/geomoose/docker-geomoose

In terms of the specifics on how to restructure your data infrastructure,
it seems like it's going to depend a lot on the specifics of your use case
and is probably outside the scope of this mailing list, or at least this
thread... Migrating projects is another beast altogether, so maybe someone
else can offer advice on that.

Regards

On Tue, 11 Aug 2020 at 20:20, Walt Ludwick <walt at valedalama.net> wrote:

> This makes good sense to me, Charles.  I've got enough experience with
> databases (tho not so much with geographic ones) that i'm comfortable w/
> SQL query tools. Unless a list or directory is small enough to eyeball with
> ease (certainly the case with this legacy QGIS instance i've inherited),
> i'd much rather search than dig for the data, so... In this sense at least,
> less fragmentation is more.
>
> That being said: i don't know if i can bundle all into a single .gpkg; if
> there is a size limit as low as 5MB on each one, then certainly not.
> Google search on string "Geopackage size limit" returns multiple
> credible-looking pages that cite a limit (subject to filesystem
> constraints) of 140TB.  Can you clarify about the "~5MB of storage overhead
> for each unique .gpkg" comment?
>
> In any case: if i go for selective consolidation -selection scheme still
> TBD[1]- then i must certainly bear in mind your caution about the data loss
> risk associated with careless use of certain processing tools/
> configurations.  If there be tools & configs oriented to one & only one
> .gpkg file, i don't yet know about them... But i'll certainly watch out for
> that and keep a good backup!
>
> [1] As to selection (or classification, i should say) and naming of .gpkg
> files that will consolidate any number of .shp files: i am thinking along
> lines of either data type (raster and vector being two high-level
> groupings, with subtypes that might have more to do with the schema of
> tabular data), or else data source (which often has much to do with data
> reliability, maintainability -and value, ultimately).  Need to think a bit
> more deeply on this, and would be happy for any guidance from more
> experienced GIS admins.
>
>
> On Tue, Aug 11, 2020 at 2:31 PM Charles Dixon-Paver <charles at kartoza.com>
> wrote:
>
>> Regarding the one-vs-many approach to gpkgs, I recommend consolidation
>> (within reason). I feel that the temptation to use gpkg as a drop-in
>> replacement for shp is familiarity with processes I personally consider to
>> be largely outmoded. I think it's worth getting over the initial
>> (relatively shallow) learning curve so that when you start working with db
>> oriented systems like PostGIS, everything makes sense right out of the gate.
>>
>> Basically it boils down to how you want to manage or distribute them as
>> you don't have traditional db roles. Personally, I try to package things
>> into "data.gpkg/something" and "data.gpkg/somethingelse" wherever possible,
>> rather than "a.gpkg/a" and "b.gpkg/b". It usually makes moving data around
>> easier for me. If you have a lot of inputs, maybe split it into unique
>> gpkgs based on some categorising criteria (like you might do with a schema)
>> rather than one monolithic gpkg. Performing maintenance (vacuum) on a large
>> number of unique gpkgs seems like an unnecessary chore.
>>
>> One limitation for gpkg is that certain processing tools/ configurations
>> will only support writing to an entire gpkg, so if you lack experience
>> you'll need to be careful not to overwrite all of your data and also have a
>> decent backup plan in place. Usually you can get away with utilising a
>> scratch.gpkg for that purpose with no risk to your primary datastore.
>>
>> Using the one-per-item feature offers little data management benefit from
>> shapefiles aside from removing the auxiliary files and being able to store
>> styles (as well as lae). There is little performance benefit over shp
>> directly from what I understand (both use WKB), but there is ~5MB of
>> storage overhead for each unique gpkg (if I remember correctly), but this
>> will depend on your use case.
>>
>> Hope that helps.
>>
>> On Tue, 11 Aug 2020 at 15:13, Basques, Bob (CI-StPaul) <
>> bob.basques at ci.stpaul.mn.us> wrote:
>>
>>> *Depending on your end goal, you might be more suited to leaving things
>>> as they are and using  some sort of content explorer to organize the
>>> existing data.  Then worry about migrating to different formats as needed.*
>>>
>>>
>>>
>>> *We’ve been using GeoMoose for this purpose.  It can connect to just
>>> about any data source on the back end, such as SHP, Postgres, and
>>> GeoPackage to name a few, but also can connect to proprietary services as
>>> well.  Because it can use Mapserver as a display engine and data query
>>> tool, it lends itself to online exploration of the data without the need
>>> for a full blown GIS tool.  This allows for wide spread use by non-GIS
>>> pros.  The datasets can still be managed by you with QGIS and/or in
>>> Postgres/postgis, or whatever you prefer for that purpose.  The Mapserver
>>> setup allow for connecting to just about any type of service behind the
>>> scenes, and with the right configuration, you can also enable each dataset
>>> in the GeoMoose catalog as a WMS/WFS data source, thee standard for open
>>> data format access and publishing.*
>>>
>>>
>>>
>>> *Bobb*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Qgis-user <qgis-user-bounces at lists.osgeo.org> * On Behalf Of *Walt
>>> Ludwick
>>> *Sent:* Tuesday, August 11, 2020 7:45 AM
>>> *To:* qgis-user at lists.osgeo.org
>>> *Subject:* Re: [Qgis-user] Migrating legacy QGIS instance
>>>
>>>
>>>
>>> *Think Before You Click: *This email originated *outside *our
>>> organization.
>>>
>>>
>>>
>>> I'm on MacOS -and not so very comfortable with command line scripting-
>>> so it looks like i might have to go the drag&drop way to import these .shp
>>> files. Will take some time, but at least that way i can be sure about what
>>> i've put where, and in what form.
>>>
>>>
>>>
>>> But i do wonder about the (a) "stick multiple shps into a single gpkg"
>>> OR (b) "create one per feature" decision, since i'm not experienced enough
>>> to have a clear preference about this.  Can you say anything about pros &
>>> cons of going one way vs the other?
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Aug 11, 2020 at 11:45 AM Charles Dixon-Paver <
>>> charles at kartoza.com> wrote:
>>>
>>> Easiest way for me is to use the GDAL ogr2ogr
>>> <https://gdal.org/programs/ogr2ogr.html> command using a bash script or
>>> cmd batch to traverse your directories (depending on how you installed QGIS
>>> this should be on your path). I don't know what environment you're running
>>> though.
>>>
>>>
>>>
>>> You can either stick multiple shps into a single gpkg or create one per
>>> feature as you prefer. ogr2ogr can also push shp files directly into
>>> PostGIS. When you want to consolidate or migrate data (between gpkgs or
>>> from gpkg to PostGIS) you can simply select the feature layers you want and
>>> use drag and drop from the QGIS 3 Browser panel to copy multiple features
>>> to a target location.
>>>
>>>
>>>
>>> Others might have different approaches though.
>>>
>>>
>>>
>>> Regards
>>>
>>>
>>>
>>> On Tue, 11 Aug 2020 at 12:24, Walt Ludwick <walt at valedalama.net> wrote:
>>>
>>> I've inherited a legacy GIS, built up over some years in versions 2.x,
>>> that i'm now responsible to maintain.  Being an almost complete n00b (did
>>> take a short course in QGIS a good few years ago, but still..), i could
>>> really use some advice about migration.
>>>
>>> i've created a new QGIS instance in version 3.14, into which i am trying
>>> to bring all useful content from our old system: oodles of shapefiles,
>>> essentially, plus all those other files (each .shp file appears to bring
>>> with it a set of.shx, .dbf, .prj, qpj  files, plus a .cpg file for each
>>> layer, it seems).  This is a significant dataset- 14gb, >1000 files -and
>>> that is just base data, not counting Projects built on this data or Layouts
>>> used for presenting these projects in various ways. Some of this is cruft
>>> that i can happily do without, but still:  i've got a lot of porting-over
>>> to do, without a clear idea of how best to do it.
>>>
>>> The one thing i'm clear about is: i want it all in a non-proprietary
>>> database (i.e. no more mess of .shp and related files) that is above all
>>> quick & easy to navigate & manage. It is a single-user system at this
>>> point, but i do aim to open it up to colleagues (off-LAN, i.e. via
>>> Internet) as soon as i've developed simple apps for them to use.  No idea
>>> how long it'll take me to get there, so...
>>>
>>> Big question at this point is: What should be the new storage format for
>>> all this data?  Having read a few related opinions on StackOverflow, i get
>>> the sense that GeoPackage will probably make for easiest migration (per this
>>> encouraging article
>>> <https://medium.com/@GispoFinland/learn-spatial-sql-and-master-geopackage-with-qgis-3-16b1e17f0291>,
>>> it's a simple matter of drag&drop -simple if you have just a few, i guess!
>>> [1]), and can easily support my needs in the short term, but then i wonder:
>>> How will i manage migration to PostGIS when i eventually put  this system
>>> online with different users/ roles enabled?
>>>
>>>
>>>
>>> [1] Given that i need to pull in some hundreds of .shp files that are
>>> stored in a tree of many folders & subfolders, i also wonder: is there a
>>> simple way that i can ask QGIS to traverse a certain directory, pull in all
>>> the .shp files -each as its own .gpkg layer, i suppose?
>>>
>>>
>>>
>>> Any advice about managing this migration would be much appreciated!
>>>
>>> _______________________________________________
>>> Qgis-user mailing list
>>> Qgis-user at lists.osgeo.org
>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>
>>> _______________________________________________
>>> Qgis-user mailing list
>>> Qgis-user at lists.osgeo.org
>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>
>> _______________________________________________
> Qgis-user mailing list
> Qgis-user at lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-user/attachments/20200811/87518f43/attachment.html>


More information about the Qgis-user mailing list