[Qgis-user] Migrating legacy QGIS instance

chris hermansen clhermansen at gmail.com
Wed Aug 12 09:47:38 PDT 2020


Walt and everyone,

On Wed, Aug 12, 2020 at 9:00 AM Walt Ludwick <walt at valedalama.net> wrote:

> Oh, i get it (duh!):  overhead = minimum file size.  Makes sense, since
> every .gpkg is its own SQLite instance -and 5mb is a small price to pay for
> a RDBMS in a single file, essentially.  Still, something to bear in mind,
> while designing one's information architecture for the GIS.  Thanks,
> Charles!
>

Walt, since you state that you "aim to open it up to colleagues" at some
point, you might just want to bite the bullet right from the start and
stuff it all into PostGIS.  Not to belittle all the good stuff in SQLite at
all, but that's no kind of multi-user database.  Another cool thing you get
for free with PostGIS - when you get around to building your web-based
access, you can do a lot of spatial processing right in PostGIS, without
requiring any client or middleware libraries / integration.

I don't know about the OS/X setup but on my Ubuntu 20.04 I see I have
somehow managed to install a "shp2pgsql-gui" tool that looks like it might
be of use in facilitating a migration.

Worth checking out is Regina Obe's and Leo Hsu's Manning book, PostGIS in
Action, which is in early access third edition
<https://www.manning.com/books/postgis-in-action-third-edition?utm_source=google&utm_medium=search&utm_campaign=dynamicsearch&gclid=EAIaIQobChMI2bWl9YuW6wIVchh9Ch1JfwrcEAAYAiAAEgIix_D_BwE>.
Chapter 5 "Using PostGIS on the desktop" gives an overview of using
PostGIS-hosted data with OpenJUMP, QGIS, gvSIG and Jupyter.


> On Tue, Aug 11, 2020 at 9:02 PM Charles Dixon-Paver <charles at kartoza.com>
> wrote:
>
>> Sorry for the confusion Walt, but the "overhead" I was referring to here
>> is actually the fact that gpkg is implemented as a SQLite container with a
>> *minimum* filesize which adds a couple MB. I think the "overhead" will
>> vary depending on the type of data stored. Basically, if you make one for
>> every shapefile you could probably expect to end up with an additional ~5MB
>> of bloat to your existing data store for each shapefile converted...
>>
>> Upper limits as you stated should be (in theory) ~140TB, or at least
>> somewhere upwards from whatever I would usually consider practical to store
>> in a database that's stored as a single flat file...
>>
>> Regarding geomoose on Mac, you could try use docker to test it out
>> https://github.com/geomoose/docker-geomoose
>>
>> In terms of the specifics on how to restructure your data infrastructure,
>> it seems like it's going to depend a lot on the specifics of your use case
>> and is probably outside the scope of this mailing list, or at least this
>> thread... Migrating projects is another beast altogether, so maybe someone
>> else can offer advice on that.
>>
>> Regards
>>
>> On Tue, 11 Aug 2020 at 20:20, Walt Ludwick <walt at valedalama.net> wrote:
>>
>>> This makes good sense to me, Charles.  I've got enough experience with
>>> databases (tho not so much with geographic ones) that i'm comfortable w/
>>> SQL query tools. Unless a list or directory is small enough to eyeball with
>>> ease (certainly the case with this legacy QGIS instance i've inherited),
>>> i'd much rather search than dig for the data, so... In this sense at least,
>>> less fragmentation is more.
>>>
>>> That being said: i don't know if i can bundle all into a single .gpkg;
>>> if there is a size limit as low as 5MB on each one, then certainly not.
>>> Google search on string "Geopackage size limit" returns multiple
>>> credible-looking pages that cite a limit (subject to filesystem
>>> constraints) of 140TB.  Can you clarify about the "~5MB of storage overhead
>>> for each unique .gpkg" comment?
>>>
>>> In any case: if i go for selective consolidation -selection scheme still
>>> TBD[1]- then i must certainly bear in mind your caution about the data loss
>>> risk associated with careless use of certain processing tools/
>>> configurations.  If there be tools & configs oriented to one & only one
>>> .gpkg file, i don't yet know about them... But i'll certainly watch out for
>>> that and keep a good backup!
>>>
>>> [1] As to selection (or classification, i should say) and naming of
>>> .gpkg files that will consolidate any number of .shp files: i am thinking
>>> along lines of either data type (raster and vector being two high-level
>>> groupings, with subtypes that might have more to do with the schema of
>>> tabular data), or else data source (which often has much to do with data
>>> reliability, maintainability -and value, ultimately).  Need to think a bit
>>> more deeply on this, and would be happy for any guidance from more
>>> experienced GIS admins.
>>>
>>>
>>> On Tue, Aug 11, 2020 at 2:31 PM Charles Dixon-Paver <charles at kartoza.com>
>>> wrote:
>>>
>>>> Regarding the one-vs-many approach to gpkgs, I recommend consolidation
>>>> (within reason). I feel that the temptation to use gpkg as a drop-in
>>>> replacement for shp is familiarity with processes I personally consider to
>>>> be largely outmoded. I think it's worth getting over the initial
>>>> (relatively shallow) learning curve so that when you start working with db
>>>> oriented systems like PostGIS, everything makes sense right out of the gate.
>>>>
>>>> Basically it boils down to how you want to manage or distribute them as
>>>> you don't have traditional db roles. Personally, I try to package things
>>>> into "data.gpkg/something" and "data.gpkg/somethingelse" wherever possible,
>>>> rather than "a.gpkg/a" and "b.gpkg/b". It usually makes moving data around
>>>> easier for me. If you have a lot of inputs, maybe split it into unique
>>>> gpkgs based on some categorising criteria (like you might do with a schema)
>>>> rather than one monolithic gpkg. Performing maintenance (vacuum) on a large
>>>> number of unique gpkgs seems like an unnecessary chore.
>>>>
>>>> One limitation for gpkg is that certain processing tools/
>>>> configurations will only support writing to an entire gpkg, so if you lack
>>>> experience you'll need to be careful not to overwrite all of your data and
>>>> also have a decent backup plan in place. Usually you can get away with
>>>> utilising a scratch.gpkg for that purpose with no risk to your primary
>>>> datastore.
>>>>
>>>> Using the one-per-item feature offers little data management benefit
>>>> from shapefiles aside from removing the auxiliary files and being able to
>>>> store styles (as well as lae). There is little performance benefit over shp
>>>> directly from what I understand (both use WKB), but there is ~5MB of
>>>> storage overhead for each unique gpkg (if I remember correctly), but this
>>>> will depend on your use case.
>>>>
>>>> Hope that helps.
>>>>
>>>> On Tue, 11 Aug 2020 at 15:13, Basques, Bob (CI-StPaul) <
>>>> bob.basques at ci.stpaul.mn.us> wrote:
>>>>
>>>>> *Depending on your end goal, you might be more suited to leaving
>>>>> things as they are and using  some sort of content explorer to organize the
>>>>> existing data.  Then worry about migrating to different formats as needed.*
>>>>>
>>>>>
>>>>>
>>>>> *We’ve been using GeoMoose for this purpose.  It can connect to just
>>>>> about any data source on the back end, such as SHP, Postgres, and
>>>>> GeoPackage to name a few, but also can connect to proprietary services as
>>>>> well.  Because it can use Mapserver as a display engine and data query
>>>>> tool, it lends itself to online exploration of the data without the need
>>>>> for a full blown GIS tool.  This allows for wide spread use by non-GIS
>>>>> pros.  The datasets can still be managed by you with QGIS and/or in
>>>>> Postgres/postgis, or whatever you prefer for that purpose.  The Mapserver
>>>>> setup allow for connecting to just about any type of service behind the
>>>>> scenes, and with the right configuration, you can also enable each dataset
>>>>> in the GeoMoose catalog as a WMS/WFS data source, thee standard for open
>>>>> data format access and publishing.*
>>>>>
>>>>>
>>>>>
>>>>> *Bobb*
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> *From:* Qgis-user <qgis-user-bounces at lists.osgeo.org> * On Behalf Of *Walt
>>>>> Ludwick
>>>>> *Sent:* Tuesday, August 11, 2020 7:45 AM
>>>>> *To:* qgis-user at lists.osgeo.org
>>>>> *Subject:* Re: [Qgis-user] Migrating legacy QGIS instance
>>>>>
>>>>>
>>>>>
>>>>> *Think Before You Click: *This email originated *outside *our
>>>>> organization.
>>>>>
>>>>>
>>>>>
>>>>> I'm on MacOS -and not so very comfortable with command line scripting-
>>>>> so it looks like i might have to go the drag&drop way to import these .shp
>>>>> files. Will take some time, but at least that way i can be sure about what
>>>>> i've put where, and in what form.
>>>>>
>>>>>
>>>>>
>>>>> But i do wonder about the (a) "stick multiple shps into a single gpkg"
>>>>> OR (b) "create one per feature" decision, since i'm not experienced enough
>>>>> to have a clear preference about this.  Can you say anything about pros &
>>>>> cons of going one way vs the other?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Aug 11, 2020 at 11:45 AM Charles Dixon-Paver <
>>>>> charles at kartoza.com> wrote:
>>>>>
>>>>> Easiest way for me is to use the GDAL ogr2ogr
>>>>> <https://gdal.org/programs/ogr2ogr.html> command using a bash script
>>>>> or cmd batch to traverse your directories (depending on how you installed
>>>>> QGIS this should be on your path). I don't know what environment you're
>>>>> running though.
>>>>>
>>>>>
>>>>>
>>>>> You can either stick multiple shps into a single gpkg or create one
>>>>> per feature as you prefer. ogr2ogr can also push shp files directly into
>>>>> PostGIS. When you want to consolidate or migrate data (between gpkgs or
>>>>> from gpkg to PostGIS) you can simply select the feature layers you want and
>>>>> use drag and drop from the QGIS 3 Browser panel to copy multiple features
>>>>> to a target location.
>>>>>
>>>>>
>>>>>
>>>>> Others might have different approaches though.
>>>>>
>>>>>
>>>>>
>>>>> Regards
>>>>>
>>>>>
>>>>>
>>>>> On Tue, 11 Aug 2020 at 12:24, Walt Ludwick <walt at valedalama.net>
>>>>> wrote:
>>>>>
>>>>> I've inherited a legacy GIS, built up over some years in versions 2.x,
>>>>> that i'm now responsible to maintain.  Being an almost complete n00b (did
>>>>> take a short course in QGIS a good few years ago, but still..), i could
>>>>> really use some advice about migration.
>>>>>
>>>>> i've created a new QGIS instance in version 3.14, into which i am
>>>>> trying to bring all useful content from our old system: oodles of
>>>>> shapefiles, essentially, plus all those other files (each .shp file appears
>>>>> to bring with it a set of.shx, .dbf, .prj, qpj  files, plus a .cpg file for
>>>>> each layer, it seems).  This is a significant dataset- 14gb, >1000 files
>>>>> -and that is just base data, not counting Projects built on this data or
>>>>> Layouts used for presenting these projects in various ways. Some of this is
>>>>> cruft that i can happily do without, but still:  i've got a lot of
>>>>> porting-over to do, without a clear idea of how best to do it.
>>>>>
>>>>> The one thing i'm clear about is: i want it all in a non-proprietary
>>>>> database (i.e. no more mess of .shp and related files) that is above all
>>>>> quick & easy to navigate & manage. It is a single-user system at this
>>>>> point, but i do aim to open it up to colleagues (off-LAN, i.e. via
>>>>> Internet) as soon as i've developed simple apps for them to use.  No idea
>>>>> how long it'll take me to get there, so...
>>>>>
>>>>> Big question at this point is: What should be the new storage format
>>>>> for all this data?  Having read a few related opinions on StackOverflow, i
>>>>> get the sense that GeoPackage will probably make for easiest migration (per this
>>>>> encouraging article
>>>>> <https://medium.com/@GispoFinland/learn-spatial-sql-and-master-geopackage-with-qgis-3-16b1e17f0291>,
>>>>> it's a simple matter of drag&drop -simple if you have just a few, i guess!
>>>>> [1]), and can easily support my needs in the short term, but then i wonder:
>>>>> How will i manage migration to PostGIS when i eventually put  this system
>>>>> online with different users/ roles enabled?
>>>>>
>>>>>
>>>>>
>>>>> [1] Given that i need to pull in some hundreds of .shp files that are
>>>>> stored in a tree of many folders & subfolders, i also wonder: is there a
>>>>> simple way that i can ask QGIS to traverse a certain directory, pull in all
>>>>> the .shp files -each as its own .gpkg layer, i suppose?
>>>>>
>>>>>
>>>>>
>>>>> Any advice about managing this migration would be much appreciated!
>>>>>
>>>>> _______________________________________________
>>>>> Qgis-user mailing list
>>>>> Qgis-user at lists.osgeo.org
>>>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>>>
>>>>> _______________________________________________
>>>>> Qgis-user mailing list
>>>>> Qgis-user at lists.osgeo.org
>>>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>>
>>>> _______________________________________________
>>> Qgis-user mailing list
>>> Qgis-user at lists.osgeo.org
>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>
>> _______________________________________________
> Qgis-user mailing list
> Qgis-user at lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user



-- 
Chris Hermansen · clhermansen "at" gmail "dot" com

C'est ma façon de parler.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-user/attachments/20200812/86c380b5/attachment.html>


More information about the Qgis-user mailing list