[Qgis-user] Migrating legacy QGIS instance

Walt Ludwick walt at valedalama.net
Wed Aug 12 10:46:47 PDT 2020


Thanks Chris for the pointer to that book;  looks like just the resource
i'll want to have on hand, when scope of my deployment calls for migrating
data to PostGIS -which time will certainly come (rather sooner than later,
i hope), but then should be no big deal, from all i've been able to gather.

At this point, i've still got a long way to go with understanding QGIS
basics, supported by some good learning material (shout out to Klas
Karlsson here for his excellent video tutorials [1]; rock on, Klas!) but
it's all file-based so far, which is complex enough already, without the
additional overhead of server administration.

[1] https://www.youtube.com/channel/UCxs7cfMwzgGZhtUuwhny4-Q

On Wed, Aug 12, 2020 at 5:47 PM chris hermansen <clhermansen at gmail.com>
wrote:

> Walt and everyone,
>
> On Wed, Aug 12, 2020 at 9:00 AM Walt Ludwick <walt at valedalama.net> wrote:
>
>> Oh, i get it (duh!):  overhead = minimum file size.  Makes sense, since
>> every .gpkg is its own SQLite instance -and 5mb is a small price to pay for
>> a RDBMS in a single file, essentially.  Still, something to bear in mind,
>> while designing one's information architecture for the GIS.  Thanks,
>> Charles!
>>
>
> Walt, since you state that you "aim to open it up to colleagues" at some
> point, you might just want to bite the bullet right from the start and
> stuff it all into PostGIS.  Not to belittle all the good stuff in SQLite at
> all, but that's no kind of multi-user database.  Another cool thing you get
> for free with PostGIS - when you get around to building your web-based
> access, you can do a lot of spatial processing right in PostGIS, without
> requiring any client or middleware libraries / integration.
>
> I don't know about the OS/X setup but on my Ubuntu 20.04 I see I have
> somehow managed to install a "shp2pgsql-gui" tool that looks like it might
> be of use in facilitating a migration.
>
> Worth checking out is Regina Obe's and Leo Hsu's Manning book, PostGIS in
> Action, which is in early access third edition
> <https://www.manning.com/books/postgis-in-action-third-edition?utm_source=google&utm_medium=search&utm_campaign=dynamicsearch&gclid=EAIaIQobChMI2bWl9YuW6wIVchh9Ch1JfwrcEAAYAiAAEgIix_D_BwE>.
> Chapter 5 "Using PostGIS on the desktop" gives an overview of using
> PostGIS-hosted data with OpenJUMP, QGIS, gvSIG and Jupyter.
>
>
>> On Tue, Aug 11, 2020 at 9:02 PM Charles Dixon-Paver <charles at kartoza.com>
>> wrote:
>>
>>> Sorry for the confusion Walt, but the "overhead" I was referring to here
>>> is actually the fact that gpkg is implemented as a SQLite container with a
>>> *minimum* filesize which adds a couple MB. I think the "overhead" will
>>> vary depending on the type of data stored. Basically, if you make one for
>>> every shapefile you could probably expect to end up with an additional ~5MB
>>> of bloat to your existing data store for each shapefile converted...
>>>
>>> Upper limits as you stated should be (in theory) ~140TB, or at least
>>> somewhere upwards from whatever I would usually consider practical to store
>>> in a database that's stored as a single flat file...
>>>
>>> Regarding geomoose on Mac, you could try use docker to test it out
>>> https://github.com/geomoose/docker-geomoose
>>>
>>> In terms of the specifics on how to restructure your data
>>> infrastructure, it seems like it's going to depend a lot on the specifics
>>> of your use case and is probably outside the scope of this mailing list, or
>>> at least this thread... Migrating projects is another beast altogether, so
>>> maybe someone else can offer advice on that.
>>>
>>> Regards
>>>
>>> On Tue, 11 Aug 2020 at 20:20, Walt Ludwick <walt at valedalama.net> wrote:
>>>
>>>> This makes good sense to me, Charles.  I've got enough experience with
>>>> databases (tho not so much with geographic ones) that i'm comfortable w/
>>>> SQL query tools. Unless a list or directory is small enough to eyeball with
>>>> ease (certainly the case with this legacy QGIS instance i've inherited),
>>>> i'd much rather search than dig for the data, so... In this sense at least,
>>>> less fragmentation is more.
>>>>
>>>> That being said: i don't know if i can bundle all into a single .gpkg;
>>>> if there is a size limit as low as 5MB on each one, then certainly not.
>>>> Google search on string "Geopackage size limit" returns multiple
>>>> credible-looking pages that cite a limit (subject to filesystem
>>>> constraints) of 140TB.  Can you clarify about the "~5MB of storage overhead
>>>> for each unique .gpkg" comment?
>>>>
>>>> In any case: if i go for selective consolidation -selection scheme
>>>> still TBD[1]- then i must certainly bear in mind your caution about the
>>>> data loss risk associated with careless use of certain processing tools/
>>>> configurations.  If there be tools & configs oriented to one & only one
>>>> .gpkg file, i don't yet know about them... But i'll certainly watch out for
>>>> that and keep a good backup!
>>>>
>>>> [1] As to selection (or classification, i should say) and naming of
>>>> .gpkg files that will consolidate any number of .shp files: i am thinking
>>>> along lines of either data type (raster and vector being two high-level
>>>> groupings, with subtypes that might have more to do with the schema of
>>>> tabular data), or else data source (which often has much to do with data
>>>> reliability, maintainability -and value, ultimately).  Need to think a bit
>>>> more deeply on this, and would be happy for any guidance from more
>>>> experienced GIS admins.
>>>>
>>>>
>>>> On Tue, Aug 11, 2020 at 2:31 PM Charles Dixon-Paver <
>>>> charles at kartoza.com> wrote:
>>>>
>>>>> Regarding the one-vs-many approach to gpkgs, I recommend consolidation
>>>>> (within reason). I feel that the temptation to use gpkg as a drop-in
>>>>> replacement for shp is familiarity with processes I personally consider to
>>>>> be largely outmoded. I think it's worth getting over the initial
>>>>> (relatively shallow) learning curve so that when you start working with db
>>>>> oriented systems like PostGIS, everything makes sense right out of the gate.
>>>>>
>>>>> Basically it boils down to how you want to manage or distribute them
>>>>> as you don't have traditional db roles. Personally, I try to package things
>>>>> into "data.gpkg/something" and "data.gpkg/somethingelse" wherever possible,
>>>>> rather than "a.gpkg/a" and "b.gpkg/b". It usually makes moving data around
>>>>> easier for me. If you have a lot of inputs, maybe split it into unique
>>>>> gpkgs based on some categorising criteria (like you might do with a schema)
>>>>> rather than one monolithic gpkg. Performing maintenance (vacuum) on a large
>>>>> number of unique gpkgs seems like an unnecessary chore.
>>>>>
>>>>> One limitation for gpkg is that certain processing tools/
>>>>> configurations will only support writing to an entire gpkg, so if you lack
>>>>> experience you'll need to be careful not to overwrite all of your data and
>>>>> also have a decent backup plan in place. Usually you can get away with
>>>>> utilising a scratch.gpkg for that purpose with no risk to your primary
>>>>> datastore.
>>>>>
>>>>> Using the one-per-item feature offers little data management benefit
>>>>> from shapefiles aside from removing the auxiliary files and being able to
>>>>> store styles (as well as lae). There is little performance benefit over shp
>>>>> directly from what I understand (both use WKB), but there is ~5MB of
>>>>> storage overhead for each unique gpkg (if I remember correctly), but this
>>>>> will depend on your use case.
>>>>>
>>>>> Hope that helps.
>>>>>
>>>>> On Tue, 11 Aug 2020 at 15:13, Basques, Bob (CI-StPaul) <
>>>>> bob.basques at ci.stpaul.mn.us> wrote:
>>>>>
>>>>>> *Depending on your end goal, you might be more suited to leaving
>>>>>> things as they are and using  some sort of content explorer to organize the
>>>>>> existing data.  Then worry about migrating to different formats as needed.*
>>>>>>
>>>>>>
>>>>>>
>>>>>> *We’ve been using GeoMoose for this purpose.  It can connect to just
>>>>>> about any data source on the back end, such as SHP, Postgres, and
>>>>>> GeoPackage to name a few, but also can connect to proprietary services as
>>>>>> well.  Because it can use Mapserver as a display engine and data query
>>>>>> tool, it lends itself to online exploration of the data without the need
>>>>>> for a full blown GIS tool.  This allows for wide spread use by non-GIS
>>>>>> pros.  The datasets can still be managed by you with QGIS and/or in
>>>>>> Postgres/postgis, or whatever you prefer for that purpose.  The Mapserver
>>>>>> setup allow for connecting to just about any type of service behind the
>>>>>> scenes, and with the right configuration, you can also enable each dataset
>>>>>> in the GeoMoose catalog as a WMS/WFS data source, thee standard for open
>>>>>> data format access and publishing.*
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Bobb*
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From:* Qgis-user <qgis-user-bounces at lists.osgeo.org> * On Behalf Of
>>>>>> *Walt Ludwick
>>>>>> *Sent:* Tuesday, August 11, 2020 7:45 AM
>>>>>> *To:* qgis-user at lists.osgeo.org
>>>>>> *Subject:* Re: [Qgis-user] Migrating legacy QGIS instance
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Think Before You Click: *This email originated *outside *our
>>>>>> organization.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I'm on MacOS -and not so very comfortable with command line
>>>>>> scripting- so it looks like i might have to go the drag&drop way to import
>>>>>> these .shp files. Will take some time, but at least that way i can be sure
>>>>>> about what i've put where, and in what form.
>>>>>>
>>>>>>
>>>>>>
>>>>>> But i do wonder about the (a) "stick multiple shps into a single
>>>>>> gpkg" OR (b) "create one per feature" decision, since i'm not experienced
>>>>>> enough to have a clear preference about this.  Can you say anything about
>>>>>> pros & cons of going one way vs the other?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Aug 11, 2020 at 11:45 AM Charles Dixon-Paver <
>>>>>> charles at kartoza.com> wrote:
>>>>>>
>>>>>> Easiest way for me is to use the GDAL ogr2ogr
>>>>>> <https://gdal.org/programs/ogr2ogr.html> command using a bash script
>>>>>> or cmd batch to traverse your directories (depending on how you installed
>>>>>> QGIS this should be on your path). I don't know what environment you're
>>>>>> running though.
>>>>>>
>>>>>>
>>>>>>
>>>>>> You can either stick multiple shps into a single gpkg or create one
>>>>>> per feature as you prefer. ogr2ogr can also push shp files directly into
>>>>>> PostGIS. When you want to consolidate or migrate data (between gpkgs or
>>>>>> from gpkg to PostGIS) you can simply select the feature layers you want and
>>>>>> use drag and drop from the QGIS 3 Browser panel to copy multiple features
>>>>>> to a target location.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Others might have different approaches though.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 11 Aug 2020 at 12:24, Walt Ludwick <walt at valedalama.net>
>>>>>> wrote:
>>>>>>
>>>>>> I've inherited a legacy GIS, built up over some years in versions
>>>>>> 2.x, that i'm now responsible to maintain.  Being an almost complete n00b
>>>>>> (did take a short course in QGIS a good few years ago, but still..), i
>>>>>> could really use some advice about migration.
>>>>>>
>>>>>> i've created a new QGIS instance in version 3.14, into which i am
>>>>>> trying to bring all useful content from our old system: oodles of
>>>>>> shapefiles, essentially, plus all those other files (each .shp file appears
>>>>>> to bring with it a set of.shx, .dbf, .prj, qpj  files, plus a .cpg file for
>>>>>> each layer, it seems).  This is a significant dataset- 14gb, >1000 files
>>>>>> -and that is just base data, not counting Projects built on this data or
>>>>>> Layouts used for presenting these projects in various ways. Some of this is
>>>>>> cruft that i can happily do without, but still:  i've got a lot of
>>>>>> porting-over to do, without a clear idea of how best to do it.
>>>>>>
>>>>>> The one thing i'm clear about is: i want it all in a non-proprietary
>>>>>> database (i.e. no more mess of .shp and related files) that is above all
>>>>>> quick & easy to navigate & manage. It is a single-user system at this
>>>>>> point, but i do aim to open it up to colleagues (off-LAN, i.e. via
>>>>>> Internet) as soon as i've developed simple apps for them to use.  No idea
>>>>>> how long it'll take me to get there, so...
>>>>>>
>>>>>> Big question at this point is: What should be the new storage format
>>>>>> for all this data?  Having read a few related opinions on StackOverflow, i
>>>>>> get the sense that GeoPackage will probably make for easiest migration (per this
>>>>>> encouraging article
>>>>>> <https://medium.com/@GispoFinland/learn-spatial-sql-and-master-geopackage-with-qgis-3-16b1e17f0291>,
>>>>>> it's a simple matter of drag&drop -simple if you have just a few, i guess!
>>>>>> [1]), and can easily support my needs in the short term, but then i wonder:
>>>>>> How will i manage migration to PostGIS when i eventually put  this system
>>>>>> online with different users/ roles enabled?
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1] Given that i need to pull in some hundreds of .shp files that are
>>>>>> stored in a tree of many folders & subfolders, i also wonder: is there a
>>>>>> simple way that i can ask QGIS to traverse a certain directory, pull in all
>>>>>> the .shp files -each as its own .gpkg layer, i suppose?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Any advice about managing this migration would be much appreciated!
>>>>>>
>>>>>> _______________________________________________
>>>>>> Qgis-user mailing list
>>>>>> Qgis-user at lists.osgeo.org
>>>>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>>>>
>>>>>> _______________________________________________
>>>>>> Qgis-user mailing list
>>>>>> Qgis-user at lists.osgeo.org
>>>>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>>>
>>>>> _______________________________________________
>>>> Qgis-user mailing list
>>>> Qgis-user at lists.osgeo.org
>>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>
>>> _______________________________________________
>> Qgis-user mailing list
>> Qgis-user at lists.osgeo.org
>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>
>
>
> --
> Chris Hermansen · clhermansen "at" gmail "dot" com
>
> C'est ma façon de parler.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-user/attachments/20200812/9a532eb8/attachment-0001.html>


More information about the Qgis-user mailing list