<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<p>Hi Andreas,</p>
<p>Interesting.<br>
</p>
<p>Behind the scenes, GeoSeer one-way hashes the GetCapabilities
documents and that hash is used as the document key. Identical
GetCapabilities documents therefore get the same key and thus only
appear once in the final index. But one single character different
in the entire document and it's a completely different hash.<br>
</p>
<p>There's also de-duplication at the endpoint, service, and dataset
levels using a similar mechanism. GeoSeer also de-duplicates
across services. I.e. if something is served from the same place
as both WMS and WFS, we glue them together.<br>
</p>
<p>The problem with using DNS is that you get organisations the size
of NOAA/USGS and they have deployments across various subdomains
that are doing different (but similar) things. You also get a kind
of opposite - a single domain belonging to a geospatial "cloud"
hosting provider that has lots of layers that have the same names
and similar metadata because all their local-government customers
are sharing their own fire-stations/roads etc.<br>
</p>
<p>There are all manner of ways in which server admins and data
custodians make this more complicated than it seems. :-)</p>
<p>Cheers,</p>
<p>Jonathan<br>
</p>
<p><br>
</p>
<div class="moz-cite-prefix">On 2020-06-09 12:25, Andreas Neumann
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:26edaa47dfaefa694829932750873625@carto.net">
<p>Hi Jonathan,</p>
<p>Thanks for sharing this information. I don't know anything
better.</p>
<p>While looking at some services that I know personally, I also
found out that others services are listed twice, because a
machine might have a DNS alias. That is also something to
consider - perhaps sort out machines that have identical
GetCapabilities responses and just the DNS name varies.</p>
<p>I agree, the numbers probably wouldn't change significantly.</p>
<p>Thanks and greetings,</p>
<p>Andreas</p>
<p id="reply-intro">On 2020-06-09 13:14, Jonathan Moules wrote:</p>
<blockquote type="cite">
<div class="pre"><span>Hi Andreas,</span><br>
<span>Sure, happy to share.</span><br>
There's a little on the About page: <a
href="https://www.geoseer.net/about.php" target="_blank"
rel="noopener noreferrer" moz-do-not-send="true">https://www.geoseer.net/about.php</a>
and then scattered around blog posts (the ones with the
"GeoSeer" tag are probably best for that: <a
href="https://www.geoseer.net/blog/?t=GeoSeer"
target="_blank" rel="noopener noreferrer"
moz-do-not-send="true">https://www.geoseer.net/blog/?t=GeoSeer</a>
), but put simply - We scrape a lot of different sources and
metadata catalogs and get the services from them. Then we
request not only the GetCapabilities that was declared, but
also make educated guesses as to what else might be on the box
and request those too.<br>
<br>
It's not perfect, but to the best of my knowledge it's by far
the largest such index in the world, and more importantly,
it's *current*. Everything in there responded with a valid
GetCapabilities document with at least one meaningful named
dataset when it was last scraped within the last few weeks.<br>
<br>
<span>Pertaining to your given services, GeoSeer has:</span><br>
<a href="http://geoweb.so.ch/wms/sogis_natgef.wms?"
target="_blank" rel="noopener noreferrer"
moz-do-not-send="true">http://geoweb.so.ch/wms/sogis_natgef.wms?</a>
and a few others on that sub-domain, as well as some on the
subdomain: <a
href="http://www.sogis1.so.ch/cgi-bin/sogis/sogis_natgef.wms?"
target="_blank" rel="noopener noreferrer"
moz-do-not-send="true">http://www.sogis1.so.ch/cgi-bin/sogis/sogis_natgef.wms?</a>
- both are now defunct I see which is why they're not in the
database.<br>
<br>
<span>Thanks for the URL, I've added it for scraping.</span><br>
<br>
<blockquote type="cite">So I wonder how many other QGIS server
installations may not be in your database?</blockquote>
Alas that's a "unknown unknown"; there's no way to know (I
can't think of a way to find out anyway; suggestions welcome).
However the vast majority of the time when I come across a new
service manually (i.e. from following various mailing lists
like this), it turns out it's already in the index, so I think
it's reasonably comprehensive at this point.<br>
<br>
While missing servers may change the absolute number of QGIS
Installations, they're very unlikely to change the
proportions. For a sample-size this large I'd expect the
proportions to remain largely the same, certainly for
deployments.<br>
<br>
<span>Hope that's of interest and answers the question,</span><br>
<span>Cheers,</span><br>
Jonathan<br>
<br>
<br>
<span>On 2020-06-09 10:45, Andreas Neumann wrote:</span>
<blockquote type="cite"><br>
<span>Hi Jonathan,</span><br>
<br>
Can you share with us how you harvest your information on
available public OGC services? You probably have that
information published somewhere - so if you could point me
towards this URL, it would help.<br>
<br>
I noticed that all of the services of our province (my
employer) can't be found, as an example.<br>
<br>
<span>Here is the start point:</span><br>
<br>
<span><a
href="https://so.ch/verwaltung/bau-und-justizdepartement/amt-fuer-geoinformation/geoportal/geodienste/wms-web-map-service/"
target="_blank" rel="noopener noreferrer"
moz-do-not-send="true">https://so.ch/verwaltung/bau-und-justizdepartement/amt-fuer-geoinformation/geoportal/geodienste/wms-web-map-service/</a></span><br>
<br>
<span>and the GetCapabilities link:</span><br>
<br>
<span><a
href="https://geo.so.ch/api/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0"
target="_blank" rel="noopener noreferrer"
moz-do-not-send="true">https://geo.so.ch/api/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0</a></span><br>
<br>
So I wonder how many other QGIS server installations may not
be in your database? Of course I know you don't claim full
coverage, but it would still be good to know how you harvest
your data.<br>
<br>
<span>Thanks for clarifying and greetings,</span><br>
<br>
Andreas<br>
<br>
</blockquote>
</div>
</blockquote>
</blockquote>
</body>
</html>