Hamish hamish_b at yahoo.com
Sun Feb 27 04:18:57 PST 2011

Even wrote:
> If I understand well what you are trying to do (try to
> evaluate the relative "popularity" of formats to present it
> graphically ?),

Yes. For the osgeo liveDVD demo disc we're trying to put together
a screenshot for the GDAL overview page. Screenshots of CLI or
libs are not very interesting, so we're being creative.

here's the current one (ignore image scaling, that's a sphinx
bug on the adhoc server; imagine it at 60% size),

> I'm afraid the method used here will lead to lots of false
> results and a biased view of the "reality"
> For example,
> * http://www.google.com/search?q=GMT shows that GMT is mainly
>  Greenwich Mean Time, Giant Magellan Telescope, Generative
> Modeling Technologies, ...

yes I know, that's why I pulled that one out as an example of
search terms which would need to be adjusted.

for GMT I think I'll change the search term to `"GMT+grid"`
and see how that does.

> * when I try "Portable Network Graphics", I get 1070000,
> and not 998000 (I've the feeling that the results given by
> Google depend somehow of your IP address) .

yes, rather frustrating. I find it often tells me the answers
already known to me and not the other uses (and I'm doing a
search for things I don't know, not for what I do know). hard to
treat as unbiased results.. anyway I do it while logged out
of gmail if that helps. :)

e.g., if I do a search for "GRASS" while logged into gmail I
get "GRASS GIS" as the #2 result, after wikipedia. I could hope,
but I doubt random users see the same list. and a lot of hits

> And it will probably lead to far less hits than "PNG" which the 
> more popular term for it (but which is also Papua New
> Guinea...). PNG gives 1 010 000 000 hits !!!

For generic names I think we could add `"$NAME"+GIS` to the
search to remove some of the clutter.

> * "Shapefile" --> 2 570 000  but "ESRI Shapefile"
> only 183 000

right, more refinement needed there.

> * FIT and FITS have incredible high hits, while being quite
> obscure formats. 
> The reason for that popularity burst is that they are
> mainly English verbs... 

+GIS or +raster ?

> * KAK is not a very significant name (and none of the
> results in the first page of Google shows a link with the
> format) for the JP2KAK driver, which is itself 
> a GDAL-only codename for JPEG2000 with Kakadu library.

so perhaps reasonably accurate.
> * "PostgreSQL/PostGIS" --> 58 500, but "PostgreSQL"
> --> 9 650 000 and "PostGIS" --> 324 000 ...

probably reduce that one to just PostGIS.

> etc etc...

> What would be more interesting would be to install a spy in
> the GDAL lib which would connect to a web site and increase
> the hit count of the appropriate driver for each successful
> opening of a dataset :-)

users would love us for that.. :-)
(fwiw, see popcon package in debian/ubuntu)

> And I'm pretty sure you would see geotiff and shapefile appear
> in the top of the list.

yeah, the idea isn't to do an exhaustive scientific search of
what is the most popular formats, just to provide some weighting
to a font-sizing algorithm in a promo image.

I feel my own use of GDAL will be strongly skewed to a few
formats on the list that I use in my field, and not necessarily
what is most used by other users.

for shared editing of the search terms, I've thrown them up



