[Geoinquiets Barcelona] hackathon de dades

ALBERTO GONZALEZ PAJE ekonlab at gmail.com
Fri Mar 1 07:54:41 PST 2013


-------------------------------------------------------------------------
Geoindex JISC UK Web Domain Dataset (1996-2010)

The ~2.5 billion 200 OK responses in the JISC UK Web Domain Dataset (1996-2010) dataset have been scanned for geographic references - specifically postcodes. This set of postcode citations, found at particular URLs, crawled at particular times, forms an historical geoindex of the UK web. For more details about how the data was created, its format, and how to use it, see here.

The geoindex is composed of some 700,641,549 lines of TSV data, each asserting that a given web page, crawled at a given data, contained one or more references to a given postcode. Uncompressed, this is a total of 61 GB of text, and so care should be taken before downloading or attempting to use this data set.

The data is not hosted on GitHub, as it is far too large. It can be downloaded from here in a compressed format (total download size, about 8GB).

http://data.webarchive.org.uk/opendata/ukwa.ds.2/geo/

--------------------------------------------------------------------------

….. tiene una pinta estupenda….. contad conmigo si quedáis para hacer algo!!

Saludos
Alberto

El 01/03/2013, a las 16:39, Raf Roset <rafroset at gmail.com> escribió:

> Després de la xerrada de dijous amb Javier i Eric vam llençar un tuit que ha tingut resposta:
> https://twitter.com/anjacks0n/status/307498166608089088
> 
> Ens ho mirem i provem de fer alguna cosa?
> 
> Raf

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/barcelona/attachments/20130301/f01baf6d/attachment.html>


More information about the Barcelona mailing list