[Gdal-dev] Content length field mismatch in shapefiles

Roger Bivand Roger.Bivand at nhh.no
Sat Apr 29 14:23:30 EDT 2006


I have a question, about shapefiles - specifically Geolytics seem to
provide US subscribers with shapefiles with the content length of the
*.shx 6 decimal above the *.shp content length, and 4 decimal above what
it should be (after checking by creating a new *.shp and *.shx in
shapelib), which throws shapelib (usually on the final geometry). This is
generalising from a small sample, but the user who contacted me reported
needing to use special treatment on the Geolytics files found at his
university that he tried to read using shapelib-based software.

Has anyone ever heard of this? The files will read into ArcGIS, and in R
the shapefiles package, read.shp() and read.shx() only use native R binary
reads can read them sequentially, because they don't try to do random
access on the *.shp. ArcGIS seems to spend more time than usual for files
of that complexity, but gets round the problem, v.in.ogr in GRASS says
that no geometry is available for one DBF record, but processes all but
the last geometry.

The Geolytics problem seems to be that the length values in the *.shx file
don't agree with the *.shp. ESRI say "The content length stored in the
index record is the same as the value stored in the main file record
header", but for a sample file:

> library(shapefiles)
> geolytics <- read.shp("jw_wacounty.shp") 
> geolytics_content.length <- sapply(geolytics$shp, function(x) 
   x$content.length)
> geolytics_content.length
 [1]  382  542  726 3574  750  846  398  806 1550 1438  878  646  902  590  960
[16]  710 2190  534 2374  582  982  854  438 2446  750  158 2390  414 1422  430
[31]  998  342 1782 1094  254  574 1096 1182 1558
> geoshx <- read.shx("jw_wacounty.shx")
> geoshx$index[,2]
 [1]  388  548  732 3580  756  852  404  812 1556 1444  884  652  908  596  966
[16]  716 2196  540 2380  588  988  860  444 2452  756  164 2396  420 1428  436
[31] 1004  348 1788 1100  260  580 1102 1188 1564

I was sent the sample file by a user unable to read it into R using the 
shapelib-based packages, but because it is Geolytics, I can't post it. I 
can ask for permission to email a copy.

Roger

-- 
Roger Bivand
Economic Geography Section, Department of Economics, Norwegian School of
Economics and Business Administration, Helleveien 30, N-5045 Bergen,
Norway. voice: +47 55 95 93 55; fax +47 55 95 95 43
e-mail: Roger.Bivand at nhh.no






More information about the Gdal-dev mailing list