[PROJ] Is PROJ unreasonably slow?

Fri Mar 13 02:38:31 PDT 2026

> don't know how you manage to write such long essays.

If we include your coding capabilities, I believe you are a far more productive
writer than I am :-)

I have once suggested that "productivity in geospatial programming" should be
measured in the unit "Rouault" (Rt). I estimated, that the entire world's
capacity of geospatial programming is on the order of 4 Rt, of which you by
definition, represent one. We, the rest of the world then split the
remaining 3 Rt, of which I, on a good day, may lurk in the low milli-Rouaults.

> Does my (non LLM generated) below summary reflect your findings

At least some of them. I agree that one should not expect speed miracles using
text based I/O. On the other hand, with JSON and XML based formats being so
important, text based I/O is important as well, even when not rooted in a shell
pipeline.

>From a communication viewpoint, text I/O in the shell command line is very
important, because it is an unambiguous way of describing the exact operation to
carry out, including direct inline sample results. I have probably at least a
hundred times described, to a colleague or external user, the proper incantation
for a given task by including two lines in in the style of:

$ echo 55 12 | proj -r +proj=utm +zone=32 +ellps=GRS80
  691875.63  6098907.83

But in these cases, obviously throughput is not a problem.

Regarding decompressing and caching, I am too unfamiliar with the details of
that part of PROJ to actually have an opinion worth considering, although I
could see a reason for using interleave-by-point, rather than by-band, to get
both channels of a datum-shift-grid at once. On the other hand, except for the
pathological case of randomly scattered points, the grid handling seems plenty
fast already, so probably not worth the effort.

My main architectural suggestion was to unify the data flow, removing all the
variations of pj_fwd and pj_inv, and all the ceremony in pj_prepare and
pj_finalize. This would probably lead to slightly more code, as each operator
should implement the functionality provided by prepare/finalize, but it would
only need to handle the individually relevant branches, rather than having the
centralized versions inferring what to do. This would, however, be quite
intrusive, but we could get rid of the code adapting each operator to the
pj_fwd/pj_fwd3d/pj_fwd4d multiplex.

In Rust Geodesy, the unified 4D data flow was what I started with. Later on, I
introduced the CoordinateSet trait, of which implementations are provided for
the simple "array-of-1d/2d/3d/4d-coordinate" cases, and which users can
implement for their own data structures (e.g. an OGC GeometryCollection), making
it possible to handle in-place transformations of coordinates embedded in more
general data types. Much like what is provided by proj_trans_generic, but more
general, and without the danger of needing to manually specify the offsets of
each coordinate dimension.

(Rust traits = "kind of like" Python Abstract Base Classes or C++ virtual base
classes)

Hence, while Rust Geodesy has a unified data flow, the physical width is
determined by the input data.

I'm quite sure a similar architecture would result in more clear code, but
retrofitting the architecture everywhere in PROJ would, as I wrote, be a
Herculean task. On the other hand: It may be possible to implement in smaller
steps. My main concern really just is, that we should not ignore the long term
maintenance of the "classical parts" of PROJ - especially not, if we can find a
way to do it in smaller steps. Rust Geodesy was an experiment wrt. architecture.
I think it has been quite successful in that respect, but would it be worth the
effort retrofitting a similar architecture onto PROJ?

> I wouldn't trust too much OS virtual memory mapping. Can easily lead to
> crashing your system (or making it unusable at the point your need to hard
> reboot) if you map too much memory vs your available RAM.

This is clearly the case for 32-bit systems. But read-only mapping of large
files "should" be a lazy process where pages are only loaded upon demand. And
since the access patterns tend to be localized, the demand may be meagre. But
the Rust Geodesy Unigrid is clearly an experiment, and I'm yet to flesh it out
fully. One thing I consider is to force a 16x16 blocking, meaning that even for
3 channel grids, a block will fit into a 4k memory page. Experience from our old
transformation system, trlib, shows that for typical usage patterns, 16x16 block
access is very efficient, since a typical task will only require one, or a few,
block accesses. But how this will work in the case of memory mapped input
remains to be seen - and will probably be highly OS-dependent, and probably
restricted to 64-bit systems.

But thanks for your warning - I will take it ad notam, and tread carefully!

/Thomas

Den ons. 11. mar. 2026 kl. 18.43 skrev Even Rouault
<even.rouault at spatialys.com>:
>
> Thomas,
>
> I don't know how you manage to write such long essays. Does my (non LLM
> generated) below summary reflect your findings:
>
> - Command line utilities are to be considered "toys" if you process a
> large amount of data and care about perf. Not that suprising, and I
> wouldn't expect anyone trying to use PROJ at its max perf capabilities
> to use them, but rather use the (binary) API. In GDAL, we use at some
> places the fast_float C++ header library for fast string->double :
> https://github.com/fastfloat/fast_float . They claim it can be up to
> about 4x faster than strtod(). We could also vendor in PROJ if needed.
>
> - Grid usage could be made much faster if we decompressed everything in
> memory. There's already a bit of caching of decompressed tiles. But
> random spatial access pattern will easily defeat it as it is quite
> modest currently. We could add an option to increase the cache to a
> specified size and/or increase the default size. I wouldn't trust too
> much OS virtual memory mapping. Can easily lead to crashing  your
> system  (or making it unusable at the point your need to hard reboot) if
> you map too much memory vs your available RAM.
>
> Even
>
> Le 11/03/2026 à 09:45, Thomas Knudsen a écrit :
> > TL;DR: Is PROJ unreasonably slow? The answer is "Mostly no", but with a few
> > caveats...
> >
> > DISCLAIMER: The evidence presented below is weak - timing experiments repeated
> > only 2-4 times, on just a single computer. But at first sight the speed tests
> > all hint at the same conclusion: That PROJ could be made significantly faster.
> >
> > Closer inspection, however, shows that, while in some corner cases, PROJ really
> > can be made faster, in the general cases, it already really is quite fast.
> >
> > But despite the weak evidence, and the limited speed gains expected, I allow
> > myself to present the material here, to frame some potential items for
> > discussion and/or action - because changes, that may improve speed, may also
> > improve maintainability. And while the latter is hard to quantify, the former is
> > readily quantifiable with a stop watch.
> >
> > Setting
> > -------
> >
> > I recently did a validation experiment, by reimplementing the PROJ based
> > canonical transformation from ITRF2014 to SWEREF99, the Swedish ETRS89
> > realization.
> >
> > The reimplementation is based on Rust geodesy (RG) [1], and the validation is
> > carried out by transforming a test set of 10 million randomly generated
> > coordinates. First using the RG coordinate processing program "kp" [2], then the
> > PROJ work horse "cs2cs" [3].
> >
> > But I did not get around to actually validate, except for a handful of
> > sufficiently convincing random checks. I got sidetracked by something more
> > concerning - namely that PROJ appeared to be unreasonably slow:
> >
> > While kp transformed the 10 million coordinates in a little less than 17
> > seconds, cs2cs needed more than 20 minutes - i.e. a factor of approximately 75
> > times slower.
> >
> > Now, the ITRF2014->SWEREF99 transformation is non-trivial [5] and includes grid
> > lookups in the 5th and 7th step of the pipeline (the grid was, by the way,
> > already fully downloaded with projsync). So I had a hunch, that the random
> > organization of the input data might be poison for PROJ's grid caching. And
> > correctly so: After sorting the 10 million input points from north to south, the
> > run time went from about 1200 seconds, to about 100 seconds.
> >
> > A respectable speed-up, although 6 times slower than RG. But as the points are
> > still randomly (dis)organized in the east-west direction, there may be even more
> > speed up possible for a more piecewise localized data set, such as the
> > coordinates of e.g. a typical OGC "simple feature" geometry collection.
> >
> > But before constructing a test data set with such characteristics, I figured, I
> > would take a closer look at some PROJ operators not depending on grid access, to
> > get a feeling for how much time goes with accessing the test coordinates, and
> > converting the external text format to the internal IEEE 754 floating point
> > format.
> >
> > For this, I had to change tool from cs2cs, to cct [4]. But the results were
> > still disappointing: Running the same 10 million coordinates through first a
> > no-operation (proj=noop), then a pipeline of 8 concatenated noops, gave these
> > results:
> >
> > NOOP
> >
> > Running kp first.
> >      One noop.     kp: 8.95 s, cct: 83 s,   (kp  9.27 times faster)
> >      Eight noops.  kp: 9.38 s, cct: 97 s.   (kp 10.34 times faster)
> > Running cct first.
> >      One noop.     kp: 9.56 s, cct: 83 s,   (kp  8.68 times faster)
> >      Eight noops.  kp: 9.65 s, cct: 92 s.   (kp  9.53 times faster)
> >
> > To see what it costs to do some real computation, I also tried projecting the
> > test coordinates to UTM zone 32. Here kp consistently ran at close to 13
> > seconds, while cct had 3 cases of close to 100 seconds, and an oddly speedy
> > outlier of just 70 seconds. I suspect I may have misread the clock there.
> >
> > UTM
> >
> > Running kp first.
> >      utm zone=32   kp: 13.54 s, cct: 70 s,   (kp  5.17 times faster)
> >      utm zone=32   kp: 12.37 s, cct: 96 s,   (kp  7.76 times faster)
> > Running cct first.
> >      utm zone=32   kp: 13.36 s, cct:  96 s,  (kp  7.18 times faster)
> >      utm zone=32   kp: 13.57 s, cct: 101 s,  (kp  7.43 times faster)
> >
> > But still, RG seems to be between 5 and 7 times faster than PROJ: Even when
> > comparing the worst kp time with the outlying cct time, kp is still more than 5
> > times faster than cct.
> >
> > Why is it so? Some potential reasons
> > ------------------------------------
> >
> > FIRST, RG does grid access by lumping all grids together in an indexed file, a
> > "unigrid", and accessing that file through memory mapping (the rationale, with
> > the punchline "Don't implement poorly, what the OS already provides
> > excellently!" is described in [6]).
> >
> > PROJ accesses files in chunks, and since, in PROJ, files are typically band
> > interleaved, PROJ needs 3 accesses far away from each other, to get all relevant
> > information for a single point value. Whereas RG uses point interleave, and
> > hence gets all relevant information from a single read operation.
> >
> > Also, PROJ uses compressed files. And in this specific case
> > (eur_nkg_nkg17rfvel.tiff), the file is just 303x313 grid nodes, each node
> > consisting of 3 32 bit values, hence 303x313x3x4 bytes = 1_138_068 bytes.
> >
> > But the compression is rather modest: The compressed file weighs 715_692 bytes,
> > i.e. a reduction of just 38%, but it prohibits direct access into the file.
> >
> > RG skips all this, accesses the file as if it is one long array, and leaves all
> > caching/swapping to the OS, which has a much better general view of the system
> > resources available, than any single running process.
> >
> > SECOND, PROJ handles single coordinates, while RG handles collections. Among
> > other things, this leads to a reduction in the number of function calls: PROJ
> > loops over the coordinates, and calls an operator on each coordinate, while RG
> > calls an operator, and let the operator loop over the coordinates. For the same
> > reason, PROJ needs to interpret a pipeline for each coordinate, while RG just
> > interprets the pipeline once for each collection of e.g. 100_000 coordinates.
> >
> > Now, interpreting a pipeline is not a heavy task: Essentially, it is just an
> > iterator over the steps of the pipeline. But it is a little piece of extra
> > ceremony, that needs to be set up for every single coordinate.
> >
> > This leads me on to the THIRD potential reason, namely that PROJ's internal data
> > flow is rather complex, carrying leftovers that made good sense back when PROJ
> > was simply a projection library, but which are mostly annoying today.
> >
> > When all operators were projections, it made good sense to centralize the
> > handling of e.g. the central meridian into the pj_fwd and pj_inv functions.
> > Today, it is to a large degree something that needs being worked around, when
> > the operator is not a projection, but another kind of geodetic operator [7].
> >
> > Also, originally PROJ was strictly 2D, so pj_fwd and pj_inv handles 2D data
> > only. When we had to extend it with both 3D and 4D variations, we also got
> > functional duplication and undesired messiness. This is likely one of the
> > reasons that PROJ's combined implementation of pipeline and stack functionality
> > weighs in at 725 lines, while RG, which has a unified data flow architecture,
> > provides (mostly) the same functionality in just 188 lines of code (in
> > both cases
> > including blank lines and comments).
> >
> > RG started its life as an experiment with simpler data flows in geodetic
> > software. I believe it has succeeded in this respect. But I cannot yet provide
> > conclusive evidence, that this difference between RG and PROJ, also results in
> > faster execution. It is worth checking, though, and worth considering whether it
> > is worth the effort to retrofit a similar data flow architecture into PROJ? It
> > would clearly be a herculean task.
> >
> > How to interpret the numbers above?
> > -----------------------------------
> >
> > First and foremost: As I stated up front, the evidence is weak, but it is also
> > unambiguous, and while being a far cry from being able to answer the question
> > whether PROJ is "unreasonably slow" conclusively, at least it indicates that
> > there are ways to making PROJ faster. Whether this will be worth the effort is
> > another discussion.
> >
> > That said, onto the interpretation.
> >
> > The input file is 406 MB, and I ran the tests twice: Once with PROJ running
> > first, once with RG running first. This should reveal whether disk caching made
> > a difference. It doesn't seem to, however.
> >
> > The full SWEREF transformation pipeline is evidently unreasonably slow, and
> > there is good evidence (the dramatic difference between sorted and random
> > input), that this is due to a grid access corner case. So PROJ is unreasonably
> > slow, when presented with unreasonable input data sets.
> >
> > Once the input is sorted, however, the PROJ timing clocks in at around 100 s, no
> > matter whether we do the full transformation, the 8 noops, or the single UTM
> > projection.
> >
> > So PROJ is very sensitive to the spatial ordering of input coordinate tuples. RG
> > not at all. Given the description above (band interleave vs. node/pixel
> > interleave, hand held caching vs. leaving it to the OS), this is probably not at
> > all surprising.
> >
> > But PROJ has the additional feature of being able to automagically downloading
> > missing grid files tile-wise, where RG is stuck with what the user has a priori
> > entered into the unigrid, or manually post-registered at run time.
> >
> > In the present test case, the download-on-demand feature is (hopefully) not
> > used, since the file is fully downloaded with projsync already. But might it
> > influence the overall grid access speed? I have not looked into that part of the
> > code recently, but I'm sure Even will spot it, if there are cheap gains to reap
> > here.
> >
> > The I/O effect
> > --------------
> >
> > Now, let's assume that the single-NOOP case mostly reflects the effort of
> > converting text based coordinates to the internal IEEE 754 binary format. I/O is
> > clearly a large part of the difference between kp, and the (cct, cs2cs) tuple:
> > "Anything" takes around 10 seconds for RG/kp, and "Anything" takes around 100
> > seconds for PROJ/(cct, cs2cs) - there is at least some evidence, that this is
> > because string-to-double (and v.v.) are surprisingly heavyweight operations.
> >
> > But cs2cs uses the platform native `strtod()` function for string-to-double,
> > while cct uses `proj_strtod()` [8], which a.o. allows underscores as
> > thousands-separators (42_000). Both routines appear equally slow, compared to
> > the Rust version used in kp.
> >
> > Apparently it just so happens, that the built in Rust converter is much faster
> > than typical C/C++ implementations. This may very well be the case: Rust's float
> > parsing was dramatically improved by Alexander Huszagh some years ago [9][10],
> > but whether this could account for a 10 times speed up, compared to C, is
> > unlikely.
> >
> > I do not trust myself to build a reliable C++ platform, for timing the "real
> > functionality only" (i.e. ignoring the i/o overhead). I would, however, be
> > willing to provide a Rust version for intercomparison, if anyone would take up
> > the C++ task.
> >
> > But fortunately PROJ chairman, Kristian Evers, upon reading an early version of
> > this text, reminded me, that the proj app supports binary I/O (and actually that
> > exact part of the PROJ source code was the target of my first contribution to
> > PROJ, way back in 1999. So shame on me for not thinking about this possibility).
> >
> > Running the utm-projection case through proj (the app), with binary input,
> > significantly speeds up things, making PROJ almost as fast as RG, although with
> > only half the size of input and output, since proj is strictly 2D.
> >
> > But switching to binary output as well, makes it even faster: With binary input
> > and binary output, proj projects 10 million input points in just 3 seconds, i.e.
> > 300 ns/point. This is roughly 4 times as fast as kp, although also with just
> > half the amount of input and output, and no numeric conversion.
> >
> > This indicates that the floating point-to-string output is an even heavier load
> > than the string-to-floating point input. This is perhaps not surprising,
> > although the widespread interest in optimizing the former is much more recent
> > for the latter.
> >
> > But taking a look at some published benchmarks is encouraging: David Tolnay's
> > Rust based shootout [11] indicates that the very recent (November 2025)
> > zmij-algorithm performs almost 8 times better than Rust's default floating
> > point-to-string implementation. Even wilder, when comparing with system-supplied
> > implementations: Victor Zverovich, the creator of the zmij algorithm, in his owm
> > benchmarks [12] measures a 100 times (not 100%, 100 times!) speed up compared to
> > the system provided ostringstream implementation, running on an Apple M1.
> >
> > Hence, we may expect the PROJ command line filters (proj, cct, cs2cs) to speed
> > up significantly, as system libraries mature and include faster floating
> > point-to-string-to-floating point operations... if that ever happens.
> >
> > Obviously, we could also decide to introduce dependencies on stand alone
> > implementations, such as zmij. It is, however, questionable whether it is worth
> > the effort: Back in the 1980's, when Gerald Evenden created PROJ (the system),
> > it was to a very large degree in order to use proj (the app) to handle
> > projections for his map plotting system, MAPGEN, where much of the work was
> > implemented as Unix shell pipelines, hence constantly doing floating point I/O.
> > I conjecture that this is also the reason for proj's binary I/O functionality:
> > It may have sped up things significantly.
> >
> > At that time of history, switching to some (not yet available) floating point
> > I/O algorithms would have made much sense, since so much work was done using
> > shell pipelines. Today, we can safely assume that in most cases, PROJ is used as
> > a linked library in a larger (GIS) system, and all inter-library communication
> > is binary.
> >
> > When PROJ is used from the command line, it is (probably) mostly by specialists,
> > testing hypotheses, or checking a few reference-system-defining benchmarks. And
> > handling even tens of thousands of input points will take insignificant amounts
> > of time on a reasonably modern computer.
> >
> > But I/O still takes some time: The recently launched "rewrite GDAL in Rust"
> > initiative, OxiGDAL [13] uses proj4rs [14], for its coordinate handling (proj4rs
> > is a Rust implementation of proj4js, which in turn is a JavaScript
> > reimplementation of PROJ.4). And OxiGDAL claims a handling time of 100
> > ns/coordinate tuple. Comparing this to the 300 ns from proj (the app) above
> > leads to the not-terribly-unreasonable conjecture that proj (the app) spends one
> > third of its time reading, one third on computing, and the last third on writing
> > the result.
> >
> > Hence, I would expect us to find, that the general functionality is comparable
> > in speed between RG and PROJ (and proj4rs), while there is probably some modest
> > gains to realize in PROJ's handling of grids. So to answer my initial question:
> > No - PROJ is not unreasonably slow at the library level, although it sure can be
> > sped up.
> >
> > But at the application level, there should be quite a bit of gains possible in
> > the floating point parsing. Whether we should or should not take on this task is
> > dubious: Although I wrote proj_strtod, I would not trust myself to doing a
> > reliable C++ port of Alexander Huszagh's work from Rust. But in the other end of
> > the I/O pipeline, the original version of the super fast zmij output algorithm
> > is already written in C++, under a MIT licence, and hence unproblematic to use
> > in the PROJ code base.
> >
> > But I would highly prefer to leave this kind of code to reside in system
> > libraries, not in an application library, like PROJ.
> >
> > Nevertheless: Hope y'all will consider this (much too) long writeup, and give it
> > a deep thought, whether rearchitecting PROJ, and to what extent, may be worth
> > the effort.
> >
> > /Thomas Knudsen
> >
> >
> > [1] Rust Geodesy: https://lib.rs/geodesy
> > https://github.com/busstoptaktik/geodesy
> > [2] kp: https://github.com/busstoptaktik/geodesy/blob/main/ruminations/003-rumination.md
> > [3] cs2cs: https://proj.org/en/stable/apps/cs2cs.html
> > [4] cct: https://proj.org/en/stable/apps/cct.html
> > [5] The ITRF2014->SWEREF99 transformation:
> >       $ projinfo -o proj --hide-ballpark -s itrf2014 -t sweref99
> >       +proj=pipeline
> >         +step +proj=axisswap +order=2,1
> >         +step +proj=unitconvert +xy_in=deg +xy_out=rad
> >         +step +proj=cart +ellps=GRS80
> >         +step +proj=helmert +x=0 +y=0 +z=0 +rx=0.001785 +ry=0.011151
> > +rz=-0.01617 +s=0
> >               +dx=0 +dy=0 +dz=0 +drx=8.5e-05 +dry=0.000531 +drz=-0.00077 +ds=0
> >               +t_epoch=2010 +convention=position_vector
> >         +step +inv +proj=deformation +t_epoch=2000 +grids=eur_nkg_nkgrf17vel.tif
> >               +ellps=GRS80
> >         +step +proj=helmert +x=0.03054 +y=0.04606 +z=-0.07944 +rx=0.00141958
> >               +ry=0.00015132 +rz=0.00150337 +s=0.003002
> > +convention=position_vector
> >         +step +proj=deformation +dt=-0.5 +grids=eur_nkg_nkgrf17vel.tif
> > +ellps=GRS80
> >         +step +inv +proj=cart +ellps=GRS80
> >         +step +proj=unitconvert +xy_in=rad +xy_out=deg
> >         +step +proj=axisswap +order=2,1
> > [6]  Rumination 012: Unigrids and the UG grid maintenance utility
> >       https://github.com/busstoptaktik/geodesy/blob/main/ruminations/012-rumination.md
> > [7]  Even Rouault om lam0:
> >       https://github.com/OSGeo/PROJ/pull/4667/changes#diff-bfb0c333155a0c8bf863b0a3e76df46cfddf646cd5f13d6313eb8a3cb123f5f1R58
> > [8]  proj_strtod():
> > https://github.com/OSGeo/PROJ/blob/master/src/apps/proj_strtod.cpp
> > [9]  Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm
> >       https://github.com/rust-lang/rust/pull/86761
> > [10] Implementing a Fast, Correct Float Parser
> >       https://internals.rust-lang.org/t/implementing-a-fast-correct-float-parser/14670
> > [11] David Tolnay's dtoa-benchmark: https://github.com/dtolnay/dtoa-benchmark
> > [12] Victor Zverovich's zmij algorithm: https://github.com/vitaut/zmij/
> > [13] OxiGDAL - Pure Rust Geospatial Data Abstraction Library:
> >       https://github.com/cool-japan/oxigdal
> > [14] proj4rs - Rust adaptation of PROJ.4: https://crates.io/crates/proj4rs
>
> --
> http://www.spatialys.com
> My software is free, but my time generally not.
>