[PROJ] Is PROJ unreasonably slow?

Wed Mar 11 10:43:04 PDT 2026

Thomas,

I don't know how you manage to write such long essays. Does my (non LLM 
generated) below summary reflect your findings:

- Command line utilities are to be considered "toys" if you process a 
large amount of data and care about perf. Not that suprising, and I 
wouldn't expect anyone trying to use PROJ at its max perf capabilities 
to use them, but rather use the (binary) API. In GDAL, we use at some 
places the fast_float C++ header library for fast string->double : 
https://github.com/fastfloat/fast_float . They claim it can be up to 
about 4x faster than strtod(). We could also vendor in PROJ if needed.

- Grid usage could be made much faster if we decompressed everything in 
memory. There's already a bit of caching of decompressed tiles. But 
random spatial access pattern will easily defeat it as it is quite 
modest currently. We could add an option to increase the cache to a 
specified size and/or increase the default size. I wouldn't trust too 
much OS virtual memory mapping. Can easily lead to crashing  your 
system  (or making it unusable at the point your need to hard reboot) if 
you map too much memory vs your available RAM.

Even

Le 11/03/2026 à 09:45, Thomas Knudsen a écrit :
> TL;DR: Is PROJ unreasonably slow? The answer is "Mostly no", but with a few
> caveats...
>
> DISCLAIMER: The evidence presented below is weak - timing experiments repeated
> only 2-4 times, on just a single computer. But at first sight the speed tests
> all hint at the same conclusion: That PROJ could be made significantly faster.
>
> Closer inspection, however, shows that, while in some corner cases, PROJ really
> can be made faster, in the general cases, it already really is quite fast.
>
> But despite the weak evidence, and the limited speed gains expected, I allow
> myself to present the material here, to frame some potential items for
> discussion and/or action - because changes, that may improve speed, may also
> improve maintainability. And while the latter is hard to quantify, the former is
> readily quantifiable with a stop watch.
>
> Setting
> -------
>
> I recently did a validation experiment, by reimplementing the PROJ based
> canonical transformation from ITRF2014 to SWEREF99, the Swedish ETRS89
> realization.
>
> The reimplementation is based on Rust geodesy (RG) [1], and the validation is
> carried out by transforming a test set of 10 million randomly generated
> coordinates. First using the RG coordinate processing program "kp" [2], then the
> PROJ work horse "cs2cs" [3].
>
> But I did not get around to actually validate, except for a handful of
> sufficiently convincing random checks. I got sidetracked by something more
> concerning - namely that PROJ appeared to be unreasonably slow:
>
> While kp transformed the 10 million coordinates in a little less than 17
> seconds, cs2cs needed more than 20 minutes - i.e. a factor of approximately 75
> times slower.
>
> Now, the ITRF2014->SWEREF99 transformation is non-trivial [5] and includes grid
> lookups in the 5th and 7th step of the pipeline (the grid was, by the way,
> already fully downloaded with projsync). So I had a hunch, that the random
> organization of the input data might be poison for PROJ's grid caching. And
> correctly so: After sorting the 10 million input points from north to south, the
> run time went from about 1200 seconds, to about 100 seconds.
>
> A respectable speed-up, although 6 times slower than RG. But as the points are
> still randomly (dis)organized in the east-west direction, there may be even more
> speed up possible for a more piecewise localized data set, such as the
> coordinates of e.g. a typical OGC "simple feature" geometry collection.
>
> But before constructing a test data set with such characteristics, I figured, I
> would take a closer look at some PROJ operators not depending on grid access, to
> get a feeling for how much time goes with accessing the test coordinates, and
> converting the external text format to the internal IEEE 754 floating point
> format.
>
> For this, I had to change tool from cs2cs, to cct [4]. But the results were
> still disappointing: Running the same 10 million coordinates through first a
> no-operation (proj=noop), then a pipeline of 8 concatenated noops, gave these
> results:
>
> NOOP
>
> Running kp first.
>      One noop.     kp: 8.95 s, cct: 83 s,   (kp  9.27 times faster)
>      Eight noops.  kp: 9.38 s, cct: 97 s.   (kp 10.34 times faster)
> Running cct first.
>      One noop.     kp: 9.56 s, cct: 83 s,   (kp  8.68 times faster)
>      Eight noops.  kp: 9.65 s, cct: 92 s.   (kp  9.53 times faster)
>
> To see what it costs to do some real computation, I also tried projecting the
> test coordinates to UTM zone 32. Here kp consistently ran at close to 13
> seconds, while cct had 3 cases of close to 100 seconds, and an oddly speedy
> outlier of just 70 seconds. I suspect I may have misread the clock there.
>
> UTM
>
> Running kp first.
>      utm zone=32   kp: 13.54 s, cct: 70 s,   (kp  5.17 times faster)
>      utm zone=32   kp: 12.37 s, cct: 96 s,   (kp  7.76 times faster)
> Running cct first.
>      utm zone=32   kp: 13.36 s, cct:  96 s,  (kp  7.18 times faster)
>      utm zone=32   kp: 13.57 s, cct: 101 s,  (kp  7.43 times faster)
>
> But still, RG seems to be between 5 and 7 times faster than PROJ: Even when
> comparing the worst kp time with the outlying cct time, kp is still more than 5
> times faster than cct.
>
> Why is it so? Some potential reasons
> ------------------------------------
>
> FIRST, RG does grid access by lumping all grids together in an indexed file, a
> "unigrid", and accessing that file through memory mapping (the rationale, with
> the punchline "Don't implement poorly, what the OS already provides
> excellently!" is described in [6]).
>
> PROJ accesses files in chunks, and since, in PROJ, files are typically band
> interleaved, PROJ needs 3 accesses far away from each other, to get all relevant
> information for a single point value. Whereas RG uses point interleave, and
> hence gets all relevant information from a single read operation.
>
> Also, PROJ uses compressed files. And in this specific case
> (eur_nkg_nkg17rfvel.tiff), the file is just 303x313 grid nodes, each node
> consisting of 3 32 bit values, hence 303x313x3x4 bytes = 1_138_068 bytes.
>
> But the compression is rather modest: The compressed file weighs 715_692 bytes,
> i.e. a reduction of just 38%, but it prohibits direct access into the file.
>
> RG skips all this, accesses the file as if it is one long array, and leaves all
> caching/swapping to the OS, which has a much better general view of the system
> resources available, than any single running process.
>
> SECOND, PROJ handles single coordinates, while RG handles collections. Among
> other things, this leads to a reduction in the number of function calls: PROJ
> loops over the coordinates, and calls an operator on each coordinate, while RG
> calls an operator, and let the operator loop over the coordinates. For the same
> reason, PROJ needs to interpret a pipeline for each coordinate, while RG just
> interprets the pipeline once for each collection of e.g. 100_000 coordinates.
>
> Now, interpreting a pipeline is not a heavy task: Essentially, it is just an
> iterator over the steps of the pipeline. But it is a little piece of extra
> ceremony, that needs to be set up for every single coordinate.
>
> This leads me on to the THIRD potential reason, namely that PROJ's internal data
> flow is rather complex, carrying leftovers that made good sense back when PROJ
> was simply a projection library, but which are mostly annoying today.
>
> When all operators were projections, it made good sense to centralize the
> handling of e.g. the central meridian into the pj_fwd and pj_inv functions.
> Today, it is to a large degree something that needs being worked around, when
> the operator is not a projection, but another kind of geodetic operator [7].
>
> Also, originally PROJ was strictly 2D, so pj_fwd and pj_inv handles 2D data
> only. When we had to extend it with both 3D and 4D variations, we also got
> functional duplication and undesired messiness. This is likely one of the
> reasons that PROJ's combined implementation of pipeline and stack functionality
> weighs in at 725 lines, while RG, which has a unified data flow architecture,
> provides (mostly) the same functionality in just 188 lines of code (in
> both cases
> including blank lines and comments).
>
> RG started its life as an experiment with simpler data flows in geodetic
> software. I believe it has succeeded in this respect. But I cannot yet provide
> conclusive evidence, that this difference between RG and PROJ, also results in
> faster execution. It is worth checking, though, and worth considering whether it
> is worth the effort to retrofit a similar data flow architecture into PROJ? It
> would clearly be a herculean task.
>
> How to interpret the numbers above?
> -----------------------------------
>
> First and foremost: As I stated up front, the evidence is weak, but it is also
> unambiguous, and while being a far cry from being able to answer the question
> whether PROJ is "unreasonably slow" conclusively, at least it indicates that
> there are ways to making PROJ faster. Whether this will be worth the effort is
> another discussion.
>
> That said, onto the interpretation.
>
> The input file is 406 MB, and I ran the tests twice: Once with PROJ running
> first, once with RG running first. This should reveal whether disk caching made
> a difference. It doesn't seem to, however.
>
> The full SWEREF transformation pipeline is evidently unreasonably slow, and
> there is good evidence (the dramatic difference between sorted and random
> input), that this is due to a grid access corner case. So PROJ is unreasonably
> slow, when presented with unreasonable input data sets.
>
> Once the input is sorted, however, the PROJ timing clocks in at around 100 s, no
> matter whether we do the full transformation, the 8 noops, or the single UTM
> projection.
>
> So PROJ is very sensitive to the spatial ordering of input coordinate tuples. RG
> not at all. Given the description above (band interleave vs. node/pixel
> interleave, hand held caching vs. leaving it to the OS), this is probably not at
> all surprising.
>
> But PROJ has the additional feature of being able to automagically downloading
> missing grid files tile-wise, where RG is stuck with what the user has a priori
> entered into the unigrid, or manually post-registered at run time.
>
> In the present test case, the download-on-demand feature is (hopefully) not
> used, since the file is fully downloaded with projsync already. But might it
> influence the overall grid access speed? I have not looked into that part of the
> code recently, but I'm sure Even will spot it, if there are cheap gains to reap
> here.
>
> The I/O effect
> --------------
>
> Now, let's assume that the single-NOOP case mostly reflects the effort of
> converting text based coordinates to the internal IEEE 754 binary format. I/O is
> clearly a large part of the difference between kp, and the (cct, cs2cs) tuple:
> "Anything" takes around 10 seconds for RG/kp, and "Anything" takes around 100
> seconds for PROJ/(cct, cs2cs) - there is at least some evidence, that this is
> because string-to-double (and v.v.) are surprisingly heavyweight operations.
>
> But cs2cs uses the platform native `strtod()` function for string-to-double,
> while cct uses `proj_strtod()` [8], which a.o. allows underscores as
> thousands-separators (42_000). Both routines appear equally slow, compared to
> the Rust version used in kp.
>
> Apparently it just so happens, that the built in Rust converter is much faster
> than typical C/C++ implementations. This may very well be the case: Rust's float
> parsing was dramatically improved by Alexander Huszagh some years ago [9][10],
> but whether this could account for a 10 times speed up, compared to C, is
> unlikely.
>
> I do not trust myself to build a reliable C++ platform, for timing the "real
> functionality only" (i.e. ignoring the i/o overhead). I would, however, be
> willing to provide a Rust version for intercomparison, if anyone would take up
> the C++ task.
>
> But fortunately PROJ chairman, Kristian Evers, upon reading an early version of
> this text, reminded me, that the proj app supports binary I/O (and actually that
> exact part of the PROJ source code was the target of my first contribution to
> PROJ, way back in 1999. So shame on me for not thinking about this possibility).
>
> Running the utm-projection case through proj (the app), with binary input,
> significantly speeds up things, making PROJ almost as fast as RG, although with
> only half the size of input and output, since proj is strictly 2D.
>
> But switching to binary output as well, makes it even faster: With binary input
> and binary output, proj projects 10 million input points in just 3 seconds, i.e.
> 300 ns/point. This is roughly 4 times as fast as kp, although also with just
> half the amount of input and output, and no numeric conversion.
>
> This indicates that the floating point-to-string output is an even heavier load
> than the string-to-floating point input. This is perhaps not surprising,
> although the widespread interest in optimizing the former is much more recent
> for the latter.
>
> But taking a look at some published benchmarks is encouraging: David Tolnay's
> Rust based shootout [11] indicates that the very recent (November 2025)
> zmij-algorithm performs almost 8 times better than Rust's default floating
> point-to-string implementation. Even wilder, when comparing with system-supplied
> implementations: Victor Zverovich, the creator of the zmij algorithm, in his owm
> benchmarks [12] measures a 100 times (not 100%, 100 times!) speed up compared to
> the system provided ostringstream implementation, running on an Apple M1.
>
> Hence, we may expect the PROJ command line filters (proj, cct, cs2cs) to speed
> up significantly, as system libraries mature and include faster floating
> point-to-string-to-floating point operations... if that ever happens.
>
> Obviously, we could also decide to introduce dependencies on stand alone
> implementations, such as zmij. It is, however, questionable whether it is worth
> the effort: Back in the 1980's, when Gerald Evenden created PROJ (the system),
> it was to a very large degree in order to use proj (the app) to handle
> projections for his map plotting system, MAPGEN, where much of the work was
> implemented as Unix shell pipelines, hence constantly doing floating point I/O.
> I conjecture that this is also the reason for proj's binary I/O functionality:
> It may have sped up things significantly.
>
> At that time of history, switching to some (not yet available) floating point
> I/O algorithms would have made much sense, since so much work was done using
> shell pipelines. Today, we can safely assume that in most cases, PROJ is used as
> a linked library in a larger (GIS) system, and all inter-library communication
> is binary.
>
> When PROJ is used from the command line, it is (probably) mostly by specialists,
> testing hypotheses, or checking a few reference-system-defining benchmarks. And
> handling even tens of thousands of input points will take insignificant amounts
> of time on a reasonably modern computer.
>
> But I/O still takes some time: The recently launched "rewrite GDAL in Rust"
> initiative, OxiGDAL [13] uses proj4rs [14], for its coordinate handling (proj4rs
> is a Rust implementation of proj4js, which in turn is a JavaScript
> reimplementation of PROJ.4). And OxiGDAL claims a handling time of 100
> ns/coordinate tuple. Comparing this to the 300 ns from proj (the app) above
> leads to the not-terribly-unreasonable conjecture that proj (the app) spends one
> third of its time reading, one third on computing, and the last third on writing
> the result.
>
> Hence, I would expect us to find, that the general functionality is comparable
> in speed between RG and PROJ (and proj4rs), while there is probably some modest
> gains to realize in PROJ's handling of grids. So to answer my initial question:
> No - PROJ is not unreasonably slow at the library level, although it sure can be
> sped up.
>
> But at the application level, there should be quite a bit of gains possible in
> the floating point parsing. Whether we should or should not take on this task is
> dubious: Although I wrote proj_strtod, I would not trust myself to doing a
> reliable C++ port of Alexander Huszagh's work from Rust. But in the other end of
> the I/O pipeline, the original version of the super fast zmij output algorithm
> is already written in C++, under a MIT licence, and hence unproblematic to use
> in the PROJ code base.
>
> But I would highly prefer to leave this kind of code to reside in system
> libraries, not in an application library, like PROJ.
>
> Nevertheless: Hope y'all will consider this (much too) long writeup, and give it
> a deep thought, whether rearchitecting PROJ, and to what extent, may be worth
> the effort.
>
> /Thomas Knudsen
>
>
> [1] Rust Geodesy: https://lib.rs/geodesy
> https://github.com/busstoptaktik/geodesy
> [2] kp: https://github.com/busstoptaktik/geodesy/blob/main/ruminations/003-rumination.md
> [3] cs2cs: https://proj.org/en/stable/apps/cs2cs.html
> [4] cct: https://proj.org/en/stable/apps/cct.html
> [5] The ITRF2014->SWEREF99 transformation:
>       $ projinfo -o proj --hide-ballpark -s itrf2014 -t sweref99
>       +proj=pipeline
>         +step +proj=axisswap +order=2,1
>         +step +proj=unitconvert +xy_in=deg +xy_out=rad
>         +step +proj=cart +ellps=GRS80
>         +step +proj=helmert +x=0 +y=0 +z=0 +rx=0.001785 +ry=0.011151
> +rz=-0.01617 +s=0
>               +dx=0 +dy=0 +dz=0 +drx=8.5e-05 +dry=0.000531 +drz=-0.00077 +ds=0
>               +t_epoch=2010 +convention=position_vector
>         +step +inv +proj=deformation +t_epoch=2000 +grids=eur_nkg_nkgrf17vel.tif
>               +ellps=GRS80
>         +step +proj=helmert +x=0.03054 +y=0.04606 +z=-0.07944 +rx=0.00141958
>               +ry=0.00015132 +rz=0.00150337 +s=0.003002
> +convention=position_vector
>         +step +proj=deformation +dt=-0.5 +grids=eur_nkg_nkgrf17vel.tif
> +ellps=GRS80
>         +step +inv +proj=cart +ellps=GRS80
>         +step +proj=unitconvert +xy_in=rad +xy_out=deg
>         +step +proj=axisswap +order=2,1
> [6]  Rumination 012: Unigrids and the UG grid maintenance utility
>       https://github.com/busstoptaktik/geodesy/blob/main/ruminations/012-rumination.md
> [7]  Even Rouault om lam0:
>       https://github.com/OSGeo/PROJ/pull/4667/changes#diff-bfb0c333155a0c8bf863b0a3e76df46cfddf646cd5f13d6313eb8a3cb123f5f1R58
> [8]  proj_strtod():
> https://github.com/OSGeo/PROJ/blob/master/src/apps/proj_strtod.cpp
> [9]  Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm
>       https://github.com/rust-lang/rust/pull/86761
> [10] Implementing a Fast, Correct Float Parser
>       https://internals.rust-lang.org/t/implementing-a-fast-correct-float-parser/14670
> [11] David Tolnay's dtoa-benchmark: https://github.com/dtolnay/dtoa-benchmark
> [12] Victor Zverovich's zmij algorithm: https://github.com/vitaut/zmij/
> [13] OxiGDAL - Pure Rust Geospatial Data Abstraction Library:
>       https://github.com/cool-japan/oxigdal
> [14] proj4rs - Rust adaptation of PROJ.4: https://crates.io/crates/proj4rs

-- 
http://www.spatialys.com
My software is free, but my time generally not.