[postgis-users] Function - Standard deviation ellipses

Francis Dupont Francis.X.Dupont at USherbrooke.ca
Wed Jun 25 10:50:35 PDT 2008


Hello!

I created a function that generate a standard deviation ellipse based on
crimestat function in chapter 4.  I thought the community might be interested
so this is why I post it!!  Maybe it's not the right place to post the
function, if so let me know where it should go.

Before executing the function, you need to insert your point data in the temp
table.  I only tried the function with Quebec Lambert projection.

Feel free (this is essential!) to contribute and optimized the function.

Regards,

Francis Dupont
francis.x.dupont at usherbrooke.ca


/***************

You will need that table to use the function generate_ellipse

CREATE TABLE tmp_data_ellipse
(
  x numeric NOT NULL,
  y numeric NOT NULL,
  id serial NOT NULL,
  CONSTRAINT pkid_tmp_data_ellipse PRIMARY KEY (id)
)
***************/

/*
Function: std_dist_ellipse

    Create a standard distance ellipse, with major and minor axes reflecting the
directional variation of the point pattern store in tmp_data_ellipse table.

Parameters:

    num_of_std :    Number of standard deviation to generate for each group of
point.
                    e.g.:  A value of 2 mean that an ellipse of distance of 1
standard
                    deviation and 2 standard deviation will be created for each
group.

    fix_point :     0 egual no action otherwise 1 mean that you want to fix the
problem
                    of point at the same coordinate.  If, for some reason, all
the points
                    are at the same coordinate the differences between the value
and the mean
                    will be 0.  This will give a division by zero error which
will be catch
                    by the exception handler.  To avoid this problem, the fix
point option
                    let you move the point that have the same coordinates.  Each
of those points
                    are move randomly on a circle of 1 meter radius.

Return:

    Recordset of the ellipses created with the corresponding standard deviation.

*/

CREATE OR REPLACE FUNCTION std_dist_ellipse(num_of_std smallint, fix_point
smallint)
  RETURNS SETOF record AS
$BODY$
DECLARE
    stats record;
    data record;
    sum_square_dif_avg_x numeric;   --SUM [(Xi - avg(X))^2]
    sum_square_dif_avg_y numeric;   --SUM [(Yi - avg(Y))^2]
    sum_dif_avg_x_y numeric;        --SUM [(Xi - avg(X) * (Yi - avg(Y)]
    sum_square_dif_avg_x_y numeric; --SUM [((Xi - avg(X) * (Yi - avg(Y))^2]
    C numeric;                      --Constant used (see standard ellipse
distance reference for more information)
    theta numeric;                  --Angle calculated in rad
    SDx numeric;                    --Standard deviation in X
    SDy numeric;                    --Standard deviation in Y
    SDx_sum_x_y_cos_sin_theta numeric;
    SDy_sum_x_y_sin_cos_theta numeric;
    stdev_rank smallint;            --Rank of the standard deviation being
calculated
    LastX numeric;                  --Value of the last X being verified (use in
fix_point option)
    LastY numeric;                  --Value of the last Y being verified (use in
fix_point option)
    r numeric;                      --radius of the circule (use in fix_point
option)
BEGIN

    /*
        For more information on the formula used in this function, take a look
at the following websites:

        Spatial Analysis Online:   
http://www.spatialanalysisonline.com/output/html/Directionalanalysisofpointdatasets.html
        CrimeStat (chapter 4):      http://www.icpsr.umich.edu/CRIMESTAT/


        A somewhat different form of directional analysis is used within many
packages to summarise the distribution
        of point sets around a mean centre. The average variation in the
distance of points around the mean can be viewed as
        a circle or set of circles at a set of standard distances (like standard
deviations in univariate statistics).
        In addition, separate variations in x- and y-coordinate values may be
used to generate a standard distance
        ellipse, with major and minor axes reflecting the directional variation
of the point pattern.
    */

    --Check if fix point option is activated
    IF (fix_point = 1) THEN

        --Reset value so there's no changed for the first point
        LastX := NULL;
        LastY := NULL;
        r     := 1;     --set a 1 meter radius circle

        FOR data IN SELECT id, x, y FROM tmp_data_ellipse ORDER BY x, y LOOP

            --Check if the last point have the same coordinate as the last one
            --Check null value to avoid calculation for the first point.
            IF ((LastX = data.x) AND (LastY = data.y)) AND ((LastX IS NOT NULL)
AND (LastY IS NOT NULL)) THEN

                theta := random() * 2 * pi();   --angle value in radian between
0 and 2pi

                --Move the point on a circle of r radius and update the point in
the temporary tmp_data_ellipse
                UPDATE tmp_data_ellipse
                SET x = data.x + ((cos(theta))*r),
                    y = data.y + ((sin(theta))*r)
                WHERE id = data.id;

            ELSE
                --Save the last point being processed
                LastX := data.x;
                LastY := data.y;
            END IF;

        END LOOP;


    END IF;

    --Calculate the average of the absolute values of X and Y, the X and Y value
of the centroid in the projection used and the number of points
    SELECT avg(abs(x)) as avg_x, avg(abs(y)) as avg_y, avg(x) as center_x,
avg(y) as center_y, COUNT(*) as n FROM tmp_data_ellipse INTO stats;


    --Reset all var values before calculation
    sum_square_dif_avg_x    := 0;
    sum_square_dif_avg_y    := 0;
    sum_dif_avg_x_y         := 0;
    sum_square_dif_avg_x_y  := 0;

    --Pre-calculate the values used in formula.  It is necessary to used the
absolute values in calculation to be in
    --a trigonometric circle otherwise you'll get the wrong angle when the angle
value calculated is beyond pi.
    --The negative values in X or Y depends on the cartographic projection you
are using.
    FOR data IN SELECT x, y FROM tmp_data_ellipse LOOP

        sum_square_dif_avg_x := sum_square_dif_avg_x + ((abs(data.x) -
stats.avg_x)^2);
        sum_square_dif_avg_y := sum_square_dif_avg_y + ((abs(data.y) -
stats.avg_y)^2);
        sum_dif_avg_x_y := sum_dif_avg_x_y + ((abs(data.x) - stats.avg_x) *
(abs(data.y) - stats.avg_y));
        sum_square_dif_avg_x_y := sum_square_dif_avg_x_y + (((abs(data.x) -
stats.avg_x) * (abs(data.y) - stats.avg_y))^2);

    END LOOP;

    --Calculate the square root of the C value (as in the formula)
    C := sqrt(((sum_square_dif_avg_x - sum_square_dif_avg_y)^2) + (4 *
sum_square_dif_avg_x_y));

    --Calulate the angle of the ellipse
    theta := atan((sum_square_dif_avg_x - sum_square_dif_avg_y + C) / (2 *
sum_dif_avg_x_y));

    --Reset values and calculate the standard deviation in X and Y for the
ellipse
    SDx_sum_x_y_cos_sin_theta := 0;
    SDy_sum_x_y_sin_cos_theta := 0;

    --Use the var center_x and center_y because they are in projection
coordinate values.
    FOR data IN SELECT x, y FROM tmp_data_ellipse LOOP

        SDx_sum_x_y_cos_sin_theta := SDx_sum_x_y_cos_sin_theta + ((((data.x -
stats.center_x) * cos(theta))- ((data.y - stats.center_y) * sin(theta)))^2);
        SDy_sum_x_y_sin_cos_theta := SDy_sum_x_y_sin_cos_theta + ((((data.x -
stats.center_x) * sin(theta))- ((data.y - stats.center_y) * cos(theta)))^2);

    END LOOP;

    --Note, again, that 2 is subtracted from the
    --number of points in both denominators to produce an unbiased estimate of
    --the standard deviational ellipse since there are two constants from which
the
    --distance along each axis is measured (mean of X, mean of Y) (Crimestat,
p.4.19).
    SDx := sqrt((2 * SDx_sum_x_y_cos_sin_theta) / (stats.n - 2));
    SDy := sqrt((2 * SDy_sum_x_y_sin_cos_theta) / (stats.n - 2));

    --Temporary insert for validating the result
    /*
    INSERT INTO tmp_data_ellipse(x, y)
    VALUES(sum_square_dif_avg_x, sum_square_dif_avg_y);
    INSERT INTO tmp_data_ellipse(x, y)
    VALUES(sum_dif_avg_x_y, sum_square_dif_avg_x_y);
    INSERT INTO tmp_data_ellipse(x, y)
    VALUES(C, C);
    INSERT INTO tmp_data_ellipse(x, y)
    VALUES(theta * 180 / pi(), theta);
    INSERT INTO tmp_data_ellipse(x, y)
    VALUES(SDx, SDy);
    */

    --Produce an ellipse for each number of standard deviation.  Start at
num_of_std (usually at 2 standard deviation) downto 1
    stdev_rank := num_of_std;
    WHILE stdev_rank >= 1 LOOP
        SELECT ellipse(stats.center_x, stats.center_y, (stdev_rank * SDx),
(stdev_rank * SDy), theta, 40) as the_geom, stdev_rank INTO data;
        RETURN NEXT data;
        stdev_rank := stdev_rank - 1;   --go to next rank
    END LOOP;

    RETURN;

EXCEPTION
    --Stop the process if a division by zero occured
    WHEN DIVISION_BY_ZERO THEN RAISE EXCEPTION 'error: Division by zero, check
the coordinates of your point dataset.';
END;
$BODY$
  LANGUAGE 'plpgsql' VOLATILE



More information about the postgis-users mailing list