[pgrouting-dev] Time dependent data and input format

Fri Jul 29 13:30:37 EDT 2011

On Fri, Jul 29, 2011 at 10:50 PM, Stephen Woodbridge <
woodbri at swoodbridge.com> wrote:

> On 7/29/2011 1:03 PM, Jay Mahadeokar wrote:
>
>>
>>
>> On Fri, Jul 29, 2011 at 9:54 PM, Stephen Woodbridge
>> <woodbri at swoodbridge.com <mailto:woodbri at swoodbridge.**com<woodbri at swoodbridge.com>>>
>> wrote:
>>
>>    On 7/29/2011 11:49 AM, Jay Mahadeokar wrote:
>>
>>        Hi Steve,
>>
>>        On Fri, Jul 29, 2011 at 6:26 PM, Stephen Woodbridge
>>        <woodbri at swoodbridge.com <mailto:woodbri at swoodbridge.**com<woodbri at swoodbridge.com>
>> >
>>        <mailto:woodbri at swoodbridge. com
>>        <mailto:woodbri at swoodbridge.**com <woodbri at swoodbridge.com>>>>
>> wrote:
>>
>>            Jay,
>>
>>            This make sense. What I'm trying to understand and get to is
>>        how do
>>            we make this easy to setup and use. It seems like we are
>> missing
>>            something somewhere. For example, the cyclical range of a
>> single
>>            entry can be different from other entries, think week days
>>        versus
>>            weekends. It should be possible to have some cyclic entries
>>        and some
>>            not cyclic. Based on this it seems that the cyclic flag
>>        needs to be
>>            set on each record along with the period of the cycle. Would
>>        you agree?
>>
>>
>>        The main motivation behind all this brain storming is to keep
>>        the core
>>        algorithm and the postgres c function implementation,
>>        independent of the
>>        nature of data, right? So, we need to somehow abstract the data
>>        dependent properties from the core functionality.
>>
>>        That was the reason why we came up with the plsql wrapper functions
>>        idea. For optimising again, we need to communicate the cyclic
>>        properties
>>        to core algorithm, but the interface should be independent of
>>        nature of
>>        data.
>>
>>            If this is the case, then a quick scan of the data would
>>        tell us if
>>            there are cyclic entries, in which case a simple function to
>>        scan
>>            the data needed and see if any entry is cyclic could return
>> that
>>            flag. But if every entry is marked as cyclic or not then you
>>        might
>>            not need that flag at all.
>>
>>
>>        What I feel is: if we scan the data and keep the logic of
>>        determining
>>        the cyclic nature within the postgres C function, then the above
>> aim
>>        will somehow get lost and it will again become a data format
>>        specific
>>        function.
>>
>>        So, I think it will be better if the application developers deal
>>        with
>>        the data themselves so that they can achieve the optimisation.
>>        Example -
>>        for weekday, weekend, the plsql function will provide specific
>>        values
>>        for each day and mark data as non cyclic if there is transition
>> from
>>        weekday to weekend and vice versa. (A journey more than few days is
>>        highly unlikely, so anyways it wont repeat the data too much,
>>        though it
>>        will not be optimal)
>>
>>        If the journey is within weekday or within weekend, then we can
>> mark
>>        data as cyclic, and just retrieve 24 hour data.
>>
>>        This is specific to example data you specified. I believe such
>>        improvisations will be applicable for most data formats. Thoughts?
>>
>>        Again, this is not the most optimal implementation, but still it
>>        keeps
>>        the core implementation independent of data format. Any other
>> ideas?
>>
>>
>>    Jay,
>>
>>    I think you are correct in this. This will require some real world
>>    examples of data and how to work with it. The abstraction makes
>>    sense but it is hard to map it to reality hence our discussions.
>>
>>    So if we propose a hypothetical example like:
>>
>>    1. roadway which it one way between 7-10am north bound on mon-fri
>>    2. same roadway is one way between 4-7pm south bound on mon-fri
>>    3. same roadway is two otherwise
>>
>>    A separate type of rule might be:
>>
>>    4. there was an accident at 11am causing reduced speeds for the next
>>    30 mins
>>
>>    we could define a table like:
>>
>>    time_dependency
>>    rule_id:seq:day_of_week:start_ time:end_time:from_cost:to_
>>    cost:expiry_date
>>    1:1:2,3,4,5,6:0700:1000:30:-1:
>>    1:2:2,3,4,5,6:1600:1900:-1:30:
>>    1:3:1,7:0000:2400:45:45:
>>    2:1:0:1100:1130:20:20:20110729
>>
>>    I hope that is clear.
>>    sun-sat === 1-7 for day_of_week
>>    cost -1 means you can travel that direction
>>
>>    then in the edges table, you can link the edge to time_dependency
>>    table via the rule_id
>>
>>    This is not perfect because, from_cost and to_cost imply a direction
>>    of digitization of the segment which might be reversed on some
>>    segments. You might have multiple rules associated with a single
>>    segment. I'm thinking of a "rule" as a single description of some
>>    conditions that might get reused by multiple edges in the graph. So
>>    we might want a linking table like:
>>
>>    edge_id:rule_id
>>
>>    that will support multiple rules to be associated.
>>
>>    Then the stored procedure could be written to query these tables
>>    based on start time, elapsed time from start, and edge_id that could
>>    return the cost of a single edge. Or you would need to write a
>>    stored procedure to transform these tables into whatever structure
>>    you need for the routing.
>>
>>
>> Hi Steve,
>>
>> Thanks for coming up with the example. It fits well in order to describe
>> our current plan. So, as proposed we will assume our data is in format
>> (instead of creating ids and seq separately im keeping gid as the id
>> like in ways table):
>>
>> gid: day_of_week: start_time: end_time: from_cost: to_cost: expiry_date
>>
>> Now, the application developer dealing with the data will write plsql
>> function which will query the above table and return set of rows:
>>
>> gid, start_time, end_time, travel_time  as expected by the tdsp query.
>>
>> Now, if the data format is such that there are rules associated to
>> specific edges, and there are linking tables etc as you mentioned, then
>> it would be the responsibility of the application developer to come up
>> with join queries etc that will be needed in the plsql wrapper function.
>> We will just expect data in the above format. And we can also keep track
>> of cyclic nature as discussed earlier.  There can be many other formats
>> and many other linking table possibilities, but we should not be
>> responsible for those. That was the reason of deciding the input data
>> format during the pre-coding period.
>>
>> I hope I am understanding your point and answering relevantly.  Thanks
>> again for coming up with the users perspective. This is really helping
>> me to understand the real-world scenario which I had not thought about
>> particularly. Any other comments?
>>
>
> I think you have it now and I understand it also. The application developer
> can have whatever structure they need to represent the data they have and
> link it to the edges, and then they have to write a procedure that will
> transform that into the record set you defined above that the engine needs.
>
> In the documentation, you can copy my example and/or simplify it a little.
> It is fine to discuss points that they need to consider like direction of
> digitization or how to handle transient one way streets, because these are
> things that people might need to deal with and it gives them an idea of how
> to deal with other data problems that they might be confronted with.
>
>
Sure! I will do that in last two weeks I think. If you can list such points
that we should document, I can come up with details of mapping each scenario
to the required format.

> It would be nice if we could get some real time dependent data and work
> through building a sample application the uses it. Maybe we will have time
> after GSoC to do that.
>

I have been trying to get real world time dependent data since a long time
now. Still I have not found a good data source. It would be great if we can
acquire that, and yes, we can do that after GSoc too!

>
> Thanks,
>  -Steve
>
>     I just trying to put this into a concrete example so we can work
>>    through how this works and to help understand how we need to
>>    document it so people can use it in a real world example.
>>
>>    Thoughts?
>>
>>    -Steve
>>
>>            May be I'm missing something because I have not had a full
>>        cup of
>>            tea yet :)
>>
>>            Thoughts,
>>              -Steve
>>
>>
>>            On 7/29/2011 8:39 AM, Jay Mahadeokar wrote:
>>
>>
>>
>>                On Fri, Jul 29, 2011 at 5:50 PM, Stephen Woodbridge
>>        <woodbri at swoodbridge.com <mailto:woodbri at swoodbridge.**com<woodbri at swoodbridge.com>
>> >
>>        <mailto:woodbri at swoodbridge. com <mailto:woodbri at swoodbridge.**com<woodbri at swoodbridge.com>
>> >>
>>        <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>. com
>>
>>
>>        <mailto:woodbri at swoodbridge. com
>>        <mailto:woodbri at swoodbridge.**com <woodbri at swoodbridge.com>>>>>
>> wrote:
>>
>>                    On 7/29/2011 5:49 AM, Jay Mahadeokar wrote:
>>
>>                        Hi all,
>>
>>                        I was looking at ways to optimise the tdsp
>>        implementation.
>>
>>                        Currently, if the time-interval is suppose 24, (24
>>                hours) and
>>                        the travel
>>                        time expected is around 60 hours, then the
>>        tdsp-wrapper
>>                query
>>                        retrieves
>>                        total 60 intervals and the data is repeated for
>>        the 24 hour
>>                        intervals.
>>                        (I hope what i am trying to say is clear).
>>
>>                        This is unnecessary space wastage. So, I edited the
>>                weight_map
>>                        class to
>>                        accommodate a variables is_cyclic and the
>>                cycle_interval, which will
>>                        keep track of the cyclic nature of the data. So,
>>                actually only
>>                        the 24
>>                        hour data will be fed to weight map and then it
>>        will be
>>                reused
>>                        in cycles.
>>
>>                        Because of the good design, I guess this has
>>        scaled up
>>                really
>>                        nice and
>>                        there is no need to modify actual core tdsp.
>>        Just the
>>                weight_map's
>>                        get_travel_time() function and the wrapper plsql
>>                function needed
>>                        to be
>>                        altered.
>>
>>
>>                    This makes perfect sense and sounds like a great
>>        optimization.
>>
>>
>>                        Now, in the main query, I am thinking of adding
>>        one more
>>                boolean
>>                        variable - is_cyclic which will be enabled if
>>        the data
>>                is cyclic in
>>                        nature.  Should we keep this? Or we should
>>        assume that
>>                the data
>>                        will be
>>                        always cyclic.
>>                        Any other views on the same are welcome.
>>
>>
>>                    I guess my questions are:
>>                    How does this get set?
>>                    What are the advantages of setting it?
>>                    Is there a way to dynamically check if it is cyclic
>>        at the
>>                start and
>>                    avoid having the app builder set it up?
>>
>>
>>                Advantages of setting it:
>>                If the data is not cyclic, then the whole data will be
>>        expected to
>>                reside in database itself. So, if the upper bound of the
>>        time is
>>                reached
>>                then by default, we can assume that the travel_time
>>                corresponding to the
>>                last existing time window should be used for such cases.
>>
>>                If data is cyclic, then we need to start all over again.
>>          These
>>                are the
>>                cases, what I could think of.
>>
>>                How does this get set:
>>                The tdsp query must have an additional parameter -
>>        is_cyclic which
>>                should be used to set the is_cyclic flag of the weight_map.
>>
>>                Now, if data is cyclic, then the plsql function that
>>        retrieves
>>                the data
>>                should be written in such a way that if interval is
>>        suppose 0 -
>>                24, and
>>                start time is 21, then  21->0 and 20->23. After that the
>>        cyclic flag
>>                will indicate us to loop again.  The plsql function will be
>>                generally
>>                written by app developer and he should take care of
>>        this. (I have
>>                already implemented this in my code.)
>>
>>                If data is cyclic, he should pass the flag appropriately.
>> If
>>                is_cyclic
>>                is false, we can assume that if intervals are exhausted,
>>        we will
>>                keep
>>                using time corresponding to last interval.
>>
>>                Is this reasonable?
>>
>>                    I suppose it is not a big deal for the app builder
>>        to set
>>                this flag
>>                    since he will know if the data loaded is cyclic in
>>        nature. I
>>                assume
>>                    you are only looking for some indication if there
>>        are cyclic
>>                entries
>>                    in the data and not if this specific query has a cyclic
>>                nature which
>>                    would be impossible to know without analyzing the
>>        start time
>>                and max
>>                    time window, in which case we should figure that out
>>                automatically
>>                    because the user making the request is not likely to
>>        know
>>                anything
>>                    more than start, end and start time or end time.
>>
>>                    -Steve
>>
>>                --
>>                Regards,
>>                -Jay Mahadeokar
>>
> ______________________________**_________________
> pgrouting-dev mailing list
> pgrouting-dev at lists.osgeo.org
> http://lists.osgeo.org/**mailman/listinfo/pgrouting-dev<http://lists.osgeo.org/mailman/listinfo/pgrouting-dev>
>

-- 
Regards,
-Jay Mahadeokar
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/pgrouting-dev/attachments/20110729/c7e58d75/attachment-0001.html