[pgrouting-dev] Time dependent data and input format

Sat Jul 30 16:00:34 EDT 2011

On 7/30/2011 2:44 PM, Jay Mahadeokar wrote:
>
>
> On Sat, Jul 30, 2011 at 11:22 PM, Stephen Woodbridge
> <woodbri at swoodbridge.com <mailto:woodbri at swoodbridge.com>> wrote:
>
>     Jay,
>
>     After reading your report, I'm still struggling with the is_cyclic
>     flag. I think the reason is because this seems to be a global flag
>     and the fact is that some records might be cyclic and some might not
>     be cyclic so the flag needs to be set at the record level not as a
>     global. If we set the is_cyclic flag and have two time records on is
>     cyclic and the other is not, how do we know which is which?
>
>     tdsp expects a table of:
>
>
>     gid, start_time, end_time, travel_time
>
>     So how do I translate my example table:
>
>
>        time_dependency
>        rule_id:seq:day_of_week:start_ time:end_time:from_cost:to_
>        cost:expiry_date
>        1:1:2,3,4,5,6:0700:1000:30:-1:
>        1:2:2,3,4,5,6:1600:1900:-1:30:
>        1:3:1,7:0000:2400:45:45:
>        2:1:0:1100:1130:20:20:20110729
>
>     into this? Note: rule_id=1 is cyclic, rule_id=2 is not.
>
>     For example lets say the route start 20110729 (friday) 6AM and ends
>     20110801 (monday) 8PM. What would be the records we would need to
>     return from the above table? For simplicity, lets assume rule_id is
>     the gid.
>
>
> Hi Steve,
>
> The is_cyclic flag will be returned by the plsql query that generates
> the time dependent data. This will be done according to the nature of
> the data, start-time, end-time etc and the logic for that will be in the
> plsql function.
>
> The example you gave, for route start friday to monday, the plsql
> function should return the data as not cyclic and give the specific time
> intervals from start to end. I think this is
>
> If query is from say monday to thursday, then the cyclic flag can be set
> to true and data corresponding to the 24 hour interval only can  be
> returned. If there is transition from weekday to weekend and vice versa
> then data should be marked as not cyclic (which we discussed yesterday).
>
> I am realising that this may not be the most optimal way, but I think
> this optimisation is still just a luxury, since usually travel times
> won't last more than a week in real world scenarios :).
>
> On second thought, Lets see if we keep the is_cyclic flag with each time
> interval as you are suggesting.
>
> So, suppose the wrapper plsql is now returning -
>
> gid, start_time, end_time, travel_time, is_cyclic
>
> Now, if data is not cyclic from 11 to 11:30 on 20110729, then we will
> have to take that for just one time and if it again cycles, we will have
> to take alternate value for same time next week.
> If some interval is not cyclic, I think it will be included only in the
> first cycle, as per your example I am not sure if this property is
> generic, but I think it might be.
>
> The logic to check all this would have to be within the weight_map
> class.  I can easily detail out algorithm for solving your example. But,
> for each cyclic interval information like how many times it will repeat,
> what will be the ordering etc should be provided to weight_map class.
>
> Ex - intervals corresp to weekdays repeat 5 times, intervals corresp to
> weekend repeat 2 times, while there is 1 non repeating interval.
> These intricacies will be more if we imagine a more complex data
> format.  Since all times are offset from start_time, all this
> information will have to be delivered in some standard format which
> should not be example specific.
>
> I think these features need a detailed understanding and discussion. We
> can surely continue to brainstorm over the issues. But I am not sure if
> we can implement this within GSoc timeline. So, maybe we can put that in
> TDSP Version 1.1?

Jay,

Ok, sounds like the optimization is a little premature but otherwise a 
good idea. I think that after GSoC, we need to look at time dependent 
handling and see if we can come up with something general that can be 
used for this code and for multimodal scheduling. It would be nice to 
have a general facility that any routing code that needs time dependent 
data can use and easily plug into. I think the best way to do this is to 
start with use cases for:

1. time dependent traffic data and events
2. mass transit bus, train, subway schedules
3. other specific use cases

Then from these, abstract the table definitions that can hold this data 
in a format that is easy to enter, edit, and manage. Then look at how we 
transform that data into a form that a specific query needs to solve a 
given problem.

And finally, we should be able to retrofit the existing code to use 
these facilities.

Regards your project, I would leave things as is and move forward. In 
your final report you should have a discussion of things we learned and 
future directions that should identify the above as one of those.

I'm open to other ideas or expanding on this if people have any 
additional thoughts.

Thanks,
-Steve

>     Also I just noticed that you assume the edges are not directed so
>     the cost is the same traversing from A-B and B-A. Actually, that is
>     probably not true, if gid is the directed edge and not the GIS edge
>     which might get entered twice in a directed graph. This is another
>     issue that the app developer needs to think about because GIS data
>     has a single edge with attributes that then might generate two graph
>     edges and then need to map time depend GIS edges back to graph
>     edges. Oh what a tangles web we weave :)
>
>
> Actually I assume that edges are directed.
> https://github.com/pgRouting/pgrouting/wiki/TDSP-Details. Refer section 3.
>
>
>     -Steve
>
>
>     On 7/29/2011 1:30 PM, Jay Mahadeokar wrote:
>
>
>
>         On Fri, Jul 29, 2011 at 10:50 PM, Stephen Woodbridge
>         <woodbri at swoodbridge.com <mailto:woodbri at swoodbridge.com>
>         <mailto:woodbri at swoodbridge. com
>         <mailto:woodbri at swoodbridge.com>>> wrote:
>
>             On 7/29/2011 1:03 PM, Jay Mahadeokar wrote:
>
>
>
>                 On Fri, Jul 29, 2011 at 9:54 PM, Stephen Woodbridge
>         <woodbri at swoodbridge.com <mailto:woodbri at swoodbridge.com>
>         <mailto:woodbri at swoodbridge. com <mailto:woodbri at swoodbridge.com>>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>. com
>         <mailto:woodbri at swoodbridge. com
>         <mailto:woodbri at swoodbridge.com>>>> wrote:
>
>                     On 7/29/2011 11:49 AM, Jay Mahadeokar wrote:
>
>                         Hi Steve,
>
>                         On Fri, Jul 29, 2011 at 6:26 PM, Stephen Woodbridge
>         <woodbri at swoodbridge.com <mailto:woodbri at swoodbridge.com>
>         <mailto:woodbri at swoodbridge. com <mailto:woodbri at swoodbridge.com>>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>. com
>         <mailto:woodbri at swoodbridge. com <mailto:woodbri at swoodbridge.com>>>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>>. com
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>. com
>         <mailto:woodbri at swoodbridge. com
>         <mailto:woodbri at swoodbridge.com>>>>> wrote:
>
>                             Jay,
>
>                             This make sense. What I'm trying to
>         understand and
>                 get to is
>                         how do
>                             we make this easy to setup and use. It seems
>         like we
>                 are missing
>                             something somewhere. For example, the
>         cyclical range
>                 of a single
>                             entry can be different from other entries, think
>                 week days
>                         versus
>                             weekends. It should be possible to have some
>         cyclic
>                 entries
>                         and some
>                             not cyclic. Based on this it seems that the
>         cyclic flag
>                         needs to be
>                             set on each record along with the period of the
>                 cycle. Would
>                         you agree?
>
>
>                         The main motivation behind all this brain
>         storming is to
>                 keep
>                         the core
>                         algorithm and the postgres c function
>         implementation,
>                         independent of the
>                         nature of data, right? So, we need to somehow
>         abstract
>                 the data
>                         dependent properties from the core functionality.
>
>                         That was the reason why we came up with the plsql
>                 wrapper functions
>                         idea. For optimising again, we need to
>         communicate the
>                 cyclic
>                         properties
>                         to core algorithm, but the interface should be
>                 independent of
>                         nature of
>                         data.
>
>                             If this is the case, then a quick scan of
>         the data would
>                         tell us if
>                             there are cyclic entries, in which case a simple
>                 function to
>                         scan
>                             the data needed and see if any entry is
>         cyclic could
>                 return that
>                             flag. But if every entry is marked as cyclic
>         or not
>                 then you
>                         might
>                             not need that flag at all.
>
>
>                         What I feel is: if we scan the data and keep the
>         logic of
>                         determining
>                         the cyclic nature within the postgres C
>         function, then
>                 the above aim
>                         will somehow get lost and it will again become a
>         data format
>                         specific
>                         function.
>
>                         So, I think it will be better if the application
>                 developers deal
>                         with
>                         the data themselves so that they can achieve the
>                 optimisation.
>                         Example -
>                         for weekday, weekend, the plsql function will
>         provide
>                 specific
>                         values
>                         for each day and mark data as non cyclic if there is
>                 transition from
>                         weekday to weekend and vice versa. (A journey
>         more than
>                 few days is
>                         highly unlikely, so anyways it wont repeat the
>         data too
>                 much,
>                         though it
>                         will not be optimal)
>
>                         If the journey is within weekday or within
>         weekend, then
>                 we can mark
>                         data as cyclic, and just retrieve 24 hour data.
>
>                         This is specific to example data you specified. I
>                 believe such
>                         improvisations will be applicable for most data
>         formats.
>                 Thoughts?
>
>                         Again, this is not the most optimal
>         implementation, but
>                 still it
>                         keeps
>                         the core implementation independent of data
>         format. Any
>                 other ideas?
>
>
>                     Jay,
>
>                     I think you are correct in this. This will require
>         some real
>                 world
>                     examples of data and how to work with it. The
>         abstraction makes
>                     sense but it is hard to map it to reality hence our
>         discussions.
>
>                     So if we propose a hypothetical example like:
>
>                     1. roadway which it one way between 7-10am north
>         bound on
>                 mon-fri
>                     2. same roadway is one way between 4-7pm south bound
>         on mon-fri
>                     3. same roadway is two otherwise
>
>                     A separate type of rule might be:
>
>                     4. there was an accident at 11am causing reduced
>         speeds for
>                 the next
>                     30 mins
>
>                     we could define a table like:
>
>                     time_dependency
>                     rule_id:seq:day_of_week:start_
>         time:end_time:from_cost:to_
>                     cost:expiry_date
>                     1:1:2,3,4,5,6:0700:1000:30:-1:
>                     1:2:2,3,4,5,6:1600:1900:-1:30:
>                     1:3:1,7:0000:2400:45:45:
>                     2:1:0:1100:1130:20:20:20110729
>
>                     I hope that is clear.
>                     sun-sat === 1-7 for day_of_week
>                     cost -1 means you can travel that direction
>
>                     then in the edges table, you can link the edge to
>                 time_dependency
>                     table via the rule_id
>
>                     This is not perfect because, from_cost and to_cost
>         imply a
>                 direction
>                     of digitization of the segment which might be
>         reversed on some
>                     segments. You might have multiple rules associated
>         with a single
>                     segment. I'm thinking of a "rule" as a single
>         description of
>                 some
>                     conditions that might get reused by multiple edges
>         in the
>                 graph. So
>                     we might want a linking table like:
>
>                     edge_id:rule_id
>
>                     that will support multiple rules to be associated.
>
>                     Then the stored procedure could be written to query
>         these tables
>                     based on start time, elapsed time from start, and
>         edge_id
>                 that could
>                     return the cost of a single edge. Or you would need
>         to write a
>                     stored procedure to transform these tables into whatever
>                 structure
>                     you need for the routing.
>
>
>                 Hi Steve,
>
>                 Thanks for coming up with the example. It fits well in
>         order to
>                 describe
>                 our current plan. So, as proposed we will assume our
>         data is in
>                 format
>                 (instead of creating ids and seq separately im keeping
>         gid as the id
>                 like in ways table):
>
>                 gid: day_of_week: start_time: end_time: from_cost: to_cost:
>                 expiry_date
>
>                 Now, the application developer dealing with the data
>         will write
>                 plsql
>                 function which will query the above table and return set
>         of rows:
>
>                 gid, start_time, end_time, travel_time  as expected by
>         the tdsp
>                 query.
>
>                 Now, if the data format is such that there are rules
>         associated to
>                 specific edges, and there are linking tables etc as you
>                 mentioned, then
>                 it would be the responsibility of the application
>         developer to
>                 come up
>                 with join queries etc that will be needed in the plsql
>         wrapper
>                 function.
>                 We will just expect data in the above format. And we can
>         also
>                 keep track
>                 of cyclic nature as discussed earlier.  There can be
>         many other
>                 formats
>                 and many other linking table possibilities, but we
>         should not be
>                 responsible for those. That was the reason of deciding
>         the input
>                 data
>                 format during the pre-coding period.
>
>                 I hope I am understanding your point and answering
>         relevantly.
>                   Thanks
>                 again for coming up with the users perspective. This is
>         really
>                 helping
>                 me to understand the real-world scenario which I had not
>         thought
>                 about
>                 particularly. Any other comments?
>
>
>             I think you have it now and I understand it also. The
>         application
>             developer can have whatever structure they need to represent the
>             data they have and link it to the edges, and then they have
>         to write
>             a procedure that will transform that into the record set you
>         defined
>             above that the engine needs.
>
>             In the documentation, you can copy my example and/or
>         simplify it a
>             little. It is fine to discuss points that they need to
>         consider like
>             direction of digitization or how to handle transient one way
>             streets, because these are things that people might need to deal
>             with and it gives them an idea of how to deal with other data
>             problems that they might be confronted with.
>
>
>         Sure! I will do that in last two weeks I think. If you can list such
>         points that we should document, I can come up with details of
>         mapping
>         each scenario to the required format.
>
>             It would be nice if we could get some real time dependent
>         data and
>             work through building a sample application the uses it. Maybe we
>             will have time after GSoC to do that.
>
>
>         I have been trying to get real world time dependent data since a
>         long
>         time now. Still I have not found a good data source. It would be
>         great
>         if we can acquire that, and yes, we can do that after GSoc too!
>
>
>             Thanks,
>               -Steve
>
>                     I just trying to put this into a concrete example so
>         we can work
>                     through how this works and to help understand how we
>         need to
>                     document it so people can use it in a real world
>         example.
>
>                     Thoughts?
>
>                     -Steve
>
>                             May be I'm missing something because I have
>         not had
>                 a full
>                         cup of
>                             tea yet :)
>
>                             Thoughts,
>                               -Steve
>
>
>                             On 7/29/2011 8:39 AM, Jay Mahadeokar wrote:
>
>
>
>                                 On Fri, Jul 29, 2011 at 5:50 PM, Stephen
>         Woodbridge
>         <woodbri at swoodbridge.com <mailto:woodbri at swoodbridge.com>
>         <mailto:woodbri at swoodbridge. com <mailto:woodbri at swoodbridge.com>>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>. com
>         <mailto:woodbri at swoodbridge. com <mailto:woodbri at swoodbridge.com>>>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>>. com
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>. com
>         <mailto:woodbri at swoodbridge. com <mailto:woodbri at swoodbridge.com>>>>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>>>. com
>
>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>>. com
>         <mailto:woodbri at swoodbridge <mailto:woodbri at swoodbridge>. com
>         <mailto:woodbri at swoodbridge. com
>         <mailto:woodbri at swoodbridge.com>>>>>> wrote:
>
>                                     On 7/29/2011 5:49 AM, Jay Mahadeokar
>         wrote:
>
>                                         Hi all,
>
>                                         I was looking at ways to
>         optimise the tdsp
>                         implementation.
>
>                                         Currently, if the time-interval is
>                 suppose 24, (24
>                                 hours) and
>                                         the travel
>                                         time expected is around 60
>         hours, then the
>                         tdsp-wrapper
>                                 query
>                                         retrieves
>                                         total 60 intervals and the data is
>                 repeated for
>                         the 24 hour
>                                         intervals.
>                                         (I hope what i am trying to say
>         is clear).
>
>                                         This is unnecessary space
>         wastage. So, I
>                 edited the
>                                 weight_map
>                                         class to
>                                         accommodate a variables
>         is_cyclic and the
>                                 cycle_interval, which will
>                                         keep track of the cyclic nature
>         of the
>                 data. So,
>                                 actually only
>                                         the 24
>                                         hour data will be fed to weight
>         map and
>                 then it
>                         will be
>                                 reused
>                                         in cycles.
>
>                                         Because of the good design, I
>         guess this has
>                         scaled up
>                                 really
>                                         nice and
>                                         there is no need to modify
>         actual core tdsp.
>                         Just the
>                                 weight_map's
>                                         get_travel_time() function and the
>                 wrapper plsql
>                                 function needed
>                                         to be
>                                         altered.
>
>
>                                     This makes perfect sense and sounds
>         like a great
>                         optimization.
>
>
>                                         Now, in the main query, I am
>         thinking of
>                 adding
>                         one more
>                                 boolean
>                                         variable - is_cyclic which will be
>                 enabled if
>                         the data
>                                 is cyclic in
>                                         nature.  Should we keep this? Or
>         we should
>                         assume that
>                                 the data
>                                         will be
>                                         always cyclic.
>                                         Any other views on the same are
>         welcome.
>
>
>                                     I guess my questions are:
>                                     How does this get set?
>                                     What are the advantages of setting it?
>                                     Is there a way to dynamically check
>         if it is
>                 cyclic
>                         at the
>                                 start and
>                                     avoid having the app builder set it up?
>
>
>                                 Advantages of setting it:
>                                 If the data is not cyclic, then the
>         whole data
>                 will be
>                         expected to
>                                 reside in database itself. So, if the upper
>                 bound of the
>                         time is
>                                 reached
>                                 then by default, we can assume that the
>         travel_time
>                                 corresponding to the
>                                 last existing time window should be used for
>                 such cases.
>
>                                 If data is cyclic, then we need to start all
>                 over again.
>                           These
>                                 are the
>                                 cases, what I could think of.
>
>                                 How does this get set:
>                                 The tdsp query must have an additional
>         parameter -
>                         is_cyclic which
>                                 should be used to set the is_cyclic flag
>         of the
>                 weight_map.
>
>                                 Now, if data is cyclic, then the plsql
>         function that
>                         retrieves
>                                 the data
>                                 should be written in such a way that if
>         interval is
>                         suppose 0 -
>                                 24, and
>                                 start time is 21, then  21->0 and
>         20->23. After
>                 that the
>                         cyclic flag
>                                 will indicate us to loop again.  The plsql
>                 function will be
>                                 generally
>                                 written by app developer and he should
>         take care of
>                         this. (I have
>                                 already implemented this in my code.)
>
>                                 If data is cyclic, he should pass the flag
>                 appropriately. If
>                                 is_cyclic
>                                 is false, we can assume that if
>         intervals are
>                 exhausted,
>                         we will
>                                 keep
>                                 using time corresponding to last interval.
>
>                                 Is this reasonable?
>
>                                     I suppose it is not a big deal for
>         the app
>                 builder
>                         to set
>                                 this flag
>                                     since he will know if the data loaded is
>                 cyclic in
>                         nature. I
>                                 assume
>                                     you are only looking for some
>         indication if
>                 there
>                         are cyclic
>                                 entries
>                                     in the data and not if this specific
>         query
>                 has a cyclic
>                                 nature which
>                                     would be impossible to know without
>                 analyzing the
>                         start time
>                                 and max
>                                     time window, in which case we should
>         figure
>                 that out
>                                 automatically
>                                     because the user making the request
>         is not
>                 likely to
>                         know
>                                 anything
>                                     more than start, end and start time
>         or end time.
>
>                                     -Steve
>
>                                 --
>                                 Regards,
>                                 -Jay Mahadeokar
>
>             ______________________________ _________________
>             pgrouting-dev mailing list
>         pgrouting-dev at lists.osgeo.org
>         <mailto:pgrouting-dev at lists.osgeo.org>
>         <mailto:pgrouting-dev at lists. osgeo.org
>         <mailto:pgrouting-dev at lists.osgeo.org>>
>
>         http://lists.osgeo.org/ mailman/listinfo/pgrouting-dev
>         <http://lists.osgeo.org/ mailman/listinfo/pgrouting-dev
>         <http://lists.osgeo.org/mailman/listinfo/pgrouting-dev> >
>
>
>
>
>         --
>         Regards,
>         -Jay Mahadeokar
>
>
>
>         ______________________________ _________________
>         pgrouting-dev mailing list
>         pgrouting-dev at lists.osgeo.org <mailto:pgrouting-dev at lists.osgeo.org>
>         http://lists.osgeo.org/ mailman/listinfo/pgrouting-dev
>         <http://lists.osgeo.org/mailman/listinfo/pgrouting-dev>
>
>
>     ______________________________ _________________
>     pgrouting-dev mailing list
>     pgrouting-dev at lists.osgeo.org <mailto:pgrouting-dev at lists.osgeo.org>
>     http://lists.osgeo.org/ mailman/listinfo/pgrouting-dev
>     <http://lists.osgeo.org/mailman/listinfo/pgrouting-dev>
>
>
>
>
> --
> Regards,
> -Jay Mahadeokar
>
>
>
> _______________________________________________
> pgrouting-dev mailing list
> pgrouting-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/pgrouting-dev