multi_locus_analysis.finite_window.munging¶

multi_locus_analysis.finite_window.munging.discrete_trajectory_to_wait_times(data, t_col='t', state_col='state')[source]¶

Converts a discrete trajectory to a dataframe containing each wait time, with its start, end, rank order, and the state it’s leaving.

Discrete here means that the state of the system was observed at finite time points (on a lattice in time), as opposed to a system where the exact times of transitions between states are known.

Because a discrete trajectory only bounds the wait times, and does not determine their exact lengths (as a continuous trajectory might), additional columns are included that explictly bound the wait times, in addition to returning the “natural” estimate.

Parameters

data (pd.DataFrame) – should have at least states_column and time_column columns, and already be groupby’d so that there’s only one “trajectory” within the DataFrame. One row should correspond to an observation at a particular time point.
time_column (string) – the name of the column containing the time of each time point
states_column (string) – the name of the column containing the state for each time point

Returns

wait_df – columns are [‘wait_time’, ‘start_time’, ‘end_time’, ‘state’, ‘wait_type’, ‘min_waits’, ‘max_waits’], where [wait,end,start]_time columns are self explanatory, state is the value of the states_column during that waiting time, and wait_type is one of ‘interior’, ‘left exterior’, ‘right exterior’, ‘full exterior’, depending on what kind of waiting time was observed. See the Notes section below for detailed explanation of these categories. The ‘min/max_waits’ columns contain the minimum/maximum possible value of the wait time (resp.), given the observations.

The default index is named “rank_order”, since it tracks the order (zero-indexed) in which the wait times occured.

Return type

pd.DataFrame

Notes

the following types of wait times are of interest to us

1) interior censored times: whenever you are observing a switching process for a finite amount of time, any waiting time you observe the entirety of is called “interior” censored

2) left exterior censored times: whenever the waiting time you observe started before you began observation (it overlaps the “left” side of your interval of observation)

3) right exterior censored times: same as above, but overlapping the “right” side of your interval of observation.

4) full exterior censored times: whenever you observe the existence of a single, particular state throughout your entire window of observation.

multi_locus_analysis.finite_window.munging.movie_to_waits(*args, **kwargs)[source]¶: Alias of discrete_trajectory_to_wait_times()

multi_locus_analysis.finite_window.munging.sim_to_obs(simulations, traj_cols=['replicate'])¶: Vectorized extracting of “observed” wait times.

multi_locus_analysis.finite_window.munging.simulation_to_frame_times(simulation, t, traj_cols=['replicate'])[source]¶

Vectorized conversion into frames, using np.searchsorted.

Getting what “frames” each transition happened on is pretty easy. Consider the example trajectory with window_start=1.2, window_end=2.9, and transition times:

>>> transitions = [0.9, 1.5, 1.7, 2.5, 3.1]

with possible frames:

>>> frames = [0, 1, 2, 3, 4, 5, 6]

It may seem that by definition, we can only observe the transitions between window_start and window_end, however, this is not actually true here, because the stationary, “continuous” Markov renewal process that we’ve generated using the fw.simulation functions is valid from the “true” start of the left exterior time to the “true” end of the right exterior time, even if we normally discard these times for the purposes of simulating censored statistics. So the discrete movie’s window size will just be the number of frames that are in the interval defined by [min(transitions), max(transitions)].

Similarly, if instead window_end=3.05, then nothing actually changes, because either way in a discrete movie we directly observe the state at t=3.

We’ve got nice strats for extracting the first and last possible frame in a vectorized way. So the algorithm is to first do that. Then, we discard transition times that are beyond our observation frames in either direction, and searchsort the remaining ones into the frames. In the above, that’s:

>>> t_i = [1, 2, 2, 3, 4]

or equivalently:

>>> start_t_i = [1, 2, 2, 3]

and

>>> end_t_i = [2, 2, 3, 4]

Now we simply notice that the last start time that searchsorts to being before a given frame is going to dictate the state that we observe at that frame. So if the states after each start_t_i were:

>>> states = ['A', 'B', 'A', 'B']

then we will observe states [‘A’, ‘A’, ‘B’] at frames [1, 2, 3], with no observation at frames 0 or 4, 5 and 6.

Finally, sometimes there will be multiple state switches between two frames, and yet by chance we end up in the same state as when we started. We must combine these.

To get the correct output for every frame, we would simply have to notice that sometimes a waiting time observation will cover many frames, and add one last step where we simply fill frames not specifically marked by a start time with whatever the state was at the previous frame. We don’t do this right now because my thesis is due tomorrow.

multi_locus_analysis.finite_window.munging.simulations_to_observations(simulations, traj_cols=['replicate'])[source]¶: Vectorized extracting of “observed” wait times.

multi_locus_analysis.finite_window.munging.state_changes_to_movie_frames(traj, times, state_col='state', start_times_col='start_time', end_times_col='end_time', wait_type_col='wait_type')[source]¶

Convert state changes into discrete-time observations of state.

Takes a Series of state change times into a Series containing observations at the times requested. The times become the index.

Parameters

times ((N,) array_like) – times at which to “measure” what state we’re in to make the new trajectories.
traj (pd.DataFrame) – should have state_col and start_times_col columns. the values of state_col will be copied over verbatim.
state_col (str, default: 'state') – name of column containing the state being transitioned out of for each measurement in traj.
start_times_col (str, default: 'start_times') – name of column containing times at which traj changed state
end_times_col ((optional) str) – by default, the function assumes that times after the last provided state transition time are in the same state. if passed, this column is used to determine at what time the last state “finished”. Times after this will be labeled as NaN. Analagously to how start times are treated, if the end time exactly matches the “transition” time, assume this is an “exterior” measurement.

Returns

movie – Series defining the “movie” with frames taken at times that simply measures what state traj is in at each frame. index is times, state_col is used to name the Series.

Return type

pd.Series

Notes

A start time means that if we observe at that time, the state transition will have already happened (right-continuity), except in the case when the transition happens at exactly the window end time. This is confusing in words, but simple to see in an example (see the example below).

Examples

For the DataFrame

>>> df = pd.DataFrame([['A',  -1, 0.1], ['B', 0.1, 1.0]],
>>>     columns=['state', 'start_time', 'end_time'])

the discretization into tenths of seconds would give

>>> state_changes_to_movie_frames(df, times=np.linspace(0, 1, 11),
>>>     end_times_col='end_time')
t
0      A
1      B
2      B
3      B
4      B
5      B
6      B
7      B
8      B
9      B
0      B
Name: state, dtype: object

Notice in particular how at 0.1, the state is already ‘B’. However at time 1.0 the state is not already “unknown”. This is what is meant by the Notes section above.

If the end_times_col argument is omitted, then the last observed state is assumed to continue for all times requested from then on:

>>> state_changes_to_movie_frames(df, times=np.linspace(0, 2, 11))
t
0    A
2    B
4    B
6    B
8    B
0    B
2    B
4    B
6    B
8    B
0    B
Name: state, dtype: object

multi_locus_analysis.finite_window.munging.state_changes_to_wait_times(traj)[source]¶

DEPRECATED: in favor of simulations_to_observations().

Converts the output of ab_window_fast() into a pd.DataFrame containing each wait time, with its start, end, rank order, and the state it’s leaving.

This function deals with “continuous” wait times, as in not measured at discrete time points (on a grid), so the wait times it returns are exact.

multi_locus_analysis.finite_window.munging.traj_to_movie(*args, **kwargs)[source]¶: Alias of state_changes_to_movie_frames().

multi_locus_analysis.finite_window.munging.traj_to_waits(*args, **kwargs)[source]¶: Alias of state_changes_to_wait_times()