multi_locus_analysis.dataframes

Utilities for massaging DataFrames

multi_locus_analysis.dataframes.array_from_numpy_string(s)[source]

For unserializing np.array’s after DataFrame.to_csv

Notes

As of 2019-03-13, looks like instead of commas separating the numpy array elements, two spaces are printed. We just replace these spaces with commas after removing all other spaces and load the string as json.

multi_locus_analysis.dataframes.pivot_loci(df, pivot_cols=['x', 'y', 'z'], spot_col='spot')[source]

Move between “long” and “short” forms for the spot id column.

Simply put, we want to be able to transform between the following two dataframes:

Condensed form
                                           X1   Y1   Z1   X2   Y2   Z2     t foci
locus genotype exp.rep meiosis cell frame
HET5  WT       2       t0      1    1      1.6  2.4  3.1  2.1  1.5  3.1    0  unp
                                    2      1.9  1.5  3.1  1.9  2.5  3.1   30  unp
                                    3      2.0  1.8  3.0  1.5  2.5  3.4   60  unp
                                    4      2.1  1.9  3.0  1.4  2.2  3.4   90  unp
                                    5      2.2  1.8  3.0  1.5  2.4  3.4  120  unp

and

"Long" form
                                                X    Y    Z      t foci
locus genotype exp.rep meiosis cell frame spot
HET5  WT       2       t0      1    1     1     1.6  2.4  3.1    0  unp
                                    2     1     1.9  1.5  3.1   30  unp
                                    3     1     2.0  1.8  3.0   60  unp
                                    ...
                                    1     2     2.1  1.5  3.1    0  unp
                                    2     2     1.9  2.5  3.1   30  unp
                                    3     2     1.5  2.5  3.4   60  unp

This function can infer which direction to pivot. Because of this, I have found using this function much more convenient (and a smaller cognitive load) than using a multiindex for the column names and using e.g. pd.unstack and friends.

Parameters
  • pivot_cols (List<str>) – The names of the columns over which to pivot (without their numerical suffixes, these will be inferred).

  • spot_col (str) – The name of the column that holds (or will hold) the spot id.

Returns

df – The pivot-ed DataFrame.

Return type

pd.DataFrame