multi_locus_analysis.dataframes¶
Utilities for massaging DataFrames
-
multi_locus_analysis.dataframes.
array_from_numpy_string
(s)[source]¶ For unserializing np.array’s after DataFrame.to_csv
Notes
As of 2019-03-13, looks like instead of commas separating the numpy array elements, two spaces are printed. We just replace these spaces with commas after removing all other spaces and load the string as json.
-
multi_locus_analysis.dataframes.
pivot_loci
(df, pivot_cols=['x', 'y', 'z'], spot_col='spot')[source]¶ Move between “long” and “short” forms for the spot id column.
Simply put, we want to be able to transform between the following two dataframes:
Condensed form X1 Y1 Z1 X2 Y2 Z2 t foci locus genotype exp.rep meiosis cell frame HET5 WT 2 t0 1 1 1.6 2.4 3.1 2.1 1.5 3.1 0 unp 2 1.9 1.5 3.1 1.9 2.5 3.1 30 unp 3 2.0 1.8 3.0 1.5 2.5 3.4 60 unp 4 2.1 1.9 3.0 1.4 2.2 3.4 90 unp 5 2.2 1.8 3.0 1.5 2.4 3.4 120 unp
and
"Long" form X Y Z t foci locus genotype exp.rep meiosis cell frame spot HET5 WT 2 t0 1 1 1 1.6 2.4 3.1 0 unp 2 1 1.9 1.5 3.1 30 unp 3 1 2.0 1.8 3.0 60 unp ... 1 2 2.1 1.5 3.1 0 unp 2 2 1.9 2.5 3.1 30 unp 3 2 1.5 2.5 3.4 60 unp
This function can infer which direction to pivot. Because of this, I have found using this function much more convenient (and a smaller cognitive load) than using a multiindex for the column names and using e.g. pd.unstack and friends.
- Parameters
pivot_cols (List<str>) – The names of the columns over which to pivot (without their numerical suffixes, these will be inferred).
spot_col (str) – The name of the column that holds (or will hold) the spot id.
- Returns
df – The pivot-ed DataFrame.
- Return type
pd.DataFrame