larry._datasets

Subpackages

Submodules

Package Contents

Classes

DimensionReduction

Helper class that provides a standard way to create an ABC using

inVitroURLPaths

inVivoURLPaths

CytokinePerturbationURLPaths

inVitroData

inVivoData

CytokinePerturbationData

RunningQuantile

SplitDataForTask

AnnDataConfiguration

Construct AnnData from constituent components.

AnnDataPathManager

Functions

mkdir(→ None)

load_expr_matrix(path)

cell_cycle_genes([genes_added])

vscores(E[, min_mean, nBins, fit_percentile, error_wt])

Calculate v-score (above-Poisson noise statistic) for genes in the input sparse counts matrix

highly_variable_genes(adata[, base_ix, ...])

Filter genes by expression level and variability

remove_cell_cycle_correlated_genes(adata[, min_corr, ...])

Remove signature-correlated genes from a list of test genes

split_for_timepoint_recovery_task(adata[, split_key, ...])

split_for_fate_prediction_task(adata[, split_key, ...])

split_for_transfer_learning_task(adata[, split_key, ...])

class larry._datasets.DimensionReduction(n_pcs=50, n_components=2, metric='euclidean', n_neighbors=30)

Bases: larry._utils.AutoParseBase

Helper class that provides a standard way to create an ABC using inheritance.

property Scaler
property PCA
property UMAP
__configure__(kwargs, ignore=['self'])
larry._datasets.mkdir(path: str, silent: bool = False) None
class larry._datasets.inVitroURLPaths(download_path=os.getcwd())

Bases: URLPathInterface

_download_structure = KleinLabData/in_vitro
_dataset = inVitro
class larry._datasets.inVivoURLPaths(download_path=os.getcwd())

Bases: URLPathInterface

_download_structure = KleinLabData/in_vivo
_dataset = inVivo
class larry._datasets.CytokinePerturbationURLPaths(download_path=os.getcwd())

Bases: URLPathInterface

_download_structure = KleinLabData/cytokine_perturbation
_dataset = cytokinePerturbation
larry._datasets.load_expr_matrix(path)
class larry._datasets.inVitroData(silent=False)

Bases: DataHandler

_url_paths
_dataset = in_vitro
fate_prediction(split_key='Well', write_h5ad=False)
timepoint_recovery(split_key='Time point')
transfer_learning(split_key='Time point')
class larry._datasets.inVivoData(silent=False)

Bases: DataHandler

_url_paths
_dataset = in_vivo
class larry._datasets.CytokinePerturbationData(silent=False)

Bases: DataHandler

_url_paths
_dataset = cytokine_perturbation
class larry._datasets.RunningQuantile(n_bins: int = 50)
__call__(x, y, p)

calculate the quantile of y in bins of x

larry._datasets.cell_cycle_genes(genes_added=[])
larry._datasets.vscores(E, min_mean=0, nBins=50, fit_percentile=0.1, error_wt=1)

Calculate v-score (above-Poisson noise statistic) for genes in the input sparse counts matrix Return v-scores and other stats

larry._datasets.highly_variable_genes(adata, base_ix=[], min_vscore_pctl=85, min_counts=3, min_cells=3, show_vscore_plot=False, sample_name='', return_idx=False)

Filter genes by expression level and variability Return list of filtered gene indices

larry._datasets.remove_cell_cycle_correlated_genes(adata, min_corr=0.1, key_added='use_genes')

Remove signature-correlated genes from a list of test genes

E: scipy.sparse.csc_matrix, shape (n_cells, n_genes)
  • full counts matrix

gene_list: numpy array, shape (n_genes,)
  • full gene list

exclude_corr_genes_list: list of list(s)
  • Each sublist is used to build a signature. Test genes correlated with this signature will be removed

test_gene_idx: 1-D numpy array
  • indices of genes to test for correlation with the gene signatures from exclude_corr_genes_list

min_corr: float (default=0.1)
  • Test genes with a Pearson correlation of min_corr or higher with any of the gene sets from exclude_corr_genes_list will be excluded

numpy array of gene indices (subset of test_gene_idx) that are not correlated with any of the gene signatures

Source: https://github.com/AllonKleinLab/SPRING_dev/blob/aa52c405b6f15efd53c66f6856799dfe46e72d01/data_prep/spring_helper.py#L307-L328

larry._datasets.split_for_timepoint_recovery_task(adata, split_key='Time point', write_h5ad=False)
larry._datasets.split_for_fate_prediction_task(adata, split_key='Well', write_h5ad=False)
larry._datasets.split_for_transfer_learning_task(adata, split_key='Time point', write_h5ad=False)
class larry._datasets.SplitDataForTask(adata, split_key: str, train_vals, test_vals, n_pcs=50, n_components=2, metric='euclidean', n_neighbors=30)
property train_idx
property test_idx
property X_train
property X_train_scaled
property X_train_pca
property X_train_umap
property X_test
property X_test_scaled
property X_test_pca
property X_test_umap
t_elapsed_message(message)
concat_train_test(adata_task)
__call__()
class larry._datasets.AnnDataConfiguration(X_path, obs_path, var_path, X_clone_path, silent=False)

Bases: larry._utils.AutoParseBase

Construct AnnData from constituent components.

property X: scipy.sparse.csr_matrix

returns expression matrix

property var: pandas.DataFrame

returns var pd.DataFrame

property obs: pandas.DataFrame

returns obs pd.DataFrame

property X_clone: scipy.sparse.csr_matrix

returns cell x clonal barcode matrix

property adata

returns formatted AnnData object

class larry._datasets.AnnDataPathManager(dataset, download_dir)
property raw
property gene_filtered
property timepoint_recovery
property fate_prediction
property uniform
_path_constructor(obj_specifier)
__repr__()

Return repr(self).