spapros.se.select_reference_probesets

spapros.se.select_reference_probesets(adata, n, genes_key='highly_variable', obs_key='celltype', methods=['PCA', 'DE', 'HVG', 'random'], seeds=[0], verbosity=2, save_dir=None)

Select reference probesets with basic selection methods.

Parameters:
  • adata (AnnData) – Data with log normalised counts in adata.X.

  • n (int) – Number of selected genes.

  • genes_key (Optional[str]) – adata.var key for subset of preselected genes to run the selections on (typically ‘highly_variable_genes’). Set to None to not subset genes.

  • obs_key (str) – Only required for method ‘DE’. Column name of adata.obs for which marker scores are calculated.

  • methods (Union[List[str], Dict[str, Dict]]) –

    Methods used for selections. Supported methods and default are [‘PCA’, ‘DE’, ‘HVG’, ‘random’]. To specify hyperparameters of the methods provide a dictionary, e.g.:

    {
        'DE':{},
        'PCA':{'n_pcs':30},
        'HVG':{},
        'random':{},
    }
    

  • seeds (List[int]) – List of random seeds. For each seed, one random gene set is selected if ‘random’ in methods.

  • verbosity (int) – Verbosity level.

  • save_dir (Optional[str]) – Directory path where all results are saved.

Returns:

Dictionary with one entry for each method. The key is the selection method name and the value is a DataFrame with the same index as adata.var and at least one boolean column called ‘selection’ representing the selected probeset. For some methods, additional information is provided in other columns.

Return type:

Dict[str, DataFrame]