spapros.se.select_pca_genes

spapros.se.select_pca_genes(adata, n, variance_scaled=False, absolute=True, n_pcs=20, penalty_keys=[], corr_penalty=None, inplace=True, progress=None, level=1, verbosity=2)

Select n features based on pca loadings.

Parameters:
  • adata (AnnData) – Data with log normalised counts in adata.X.

  • n (int) – Number of selected features.

  • variance_scaled (bool) – If True, loadings are defined as eigenvector_component * sqrt(eigenvalue). If False loadings are defined as eigenvector_component.

  • absolute (bool) – Take absolute value of loadings.

  • n_pcs (int) – Number of PCs used to calculate loadings sums.

  • penalty_keys (list) – List of keys for columns in adata.var that are multiplied with the scores.

  • corr_penalty (Optional[Callable]) – Function that maps values from [0,1] to [0,1]. It describes an iterative penalty function that is applied on pca selected genes. The highest correlation with already selected genes to the next selected genes are penalized according the given function. (max correlation is recomputed after each selected gene).

  • inplace (bool) – Save results in adata.var or return dataframe.

  • progress (Optional[Progress]) – rich.Progress object if progress bars should be shown.

  • level (int) – Progress bar level.

  • verbosity (int) – Verbosity level.

Returns:

pd.DataFrame (like adata.var) with columns:

  • ’selection’: bool indicator of selected genes

  • ’selection_score’: pca loadings based score of each gene

  • ’selection_ranking’: ranking according selection scores

  • if inplace:

    Save results in adata.var[[‘selection’,’selection_score’,’selection_ranking’]].

Return type:

  • if not inplace