Single Cell Data#

A glue Data class that wraps an AnnData object

The primary motivation is to support on-disk access to a dataset that (even sparse) may be too large to fit comfortably in memory.

AnnData objects include many things of different dimensions. This DataAnnData class only exposes the X matrix of data values with dimension num_obs x num_vars as a glue component. The other parts of the AnnData object are stored as regular glue data objects – their creation and linking with the DataAnnData is handled by the data loader.

All the obsm arrays and obs table are combined as one dataset. All the varm arrays and var table are combined as one dataset.

The obsp and varp arrays could be extremely large and require a dedicated data class. We do not deal with these yet.

TODO: The anndata/scanpy convention of updating the data structure when new calculations are performed only works with glue if we know how to add components to the appropriate glue datasets (which currently are only linked-by-key) in a loose way, and should probably be more tightly coupled.

anndata - Annotated Data https://anndata.readthedocs.io/en/latest/

Scanpy - Single-Cell Analysis in Python https://scanpy.readthedocs.io/en/stable/

DataAnnData Object#

DataAnnData([label, full_anndata_obj, ...])

A Data class to handle on-disk and sparse access to a large AnnData X matrix.

DataAnnData Translator#

DataAnnDataTranslator()

A translator between an AnnData object and a DataAnnData object.