lamindb.core.MappedCollection¶
- class lamindb.core.MappedCollection(path_list, layers_keys=None, obs_keys=None, obsm_keys=None, join='inner', encode_labels=True, unknown_label=None, cache_categories=True, parallel=False, dtype=None)¶
Bases:
objectMap-style collection for use in data loaders.
This class virtually concatenates
AnnDataarrays as a pytorch map-style dataset.If your
AnnDatacollection is in the cloud, move them into a local cache first for faster access.__getitem__of theMappedCollectionobject takes a single integer index and returns a dictionary with the observation data sample for this index from theAnnDataobjects inpath_list. The dictionary has keys forlayers_keys(.Xis in"X"),obs_keys,obsm_keys(underf"obsm_{key}") and also"_store_idx"for the index of theAnnDataobject containing this observation sample.Note
For a guide, see Train a machine learning model on a collection.
For more convenient use within
MappedCollection, seemapped().This currently only works for collections of
AnnDataobjects.The implementation was influenced by the SCimilarity data loader.
- Parameters:
path_list (
list[str|Path]) – A list of paths toAnnDataobjects stored in.h5ador.zarrformats.layers_keys (
str|list[str] |None, default:None) – Keys from the.layersslot.layers_keys=Noneor"X"in the list retrieves.X.obsm_keys (
str|list[str] |None, default:None) – Keys from the.obsmslots.obs_keys (
str|list[str] |None, default:None) – Keys from the.obsslots.join (
Literal['inner','outer'] |None, default:'inner') –"inner"or"outer"virtual joins. IfNoneis passed, does not join.encode_labels (
bool|list[str], default:True) – Encode labels into integers. Can be a list with elements fromobs_keys.unknown_label (
str|dict[str,str] |None, default:None) – Encode this label to -1. Can be a dictionary with keys fromobs_keysifencode_labels=Trueor fromencode_labelsif it is a list.cache_categories (
bool, default:True) – Enable caching categories ofobs_keysfor faster access.parallel (
bool, default:False) – Enable sampling with multiple processes.dtype (
str|None, default:None) – Convert numpy arrays from.X,.layersand.obsm
Attributes
- closed property¶
Check if connections to array streaming backend are closed.
Does not matter if
parallel=True.
- original_shapes property¶
Shapes of the underlying AnnData objects.
- shape property¶
Shape of the (virtually aligned) dataset.
Methods
- close()¶
Close connections to array streaming backend.
No effect if
parallel=True.
- get_label_weights(obs_keys)¶
Get all weights for the given label keys.
- get_merged_categories(label_key)¶
Get merged categories for
label_keyfrom all.obs.
- get_merged_labels(label_key)¶
Get merged labels for
label_keyfrom all.obs.
- static torch_worker_init_fn(worker_id)¶
worker_init_fnfortorch.utils.data.DataLoader.Improves performance for
num_workers > 1.