lamindb.FeatureSet¶
- class lamindb.FeatureSet(features: Iterable[Registry], type: str | None = None, name: str | None = None)¶
-
Feature sets.
Stores references to sets of
Feature
and other registries that may be used to identify features (e.g., class:~bionty.Gene
or class:~bionty.Protein
).- Parameters:
features –
Iterable[Registry]
An iterable ofFeature
records to hash, e.g.,[Feature(...), Feature(...)]
. Is turned into a set upon instantiation. If you’d like to pass values, usefrom_values()
orfrom_df()
.type –
str | None = None
The simple type. Defaults toNone
for sets ofFeature
records, and otherwise defaults to"number"
(e.g., for sets ofGene
).name –
str | None = None
A name.
Note
Feature sets are useful as you likely have many datasets that measure the same features. In LaminDB, they are all linked against the exact same feature set. If instead, you’d link each of the datasets against single features (say, genes), you’d face exploding link tables.
A feature set is identified by the hash of the feature uids in the set.
See also
from_values()
Create from values.
from_df()
Create from dataframe columns.
Examples
Create a featureset from df with types:
>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]}) >>> feature_set = ln.FeatureSet.from_df(df)
Create a featureset from features:
>>> features = ln.Feature.from_values(["feat1", "feat2"], type=float) >>> feature_set = ln.FeatureSet(features)
Create a featureset from feature values:
>>> import bionty as bt >>> feature_set = ln.FeatureSet.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id, orgaism="mouse") >>> feature_set.save()
Link a feature set to an artifact:
>>> artifact.features.add_feature_set(feature_set, slot="var")
Link features to an artifact (will create a featureset under the hood):
>>> artifact.features.add(features)
Properties
- members¶
A queryset for the individual records of the set..
Fields
- created_at DateTimeField
Time of creation of record.
- created_by ForeignKey
Creator of record, a
User
.
- run ForeignKey
Last run that created or updated the record, a
Run
.
- id AutoField
Internal id, valid only in one DB instance.
- uid CharField
A universal id (hash of the set of feature values).
- name CharField
A name (optional).
- n IntegerField
Number of features in the set.
- dtype CharField
Data type, e.g., “number”, “float”, “int”. Is
None
forFeature
.For
Feature
, types are expected to be heterogeneous and defined on a per-feature level.
- registry CharField
The registry that stores the feature identifiers, e.g.,
'core.Feature'
or'bionty.Gene'
.Depending on the registry,
.members
stores, e.g.Feature
orGene
records.
- hash CharField
The hash of the set.
Methods
- classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, public_source=None)¶
Create feature set for validated features..
- Return type:
FeatureSet
|None
- classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, public_source=None, raise_validation_error=True)¶
Create feature set for validated features.
- Parameters:
values (
List
[str
] |Series
|array
) – A list of values, like feature names or ids.field (
DeferredAttribute
, default:FieldAttr(Feature.name)
) – The field of a reference registry to map values.type (
str
|None
, default:None
) – The simple type. Defaults toNone
if reference registry isFeature
, defaults to"float"
otherwise.name (
str
|None
, default:None
) – A name.organism (
str
|Registry
|None
, default:None
) – An organism to resolve gene mapping.public_source (
Registry
|None
, default:None
) – A public ontology to resolve feature identifier mapping.raise_validation_error (
bool
, default:True
) – Whether to raise a validation error if some values are not valid.
- Raises:
ValidationError – If some values are not valid.
- Return type:
Examples
>>> features = ["feat1", "feat2"] >>> feature_set = ln.FeatureSet.from_values(features)
>>> genes = ["ENS980983409", "ENS980983410"] >>> feature_set = ln.FeatureSet.from_values(features, bt.Gene.ensembl_gene_id, float)
.
- save(*args, **kwargs)¶
Save.
- Return type:
None