lamindb.FeatureSet

class lamindb.FeatureSet(features: Iterable[Registry], type: str | None = None, name: str | None = None)

Bases: Registry, TracksRun

Feature sets.

Stores references to sets of Feature and other registries that may be used to identify features (e.g., class:~bionty.Gene or class:~bionty.Protein).

Parameters:
  • featuresIterable[Registry] An iterable of Feature records to hash, e.g., [Feature(...), Feature(...)]. Is turned into a set upon instantiation. If you’d like to pass values, use from_values() or from_df().

  • typestr | None = None The simple type. Defaults to None for sets of Feature records, and otherwise defaults to "number" (e.g., for sets of Gene).

  • namestr | None = None A name.

Note

Feature sets are useful as you likely have many datasets that measure the same features. In LaminDB, they are all linked against the exact same feature set. If instead, you’d link each of the datasets against single features (say, genes), you’d face exploding link tables.

A feature set is identified by the hash of the feature uids in the set.

See also

from_values()

Create from values.

from_df()

Create from dataframe columns.

Examples

Create a featureset from df with types:

>>> df = pd.DataFrame({"feat1": [1, 2], "feat2": [3.1, 4.2], "feat3": ["cond1", "cond2"]})
>>> feature_set = ln.FeatureSet.from_df(df)

Create a featureset from features:

>>> features = ln.Feature.from_values(["feat1", "feat2"], type=float)
>>> feature_set = ln.FeatureSet(features)

Create a featureset from feature values:

>>> import bionty as bt
>>> feature_set = ln.FeatureSet.from_values(adata.var["ensemble_id"], Gene.ensembl_gene_id, orgaism="mouse")
>>> feature_set.save()

Link a feature set to an artifact:

>>> artifact.features.add_feature_set(feature_set, slot="var")

Link features to an artifact (will create a featureset under the hood):

>>> artifact.features.add(features)

Properties

members

A queryset for the individual records of the set..

Fields

created_at DateTimeField

Time of creation of record.

created_by ForeignKey

Creator of record, a User.

run ForeignKey

Last run that created or updated the record, a Run.

id AutoField

Internal id, valid only in one DB instance.

uid CharField

A universal id (hash of the set of feature values).

name CharField

A name (optional).

n IntegerField

Number of features in the set.

dtype CharField

Data type, e.g., “number”, “float”, “int”. Is None for Feature.

For Feature, types are expected to be heterogeneous and defined on a per-feature level.

registry CharField

The registry that stores the feature identifiers, e.g., 'core.Feature' or 'bionty.Gene'.

Depending on the registry, .members stores, e.g. Feature or Gene records.

hash CharField

The hash of the set.

Methods

classmethod from_df(df, field=FieldAttr(Feature.name), name=None, mute=False, organism=None, public_source=None)

Create feature set for validated features..

Return type:

FeatureSet | None

classmethod from_values(values, field=FieldAttr(Feature.name), type=None, name=None, mute=False, organism=None, public_source=None, raise_validation_error=True)

Create feature set for validated features.

Parameters:
  • values (List[str] | Series | array) – A list of values, like feature names or ids.

  • field (DeferredAttribute, default: FieldAttr(Feature.name)) – The field of a reference registry to map values.

  • type (str | None, default: None) – The simple type. Defaults to None if reference registry is Feature, defaults to "float" otherwise.

  • name (str | None, default: None) – A name.

  • organism (str | Registry | None, default: None) – An organism to resolve gene mapping.

  • public_source (Registry | None, default: None) – A public ontology to resolve feature identifier mapping.

  • raise_validation_error (bool, default: True) – Whether to raise a validation error if some values are not valid.

Raises:

ValidationError – If some values are not valid.

Return type:

FeatureSet

Examples

>>> features = ["feat1", "feat2"]
>>> feature_set = ln.FeatureSet.from_values(features)
>>> genes = ["ENS980983409", "ENS980983410"]
>>> feature_set = ln.FeatureSet.from_values(features, bt.Gene.ensembl_gene_id, float)

.

save(*args, **kwargs)

Save.

Return type:

None