Tutorial: Features & labels¶

In Tutorial: Artifacts, we learned about provenance and data access. Here, we walk through validating & annotating datasets with features & labels to improve:

Findability: Which collections measured expression of cell marker CD14? Which characterized cell line K562? Which collections have a test & train split? Etc.
Usability: Are there typos in feature names? Are there typos in sampled labels? Are types and units of features consistent? Etc.

Hint

This is a low-level tutorial aimed at a basic understanding of registering features and labels for annotation & validation.

If you’re just looking to readily validate and annotate a dataset with features and labels, see this guide: Validate, standardize & annotate.

import lamindb as ln
import pandas as pd

ln.settings.verbosity = "hint"

Re-cap¶

Let’s briefly re-cap what we learned in Introduction. We started with simple labeling:

# create a label
study0 = ln.ULabel(name="Study 0: initial plant gathering", description="My initial study").save()
# query an artifact from the previous tutorial
artifact = ln.Artifact.filter(key="iris_studies/study0_raw_images").one()
# label the artifact
artifact.labels.add(study0)
# look at artifact metadata
artifact.describe()

In general, it’s good practice to associate labels with features so that we can later feed them into learning algorithms with a defined dimension:

feature = ln.Feature(name="study_name", dtype="cat").save()
artifact.labels.add(study0, feature)
artifact.describe()

✅ linked feature 'study_name' to registry 'ULabel'

✅ linked new feature 'study_name' together with new feature set FeatureSet(uid='AVSJ4LiRpONRelwyWJl5', n=1, registry='Feature', hash='mWcNd6CmMPMB0RzbKYFS', created_by_id=1)

Artifact(updated_at=2024-05-19 23:23:24 UTC, uid='UI4dKwLK1iDoIAWMvJ4j', key='iris_studies/study0_raw_images', suffix='', size=656692, hash='wVYKPpEsmmrqSpAZIRXCFg', hash_type='md5-d', n_objects=51, visibility=1, key_is_virtual=False)

Provenance:
  📎 created_by: User(uid='00000000', handle='anonymous')
  📎 storage: uid='EzvEnPnH', root='s3://lamindb-dev-datasets', type='s3', region='us-east-1', instance_uid='pZ1VQkyD3haH')
  📎 transform: Transform(version='0', uid='NJvdsWWbJlZS6K79', name='Tutorial: Artifacts', key='tutorial', type='notebook')
  📎 run: Run(uid='VBT7VLzVRSYnVrx1ESJm', started_at=2024-05-19 23:23:21 UTC, is_consecutive=True)
Features:
  external: FeatureSet(uid='AVSJ4LiRpONRelwyWJl5', n=1, registry='Feature')
    🔗 study_name (1, cat[ULabel]): 'Study 0: initial plant gathering'
Labels:
  📎 ulabels (1, ULabel): 'Study 0: initial plant gathering'

Register metadata¶

Features and labels are the primary ways of registering domain-knowledge related metadata in LaminDB.

Features represent measurement dimensions (e.g. "species") and labels represent measured values (e.g. "iris setosa", "iris versicolor", "iris virginica").

In statistics, you’d say a feature is a categorical or numerical variable while a label is a simple category. Categorical variables draw their values from a set of categories.

Register labels¶

We study 3 species of the Iris plant: setosa, versicolor & virginica. Let’s create 3 labels with ULabel.

speciess = [ln.ULabel(name=name) for name in ["setosa", "versicolor", "virginica"]]
ln.save(speciess)

ULabel enables you to manage an in-house ontology to manage all kinds of generic labels.

Anticipating that we’ll have many different labels when working with more data, we’d like to express that all 3 labels are species labels:

is_species = ln.ULabel(name="is_species").save()
is_species.children.set(speciess)
is_species.view_parents(with_children=True)

studies = [ln.ULabel(name=name) for name in ["study0", "study1", "study2"]]
ln.save(studies)
is_study = ln.ULabel(name="is_study").save()
is_study.children.set(studies)
is_study.view_parents(with_children=True)

Register features¶

For every set of studied labels (measured values), we typically also want an identifier for the corresponding measurement dimension: the feature.

When we integrate datasets, feature names will label columns that store data.

Let’s create and save two Feature records to identify measurements of the iris species label and the study:

ln.Feature(name="iris_species_name", dtype="cat").save()

# create a lookup object so that we can access features with auto-complete
features = ln.Feature.lookup()

Validate & link labels¶

We already looked at the metadata for study0, before:

meta_artifact = ln.Artifact.filter(key="iris_studies/study0_raw_images/meta.csv").one()
meta = meta_artifact.load(index_col=0)  # load a dataframe
meta.head()

Show code cell output Hide code cell output

💡 you can auto-track these data as a run input by calling `ln.track()`

	0	1
0	iris-0797945218a97d6e5251b4758a2ba1b418cbd52ce...	setosa
1	iris-0f133861ea3fe1b68f9f1b59ebd9116ff963ee710...	versicolor
2	iris-9ffe51c2abd973d25a299647fa9ccaf6aa9c8eecf...	versicolor
3	iris-83f433381b755101b9fc9fbc9743e35fbb8a1a109...	setosa
4	iris-bdae8314e4385d8e2322abd8e63a82758a9063c77...	virginica

Validate metadata¶

Depending on the data generation process, such metadata might or might not match the labels we defined in our registries.

Let’s validate the labels by mapping the values stored in the artifact on the ULabel registry:

ln.ULabel.validate(meta["1"], field="name")

Everything passed and no fixes are needed!

If validation doesn’t pass, standardize() and inspect() will help standardize data.

Label artifacts¶

You can label an artifact by calling artifact.labels.add() and pass a single or multiple labels, and optionally, the corresponding feature.

Let’s do this based on the labels in meta.csv:

ln.Artifact.df()

Show code cell output Hide code cell output

	version	created_at	created_by_id	updated_at	uid	storage_id	key	suffix	accessor	description	size	hash	hash_type	n_objects	n_observations	transform_id	run_id	visibility	key_is_virtual
id
4	None	2024-05-19 23:23:24.836147+00:00	1	2024-05-19 23:23:24.836198+00:00	62uS4kSWwh553bVPhdkJ	2	iris_studies/study2_raw_images		None	None	665518	PX8Vt9T28y-uCEJO1tKm7A	md5-d	51.0	None	1	1	1	False
3	None	2024-05-19 23:23:24.646038+00:00	1	2024-05-19 23:23:24.646092+00:00	nE7BMpA4cluio47JdU66	2	iris_studies/study1_raw_images		None	None	640617	j61W__GgImA18CKrIf7FVg	md5-d	49.0	None	1	1	1	False
2	None	2024-05-19 23:23:24.280523+00:00	1	2024-05-19 23:23:24.280576+00:00	UI4dKwLK1iDoIAWMvJ4j	2	iris_studies/study0_raw_images		None	None	656692	wVYKPpEsmmrqSpAZIRXCFg	md5-d	51.0	None	1	1	1	False
1	None	2024-05-19 23:23:23.817202+00:00	1	2024-05-19 23:23:23.817253+00:00	5ruHcLPvlViem0z49X9f	2	iris_studies/study0_raw_images/meta.csv	.csv	None	None	4355	ZpAEpN0iFYH6vjZNigic7g	md5	NaN	None	1	1	1	False

study_artifacts = ln.Artifact.filter(key__startswith="iris_studies/", suffix="").all()
study_labels = ln.ULabel.filter(name="is_study").one().children.all()
for artifact, study in zip(study_artifacts, study_labels):
    artifact.labels.add(study, feature=features.study_name)
    df = pd.read_csv(artifact.path / "meta.csv", index_col=0)
    species_labels = ln.ULabel.from_values(df["1"].unique())
    artifact.labels.add(species_labels, feature=features.iris_species_name)

Query artifacts by labels¶

Using the new annotations, you can now query image artifacts by species & study labels:

ulabels = ln.ULabel.lookup()
artifact = ln.Artifact.filter(ulabels=ulabels.study0).first()

We also see them when calling describe():

artifact.describe()

Label collections¶

Labeling collections works in the same way as labeling artifacts:

collection = ln.Collection.filter(name__startswith="Iris collection", version="1").one()
collection.labels.add(ulabels.study0, feature=features.study_name)
all_species_labels = ln.ULabel.filter(parents__name="is_species").all()
collection.labels.add(all_species_labels, feature=features.iris_species_name)

collection.describe()

Run an ML model¶

Let’s now run a mock ML model that transforms the images into 4 high-level features.

def run_ml_model() -> pd.DataFrame:
    transform = ln.Transform(name="Petal & sepal regressor", type="pipeline")
    ln.track(transform=transform)
    input_data = ln.Collection.filter(name__startswith="Iris collection", version="1").one()
    input_paths = [
        path.download_to(path.name) for path in input_data.artifacts[0].path.glob("*")
    ]
    # apply ML model
    output_data = ln.core.datasets.df_iris_in_meter_study1()
    return output_data


df = run_ml_model()

The output is a dataframe:

df.head()

Show code cell output Hide code cell output

	sepal_length	sepal_width	petal_length	petal_width	iris_organism_name
0	0.051	0.035	0.014	0.002	setosa
1	0.049	0.030	0.014	0.002	setosa
2	0.047	0.032	0.013	0.002	setosa
3	0.046	0.031	0.015	0.002	setosa
4	0.050	0.036	0.014	0.002	setosa

And this is the pipeline that produced the dataframe:

ln.core.run_context.transform.view_parents()

Register the output data¶

Let’s first register the features of the transformed data:

new_features = ln.Feature.from_df(df)
ln.save(new_features)

We can now validate & register the dataframe in one line:

artifact = ln.Artifact.from_df(
    df,
    description="Iris study 1 - after measuring sepal & petal metrics",
)
artifact.save()

artifact.features.add_feature_set(ln.FeatureSet(new_features), slot="columns")

Feature sets¶

Get an overview of linked features:

artifact.features

You’ll see that they’re always grouped in sets that correspond to records of FeatureSet.

A slot provides a string key to access feature sets. It’s typically the accessor within the registered data object, here pd.DataFrame.columns.

Let’s use it to access all linked features:

artifact.features["columns"].df()

Show code cell output Hide code cell output

	created_at	created_by_id	run_id	updated_at	uid	name	dtype	unit	description	synonyms
id
3	2024-05-19 23:23:32.402119+00:00	1	None	2024-05-19 23:23:32.402192+00:00	a8s7SXkNIsRI	sepal_length	float	None	None	None
4	2024-05-19 23:23:32.402285+00:00	1	None	2024-05-19 23:23:32.402327+00:00	Xe5Tq6Dbvrb1	sepal_width	float	None	None	None
5	2024-05-19 23:23:32.402411+00:00	1	None	2024-05-19 23:23:32.402450+00:00	CByVI6TFPVMy	petal_length	float	None	None	None
6	2024-05-19 23:23:32.402534+00:00	1	None	2024-05-19 23:23:32.402574+00:00	drQTElEw1rf5	petal_width	float	None	None	None
7	2024-05-19 23:23:32.402656+00:00	1	None	2024-05-19 23:23:32.402695+00:00	lBmzw6jmRUgN	iris_organism_name	cat	None	None	None

There is one categorical feature, let’s add the species labels:

species_labels = ln.ULabel.filter(parents__name="is_species").all()
artifact.labels.add(species_labels, feature=features.iris_species_name)

Let’s now add study labels:

artifact.labels.add(ulabels.study0, feature=features.study_name)

In addition to the columns feature set, we now have an external feature set:

artifact.features

This is the context for our artifact:

artifact.describe()
artifact.view_lineage()

See the database content:

ln.view(registries=["Feature", "FeatureSet", "ULabel"])

Show code cell output Hide code cell output

Feature

	created_at	created_by_id	run_id	updated_at	uid	name	dtype	unit	description	synonyms
id
7	2024-05-19 23:23:32.402656+00:00	1	None	2024-05-19 23:23:32.402695+00:00	lBmzw6jmRUgN	iris_organism_name	cat	None	None	None
6	2024-05-19 23:23:32.402534+00:00	1	None	2024-05-19 23:23:32.402574+00:00	drQTElEw1rf5	petal_width	float	None	None	None
5	2024-05-19 23:23:32.402411+00:00	1	None	2024-05-19 23:23:32.402450+00:00	CByVI6TFPVMy	petal_length	float	None	None	None
4	2024-05-19 23:23:32.402285+00:00	1	None	2024-05-19 23:23:32.402327+00:00	Xe5Tq6Dbvrb1	sepal_width	float	None	None	None
3	2024-05-19 23:23:32.402119+00:00	1	None	2024-05-19 23:23:32.402192+00:00	a8s7SXkNIsRI	sepal_length	float	None	None	None
2	2024-05-19 23:23:27.967153+00:00	1	None	2024-05-19 23:23:28.815905+00:00	0AUm4iYF8CpE	iris_species_name	cat[ULabel]	None	None	None
1	2024-05-19 23:23:27.748427+00:00	1	None	2024-05-19 23:23:27.757144+00:00	wCwcODTISR4C	study_name	cat[ULabel]	None	None	None

FeatureSet

	created_at	created_by_id	run_id	uid	name	n	dtype	registry	hash
id
2	2024-05-19 23:23:28.821520+00:00	1	None	QZRqz9cFg6SmJxzBU9wO	None	2	None	Feature	58lX_dBcok06ZlN12ryt
4	2024-05-19 23:23:32.433112+00:00	1	None	89vRDkS1DsNXvKWh80Hv	None	5	None	Feature	bHnAxI79Pu6350MpTFQN
5	2024-05-19 23:23:32.489564+00:00	1	None	oarAFpPXtq3yKlqmjTkS	None	1	None	Feature	pr7SYbKy1OLWX2q1FMAe

ULabel

	created_at	created_by_id	run_id	updated_at	uid	name	description	reference	reference_type
id
9	2024-05-19 23:23:27.906120+00:00	1	None	2024-05-19 23:23:27.906188+00:00	ELICS2S0	is_study	None	None	None
8	2024-05-19 23:23:27.902991+00:00	1	None	2024-05-19 23:23:27.903032+00:00	7nSTk5j6	study2	None	None	None
7	2024-05-19 23:23:27.902874+00:00	1	None	2024-05-19 23:23:27.902916+00:00	MSMDe9ja	study1	None	None	None
6	2024-05-19 23:23:27.902723+00:00	1	None	2024-05-19 23:23:27.902794+00:00	QqzvhQoW	study0	None	None	None
5	2024-05-19 23:23:27.816868+00:00	1	None	2024-05-19 23:23:27.816938+00:00	PIHQsKzj	is_species	None	None	None
4	2024-05-19 23:23:27.806211+00:00	1	None	2024-05-19 23:23:27.806254+00:00	8fXKs7fs	virginica	None	None	None
3	2024-05-19 23:23:27.806091+00:00	1	None	2024-05-19 23:23:27.806134+00:00	JBfBT1AI	versicolor	None	None	None

Manage follow-up data¶

Assume that a couple of weeks later, we receive a new dataset in a follow-up study 2.

Let’s track a new analysis:

ln.settings.transform.stem_uid = "dMtrt8YMSdl6"
ln.settings.transform.version = "1"
ln.track()

Register a joint collection¶

Assume we already ran all preprocessing including the ML model.

We get a DataFrame and store it as an artifact:

df = ln.core.datasets.df_iris_in_meter_study2()
ln.Artifact.from_df(df, description="Iris study 2 - transformed").save()

Let’s load it:

artifact2 = ln.Artifact.filter(description="Iris study 2 - transformed").one()

We can now store the joint collection:

collection = ln.Collection(
    [artifact, artifact2], name="Iris flower study 1 & 2 - transformed"
)
collection.save()

Auto-concatenate datasets¶

Because both datasets measured the same validated feature set, we can auto-concatenate the collection:

collection.load().tail()

Show code cell output Hide code cell output

	sepal_length	sepal_width	petal_length	petal_width	iris_organism_name
145	0.067	0.030	0.052	0.023	virginica
146	0.063	0.025	0.050	0.019	virginica
147	0.065	0.030	0.052	0.020	virginica
148	0.062	0.034	0.054	0.023	virginica
149	0.059	0.030	0.051	0.018	virginica

We can also access & query the underlying two artifact objects:

collection.artifacts.df()

Show code cell output Hide code cell output

	version	created_at	created_by_id	updated_at	uid	storage_id	key	suffix	accessor	description	size	hash	hash_type	n_objects	n_observations	transform_id	run_id	visibility	key_is_virtual
id
5	None	2024-05-19 23:23:32.418732+00:00	1	2024-05-19 23:23:32.418781+00:00	SlGWQhSRAsBBpNFCi64d	1	None	.parquet	DataFrame	Iris study 1 - after measuring sepal & petal m...	5347	zMBDnOFHeA8CwpaI_7KF9g	md5	None	None	2	2	1	True
6	None	2024-05-19 23:23:33.898346+00:00	1	2024-05-19 23:23:33.898402+00:00	G76wEGtq4G8Hku8MjZrz	1	None	.parquet	DataFrame	Iris study 2 - transformed	5397	1OWu4rEeeob4-ZdGnLhTLw	md5	None	None	3	3	1	True

Or look at their data lineage:

collection.view_lineage()

Or look at the database:

ln.view()

Show code cell output Hide code cell output

Artifact

	version	created_at	created_by_id	updated_at	uid	storage_id	key	suffix	accessor	description	size	hash	hash_type	n_objects	n_observations	transform_id	run_id	visibility	key_is_virtual
id
6	None	2024-05-19 23:23:33.898346+00:00	1	2024-05-19 23:23:33.898402+00:00	G76wEGtq4G8Hku8MjZrz	1	None	.parquet	DataFrame	Iris study 2 - transformed	5397	1OWu4rEeeob4-ZdGnLhTLw	md5	NaN	None	3	3	1	True
5	None	2024-05-19 23:23:32.418732+00:00	1	2024-05-19 23:23:32.418781+00:00	SlGWQhSRAsBBpNFCi64d	1	None	.parquet	DataFrame	Iris study 1 - after measuring sepal & petal m...	5347	zMBDnOFHeA8CwpaI_7KF9g	md5	NaN	None	2	2	1	True
4	None	2024-05-19 23:23:24.836147+00:00	1	2024-05-19 23:23:24.836198+00:00	62uS4kSWwh553bVPhdkJ	2	iris_studies/study2_raw_images		None	None	665518	PX8Vt9T28y-uCEJO1tKm7A	md5-d	51.0	None	1	1	1	False
3	None	2024-05-19 23:23:24.646038+00:00	1	2024-05-19 23:23:24.646092+00:00	nE7BMpA4cluio47JdU66	2	iris_studies/study1_raw_images		None	None	640617	j61W__GgImA18CKrIf7FVg	md5-d	49.0	None	1	1	1	False
2	None	2024-05-19 23:23:24.280523+00:00	1	2024-05-19 23:23:24.280576+00:00	UI4dKwLK1iDoIAWMvJ4j	2	iris_studies/study0_raw_images		None	None	656692	wVYKPpEsmmrqSpAZIRXCFg	md5-d	51.0	None	1	1	1	False
1	None	2024-05-19 23:23:23.817202+00:00	1	2024-05-19 23:23:23.817253+00:00	5ruHcLPvlViem0z49X9f	2	iris_studies/study0_raw_images/meta.csv	.csv	None	None	4355	ZpAEpN0iFYH6vjZNigic7g	md5	NaN	None	1	1	1	False

Collection

	version	created_at	created_by_id	updated_at	uid	name	description	hash	reference	reference_type	transform_id	run_id	artifact_id	visibility
id
4	None	2024-05-19 23:23:33.936147+00:00	1	2024-05-19 23:23:33.936192+00:00	zP8h0uQuKg8VAe9kxWvX	Iris flower study 1 & 2 - transformed	None	I5HABiYOqx1fjRZ2be5E	None	None	3	3	None	1
3	3	2024-05-19 23:23:24.843549+00:00	1	2024-05-19 23:23:24.843593+00:00	w7p8GWvoDRJViSLvgHz0	Iris collection	Now includes study2_raw_images	T-U8z2Zi5rFYdAD9pzmS	None	None	1	1	None	1
2	2	2024-05-19 23:23:24.653688+00:00	1	2024-05-19 23:23:24.653733+00:00	w7p8GWvoDRJViSLv1rKm	Iris collection	Now includes study1_raw_images	5cCK6ZLOPB0cV3tyeZup	None	None	1	1	None	1
1	1	2024-05-19 23:23:24.467012+00:00	1	2024-05-19 23:23:24.467066+00:00	w7p8GWvoDRJViSLvunTx	Iris collection	Iris study 0	WwFLpNFmK8GMC2dSGj1W	None	None	1	1	None	1

Feature

	created_at	created_by_id	run_id	updated_at	uid	name	dtype	unit	description	synonyms
id
7	2024-05-19 23:23:32.402656+00:00	1	None	2024-05-19 23:23:32.402695+00:00	lBmzw6jmRUgN	iris_organism_name	cat	None	None	None
6	2024-05-19 23:23:32.402534+00:00	1	None	2024-05-19 23:23:32.402574+00:00	drQTElEw1rf5	petal_width	float	None	None	None
5	2024-05-19 23:23:32.402411+00:00	1	None	2024-05-19 23:23:32.402450+00:00	CByVI6TFPVMy	petal_length	float	None	None	None
4	2024-05-19 23:23:32.402285+00:00	1	None	2024-05-19 23:23:32.402327+00:00	Xe5Tq6Dbvrb1	sepal_width	float	None	None	None
3	2024-05-19 23:23:32.402119+00:00	1	None	2024-05-19 23:23:32.402192+00:00	a8s7SXkNIsRI	sepal_length	float	None	None	None
2	2024-05-19 23:23:27.967153+00:00	1	None	2024-05-19 23:23:28.815905+00:00	0AUm4iYF8CpE	iris_species_name	cat[ULabel]	None	None	None
1	2024-05-19 23:23:27.748427+00:00	1	None	2024-05-19 23:23:27.757144+00:00	wCwcODTISR4C	study_name	cat[ULabel]	None	None	None

FeatureSet

	created_at	created_by_id	run_id	uid	name	n	dtype	registry	hash
id
2	2024-05-19 23:23:28.821520+00:00	1	None	QZRqz9cFg6SmJxzBU9wO	None	2	None	Feature	58lX_dBcok06ZlN12ryt
4	2024-05-19 23:23:32.433112+00:00	1	None	89vRDkS1DsNXvKWh80Hv	None	5	None	Feature	bHnAxI79Pu6350MpTFQN
5	2024-05-19 23:23:32.489564+00:00	1	None	oarAFpPXtq3yKlqmjTkS	None	1	None	Feature	pr7SYbKy1OLWX2q1FMAe

Run

	uid	transform_id	started_at	finished_at	created_by_id	report_id	environment_id	is_consecutive	reference	reference_type	created_at
id
1	VBT7VLzVRSYnVrx1ESJm	1	2024-05-19 23:23:21.424037+00:00	None	1	None	None	True	None	None	2024-05-19 23:23:21.424162+00:00
2	0LiZ43vXlo5liMUV0RSD	2	2024-05-19 23:23:29.190316+00:00	None	1	None	None	True	None	None	2024-05-19 23:23:29.190437+00:00
3	nadRBLpmOBWAh8BdZs61	3	2024-05-19 23:23:32.914125+00:00	None	1	None	None	True	None	None	2024-05-19 23:23:32.914248+00:00

Storage

	created_at	created_by_id	run_id	updated_at	uid	root	description	type	region	instance_uid
id
2	2024-05-19 23:23:23.780440+00:00	1	None	2024-05-19 23:23:23.780523+00:00	EzvEnPnH	s3://lamindb-dev-datasets	None	s3	us-east-1	pZ1VQkyD3haH
1	2024-05-19 23:23:19.714345+00:00	1	None	2024-05-19 23:23:19.714408+00:00	fgwMZj1JIHOy	/home/runner/work/lamindb/lamindb/docs/lamin-t...	None	local	None	5WuFt3cW4zRx

Transform

	version	uid	name	key	description	type	latest_report_id	source_code_id	reference	reference_type	created_at	updated_at	created_by_id
id
3	1	dMtrt8YMSdl65zKv	Tutorial: Features & labels	tutorial2	None	notebook	None	None	None	None	2024-05-19 23:23:32.907670+00:00	2024-05-19 23:23:32.907714+00:00	1
2	None	gzL8KJt6OelI	Petal & sepal regressor	None	None	pipeline	None	None	None	None	2024-05-19 23:23:29.186701+00:00	2024-05-19 23:23:29.186726+00:00	1
1	0	NJvdsWWbJlZS6K79	Tutorial: Artifacts	tutorial	None	notebook	None	None	None	None	2024-05-19 23:23:21.416956+00:00	2024-05-19 23:23:21.416999+00:00	1

ULabel

	created_at	created_by_id	run_id	updated_at	uid	name	description	reference	reference_type
id
9	2024-05-19 23:23:27.906120+00:00	1	None	2024-05-19 23:23:27.906188+00:00	ELICS2S0	is_study	None	None	None
8	2024-05-19 23:23:27.902991+00:00	1	None	2024-05-19 23:23:27.903032+00:00	7nSTk5j6	study2	None	None	None
7	2024-05-19 23:23:27.902874+00:00	1	None	2024-05-19 23:23:27.902916+00:00	MSMDe9ja	study1	None	None	None
6	2024-05-19 23:23:27.902723+00:00	1	None	2024-05-19 23:23:27.902794+00:00	QqzvhQoW	study0	None	None	None
5	2024-05-19 23:23:27.816868+00:00	1	None	2024-05-19 23:23:27.816938+00:00	PIHQsKzj	is_species	None	None	None
4	2024-05-19 23:23:27.806211+00:00	1	None	2024-05-19 23:23:27.806254+00:00	8fXKs7fs	virginica	None	None	None
3	2024-05-19 23:23:27.806091+00:00	1	None	2024-05-19 23:23:27.806134+00:00	JBfBT1AI	versicolor	None	None	None

User

	uid	handle	name	created_at	updated_at
id
1	00000000	anonymous	None	2024-05-19 23:23:19.709467+00:00	2024-05-19 23:23:19.709492+00:00

This is it! 😅

If you’re interested, please check out guides & use cases or make an issue on GitHub to discuss.

Appendix¶

Manage metadata¶

Avoid duplicates¶

Let’s create a label "project1":

ln.ULabel(name="project1").save()

We already created a project1 label before, let’s see what happens if we try to create it again:

label = ln.ULabel(name="project1")
label.save()

Instead of creating a new record, LaminDB loads and returns the existing record from the database.

If there is no exact match, LaminDB will warn you upon creating a record about potential duplicates.

Say, we spell “project 1” with a white space:

ln.ULabel(name="project 1")

To avoid inserting duplicates when creating new records, a search compares whether a similar record already exists.

You can switch it off for performance gains via upon_create_search_names.

Update & delete records¶

label = ln.ULabel.filter(name="project1").first()
label

label.name = "project1a"
label.save()
label

label.delete()

Manage storage¶

Change default storage¶

The default storage location is:

ln.settings.storage

You can change it by setting ln.settings.storage = "s3://my-bucket".

See all storage locations¶

ln.Storage.df()

Show code cell output Hide code cell output

	created_at	created_by_id	run_id	updated_at	uid	root	description	type	region	instance_uid
id
2	2024-05-19 23:23:23.780440+00:00	1	None	2024-05-19 23:23:23.780523+00:00	EzvEnPnH	s3://lamindb-dev-datasets	None	s3	us-east-1	pZ1VQkyD3haH
1	2024-05-19 23:23:19.714345+00:00	1	None	2024-05-19 23:23:19.714408+00:00	fgwMZj1JIHOy	/home/runner/work/lamindb/lamindb/docs/lamin-t...	None	local	None	5WuFt3cW4zRx