Query & search registries¶
Find & access data using registries.
Setup¶
!lamin init --storage ./mydata
Show code cell output
💡 connected lamindb: testuser1/mydata
import lamindb as ln
ln.settings.verbosity = "info"
💡 connected lamindb: testuser1/mydata
We’ll need some toy data:
ln.Artifact(ln.core.datasets.file_jpg_paradisi05(), description="My image").save()
ln.Artifact.from_df(ln.core.datasets.df_iris(), description="The iris collection").save()
ln.Artifact(ln.core.datasets.file_fastq(), description="My fastq").save()
Show code cell output
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact '08xrlE4OG68OqFT5uNK8' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/08xrlE4OG68OqFT5uNK8.jpg'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'VGLlUyzsmgMDTOywZNxC' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/VGLlUyzsmgMDTOywZNxC.parquet'
❗ no run & transform get linked, consider calling ln.track()
✅ storing artifact 'QVDzUN8FLQs5plEucqQ1' at '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/QVDzUN8FLQs5plEucqQ1.fastq.gz'
Artifact(updated_at=2024-05-20 08:58:08 UTC, uid='QVDzUN8FLQs5plEucqQ1', suffix='.fastq.gz', description='My fastq', size=20, hash='hi7ZmAzz8sfMd3vIQr-57Q', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1)
Look up metadata¶
For entities where we don’t store more than 100k records, a look up object can be a convenient way of selecting a record.
Consider the User
registry:
users = ln.User.lookup(field="handle")
With auto-complete, we find a user:
user = users.testuser1
user
User(uid='DzTjkKse', handle='testuser1', name='Test User1', updated_at=2024-05-20 08:58:06 UTC)
Note
You can also auto-complete in a dictionary:
users_dict = ln.User.lookup().dict()
Filter by metadata¶
Filter for all artifacts created by a user:
ln.Artifact.filter(created_by=user).df()
version | created_at | created_by_id | updated_at | uid | storage_id | key | suffix | accessor | description | size | hash | hash_type | n_objects | n_observations | transform_id | run_id | visibility | key_is_virtual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
1 | None | 2024-05-20 08:58:08.339095+00:00 | 1 | 2024-05-20 08:58:08.339167+00:00 | 08xrlE4OG68OqFT5uNK8 | 1 | None | .jpg | None | My image | 29358 | r4tnqmKI_SjrkdLzpuWp4g | md5 | None | None | None | None | 1 | True |
2 | None | 2024-05-20 08:58:08.458688+00:00 | 1 | 2024-05-20 08:58:08.458741+00:00 | VGLlUyzsmgMDTOywZNxC | 1 | None | .parquet | DataFrame | The iris collection | 5629 | ah24lV9Ncc8nPL0MumEsdw | md5 | None | None | None | None | 1 | True |
3 | None | 2024-05-20 08:58:08.466297+00:00 | 1 | 2024-05-20 08:58:08.466341+00:00 | QVDzUN8FLQs5plEucqQ1 | 1 | None | .fastq.gz | None | My fastq | 20 | hi7ZmAzz8sfMd3vIQr-57Q | md5 | None | None | None | None | 1 | True |
To access the results encoded in a filter statement, execute its return value with one of:
.df()
: A pandasDataFrame
with each record stored as a row..all()
: An indexable djangoQuerySet
..list()
: A list of records..one()
: Exactly one record. Will raise an error if there is none..one_or_none()
: Either one record orNone
if there is no query result.
Note
The ORMs in LaminDB are Django Models and any Django query works. LaminDB extends Django’s API for data scientists.
Under the hood, any .filter()
call translates into a SQL select statement.
.one()
and .one_or_none()
are two parts of LaminDB’s API that are borrowed from SQLAlchemy.
Search for metadata¶
ln.Artifact.search("iris").df()
version | created_at | created_by_id | updated_at | uid | storage_id | key | suffix | accessor | description | size | hash | hash_type | n_objects | n_observations | transform_id | run_id | visibility | key_is_virtual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
2 | None | 2024-05-20 08:58:08.458688+00:00 | 1 | 2024-05-20 08:58:08.458741+00:00 | VGLlUyzsmgMDTOywZNxC | 1 | None | .parquet | DataFrame | The iris collection | 5629 | ah24lV9Ncc8nPL0MumEsdw | md5 | None | None | None | None | 1 | True |
Let us create 500 notebook objects with fake titles and save them:
ln.save(
[
ln.Transform(name=title, type="notebook")
for title in ln.core.datasets.fake_bio_notebook_titles(n=500)
]
)
We can now search for any combination of terms:
ln.Transform.search("intestine").df().head()
version | uid | name | key | description | type | latest_report_id | source_code_id | reference | reference_type | created_at | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
9 | None | DcTc5KhdouRC | Intestine IgG3 IgG3 Connective-tissue macrophage. | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.563151+00:00 | 2024-05-20 08:58:09.563164+00:00 | 1 |
22 | None | 1w9PkGLPgz1y | Efficiency intestine intestinal intestine visu... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.565117+00:00 | 2024-05-20 08:58:09.565130+00:00 | 1 |
43 | None | Wsny6GWDfKdu | Thymus intestine result IgY IgA IgM IgG2. | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.568252+00:00 | 2024-05-20 08:58:09.568265+00:00 | 1 |
45 | None | Cf9A90HyplM0 | Efficiency visualize IgG2 rank Mesangial cell ... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.568549+00:00 | 2024-05-20 08:58:09.568562+00:00 | 1 |
64 | None | fTAvwHVev9Yu | Rank classify IgM study intestine IgM. | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.571423+00:00 | 2024-05-20 08:58:09.571437+00:00 | 1 |
Leverage relations¶
Django has a double-under-score syntax to filter based on related tables.
This syntax enables you to traverse several layers of relations:
ln.Artifact.filter(run__created_by__handle__startswith="testuse").df()
version | created_at | updated_at | uid | key | suffix | accessor | description | size | hash | hash_type | n_objects | n_observations | visibility | key_is_virtual | created_by_id | storage_id | transform_id | run_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id |
The filter selects all artifacts based on the users who ran the generating notebook.
(Under the hood, in the SQL database, it’s joining the artifact table with the run and the user table.)
Beyond __startswith
, Django supports about two dozen field comparators field__comparator=value
.
Here are some of them.
and¶
ln.Artifact.filter(suffix=".jpg", created_by=user).df()
version | created_at | created_by_id | updated_at | uid | storage_id | key | suffix | accessor | description | size | hash | hash_type | n_objects | n_observations | transform_id | run_id | visibility | key_is_virtual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
1 | None | 2024-05-20 08:58:08.339095+00:00 | 1 | 2024-05-20 08:58:08.339167+00:00 | 08xrlE4OG68OqFT5uNK8 | 1 | None | .jpg | None | My image | 29358 | r4tnqmKI_SjrkdLzpuWp4g | md5 | None | None | None | None | 1 | True |
less than/ greater than¶
Or subset to artifacts greater than 10kB. Here, we can’t use keyword arguments, but need an explicit where statement.
ln.Artifact.filter(created_by=user, size__lt=1e4).df()
version | created_at | created_by_id | updated_at | uid | storage_id | key | suffix | accessor | description | size | hash | hash_type | n_objects | n_observations | transform_id | run_id | visibility | key_is_virtual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
2 | None | 2024-05-20 08:58:08.458688+00:00 | 1 | 2024-05-20 08:58:08.458741+00:00 | VGLlUyzsmgMDTOywZNxC | 1 | None | .parquet | DataFrame | The iris collection | 5629 | ah24lV9Ncc8nPL0MumEsdw | md5 | None | None | None | None | 1 | True |
3 | None | 2024-05-20 08:58:08.466297+00:00 | 1 | 2024-05-20 08:58:08.466341+00:00 | QVDzUN8FLQs5plEucqQ1 | 1 | None | .fastq.gz | None | My fastq | 20 | hi7ZmAzz8sfMd3vIQr-57Q | md5 | None | None | None | None | 1 | True |
or¶
from django.db.models import Q
ln.Artifact.filter().filter(Q(suffix=".jpg") | Q(suffix=".fastq.gz")).df()
version | created_at | created_by_id | updated_at | uid | storage_id | key | suffix | accessor | description | size | hash | hash_type | n_objects | n_observations | transform_id | run_id | visibility | key_is_virtual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
1 | None | 2024-05-20 08:58:08.339095+00:00 | 1 | 2024-05-20 08:58:08.339167+00:00 | 08xrlE4OG68OqFT5uNK8 | 1 | None | .jpg | None | My image | 29358 | r4tnqmKI_SjrkdLzpuWp4g | md5 | None | None | None | None | 1 | True |
3 | None | 2024-05-20 08:58:08.466297+00:00 | 1 | 2024-05-20 08:58:08.466341+00:00 | QVDzUN8FLQs5plEucqQ1 | 1 | None | .fastq.gz | None | My fastq | 20 | hi7ZmAzz8sfMd3vIQr-57Q | md5 | None | None | None | None | 1 | True |
in¶
ln.Artifact.filter(suffix__in=[".jpg", ".fastq.gz"]).df()
version | created_at | created_by_id | updated_at | uid | storage_id | key | suffix | accessor | description | size | hash | hash_type | n_objects | n_observations | transform_id | run_id | visibility | key_is_virtual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
1 | None | 2024-05-20 08:58:08.339095+00:00 | 1 | 2024-05-20 08:58:08.339167+00:00 | 08xrlE4OG68OqFT5uNK8 | 1 | None | .jpg | None | My image | 29358 | r4tnqmKI_SjrkdLzpuWp4g | md5 | None | None | None | None | 1 | True |
3 | None | 2024-05-20 08:58:08.466297+00:00 | 1 | 2024-05-20 08:58:08.466341+00:00 | QVDzUN8FLQs5plEucqQ1 | 1 | None | .fastq.gz | None | My fastq | 20 | hi7ZmAzz8sfMd3vIQr-57Q | md5 | None | None | None | None | 1 | True |
order by¶
ln.Artifact.filter().order_by("-updated_at").df()
version | created_at | created_by_id | updated_at | uid | storage_id | key | suffix | accessor | description | size | hash | hash_type | n_objects | n_observations | transform_id | run_id | visibility | key_is_virtual | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||||||||
3 | None | 2024-05-20 08:58:08.466297+00:00 | 1 | 2024-05-20 08:58:08.466341+00:00 | QVDzUN8FLQs5plEucqQ1 | 1 | None | .fastq.gz | None | My fastq | 20 | hi7ZmAzz8sfMd3vIQr-57Q | md5 | None | None | None | None | 1 | True |
2 | None | 2024-05-20 08:58:08.458688+00:00 | 1 | 2024-05-20 08:58:08.458741+00:00 | VGLlUyzsmgMDTOywZNxC | 1 | None | .parquet | DataFrame | The iris collection | 5629 | ah24lV9Ncc8nPL0MumEsdw | md5 | None | None | None | None | 1 | True |
1 | None | 2024-05-20 08:58:08.339095+00:00 | 1 | 2024-05-20 08:58:08.339167+00:00 | 08xrlE4OG68OqFT5uNK8 | 1 | None | .jpg | None | My image | 29358 | r4tnqmKI_SjrkdLzpuWp4g | md5 | None | None | None | None | 1 | True |
contains¶
ln.Transform.filter(name__contains="search").df().head(10)
version | uid | name | key | description | type | latest_report_id | source_code_id | reference | reference_type | created_at | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
13 | None | NmRGtti082RY | Igy Sebaceous gland IgG4 IgG2 research IgG Gol... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.563754+00:00 | 2024-05-20 08:58:09.563768+00:00 | 1 |
16 | None | xhPdwN25JfS0 | Iga classify IgG1 research. | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.564224+00:00 | 2024-05-20 08:58:09.564238+00:00 | 1 |
27 | None | k3ny74UQqJJy | Iga Sebaceous gland IgG1 result research study. | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.565860+00:00 | 2024-05-20 08:58:09.565873+00:00 | 1 |
28 | None | gnMWyP2oFGGu | Research IgA efficiency IgG2 Subcutaneous tiss... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.566009+00:00 | 2024-05-20 08:58:09.566022+00:00 | 1 |
33 | None | jPmdm2LE7TEE | Taste Bud Supporting Cells research Subcutaneo... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.566749+00:00 | 2024-05-20 08:58:09.566763+00:00 | 1 |
34 | None | VBwlyjPNUb4O | Research Sebaceous gland research Muscular sys... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.566898+00:00 | 2024-05-20 08:58:09.566911+00:00 | 1 |
36 | None | KU4E6zNmuTDV | Research Cartwheel cells IgG3 Vagina intestina... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.567194+00:00 | 2024-05-20 08:58:09.567208+00:00 | 1 |
40 | None | U34NvKPaX3lj | Parotid Glands Muscular system IgG3 IgE Vagina... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.567789+00:00 | 2024-05-20 08:58:09.567802+00:00 | 1 |
48 | None | MpwSXHp392r3 | Igm Mesangial cell Connective-tissue macrophag... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.569026+00:00 | 2024-05-20 08:58:09.569040+00:00 | 1 |
75 | None | 38NEYTmxYUI4 | Ameloblast Muscular system IgE Cartwheel cells... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.573108+00:00 | 2024-05-20 08:58:09.573121+00:00 | 1 |
And case-insensitive:
ln.Transform.filter(name__icontains="Search").df().head(10)
version | uid | name | key | description | type | latest_report_id | source_code_id | reference | reference_type | created_at | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
13 | None | NmRGtti082RY | Igy Sebaceous gland IgG4 IgG2 research IgG Gol... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.563754+00:00 | 2024-05-20 08:58:09.563768+00:00 | 1 |
16 | None | xhPdwN25JfS0 | Iga classify IgG1 research. | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.564224+00:00 | 2024-05-20 08:58:09.564238+00:00 | 1 |
27 | None | k3ny74UQqJJy | Iga Sebaceous gland IgG1 result research study. | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.565860+00:00 | 2024-05-20 08:58:09.565873+00:00 | 1 |
28 | None | gnMWyP2oFGGu | Research IgA efficiency IgG2 Subcutaneous tiss... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.566009+00:00 | 2024-05-20 08:58:09.566022+00:00 | 1 |
33 | None | jPmdm2LE7TEE | Taste Bud Supporting Cells research Subcutaneo... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.566749+00:00 | 2024-05-20 08:58:09.566763+00:00 | 1 |
34 | None | VBwlyjPNUb4O | Research Sebaceous gland research Muscular sys... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.566898+00:00 | 2024-05-20 08:58:09.566911+00:00 | 1 |
36 | None | KU4E6zNmuTDV | Research Cartwheel cells IgG3 Vagina intestina... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.567194+00:00 | 2024-05-20 08:58:09.567208+00:00 | 1 |
40 | None | U34NvKPaX3lj | Parotid Glands Muscular system IgG3 IgE Vagina... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.567789+00:00 | 2024-05-20 08:58:09.567802+00:00 | 1 |
48 | None | MpwSXHp392r3 | Igm Mesangial cell Connective-tissue macrophag... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.569026+00:00 | 2024-05-20 08:58:09.569040+00:00 | 1 |
75 | None | 38NEYTmxYUI4 | Ameloblast Muscular system IgE Cartwheel cells... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.573108+00:00 | 2024-05-20 08:58:09.573121+00:00 | 1 |
startswith¶
ln.Transform.filter(name__startswith="Research").df()
version | uid | name | key | description | type | latest_report_id | source_code_id | reference | reference_type | created_at | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||||
28 | None | gnMWyP2oFGGu | Research IgA efficiency IgG2 Subcutaneous tiss... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.566009+00:00 | 2024-05-20 08:58:09.566022+00:00 | 1 |
34 | None | VBwlyjPNUb4O | Research Sebaceous gland research Muscular sys... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.566898+00:00 | 2024-05-20 08:58:09.566911+00:00 | 1 |
36 | None | KU4E6zNmuTDV | Research Cartwheel cells IgG3 Vagina intestina... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.567194+00:00 | 2024-05-20 08:58:09.567208+00:00 | 1 |
88 | None | Rs8nwgOeSnok | Research IgA Connective-tissue macrophage inte... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.578096+00:00 | 2024-05-20 08:58:09.578118+00:00 | 1 |
267 | None | oxixfS0gbhH7 | Research study Vagina Cartwheel cells Adrenerg... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.609862+00:00 | 2024-05-20 08:58:09.609875+00:00 | 1 |
292 | None | X9aG49T8XCbG | Research Ameloblast IgG2 Cartwheel cells. | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.613552+00:00 | 2024-05-20 08:58:09.613565+00:00 | 1 |
331 | None | LW7NRn2VsyVL | Research investigate cluster Cartwheel cells. | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.621777+00:00 | 2024-05-20 08:58:09.621790+00:00 | 1 |
420 | None | ZOVyFwTNUoZ5 | Research intestinal Taste bud supporting cells... | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.637291+00:00 | 2024-05-20 08:58:09.637304+00:00 | 1 |
460 | None | 6TOkrutaRt3W | Research candidate IgM IgG3 IgG4. | None | None | notebook | None | None | None | None | 2024-05-20 08:58:09.645768+00:00 | 2024-05-20 08:58:09.645781+00:00 | 1 |
Show code cell content
# clean up test instance
!lamin delete --force mydata
!rm -r mydata
Traceback (most recent call last):
File "/opt/hostedtoolcache/Python/3.11.9/x64/bin/lamin", line 8, in <module>
sys.exit(main())
^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 367, in __call__
return super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/rich_click/rich_command.py", line 152, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamin_cli/__main__.py", line 103, in delete
return delete(instance, force=force)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/_delete.py", line 98, in delete
n_objects = check_storage_is_empty(
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/site-packages/lamindb_setup/core/upath.py", line 760, in check_storage_is_empty
raise InstanceNotEmpty(message)
lamindb_setup.core.upath.InstanceNotEmpty: Storage /home/runner/work/lamindb/lamindb/docs/mydata/.lamindb contains 3 objects ('_is_initialized' ignored) - delete them prior to deleting the instance
['/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/08xrlE4OG68OqFT5uNK8.jpg', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/QVDzUN8FLQs5plEucqQ1.fastq.gz', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/VGLlUyzsmgMDTOywZNxC.parquet', '/home/runner/work/lamindb/lamindb/docs/mydata/.lamindb/_is_initialized']