Keep artifacts local in a cloud instanceΒΆ

If you want to default to keeping artifacts local in a cloud instance, enable keep_artifacts_local.

Hide code cell content
!lamin login testuser1
!lamin init --storage s3://lamindb-ci/keep-artifacts-local
βœ… logged in with email testuser1@lamin.ai (uid: DzTjkKse)
πŸ’‘ go to: https://lamin.ai/testuser1/keep-artifacts-local
❗ updating & unlocking cloud SQLite 's3://lamindb-ci/keep-artifacts-local/cc7f2489bf7251f79ff9ca8df7ac045b.lndb' of instance 'testuser1/keep-artifacts-local'
πŸ’‘ connected lamindb: testuser1/keep-artifacts-local
❗ locked instance (to unlock and push changes to the cloud SQLite file, call: lamin close)
import lamindb as ln

ln.settings.transform.stem_uid = "l9lFf83aPwRc"
ln.settings.transform.version = "1"
ln.track()
πŸ’‘ connected lamindb: testuser1/keep-artifacts-local
πŸ’‘ notebook imports: lamindb==0.72.0
πŸ’‘ saved: Transform(version='1', uid='l9lFf83aPwRc5zKv', name='Keep artifacts local in a cloud instance', key='keep-artifacts-local', type='notebook', updated_at=2024-05-20 08:58:50 UTC, created_by_id=1)
πŸ’‘ saved: Run(uid='zK2UjFG5IJ0rOTA6PKj1', transform_id=1, created_by_id=1)
Hide code cell content
# the setting should be enabled on lamin.ai
# we're temporarily setting it here only for testing purposes
ln.setup.settings.instance._keep_artifacts_local = True

You can register a managed local storage location as follows:

ln.settings.storage_local = "./my_storage_local"
πŸ’‘ defaulting to local storage: /home/runner/work/lamindb/lamindb/docs/faq/my_storage_local

Now, you have two storage locations: one in the S3 bucket, and the other locally.

ln.Storage.df()
Hide code cell output
created_at created_by_id run_id updated_at uid root description type region instance_uid
id
2 2024-05-20 08:58:51.856155+00:00 1 None 2024-05-20 08:58:51.856236+00:00 qObmUkhYU2wo /home/runner/work/lamindb/lamindb/docs/faq/my_... None local None 6uGWmLpZlNoJ
1 2024-05-20 08:58:46.241792+00:00 1 None 2024-05-20 08:58:46.241858+00:00 dY6koACZwpEo s3://lamindb-ci/keep-artifacts-local None s3 us-west-1 6uGWmLpZlNoJ

Update storage descriptionΒΆ

You can add a description to the storage by using the description parameter:

storage_record = ln.Storage.filter(root=ln.settings.storage_local).one()
storage_record.description = "Files stored locally in site X on server Y for reason ABC"
storage_record.save()
ln.Storage.df()
created_at created_by_id run_id updated_at uid root description type region instance_uid
id
2 2024-05-20 08:58:51.856155+00:00 1 None 2024-05-20 08:58:51.894804+00:00 qObmUkhYU2wo /home/runner/work/lamindb/lamindb/docs/faq/my_... Files stored locally in site X on server Y for... local None 6uGWmLpZlNoJ
1 2024-05-20 08:58:46.241792+00:00 1 None 2024-05-20 08:58:46.241858+00:00 dY6koACZwpEo s3://lamindb-ci/keep-artifacts-local None s3 us-west-1 6uGWmLpZlNoJ

Use local storageΒΆ

If you save an artifact, by default, it’s stored in local storage.

original_filepath = ln.core.datasets.file_fcs()
artifact = ln.Artifact(original_filepath, description="My fcs file").save()
local_path = artifact.path
local_path
Hide code cell output
PosixUPath('/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local/.lamindb/frNkuOvJ6I19NOcmsWI3.fcs')

You’ll see the .fcs file named by the uid in your .lamindb/ directory under ./my_storage_local/:

ln.settings.storage_local.view_tree()
Hide code cell output
1 sub-directory & 2 files with suffixes '', '.fcs'
/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
└── .lamindb
    β”œβ”€β”€ _is_initialized
    └── frNkuOvJ6I19NOcmsWI3.fcs
Hide code cell content
assert local_path.exists()
assert artifact.path.as_posix().startswith(ln.setup.settings.instance.storage_local.root.as_posix())

If you’d like to upload an artifact, you pass upload=True to the save() method.

artifact.save(upload=True)
Hide code cell output
πŸ’‘ moved local artifact to cache: /home/runner/.cache/lamindb/frNkuOvJ6I19NOcmsWI3.fcs
Artifact(updated_at=2024-05-20 08:58:52 UTC, uid='frNkuOvJ6I19NOcmsWI3', suffix='.fcs', description='My fcs file', size=6785467, hash='KCEXRahJ-Ui9Y6nksQ8z1A', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, transform_id=1, run_id=1)

You now see the artifact in the S3 bucket:

ln.setup.settings.storage.root.view_tree()
Hide code cell output
2 sub-directories & 3 files with suffixes '', '.lndb', '.fcs'
s3://lamindb-ci/keep-artifacts-local
β”œβ”€β”€ cc7f2489bf7251f79ff9ca8df7ac045b.lndb
└── .lamindb
    β”œβ”€β”€ _is_initialized
    β”œβ”€β”€ frNkuOvJ6I19NOcmsWI3.fcs
    └── _exclusion

And it’s no longer present in local storage:

ln.setup.settings.instance.storage_local.root.view_tree()
Hide code cell output
1 sub-directory & 1 files with suffixes ''
/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
└── .lamindb
    └── _is_initialized
Hide code cell content
assert artifact.path.exists()
assert not local_path.exists()
assert artifact.path.as_posix().startswith(ln.setup.settings.instance.storage.root.as_posix())

Direct uploadΒΆ

You can also directly upload a file by passing upload=True:

filepath = ln.core.datasets.file_mini_csv()
artifact2 = ln.Artifact(filepath, description="My csv file").save(upload=True)
artifact2.path
Hide code cell output
S3Path('s3://lamindb-ci/keep-artifacts-local/.lamindb/OLtZPqdQn44xgSfQhcIQ.csv')

Now we have two files on S3:

ln.Artifact.df(include="storage__root")
Hide code cell output
storage__root version created_at created_by_id updated_at uid storage_id key suffix accessor description size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual
id
2 s3://lamindb-ci/keep-artifacts-local None 2024-05-20 08:58:53.059914+00:00 1 2024-05-20 08:58:53.059969+00:00 OLtZPqdQn44xgSfQhcIQ 1 None .csv None My csv file 11 z1LdF2qN4cN0M2sXrcW8aw md5 None None 1 1 1 True
1 s3://lamindb-ci/keep-artifacts-local None 2024-05-20 08:58:52.759403+00:00 1 2024-05-20 08:58:52.797539+00:00 frNkuOvJ6I19NOcmsWI3 1 None .fcs None My fcs file 6785467 KCEXRahJ-Ui9Y6nksQ8z1A md5 None None 1 1 1 True
Hide code cell content
assert artifact2.path.exists()

Pre-existing artifactsΒΆ

Assume we already have a file in our registered local storage location:

Hide code cell source
file_in_local_storage = ln.core.datasets.file_bam()
file_in_local_storage.rename("./my_storage_local/output.bam")
ln.UPath("my_storage_local/").view_tree()
1 sub-directory & 2 files with suffixes '', '.bam'
/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
β”œβ”€β”€ .lamindb
β”‚   └── _is_initialized
└── output.bam

If we create an artifact from it, it remains where it is during saving:

my_existing_file = ln.Artifact("./my_storage_local/output.bam", description="my existing file").save()
ln.UPath("my_storage_local/").view_tree()
Hide code cell output
1 sub-directory & 2 files with suffixes '', '.bam'
/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
β”œβ”€β”€ .lamindb
β”‚   └── _is_initialized
└── output.bam

The storage path of the artifact is constructed using key because key_is_virtual=False:

my_existing_file
Hide code cell output
Artifact(updated_at=2024-05-20 08:58:53 UTC, uid='bYiUCB43lVtpQRmIH5uL', key='output.bam', suffix='.bam', description='my existing file', size=18, hash='D2yxELM5U3VLeyvrwWUMUA', hash_type='md5', visibility=1, key_is_virtual=False, created_by_id=1, storage_id=2, transform_id=1, run_id=1)

However, if we decide to upload the artifact, we’ll use the uid for constructing the storage path and switch key_is_virtual=True:

my_existing_file.save(upload=True)
Hide code cell output
πŸ’‘ moved local artifact to cache: /home/runner/.cache/lamindb/output.bam
Artifact(updated_at=2024-05-20 08:58:53 UTC, uid='bYiUCB43lVtpQRmIH5uL', key='output.bam', suffix='.bam', description='my existing file', size=18, hash='D2yxELM5U3VLeyvrwWUMUA', hash_type='md5', visibility=1, key_is_virtual=True, created_by_id=1, storage_id=1, transform_id=1, run_id=1)

Here is the remote path of the artifact:

my_existing_file.path
Hide code cell output
S3Path('s3://lamindb-ci/keep-artifacts-local/.lamindb/bYiUCB43lVtpQRmIH5uL.bam')

And here are the contents of the storage locations:

# the path on S3
ln.setup.settings.storage.root.view_tree()
# the local path
ln.setup.settings.instance.storage_local.root.view_tree()
Hide code cell output
2 sub-directories & 5 files with suffixes '', '.lndb', '.csv', '.fcs', '.bam'
s3://lamindb-ci/keep-artifacts-local
β”œβ”€β”€ cc7f2489bf7251f79ff9ca8df7ac045b.lndb
└── .lamindb
    β”œβ”€β”€ OLtZPqdQn44xgSfQhcIQ.csv
    β”œβ”€β”€ _is_initialized
    β”œβ”€β”€ bYiUCB43lVtpQRmIH5uL.bam
    β”œβ”€β”€ frNkuOvJ6I19NOcmsWI3.fcs
    └── _exclusion
1 sub-directory & 1 files with suffixes ''
/home/runner/work/lamindb/lamindb/docs/faq/my_storage_local
└── .lamindb
    └── _is_initialized

Delete the test instanceΒΆ

Delete the artifacts:

artifact.delete(permanent=True)
artifact2.delete(permanent=True)
my_existing_file.delete(permanent=True)

Delete the instance:

ln.setup.delete("keep-artifacts-local", force=True)
Hide code cell output
πŸ’‘ deleted storage record on hub eb35d80e9c4d5d02aab65a03c95bb70c
πŸ’‘ deleted storage record on hub 5283fb842808478ab5fb37a73d7e067a
πŸ’‘ deleted instance record on hub cc7f2489bf7251f79ff9ca8df7ac045b