Here, we’ll see how to track redun workflow runs with LaminDB.


This use case is based on github.com/ricomnl/bioinformatics-pipeline-tutorial.


!lamin init --storage .  --name redun-lamin-fasta
💡 connected lamindb: testuser1/redun-lamin-fasta

Register the workflow

import lamindb as ln
import json
💡 connected lamindb: testuser1/redun-lamin-fasta

Register the workflow in the Transform registry:

Transform(version='0.1.0', uid='6q33ZrLo3YgG', name='lamin-redun-fasta', type='pipeline', reference='https://github.com/laminlabs/redun-lamin-fasta', updated_at=2024-05-19 23:24:13 UTC, created_by_id=1)
How to amend a redun workflow.py to register input & output files in LaminDB?

To query input files via LaminDB, we added the following lines:

# register input files in lamindb
# query & track this pipeline
transform = ln.Transform.filter(name="lamin-redun-fasta", version="0.1.0").one()
# query input files
input_filepaths = [
    file.stage() for file in ln.Artifact.filter(key__startswith="fasta/")

To register the output file via LaminDB, we added the following line to the last task:


Run redun

Let’s see what the input files are:

!ls ./fasta
KLF4.fasta  MYC.fasta  PO5F1.fasta  SOX2.fasta

And call the workflow:

!redun run workflow.py main --input-dir ./fasta --tag run=test-run  1> redun_stdout.txt 2>redun_stderr.txt

Inspect the output:

!cat redun_stdout.txt
💡 connected lamindb: testuser1/redun-lamin-fasta
❗ this creates one artifact per file in the directory - you might simply call ln.Artifact(dir) to get one artifact for the entire directory
❗ no run & transform get linked, consider calling ln.track()
❗ no run & transform get linked, consider calling ln.track()
❗ no run & transform get linked, consider calling ln.track()
❗ no run & transform get linked, consider calling ln.track()
💡 loaded: Transform(version='0.1.0', uid='6q33ZrLo3YgG', name='lamin-redun-fasta', type='pipeline', reference='https://github.com/laminlabs/redun-lamin-fasta', updated_at=2024-05-19 23:24:13 UTC, created_by_id=1)
💡 saved: Run(uid='l6XxrGvQpS3jjAnwpwhQ', transform_id=1, created_by_id=1)
File(path=/home/runner/work/redun-lamin-fasta/redun-lamin-fasta/docs/data/results.tgz, hash=19ba0167)

And the error log:

!tail -1 redun_stderr.txt
[redun] Execution duration: 2.22 seconds

View data lineage:

artifact = ln.Artifact.filter(key="data/results.tgz").one()  # query by name

Register the redun execution id

If we want to be able to query LaminDB for redun execution ID, this here is a way to get it:

# export the run information from redun
!redun log --exec --exec-tag run=test-run --format json --no-pager > redun_exec.json
# load the redun execution id from the JSON and store it in the LaminDB run record
redun_exec = json.load(open("redun_exec.json"))
artifact.run.reference = redun_exec["id"]
artifact.run.reference_type = "redun_id"
Run(uid='l6XxrGvQpS3jjAnwpwhQ', started_at=2024-05-19 23:24:20 UTC, reference='92b77671-0d5d-474b-ae5a-d2f2599f40f9', reference_type='redun_id', transform_id=1, created_by_id=1)

View the database content

version created_at created_by_id updated_at uid storage_id key suffix accessor description size hash hash_type n_objects n_observations transform_id run_id visibility key_is_virtual
5 None 2024-05-19 23:24:23.035550+00:00 1 2024-05-19 23:24:23.035611+00:00 6LWYbDmkNuRWxtrOBWTM 1 data/results.tgz .tgz None None 83503 qfxueUZlr1IqgWdX9nD3Lg md5 None None 1.0 1.0 1 False
4 None 2024-05-19 23:24:20.938244+00:00 1 2024-05-19 23:24:20.938284+00:00 QvKR7DvQZliLBlwhVyk2 1 fasta/MYC.fasta .fasta None None 536 WGbEtzPw-3bQEGcngO_pHQ md5 None None NaN NaN 1 False
3 None 2024-05-19 23:24:20.937673+00:00 1 2024-05-19 23:24:20.937714+00:00 UEJPjWZ4N7zP4tqHgWZx 1 fasta/SOX2.fasta .fasta None None 414 C5q_yaFXGk4SAEpfdqBwnQ md5 None None NaN NaN 1 False
2 None 2024-05-19 23:24:20.936918+00:00 1 2024-05-19 23:24:20.936961+00:00 pZ768DhSCkS3pYXRvqwt 1 fasta/KLF4.fasta .fasta None None 609 LyuoYkWs4SgYcH7P7JLJtA md5 None None NaN NaN 1 False
1 None 2024-05-19 23:24:20.935771+00:00 1 2024-05-19 23:24:20.935829+00:00 3rrqpJ8wdTqs30lrwv0j 1 fasta/PO5F1.fasta .fasta None None 477 -7iJgveFO9ia0wE1bqVu6g md5 None None NaN NaN 1 False
uid transform_id started_at finished_at created_by_id report_id environment_id is_consecutive reference reference_type created_at
1 l6XxrGvQpS3jjAnwpwhQ 1 2024-05-19 23:24:20.944426+00:00 None 1 None None None 92b77671-0d5d-474b-ae5a-d2f2599f40f9 redun_id 2024-05-19 23:24:20.944554+00:00
created_at created_by_id run_id updated_at uid root description type region instance_uid
1 2024-05-19 23:24:12.306322+00:00 1 None 2024-05-19 23:24:12.306384+00:00 gTfS4lET0VGx /home/runner/work/redun-lamin-fasta/redun-lami... None local None 8SgWe7slTFKk
version uid name key description type latest_report_id source_code_id reference reference_type created_at updated_at created_by_id
1 0.1.0 6q33ZrLo3YgG lamin-redun-fasta None None pipeline None None https://github.com/laminlabs/redun-lamin-fasta None 2024-05-19 23:24:13.727772+00:00 2024-05-19 23:24:13.727803+00:00 1
uid handle name created_at updated_at
1 DzTjkKse testuser1 Test User1 2024-05-19 23:24:12.301203+00:00 2024-05-19 23:24:12.301229+00:00

Delete the test instance:

!lamin delete --force redun-lamin-fasta
💡 deleting instance testuser1/redun-lamin-fasta
