Snakemake#

Snakemake is a workflow management system used for executing scientific workflows across platforms scalably, portably, and reproducibly.

Here, we’ll run snakemake-workflows/rna-seq-star-deseq2 to perform differential gene expression analysis with STAR and deseq2 (reference).

Setup#

Let’s create a test instance:

!lamin init --storage . --name snakemake-bulkrna
✅ saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-20 21:53:07)
✅ saved: Storage(id='t6Qgo37v', root='/home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs', type='local', updated_at=2023-09-20 21:53:07, created_by_id='DzTjkKse')
💡 loaded instance: testuser1/snakemake-bulkrna
💡 did not register local instance on hub (if you want, call `lamin register`)

import lamindb as ln
💡 loaded instance: testuser1/snakemake-bulkrna (lamindb 0.54.0)

Download test data#

The Snakemake pipeline comes with test data. Therefore, we clone the whole pipeline using git:

!git clone https://github.com/snakemake-workflows/rna-seq-star-deseq2 --single-branch --branch v2.0.0
Hide code cell output
Cloning into 'rna-seq-star-deseq2'...
remote: Enumerating objects: 759, done.
remote: Counting objects:   0% (1/151)
remote: Counting objects:   1% (2/151)
remote: Counting objects:   2% (4/151)
remote: Counting objects:   3% (5/151)
remote: Counting objects:   4% (7/151)
remote: Counting objects:   5% (8/151)
remote: Counting objects:   6% (10/151)
remote: Counting objects:   7% (11/151)
remote: Counting objects:   8% (13/151)
remote: Counting objects:   9% (14/151)
remote: Counting objects:  10% (16/151)
remote: Counting objects:  11% (17/151)
remote: Counting objects:  12% (19/151)
remote: Counting objects:  13% (20/151)
remote: Counting objects:  14% (22/151)
remote: Counting objects:  15% (23/151)
remote: Counting objects:  16% (25/151)
remote: Counting objects:  17% (26/151)
remote: Counting objects:  18% (28/151)
remote: Counting objects:  19% (29/151)
remote: Counting objects:  20% (31/151)
remote: Counting objects:  21% (32/151)
remote: Counting objects:  22% (34/151)
remote: Counting objects:  23% (35/151)
remote: Counting objects:  24% (37/151)
remote: Counting objects:  25% (38/151)
remote: Counting objects:  26% (40/151)
remote: Counting objects:  27% (41/151)
remote: Counting objects:  28% (43/151)
remote: Counting objects:  29% (44/151)
remote: Counting objects:  30% (46/151)
remote: Counting objects:  31% (47/151)
remote: Counting objects:  32% (49/151)
remote: Counting objects:  33% (50/151)
remote: Counting objects:  34% (52/151)
remote: Counting objects:  35% (53/151)
remote: Counting objects:  36% (55/151)
remote: Counting objects:  37% (56/151)
remote: Counting objects:  38% (58/151)
remote: Counting objects:  39% (59/151)
remote: Counting objects:  40% (61/151)
remote: Counting objects:  41% (62/151)
remote: Counting objects:  42% (64/151)
remote: Counting objects:  43% (65/151)
remote: Counting objects:  44% (67/151)
remote: Counting objects:  45% (68/151)
remote: Counting objects:  46% (70/151)
remote: Counting objects:  47% (71/151)
remote: Counting objects:  48% (73/151)
remote: Counting objects:  49% (74/151)
remote: Counting objects:  50% (76/151)
remote: Counting objects:  51% (78/151)
remote: Counting objects:  52% (79/151)
remote: Counting objects:  53% (81/151)
remote: Counting objects:  54% (82/151)
remote: Counting objects:  55% (84/151)
remote: Counting objects:  56% (85/151)
remote: Counting objects:  57% (87/151)
remote: Counting objects:  58% (88/151)
remote: Counting objects:  59% (90/151)
remote: Counting objects:  60% (91/151)
remote: Counting objects:  61% (93/151)
remote: Counting objects:  62% (94/151)
remote: Counting objects:  63% (96/151)
remote: Counting objects:  64% (97/151)
remote: Counting objects:  65% (99/151)
remote: Counting objects:  66% (100/151)
remote: Counting objects:  67% (102/151)
remote: Counting objects:  68% (103/151)
remote: Counting objects:  69% (105/151)
remote: Counting objects:  70% (106/151)
remote: Counting objects:  71% (108/151)
remote: Counting objects:  72% (109/151)
remote: Counting objects:  73% (111/151)
remote: Counting objects:  74% (112/151)
remote: Counting objects:  75% (114/151)
remote: Counting objects:  76% (115/151)
remote: Counting objects:  77% (117/151)
remote: Counting objects:  78% (118/151)
remote: Counting objects:  79% (120/151)
remote: Counting objects:  80% (121/151)
remote: Counting objects:  81% (123/151)
remote: Counting objects:  82% (124/151)
remote: Counting objects:  83% (126/151)
remote: Counting objects:  84% (127/151)
remote: Counting objects:  85% (129/151)
remote: Counting objects:  86% (130/151)
remote: Counting objects:  87% (132/151)
remote: Counting objects:  88% (133/151)
remote: Counting objects:  89% (135/151)
remote: Counting objects:  90% (136/151)
remote: Counting objects:  91% (138/151)
remote: Counting objects:  92% (139/151)
remote: Counting objects:  93% (141/151)
remote: Counting objects:  94% (142/151)
remote: Counting objects:  95% (144/151)
remote: Counting objects:  96% (145/151)
remote: Counting objects:  97% (147/151)
remote: Counting objects:  98% (148/151)
remote: Counting objects:  99% (150/151)
remote: Counting objects: 100% (151/151)
remote: Counting objects: 100% (151/151), done.
remote: Compressing objects:   1% (1/92)
remote: Compressing objects:   2% (2/92)
remote: Compressing objects:   3% (3/92)
remote: Compressing objects:   4% (4/92)
remote: Compressing objects:   5% (5/92)
remote: Compressing objects:   6% (6/92)
remote: Compressing objects:   7% (7/92)
remote: Compressing objects:   8% (8/92)
remote: Compressing objects:   9% (9/92)
remote: Compressing objects:  10% (10/92)
remote: Compressing objects:  11% (11/92)
remote: Compressing objects:  13% (12/92)
remote: Compressing objects:  14% (13/92)
remote: Compressing objects:  15% (14/92)
remote: Compressing objects:  16% (15/92)
remote: Compressing objects:  17% (16/92)
remote: Compressing objects:  18% (17/92)
remote: Compressing objects:  19% (18/92)
remote: Compressing objects:  20% (19/92)
remote: Compressing objects:  21% (20/92)
remote: Compressing objects:  22% (21/92)
remote: Compressing objects:  23% (22/92)
remote: Compressing objects:  25% (23/92)
remote: Compressing objects:  26% (24/92)
remote: Compressing objects:  27% (25/92)
remote: Compressing objects:  28% (26/92)
remote: Compressing objects:  29% (27/92)
remote: Compressing objects:  30% (28/92)
remote: Compressing objects:  31% (29/92)
remote: Compressing objects:  32% (30/92)
remote: Compressing objects:  33% (31/92)
remote: Compressing objects:  34% (32/92)
remote: Compressing objects:  35% (33/92)
remote: Compressing objects:  36% (34/92)
remote: Compressing objects:  38% (35/92)
remote: Compressing objects:  39% (36/92)
remote: Compressing objects:  40% (37/92)
remote: Compressing objects:  41% (38/92)
remote: Compressing objects:  42% (39/92)
remote: Compressing objects:  43% (40/92)
remote: Compressing objects:  44% (41/92)
remote: Compressing objects:  45% (42/92)
remote: Compressing objects:  46% (43/92)
remote: Compressing objects:  47% (44/92)
remote: Compressing objects:  48% (45/92)
remote: Compressing objects:  50% (46/92)
remote: Compressing objects:  51% (47/92)
remote: Compressing objects:  52% (48/92)
remote: Compressing objects:  53% (49/92)
remote: Compressing objects:  54% (50/92)
remote: Compressing objects:  55% (51/92)
remote: Compressing objects:  56% (52/92)
remote: Compressing objects:  57% (53/92)
remote: Compressing objects:  58% (54/92)
remote: Compressing objects:  59% (55/92)
remote: Compressing objects:  60% (56/92)
remote: Compressing objects:  61% (57/92)
remote: Compressing objects:  63% (58/92)
remote: Compressing objects:  64% (59/92)
remote: Compressing objects:  65% (60/92)
remote: Compressing objects:  66% (61/92)
remote: Compressing objects:  67% (62/92)
remote: Compressing objects:  68% (63/92)
remote: Compressing objects:  69% (64/92)
remote: Compressing objects:  70% (65/92)
remote: Compressing objects:  71% (66/92)
remote: Compressing objects:  72% (67/92)
remote: Compressing objects:  73% (68/92)
remote: Compressing objects:  75% (69/92)
remote: Compressing objects:  76% (70/92)
remote: Compressing objects:  77% (71/92)
remote: Compressing objects:  78% (72/92)
remote: Compressing objects:  79% (73/92)
remote: Compressing objects:  80% (74/92)
remote: Compressing objects:  81% (75/92)
remote: Compressing objects:  82% (76/92)
remote: Compressing objects:  83% (77/92)
remote: Compressing objects:  84% (78/92)
remote: Compressing objects:  85% (79/92)
remote: Compressing objects:  86% (80/92)
remote: Compressing objects:  88% (81/92)
remote: Compressing objects:  89% (82/92)
remote: Compressing objects:  90% (83/92)
remote: Compressing objects:  91% (84/92)
remote: Compressing objects:  92% (85/92)
remote: Compressing objects:  93% (86/92)
remote: Compressing objects:  94% (87/92)
remote: Compressing objects:  95% (88/92)
remote: Compressing objects:  96% (89/92)
remote: Compressing objects:  97% (90/92)
remote: Compressing objects:  98% (91/92)
remote: Compressing objects: 100% (92/92)
remote: Compressing objects: 100% (92/92), done.
Receiving objects:   0% (1/759)
Receiving objects:   1% (8/759)
Receiving objects:   2% (16/759)
Receiving objects:   3% (23/759)
Receiving objects:   4% (31/759)
Receiving objects:   5% (38/759)
Receiving objects:   6% (46/759)
Receiving objects:   7% (54/759)
Receiving objects:   8% (61/759)
Receiving objects:   9% (69/759)
Receiving objects:  10% (76/759)
Receiving objects:  11% (84/759)
Receiving objects:  12% (92/759)
Receiving objects:  13% (99/759)
Receiving objects:  14% (107/759)
Receiving objects:  15% (114/759)
Receiving objects:  16% (122/759)
Receiving objects:  17% (130/759)
Receiving objects:  18% (137/759)
Receiving objects:  19% (145/759)
Receiving objects:  20% (152/759)
Receiving objects:  21% (160/759)
Receiving objects:  22% (167/759)
Receiving objects:  23% (175/759)
Receiving objects:  24% (183/759)
Receiving objects:  25% (190/759)
Receiving objects:  26% (198/759)
Receiving objects:  27% (205/759)
Receiving objects:  28% (213/759)
Receiving objects:  29% (221/759)
Receiving objects:  30% (228/759)
Receiving objects:  31% (236/759)
Receiving objects:  32% (243/759)
Receiving objects:  33% (251/759)
Receiving objects:  34% (259/759)
Receiving objects:  35% (266/759)
Receiving objects:  36% (274/759)
Receiving objects:  37% (281/759)
Receiving objects:  38% (289/759)
Receiving objects:  39% (297/759)
Receiving objects:  40% (304/759)
Receiving objects:  41% (312/759)
Receiving objects:  42% (319/759)
Receiving objects:  43% (327/759)
Receiving objects:  44% (334/759)
Receiving objects:  45% (342/759)
Receiving objects:  46% (350/759)
Receiving objects:  47% (357/759)
Receiving objects:  48% (365/759)
Receiving objects:  49% (372/759)
Receiving objects:  50% (380/759)
Receiving objects:  51% (388/759)
Receiving objects:  52% (395/759)
Receiving objects:  53% (403/759)
Receiving objects:  54% (410/759)
Receiving objects:  55% (418/759)
Receiving objects:  56% (426/759)
Receiving objects:  57% (433/759)
Receiving objects:  58% (441/759)
Receiving objects:  59% (448/759)
Receiving objects:  60% (456/759)
Receiving objects:  61% (463/759)
Receiving objects:  62% (471/759)
Receiving objects:  63% (479/759)
Receiving objects:  64% (486/759)
Receiving objects:  65% (494/759)
Receiving objects:  66% (501/759)
Receiving objects:  67% (509/759)
Receiving objects:  68% (517/759)
Receiving objects:  69% (524/759)
Receiving objects:  70% (532/759)
Receiving objects:  71% (539/759)
Receiving objects:  72% (547/759)
remote: Total 759 (delta 68), reused 105 (delta 52), pack-reused 608
Receiving objects:  73% (555/759)
Receiving objects:  74% (562/759)
Receiving objects:  75% (570/759)
Receiving objects:  76% (577/759)
Receiving objects:  77% (585/759)
Receiving objects:  78% (593/759)
Receiving objects:  79% (600/759)
Receiving objects:  80% (608/759)
Receiving objects:  81% (615/759)
Receiving objects:  82% (623/759)
Receiving objects:  83% (630/759)
Receiving objects:  84% (638/759)
Receiving objects:  85% (646/759)
Receiving objects:  86% (653/759)
Receiving objects:  87% (661/759)
Receiving objects:  88% (668/759)
Receiving objects:  89% (676/759)
Receiving objects:  90% (684/759)
Receiving objects:  91% (691/759)
Receiving objects:  92% (699/759)
Receiving objects:  93% (706/759)
Receiving objects:  94% (714/759)
Receiving objects:  95% (722/759)
Receiving objects:  96% (729/759)
Receiving objects:  97% (737/759)
Receiving objects:  98% (744/759)
Receiving objects:  99% (752/759)
Receiving objects: 100% (759/759)
Receiving objects: 100% (759/759), 16.95 MiB | 34.16 MiB/s, done.
Resolving deltas:   0% (0/379)
Resolving deltas:   1% (4/379)
Resolving deltas:   2% (8/379)
Resolving deltas:   3% (12/379)
Resolving deltas:   4% (16/379)
Resolving deltas:   5% (20/379)
Resolving deltas:   6% (23/379)
Resolving deltas:   7% (27/379)
Resolving deltas:   8% (31/379)
Resolving deltas:   9% (35/379)
Resolving deltas:  10% (38/379)
Resolving deltas:  11% (42/379)
Resolving deltas:  12% (46/379)
Resolving deltas:  13% (50/379)
Resolving deltas:  14% (54/379)
Resolving deltas:  15% (57/379)
Resolving deltas:  16% (61/379)
Resolving deltas:  17% (65/379)
Resolving deltas:  18% (70/379)
Resolving deltas:  19% (74/379)
Resolving deltas:  20% (76/379)
Resolving deltas:  21% (80/379)
Resolving deltas:  22% (84/379)
Resolving deltas:  23% (88/379)
Resolving deltas:  24% (91/379)
Resolving deltas:  25% (95/379)
Resolving deltas:  26% (99/379)
Resolving deltas:  27% (103/379)
Resolving deltas:  28% (107/379)
Resolving deltas:  29% (110/379)
Resolving deltas:  30% (114/379)
Resolving deltas:  31% (118/379)
Resolving deltas:  32% (122/379)
Resolving deltas:  33% (126/379)
Resolving deltas:  34% (129/379)
Resolving deltas:  35% (133/379)
Resolving deltas:  36% (137/379)
Resolving deltas:  37% (141/379)
Resolving deltas:  38% (145/379)
Resolving deltas:  39% (148/379)
Resolving deltas:  40% (152/379)
Resolving deltas:  41% (156/379)
Resolving deltas:  42% (160/379)
Resolving deltas:  43% (163/379)
Resolving deltas:  44% (167/379)
Resolving deltas:  45% (171/379)
Resolving deltas:  46% (175/379)
Resolving deltas:  47% (179/379)
Resolving deltas:  48% (182/379)
Resolving deltas:  49% (186/379)
Resolving deltas:  50% (190/379)
Resolving deltas:  51% (194/379)
Resolving deltas:  52% (198/379)
Resolving deltas:  53% (201/379)
Resolving deltas:  54% (205/379)
Resolving deltas:  55% (209/379)
Resolving deltas:  56% (213/379)
Resolving deltas:  57% (217/379)
Resolving deltas:  58% (220/379)
Resolving deltas:  59% (224/379)
Resolving deltas:  60% (228/379)
Resolving deltas:  61% (232/379)
Resolving deltas:  62% (235/379)
Resolving deltas:  63% (239/379)
Resolving deltas:  64% (243/379)
Resolving deltas:  65% (247/379)
Resolving deltas:  66% (251/379)
Resolving deltas:  67% (254/379)
Resolving deltas:  68% (258/379)
Resolving deltas:  69% (262/379)
Resolving deltas:  70% (266/379)
Resolving deltas:  71% (270/379)
Resolving deltas:  72% (273/379)
Resolving deltas:  73% (277/379)
Resolving deltas:  74% (281/379)
Resolving deltas:  75% (285/379)
Resolving deltas:  76% (289/379)
Resolving deltas:  77% (292/379)
Resolving deltas:  78% (296/379)
Resolving deltas:  79% (300/379)
Resolving deltas:  80% (304/379)
Resolving deltas:  81% (307/379)
Resolving deltas:  82% (311/379)
Resolving deltas:  83% (315/379)
Resolving deltas:  84% (319/379)
Resolving deltas:  85% (323/379)
Resolving deltas:  86% (326/379)
Resolving deltas:  87% (330/379)
Resolving deltas:  88% (334/379)
Resolving deltas:  89% (338/379)
Resolving deltas:  90% (342/379)
Resolving deltas:  91% (345/379)
Resolving deltas:  92% (349/379)
Resolving deltas:  93% (353/379)
Resolving deltas:  94% (357/379)
Resolving deltas:  95% (361/379)
Resolving deltas:  96% (364/379)
Resolving deltas:  97% (368/379)
Resolving deltas:  98% (372/379)
Resolving deltas:  99% (376/379)
Resolving deltas: 100% (379/379)
Resolving deltas: 100% (379/379), done.
Note: switching to 'e103c1cc78feba97cc3cebe8d7f2a51c8958ab96'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false
root_dir = "rna-seq-star-deseq2"

Track the download:

download = ln.Transform(name="Download")
download_url = "https://github.com/snakemake-workflows/rna-seq-star-deseq2"
# create global run containing the download_url
ln.track(download, reference=download_url, reference_type="url")
💡 Transform(id='hCfEkAZ94EeCNt', name='Download', type=notebook, updated_at=2023-09-20 21:53:10, created_by_id='DzTjkKse')
💡 Run(id='XltwcXAZ7H4twaCujeqX', run_at=2023-09-20 21:53:10, reference='https://github.com/snakemake-workflows/rna-seq-star-deseq2', reference_type='url', transform_id='hCfEkAZ94EeCNt', created_by_id='DzTjkKse')

Register input files - they’ll automatically be linked against the download run:

sample_sheet = ln.File(f"{root_dir}/.test/config_basic/samples.tsv")
ln.save(sample_sheet)
input_fastqs = ln.File.from_dir(f"{root_dir}/.test/ngs-test-data/reads/")
ln.save(input_fastqs)
Hide code cell output
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ file has more than one suffix (path.suffixes), using only last suffix: '.fq'
❗ there are different file ids with the same hashes, dropping 2 duplicates out of 10 files:
    File(id='giidtfhayfrI68utcbKw', key='rna-seq-star-deseq2/.test/ngs-test-data/reads/a.scerevisiae.2.fq', suffix='.fq', size=2218894, hash='B4hGJwLbEtEx8GCEdcvjgA', hash_type='md5', storage_id='t6Qgo37v', transform_id='hCfEkAZ94EeCNt', run_id='XltwcXAZ7H4twaCujeqX', created_by_id='DzTjkKse')
    File(id='lvWwIPa0ZMhEiiqbIwRW', key='rna-seq-star-deseq2/.test/ngs-test-data/reads/b.scerevisiae.1.fq', suffix='.fq', size=2218894, hash='DqfThx982Ai4akCcx-HikA', hash_type='md5', storage_id='t6Qgo37v', transform_id='hCfEkAZ94EeCNt', run_id='XltwcXAZ7H4twaCujeqX', created_by_id='DzTjkKse')

Visualize data lineage for one of the files:

sample_sheet.view_flow()
_images/2330a213c2b19b42c4ebc4c1d64a33f099005224da45dfcd2ddf189506f0b241.svg

Track Snakemake run#

(We’d start here if input files were tracked in the cloud with LaminDB rather than downloaded through git.)

Track the Snakemake workflow & run:

transform = ln.Transform(
    name="snakemake-workflows/rna-seq-star-deseq2",
    version="2.0.0",
    type="pipeline",
    reference="https://github.com/laminlabs/snakemake-lamin-usecases",
)
ln.track(transform)
run = ln.dev.run_context.run  # let's grab the global run record
💡 Transform(id='0wOgM2TBtkMV6M', name='snakemake-workflows/rna-seq-star-deseq2', version='2.0.0', type='pipeline', reference='https://github.com/laminlabs/snakemake-lamin-usecases', updated_at=2023-09-20 21:53:10, created_by_id='DzTjkKse')
💡 Run(id='O0aXDnvbt4EHLqbc0RO6', run_at=2023-09-20 21:53:10, transform_id='0wOgM2TBtkMV6M', created_by_id='DzTjkKse')

If we now stage input files, they’ll be tracked as run inputs.

(In this test case, data is already locally available and staging won’t download anything.)

input_sample_sheet_path = sample_sheet.stage()
input_paths = [input_fastq.stage() for input_fastq in input_fastqs]

All data is now locally available, and we can run the snakemake pipeline:

!snakemake \
    --directory rna-seq-star-deseq2/.test \
    --snakefile rna-seq-star-deseq2/workflow/Snakefile \
    --configfile rna-seq-star-deseq2/.test/config_basic/config.yaml \
    --use-conda \
    --show-failed-logs \
    --cores 2 \
    --conda-frontend conda \
    --conda-cleanup-pkgs cache
Hide code cell output
Workflow defines that rule get_genome is eligible for caching between workflows (use the --cache argument to enable this).

Workflow defines that rule get_annotation is eligible for caching between workflows (use the --cache argument to enable this).

Workflow defines that rule genome_faidx is eligible for caching between workflows (use the --cache argument to enable this).

Workflow defines that rule bwa_index is eligible for caching between workflows (use the --cache argument to enable this).

Workflow defines that rule star_index is eligible for caching between workflows (use the --cache argument to enable this).

Building DAG of jobs...

Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.

Creating conda environment ../workflow/envs/deseq2.yaml...

Downloading and installing remote packages.

Cleaning up conda package tarballs and package cache.

Environment for /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs/rna-seq-star-deseq2/workflow/rules/../envs/deseq2.yaml created (location: .snakemake/conda/97c4b09bad6bfa4ada6fe14151f3ec1c_)

Creating conda environment ../workflow/envs/biomart.yaml...

Downloading and installing remote packages.

Cleaning up conda package tarballs and package cache.

Environment for /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs/rna-seq-star-deseq2/workflow/rules/../envs/biomart.yaml created (location: .snakemake/conda/c39789d0da717f333b9d416fae158cd7_)

Creating conda environment ../workflow/envs/gffutils.yaml...

Downloading and installing remote packages.

Cleaning up conda package tarballs and package cache.

Environment for /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs/rna-seq-star-deseq2/workflow/rules/../envs/gffutils.yaml created (location: .snakemake/conda/5adf00fcbc983f732eb9e031c62ebc03_)

Creating conda environment ../workflow/envs/pandas.yaml...

Downloading and installing remote packages.

Cleaning up conda package tarballs and package cache.

Environment for /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs/rna-seq-star-deseq2/workflow/rules/../envs/pandas.yaml created (location: .snakemake/conda/45752da70eae001d2312f5cc28f0893b_)

Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/star/index/environment.yaml...

Downloading and installing remote packages.

Cleaning up conda package tarballs and package cache.

Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/star/index/environment.yaml created (location: .snakemake/conda/d02df01c1a926fffbd5dd7a2048236cd_)

Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/cutadapt/pe/environment.yaml...

Downloading and installing remote packages.

Cleaning up conda package tarballs and package cache.

Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/cutadapt/pe/environment.yaml created (location: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_)

Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/star/align/environment.yaml...

Downloading and installing remote packages.

Cleaning up conda package tarballs and package cache.

Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/star/align/environment.yaml created (location: .snakemake/conda/8c561395ac76b6c2cbace6e1f2e26ae4_)

Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/reference/ensembl-annotation/environment.yaml...

Downloading and installing remote packages.

Cleaning up conda package tarballs and package cache.

Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/reference/ensembl-annotation/environment.yaml created (location: .snakemake/conda/b989f3f8888314c661109a11e4951d06_)

Creating conda environment https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/multiqc/environment.yaml...

Downloading and installing remote packages.

Cleaning up conda package tarballs and package cache.

Environment for https://github.com/snakemake/snakemake-wrappers/raw/v1.21.4/bio/multiqc/environment.yaml created (location: .snakemake/conda/fdd167354aa72861c97737721eeeb361_)

Creating conda environment ../workflow/envs/rseqc.yaml...

Downloading and installing remote packages.

Cleaning up conda package tarballs and package cache.

Environment for /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs/rna-seq-star-deseq2/workflow/rules/../envs/rseqc.yaml created (location: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_)

Using shell: /usr/bin/bash

Provided cores: 2

Rules claiming more threads will be scaled down.

Singularity containers: ignored

Job stats:
job                          count
-------------------------  -------
align                            4
all                              1
count_matrix                     1
cutadapt_pe                      4
cutadapt_pipe                    8
deseq2                           1
deseq2_init                      1
gene_2_symbol                    3
get_annotation                   1
get_genome                       1
multiqc                          1
pca                              1
rseqc_gtf2bed                    1
rseqc_infer                      4
rseqc_innerdis                   4
rseqc_junction_annotation        4
rseqc_junction_saturation        4
rseqc_readdis                    4
rseqc_readdup                    4
rseqc_readgc                     4
rseqc_stat                       4
star_index                       1
total                           61

Select jobs to execute...

[Wed Sep 20 22:13:01 2023]

group job 7c81e762-1a4a-4e53-a41e-3fb22085eb04 (jobs in lexicogr. order):


    [Wed Sep 20 22:13:01 2023]

    rule cutadapt_pe:
        input: pipe/cutadapt/B2/1.fq1.fastq, pipe/cutadapt/B2/1.fq2.fastq
        output: results/trimmed/B2_1_R1.fastq.gz, results/trimmed/B2_1_R2.fastq.gz, results/trimmed/B2_1.paired.qc.txt
        log: logs/cutadapt/B2_1.log
        jobid: 21
        reason: Missing output files: results/trimmed/B2_1_R1.fastq.gz, results/trimmed/B2_1_R2.fastq.gz; Input files updated by another job: pipe/cutadapt/B2/1.fq2.fastq, pipe/cutadapt/B2/1.fq1.fastq
        wildcards: sample=B2, unit=1
        threads: 2
        resources: tmpdir=/tmp


    [Wed Sep 20 22:13:01 2023]

    rule cutadapt_pipe:
        input: ngs-test-data/reads/b.scerevisiae.1.fq
        output: pipe/cutadapt/B2/1.fq1.fastq (pipe)
        log: logs/pipe-fastqs/catadapt/B2_1.fq1.fastq.log
        jobid: 22
        reason: Missing output files: pipe/cutadapt/B2/1.fq1.fastq
        wildcards: sample=B2, unit=1, fq=fq1, ext=fastq
        threads: 0
        resources: tmpdir=/tmp



    [Wed Sep 20 22:13:01 2023]

    rule cutadapt_pipe:
        input: ngs-test-data/reads/b.scerevisiae.2.fq
        output: pipe/cutadapt/B2/1.fq2.fastq (pipe)
        log: logs/pipe-fastqs/catadapt/B2_1.fq2.fastq.log
        jobid: 23
        reason: Missing output files: pipe/cutadapt/B2/1.fq2.fastq
        wildcards: sample=B2, unit=1, fq=fq2, ext=fastq
        threads: 0
        resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_

Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_

[Wed Sep 20 22:13:03 2023]

Finished job 22.

[Wed Sep 20 22:13:03 2023]

Finished job 23.

[Wed Sep 20 22:13:03 2023]

Finished job 21.

3 of 61 steps (5%) done

Select jobs to execute...

[Wed Sep 20 22:13:03 2023]

group job da2000d7-ffd9-4031-b802-451cbb89a149 (jobs in lexicogr. order):


    [Wed Sep 20 22:13:03 2023]

    rule cutadapt_pe:
        input: pipe/cutadapt/A1/1.fq1.fastq, pipe/cutadapt/A1/1.fq2.fastq
        output: results/trimmed/A1_1_R1.fastq.gz, results/trimmed/A1_1_R2.fastq.gz, results/trimmed/A1_1.paired.qc.txt
        log: logs/cutadapt/A1_1.log
        jobid: 6
        reason: Missing output files: results/trimmed/A1_1_R1.fastq.gz, results/trimmed/A1_1_R2.fastq.gz; Input files updated by another job: pipe/cutadapt/A1/1.fq1.fastq, pipe/cutadapt/A1/1.fq2.fastq
        wildcards: sample=A1, unit=1
        threads: 2
        resources: tmpdir=/tmp


    [Wed Sep 20 22:13:03 2023]

    rule cutadapt_pipe:
        input: ngs-test-data/reads/a.scerevisiae.1.fq
        output: pipe/cutadapt/A1/1.fq1.fastq (pipe)
        log: logs/pipe-fastqs/catadapt/A1_1.fq1.fastq.log
        jobid: 7
        reason: Missing output files: pipe/cutadapt/A1/1.fq1.fastq
        wildcards: sample=A1, unit=1, fq=fq1, ext=fastq
        threads: 0
        resources: tmpdir=/tmp



    [Wed Sep 20 22:13:03 2023]

    rule cutadapt_pipe:
        input: ngs-test-data/reads/a.scerevisiae.2.fq
        output: pipe/cutadapt/A1/1.fq2.fastq (pipe)
        log: logs/pipe-fastqs/catadapt/A1_1.fq2.fastq.log
        jobid: 8
        reason: Missing output files: pipe/cutadapt/A1/1.fq2.fastq
        wildcards: sample=A1, unit=1, fq=fq2, ext=fastq
        threads: 0
        resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_

Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_

[Wed Sep 20 22:13:04 2023]

Finished job 7.

[Wed Sep 20 22:13:04 2023]

Finished job 8.

[Wed Sep 20 22:13:04 2023]

Finished job 6.

6 of 61 steps (10%) done

Select jobs to execute...

[Wed Sep 20 22:13:04 2023]

group job 6288b35b-0f54-4cfe-bd22-4dbdc36e76b6 (jobs in lexicogr. order):


    [Wed Sep 20 22:13:04 2023]

    rule cutadapt_pe:
        input: pipe/cutadapt/A2/1.fq1.fastq, pipe/cutadapt/A2/1.fq2.fastq
        output: results/trimmed/A2_1_R1.fastq.gz, results/trimmed/A2_1_R2.fastq.gz, results/trimmed/A2_1.paired.qc.txt
        log: logs/cutadapt/A2_1.log
        jobid: 13
        reason: Missing output files: results/trimmed/A2_1_R1.fastq.gz, results/trimmed/A2_1_R2.fastq.gz; Input files updated by another job: pipe/cutadapt/A2/1.fq2.fastq, pipe/cutadapt/A2/1.fq1.fastq
        wildcards: sample=A2, unit=1
        threads: 2
        resources: tmpdir=/tmp


    [Wed Sep 20 22:13:04 2023]

    rule cutadapt_pipe:
        input: ngs-test-data/reads/c.scerevisiae.1.fq
        output: pipe/cutadapt/A2/1.fq1.fastq (pipe)
        log: logs/pipe-fastqs/catadapt/A2_1.fq1.fastq.log
        jobid: 14
        reason: Missing output files: pipe/cutadapt/A2/1.fq1.fastq
        wildcards: sample=A2, unit=1, fq=fq1, ext=fastq
        threads: 0
        resources: tmpdir=/tmp



    [Wed Sep 20 22:13:04 2023]

    rule cutadapt_pipe:
        input: ngs-test-data/reads/c.scerevisiae.2.fq
        output: pipe/cutadapt/A2/1.fq2.fastq (pipe)
        log: logs/pipe-fastqs/catadapt/A2_1.fq2.fastq.log
        jobid: 15
        reason: Missing output files: pipe/cutadapt/A2/1.fq2.fastq
        wildcards: sample=A2, unit=1, fq=fq2, ext=fastq
        threads: 0
        resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_

Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_

[Wed Sep 20 22:13:05 2023]

Finished job 14.

[Wed Sep 20 22:13:05 2023]

Finished job 15.

[Wed Sep 20 22:13:05 2023]

Finished job 13.

9 of 61 steps (15%) done

Select jobs to execute...

[Wed Sep 20 22:13:05 2023]

group job 38b55461-5e6f-47b3-9387-a9dcaf8b3b3c (jobs in lexicogr. order):


    [Wed Sep 20 22:13:05 2023]

    rule cutadapt_pe:
        input: pipe/cutadapt/B1/1.fq1.fastq, pipe/cutadapt/B1/1.fq2.fastq
        output: results/trimmed/B1_1_R1.fastq.gz, results/trimmed/B1_1_R2.fastq.gz, results/trimmed/B1_1.paired.qc.txt
        log: logs/cutadapt/B1_1.log
        jobid: 17
        reason: Missing output files: results/trimmed/B1_1_R1.fastq.gz, results/trimmed/B1_1_R2.fastq.gz; Input files updated by another job: pipe/cutadapt/B1/1.fq2.fastq, pipe/cutadapt/B1/1.fq1.fastq
        wildcards: sample=B1, unit=1
        threads: 2
        resources: tmpdir=/tmp


    [Wed Sep 20 22:13:05 2023]

    rule cutadapt_pipe:
        input: ngs-test-data/reads/c.scerevisiae.1.fq
        output: pipe/cutadapt/B1/1.fq1.fastq (pipe)
        log: logs/pipe-fastqs/catadapt/B1_1.fq1.fastq.log
        jobid: 18
        reason: Missing output files: pipe/cutadapt/B1/1.fq1.fastq
        wildcards: sample=B1, unit=1, fq=fq1, ext=fastq
        threads: 0
        resources: tmpdir=/tmp



    [Wed Sep 20 22:13:05 2023]

    rule cutadapt_pipe:
        input: ngs-test-data/reads/c.scerevisiae.2.fq
        output: pipe/cutadapt/B1/1.fq2.fastq (pipe)
        log: logs/pipe-fastqs/catadapt/B1_1.fq2.fastq.log
        jobid: 19
        reason: Missing output files: pipe/cutadapt/B1/1.fq2.fastq
        wildcards: sample=B1, unit=1, fq=fq2, ext=fastq
        threads: 0
        resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_

Activating conda environment: .snakemake/conda/686f2b5813d0bc7102134b5e3825944b_

[Wed Sep 20 22:13:06 2023]

Finished job 18.

[Wed Sep 20 22:13:06 2023]

Finished job 19.

[Wed Sep 20 22:13:06 2023]

Finished job 17.

12 of 61 steps (20%) done

Select jobs to execute...


[Wed Sep 20 22:13:06 2023]

rule get_genome:
    output: resources/genome.fasta
    log: logs/get-genome.log
    jobid: 10
    reason: Missing output files: resources/genome.fasta
    resources: tmpdir=/tmp


[Wed Sep 20 22:13:06 2023]

rule get_annotation:
    output: resources/genome.gtf
    log: logs/get_annotation.log
    jobid: 11
    reason: Missing output files: resources/genome.gtf
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/b989f3f8888314c661109a11e4951d06_

Activating conda environment: .snakemake/conda/b989f3f8888314c661109a11e4951d06_

[Wed Sep 20 22:13:09 2023]

Finished job 11.

13 of 61 steps (21%) done

Select jobs to execute...


[Wed Sep 20 22:13:09 2023]

rule rseqc_gtf2bed:
    input: resources/genome.gtf
    output: results/qc/rseqc/annotation.bed, results/qc/rseqc/annotation.db
    log: logs/rseqc_gtf2bed.log
    jobid: 28
    reason: Missing output files: results/qc/rseqc/annotation.bed; Input files updated by another job: resources/genome.gtf
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/5adf00fcbc983f732eb9e031c62ebc03_

Activating conda environment: .snakemake/conda/5adf00fcbc983f732eb9e031c62ebc03_

[Wed Sep 20 22:13:13 2023]

Finished job 10.

14 of 61 steps (23%) done

Select jobs to execute...

[Wed Sep 20 22:13:15 2023]

Finished job 28.

15 of 61 steps (25%) done

Removing temporary output results/qc/rseqc/annotation.db.


[Wed Sep 20 22:13:15 2023]

rule star_index:
    input: resources/genome.fasta, resources/genome.gtf
    output: resources/star_genome
    log: logs/star_index_genome.log
    jobid: 9
    reason: Missing output files: resources/star_genome; Input files updated by another job: resources/genome.fasta, resources/genome.gtf
    threads: 2
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/d02df01c1a926fffbd5dd7a2048236cd_

[Wed Sep 20 22:13:45 2023]

Finished job 9.

16 of 61 steps (26%) done

Select jobs to execute...


[Wed Sep 20 22:13:45 2023]

rule align:
    input: results/trimmed/B1_1_R1.fastq.gz, results/trimmed/B1_1_R2.fastq.gz, resources/star_genome, resources/genome.gtf
    output: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/star/B1_1/ReadsPerGene.out.tab
    log: logs/star/B1_1.log
    jobid: 16
    reason: Missing output files: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/star/B1_1/ReadsPerGene.out.tab; Input files updated by another job: results/trimmed/B1_1_R1.fastq.gz, resources/star_genome, resources/genome.gtf, results/trimmed/B1_1_R2.fastq.gz
    wildcards: sample=B1, unit=1
    threads: 2
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/8c561395ac76b6c2cbace6e1f2e26ae4_

[Wed Sep 20 22:14:33 2023]

Finished job 16.

17 of 61 steps (28%) done

Select jobs to execute...


[Wed Sep 20 22:14:33 2023]

rule rseqc_readdis:
    input: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/B1_1.readdistribution.txt
    log: logs/rseqc/rseqc_readdis/B1_1.log
    jobid: 50
    reason: Missing output files: results/qc/rseqc/B1_1.readdistribution.txt; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    wildcards: sample=B1, unit=1
    priority: 1
    resources: tmpdir=/tmp



Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:14:33 2023]

rule rseqc_junction_annotation:
    input: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/B1_1.junctionanno.junction.bed
    log: logs/rseqc/rseqc_junction_annotation/B1_1.log
    jobid: 30
    reason: Missing output files: results/qc/rseqc/B1_1.junctionanno.junction.bed, logs/rseqc/rseqc_junction_annotation/B1_1.log; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    wildcards: sample=B1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:14:35 2023]

Finished job 30.

18 of 61 steps (30%) done

Select jobs to execute...


[Wed Sep 20 22:14:35 2023]

rule rseqc_junction_saturation:
    input: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/B1_1.junctionsat.junctionSaturation_plot.pdf
    log: logs/rseqc/rseqc_junction_saturation/B1_1.log
    jobid: 34
    reason: Missing output files: results/qc/rseqc/B1_1.junctionsat.junctionSaturation_plot.pdf; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    wildcards: sample=B1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:14:36 2023]

Finished job 50.

19 of 61 steps (31%) done

Select jobs to execute...


[Wed Sep 20 22:14:36 2023]

rule rseqc_infer:
    input: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/B1_1.infer_experiment.txt
    log: logs/rseqc/rseqc_infer/B1_1.log
    jobid: 38
    reason: Missing output files: results/qc/rseqc/B1_1.infer_experiment.txt; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    wildcards: sample=B1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:14:38 2023]

Finished job 38.

20 of 61 steps (33%) done

Select jobs to execute...


[Wed Sep 20 22:14:38 2023]

rule rseqc_innerdis:
    input: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/B1_1.inner_distance_freq.inner_distance.txt
    log: logs/rseqc/rseqc_innerdis/B1_1.log
    jobid: 46
    reason: Missing output files: results/qc/rseqc/B1_1.inner_distance_freq.inner_distance.txt; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    wildcards: sample=B1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:14:38 2023]

Finished job 34.

21 of 61 steps (34%) done

Select jobs to execute...


[Wed Sep 20 22:14:38 2023]

rule rseqc_readdup:
    input: results/star/B1_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/B1_1.readdup.DupRate_plot.pdf
    log: logs/rseqc/rseqc_readdup/B1_1.log
    jobid: 54
    reason: Missing output files: results/qc/rseqc/B1_1.readdup.DupRate_plot.pdf; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:14:40 2023]

Finished job 54.

22 of 61 steps (36%) done

Select jobs to execute...


[Wed Sep 20 22:14:40 2023]

rule rseqc_readgc:
    input: results/star/B1_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/B1_1.readgc.GC_plot.pdf
    log: logs/rseqc/rseqc_readgc/B1_1.log
    jobid: 58
    reason: Missing output files: results/qc/rseqc/B1_1.readgc.GC_plot.pdf; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:14:40 2023]

Finished job 46.

23 of 61 steps (38%) done

Select jobs to execute...


[Wed Sep 20 22:14:40 2023]

rule rseqc_stat:
    input: results/star/B1_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/B1_1.stats.txt
    log: logs/rseqc/rseqc_stat/B1_1.log
    jobid: 42
    reason: Missing output files: results/qc/rseqc/B1_1.stats.txt; Input files updated by another job: results/star/B1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:14:42 2023]

Finished job 42.

24 of 61 steps (39%) done

Select jobs to execute...

[Wed Sep 20 22:14:42 2023]

Finished job 58.

25 of 61 steps (41%) done


[Wed Sep 20 22:14:42 2023]

rule align:
    input: results/trimmed/A2_1_R1.fastq.gz, results/trimmed/A2_1_R2.fastq.gz, resources/star_genome, resources/genome.gtf
    output: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/star/A2_1/ReadsPerGene.out.tab
    log: logs/star/A2_1.log
    jobid: 12
    reason: Missing output files: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/star/A2_1/ReadsPerGene.out.tab; Input files updated by another job: results/trimmed/A2_1_R1.fastq.gz, resources/star_genome, results/trimmed/A2_1_R2.fastq.gz, resources/genome.gtf
    wildcards: sample=A2, unit=1
    threads: 2
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/8c561395ac76b6c2cbace6e1f2e26ae4_

[Wed Sep 20 22:15:30 2023]

Finished job 12.

26 of 61 steps (43%) done

Select jobs to execute...


[Wed Sep 20 22:15:30 2023]

rule rseqc_infer:
    input: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/A2_1.infer_experiment.txt
    log: logs/rseqc/rseqc_infer/A2_1.log
    jobid: 37
    reason: Missing output files: results/qc/rseqc/A2_1.infer_experiment.txt; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    wildcards: sample=A2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_


[Wed Sep 20 22:15:30 2023]

rule rseqc_innerdis:
    input: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/A2_1.inner_distance_freq.inner_distance.txt
    log: logs/rseqc/rseqc_innerdis/A2_1.log
    jobid: 45
    reason: Missing output files: results/qc/rseqc/A2_1.inner_distance_freq.inner_distance.txt; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    wildcards: sample=A2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:32 2023]

Finished job 37.

27 of 61 steps (44%) done

Select jobs to execute...


[Wed Sep 20 22:15:32 2023]

rule rseqc_readdis:
    input: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/A2_1.readdistribution.txt
    log: logs/rseqc/rseqc_readdis/A2_1.log
    jobid: 49
    reason: Missing output files: results/qc/rseqc/A2_1.readdistribution.txt; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    wildcards: sample=A2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:33 2023]

Finished job 45.

28 of 61 steps (46%) done

Select jobs to execute...


[Wed Sep 20 22:15:33 2023]

rule rseqc_junction_annotation:
    input: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/A2_1.junctionanno.junction.bed
    log: logs/rseqc/rseqc_junction_annotation/A2_1.log
    jobid: 29
    reason: Missing output files: results/qc/rseqc/A2_1.junctionanno.junction.bed, logs/rseqc/rseqc_junction_annotation/A2_1.log; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    wildcards: sample=A2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:35 2023]

Finished job 49.

29 of 61 steps (48%) done

Select jobs to execute...


[Wed Sep 20 22:15:35 2023]

rule rseqc_junction_saturation:
    input: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/A2_1.junctionsat.junctionSaturation_plot.pdf
    log: logs/rseqc/rseqc_junction_saturation/A2_1.log
    jobid: 33
    reason: Missing output files: results/qc/rseqc/A2_1.junctionsat.junctionSaturation_plot.pdf; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    wildcards: sample=A2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:35 2023]

Finished job 29.

30 of 61 steps (49%) done

Select jobs to execute...


[Wed Sep 20 22:15:35 2023]

rule rseqc_stat:
    input: results/star/A2_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/A2_1.stats.txt
    log: logs/rseqc/rseqc_stat/A2_1.log
    jobid: 41
    reason: Missing output files: results/qc/rseqc/A2_1.stats.txt; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:37 2023]

Finished job 41.

31 of 61 steps (51%) done

Select jobs to execute...


[Wed Sep 20 22:15:37 2023]

rule rseqc_readdup:
    input: results/star/A2_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/A2_1.readdup.DupRate_plot.pdf
    log: logs/rseqc/rseqc_readdup/A2_1.log
    jobid: 53
    reason: Missing output files: results/qc/rseqc/A2_1.readdup.DupRate_plot.pdf; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:37 2023]

Finished job 33.

32 of 61 steps (52%) done

Select jobs to execute...


[Wed Sep 20 22:15:37 2023]

rule rseqc_readgc:
    input: results/star/A2_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/A2_1.readgc.GC_plot.pdf
    log: logs/rseqc/rseqc_readgc/A2_1.log
    jobid: 57
    reason: Missing output files: results/qc/rseqc/A2_1.readgc.GC_plot.pdf; Input files updated by another job: results/star/A2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:40 2023]

Finished job 53.

33 of 61 steps (54%) done

Select jobs to execute...

[Wed Sep 20 22:15:40 2023]

Finished job 57.

34 of 61 steps (56%) done


[Wed Sep 20 22:15:40 2023]

rule align:
    input: results/trimmed/B2_1_R1.fastq.gz, results/trimmed/B2_1_R2.fastq.gz, resources/star_genome, resources/genome.gtf
    output: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/star/B2_1/ReadsPerGene.out.tab
    log: logs/star/B2_1.log
    jobid: 20
    reason: Missing output files: results/star/B2_1/ReadsPerGene.out.tab, results/star/B2_1/Aligned.sortedByCoord.out.bam; Input files updated by another job: resources/star_genome, results/trimmed/B2_1_R1.fastq.gz, resources/genome.gtf, results/trimmed/B2_1_R2.fastq.gz
    wildcards: sample=B2, unit=1
    threads: 2
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/8c561395ac76b6c2cbace6e1f2e26ae4_

[Wed Sep 20 22:15:46 2023]

Finished job 20.

35 of 61 steps (57%) done

Select jobs to execute...


[Wed Sep 20 22:15:46 2023]

rule rseqc_junction_saturation:
    input: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/B2_1.junctionsat.junctionSaturation_plot.pdf
    log: logs/rseqc/rseqc_junction_saturation/B2_1.log
    jobid: 35
    reason: Missing output files: results/qc/rseqc/B2_1.junctionsat.junctionSaturation_plot.pdf; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/B2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B2, unit=1
    priority: 1
    resources: tmpdir=/tmp



Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:46 2023]

rule rseqc_infer:
    input: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/B2_1.infer_experiment.txt
    log: logs/rseqc/rseqc_infer/B2_1.log
    jobid: 39
    reason: Missing output files: results/qc/rseqc/B2_1.infer_experiment.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/B2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:48 2023]

Finished job 39.

36 of 61 steps (59%) done

Select jobs to execute...


[Wed Sep 20 22:15:48 2023]

rule rseqc_innerdis:
    input: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/B2_1.inner_distance_freq.inner_distance.txt
    log: logs/rseqc/rseqc_innerdis/B2_1.log
    jobid: 47
    reason: Missing output files: results/qc/rseqc/B2_1.inner_distance_freq.inner_distance.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/B2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:48 2023]

Finished job 35.

37 of 61 steps (61%) done

Select jobs to execute...


[Wed Sep 20 22:15:48 2023]

rule rseqc_readdis:
    input: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/B2_1.readdistribution.txt
    log: logs/rseqc/rseqc_readdis/B2_1.log
    jobid: 51
    reason: Missing output files: results/qc/rseqc/B2_1.readdistribution.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/B2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:51 2023]

Finished job 47.

38 of 61 steps (62%) done

Select jobs to execute...


[Wed Sep 20 22:15:51 2023]

rule rseqc_junction_annotation:
    input: results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/B2_1.junctionanno.junction.bed
    log: logs/rseqc/rseqc_junction_annotation/B2_1.log
    jobid: 31
    reason: Missing output files: logs/rseqc/rseqc_junction_annotation/B2_1.log, results/qc/rseqc/B2_1.junctionanno.junction.bed; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/B2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:51 2023]

Finished job 51.

39 of 61 steps (64%) done

Select jobs to execute...


[Wed Sep 20 22:15:51 2023]

rule rseqc_stat:
    input: results/star/B2_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/B2_1.stats.txt
    log: logs/rseqc/rseqc_stat/B2_1.log
    jobid: 43
    reason: Missing output files: results/qc/rseqc/B2_1.stats.txt; Input files updated by another job: results/star/B2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:53 2023]

Finished job 43.

40 of 61 steps (66%) done

Select jobs to execute...


[Wed Sep 20 22:15:53 2023]

rule rseqc_readgc:
    input: results/star/B2_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/B2_1.readgc.GC_plot.pdf
    log: logs/rseqc/rseqc_readgc/B2_1.log
    jobid: 59
    reason: Missing output files: results/qc/rseqc/B2_1.readgc.GC_plot.pdf; Input files updated by another job: results/star/B2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:53 2023]

Finished job 31.

41 of 61 steps (67%) done

Select jobs to execute...


[Wed Sep 20 22:15:53 2023]

rule rseqc_readdup:
    input: results/star/B2_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/B2_1.readdup.DupRate_plot.pdf
    log: logs/rseqc/rseqc_readdup/B2_1.log
    jobid: 55
    reason: Missing output files: results/qc/rseqc/B2_1.readdup.DupRate_plot.pdf; Input files updated by another job: results/star/B2_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=B2, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:15:56 2023]

Finished job 59.

42 of 61 steps (69%) done

Select jobs to execute...

[Wed Sep 20 22:15:56 2023]

Finished job 55.

43 of 61 steps (70%) done


[Wed Sep 20 22:15:56 2023]

rule align:
    input: results/trimmed/A1_1_R1.fastq.gz, results/trimmed/A1_1_R2.fastq.gz, resources/star_genome, resources/genome.gtf
    output: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/star/A1_1/ReadsPerGene.out.tab
    log: logs/star/A1_1.log
    jobid: 5
    reason: Missing output files: results/star/A1_1/ReadsPerGene.out.tab, results/star/A1_1/Aligned.sortedByCoord.out.bam; Input files updated by another job: resources/star_genome, resources/genome.gtf, results/trimmed/A1_1_R1.fastq.gz, results/trimmed/A1_1_R2.fastq.gz
    wildcards: sample=A1, unit=1
    threads: 2
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/8c561395ac76b6c2cbace6e1f2e26ae4_

[Wed Sep 20 22:16:02 2023]

Finished job 5.

44 of 61 steps (72%) done

Select jobs to execute...


[Wed Sep 20 22:16:02 2023]

rule rseqc_infer:
    input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/A1_1.infer_experiment.txt
    log: logs/rseqc/rseqc_infer/A1_1.log
    jobid: 36
    reason: Missing output files: results/qc/rseqc/A1_1.infer_experiment.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/A1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_


[Wed Sep 20 22:16:02 2023]

rule rseqc_junction_saturation:
    input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/A1_1.junctionsat.junctionSaturation_plot.pdf
    log: logs/rseqc/rseqc_junction_saturation/A1_1.log
    jobid: 32
    reason: Missing output files: results/qc/rseqc/A1_1.junctionsat.junctionSaturation_plot.pdf; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/A1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:16:04 2023]

Finished job 36.

45 of 61 steps (74%) done

Select jobs to execute...


[Wed Sep 20 22:16:04 2023]

rule rseqc_junction_annotation:
    input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/A1_1.junctionanno.junction.bed
    log: logs/rseqc/rseqc_junction_annotation/A1_1.log
    jobid: 27
    reason: Missing output files: logs/rseqc/rseqc_junction_annotation/A1_1.log, results/qc/rseqc/A1_1.junctionanno.junction.bed; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/A1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:16:04 2023]

Finished job 32.

46 of 61 steps (75%) done

Select jobs to execute...


[Wed Sep 20 22:16:04 2023]

rule rseqc_innerdis:
    input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/A1_1.inner_distance_freq.inner_distance.txt
    log: logs/rseqc/rseqc_innerdis/A1_1.log
    jobid: 44
    reason: Missing output files: results/qc/rseqc/A1_1.inner_distance_freq.inner_distance.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/A1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:16:07 2023]

Finished job 27.

47 of 61 steps (77%) done

Select jobs to execute...


[Wed Sep 20 22:16:07 2023]

rule rseqc_readdis:
    input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/annotation.bed
    output: results/qc/rseqc/A1_1.readdistribution.txt
    log: logs/rseqc/rseqc_readdis/A1_1.log
    jobid: 48
    reason: Missing output files: results/qc/rseqc/A1_1.readdistribution.txt; Input files updated by another job: results/qc/rseqc/annotation.bed, results/star/A1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:16:07 2023]

Finished job 44.

48 of 61 steps (79%) done

Select jobs to execute...


[Wed Sep 20 22:16:07 2023]

rule rseqc_stat:
    input: results/star/A1_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/A1_1.stats.txt
    log: logs/rseqc/rseqc_stat/A1_1.log
    jobid: 40
    reason: Missing output files: results/qc/rseqc/A1_1.stats.txt; Input files updated by another job: results/star/A1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:16:09 2023]

Finished job 40.

49 of 61 steps (80%) done

Select jobs to execute...


[Wed Sep 20 22:16:09 2023]

rule rseqc_readdup:
    input: results/star/A1_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/A1_1.readdup.DupRate_plot.pdf
    log: logs/rseqc/rseqc_readdup/A1_1.log
    jobid: 52
    reason: Missing output files: results/qc/rseqc/A1_1.readdup.DupRate_plot.pdf; Input files updated by another job: results/star/A1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:16:09 2023]

Finished job 48.

50 of 61 steps (82%) done

Select jobs to execute...


[Wed Sep 20 22:16:09 2023]

rule rseqc_readgc:
    input: results/star/A1_1/Aligned.sortedByCoord.out.bam
    output: results/qc/rseqc/A1_1.readgc.GC_plot.pdf
    log: logs/rseqc/rseqc_readgc/A1_1.log
    jobid: 56
    reason: Missing output files: results/qc/rseqc/A1_1.readgc.GC_plot.pdf; Input files updated by another job: results/star/A1_1/Aligned.sortedByCoord.out.bam
    wildcards: sample=A1, unit=1
    priority: 1
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/f4b389ae96894c64dac5fdb184105ba7_

[Wed Sep 20 22:16:12 2023]

Finished job 52.

51 of 61 steps (84%) done

Select jobs to execute...


[Wed Sep 20 22:16:12 2023]

rule count_matrix:
    input: results/star/A1_1/ReadsPerGene.out.tab, results/star/A2_1/ReadsPerGene.out.tab, results/star/B1_1/ReadsPerGene.out.tab, results/star/B2_1/ReadsPerGene.out.tab
    output: results/counts/all.tsv
    log: logs/count-matrix.log
    jobid: 4
    reason: Missing output files: results/counts/all.tsv; Input files updated by another job: results/star/B2_1/ReadsPerGene.out.tab, results/star/B1_1/ReadsPerGene.out.tab, results/star/A1_1/ReadsPerGene.out.tab, results/star/A2_1/ReadsPerGene.out.tab
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/45752da70eae001d2312f5cc28f0893b_

[Wed Sep 20 22:16:12 2023]

Finished job 56.

52 of 61 steps (85%) done

Select jobs to execute...


[Wed Sep 20 22:16:12 2023]

rule multiqc:
    input: results/star/A1_1/Aligned.sortedByCoord.out.bam, results/star/A2_1/Aligned.sortedByCoord.out.bam, results/star/B1_1/Aligned.sortedByCoord.out.bam, results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/A1_1.junctionanno.junction.bed, results/qc/rseqc/A2_1.junctionanno.junction.bed, results/qc/rseqc/B1_1.junctionanno.junction.bed, results/qc/rseqc/B2_1.junctionanno.junction.bed, results/qc/rseqc/A1_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/A2_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/B1_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/B2_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/A1_1.infer_experiment.txt, results/qc/rseqc/A2_1.infer_experiment.txt, results/qc/rseqc/B1_1.infer_experiment.txt, results/qc/rseqc/B2_1.infer_experiment.txt, results/qc/rseqc/A1_1.stats.txt, results/qc/rseqc/A2_1.stats.txt, results/qc/rseqc/B1_1.stats.txt, results/qc/rseqc/B2_1.stats.txt, results/qc/rseqc/A1_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/A2_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/B1_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/B2_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/A1_1.readdistribution.txt, results/qc/rseqc/A2_1.readdistribution.txt, results/qc/rseqc/B1_1.readdistribution.txt, results/qc/rseqc/B2_1.readdistribution.txt, results/qc/rseqc/A1_1.readdup.DupRate_plot.pdf, results/qc/rseqc/A2_1.readdup.DupRate_plot.pdf, results/qc/rseqc/B1_1.readdup.DupRate_plot.pdf, results/qc/rseqc/B2_1.readdup.DupRate_plot.pdf, results/qc/rseqc/A1_1.readgc.GC_plot.pdf, results/qc/rseqc/A2_1.readgc.GC_plot.pdf, results/qc/rseqc/B1_1.readgc.GC_plot.pdf, results/qc/rseqc/B2_1.readgc.GC_plot.pdf, logs/rseqc/rseqc_junction_annotation/A1_1.log, logs/rseqc/rseqc_junction_annotation/A2_1.log, logs/rseqc/rseqc_junction_annotation/B1_1.log, logs/rseqc/rseqc_junction_annotation/B2_1.log
    output: results/qc/multiqc_report.html
    log: logs/multiqc.log
    jobid: 26
    reason: Missing output files: results/qc/multiqc_report.html; Input files updated by another job: results/qc/rseqc/A1_1.inner_distance_freq.inner_distance.txt, results/star/A2_1/Aligned.sortedByCoord.out.bam, logs/rseqc/rseqc_junction_annotation/A1_1.log, logs/rseqc/rseqc_junction_annotation/B1_1.log, results/qc/rseqc/B1_1.stats.txt, results/qc/rseqc/B1_1.readdup.DupRate_plot.pdf, results/qc/rseqc/A1_1.junctionanno.junction.bed, results/qc/rseqc/A1_1.stats.txt, results/qc/rseqc/A2_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/B2_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/B2_1.readgc.GC_plot.pdf, results/star/A1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/B2_1.readdistribution.txt, results/qc/rseqc/A1_1.readgc.GC_plot.pdf, results/qc/rseqc/A1_1.readdup.DupRate_plot.pdf, results/qc/rseqc/B2_1.readdup.DupRate_plot.pdf, results/qc/rseqc/B1_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/B1_1.infer_experiment.txt, logs/rseqc/rseqc_junction_annotation/A2_1.log, results/qc/rseqc/A2_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/B1_1.readdistribution.txt, results/qc/rseqc/A2_1.readdup.DupRate_plot.pdf, results/qc/rseqc/B2_1.junctionanno.junction.bed, logs/rseqc/rseqc_junction_annotation/B2_1.log, results/star/B2_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/B2_1.inner_distance_freq.inner_distance.txt, results/qc/rseqc/A1_1.readdistribution.txt, results/qc/rseqc/A2_1.junctionanno.junction.bed, results/star/B1_1/Aligned.sortedByCoord.out.bam, results/qc/rseqc/B2_1.stats.txt, results/qc/rseqc/A1_1.infer_experiment.txt, results/qc/rseqc/B1_1.junctionsat.junctionSaturation_plot.pdf, results/qc/rseqc/A2_1.infer_experiment.txt, results/qc/rseqc/B1_1.readgc.GC_plot.pdf, results/qc/rseqc/A2_1.stats.txt, results/qc/rseqc/A2_1.readgc.GC_plot.pdf, results/qc/rseqc/B2_1.infer_experiment.txt, results/qc/rseqc/B1_1.junctionanno.junction.bed, results/qc/rseqc/A2_1.readdistribution.txt, results/qc/rseqc/A1_1.junctionsat.junctionSaturation_plot.pdf
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/45752da70eae001d2312f5cc28f0893b_

Activating conda environment: .snakemake/conda/fdd167354aa72861c97737721eeeb361_

Activating conda environment: .snakemake/conda/fdd167354aa72861c97737721eeeb361_

[Wed Sep 20 22:16:13 2023]

Finished job 4.

53 of 61 steps (87%) done

Select jobs to execute...


[Wed Sep 20 22:16:13 2023]

rule deseq2_init:
    input: results/counts/all.tsv
    output: results/deseq2/all.rds, results/deseq2/normcounts.tsv
    log: logs/deseq2/init.log
    jobid: 3
    reason: Missing output files: results/deseq2/all.rds, results/deseq2/normcounts.tsv; Input files updated by another job: results/counts/all.tsv
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/97c4b09bad6bfa4ada6fe14151f3ec1c_

[Wed Sep 20 22:16:16 2023]

Finished job 26.

54 of 61 steps (89%) done

Select jobs to execute...


[Wed Sep 20 22:16:16 2023]

rule gene_2_symbol:
    input: results/counts/all.tsv
    output: results/counts/all.symbol.tsv
    log: logs/gene2symbol/results/counts/all.log
    jobid: 25
    reason: Missing output files: results/counts/all.symbol.tsv; Input files updated by another job: results/counts/all.tsv
    wildcards: prefix=results/counts/all
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/c39789d0da717f333b9d416fae158cd7_

?25h
── 
Attaching packages
 ─────────────────────────────────────── tidyverse 1.3.2 ──
 
ggplot2
 3.4.3     
 
purrr  
 1.0.1
 
tibble 
 3.2.1     
 
dplyr  
 1.1.3
 
tidyr  
 1.3.0     
 
stringr
 1.5.0
 
readr  
 2.1.4     
 
forcats
 1.0.0
── 
Conflicts
 ────────────────────────────────────────── tidyverse_conflicts() ──
 
dplyr
::
filter()
 masks 
stats
::filter()
 
dplyr
::
lag()
    masks 
stats
::lag()
 
dplyr
::
select()
 masks 
biomaRt
::select()
?25h?25h?25h
Possible SSL connectivity problems detected.
Please report this issue at https://github.com/grimbough/biomaRt/issues
Error in curl::curl_fetch_memory(url, handle = handle) : 
  SSL peer certificate or SSH remote key was not OK: [uswest.ensembl.org] SSL certificate problem: certificate has expired
[Wed Sep 20 22:16:26 2023]

Finished job 3.

55 of 61 steps (90%) done

Select jobs to execute...


[Wed Sep 20 22:16:26 2023]

rule deseq2:
    input: results/deseq2/all.rds
    output: results/diffexp/treated-vs-untreated.diffexp.tsv, results/diffexp/treated-vs-untreated.ma-plot.svg
    log: logs/deseq2/treated-vs-untreated.diffexp.log
    jobid: 2
    reason: Missing output files: results/diffexp/treated-vs-untreated.diffexp.tsv; Input files updated by another job: results/deseq2/all.rds
    wildcards: contrast=treated-vs-untreated
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/97c4b09bad6bfa4ada6fe14151f3ec1c_

Ensembl site unresponsive, trying asia mirror
[Wed Sep 20 22:16:36 2023]

Finished job 2.

56 of 61 steps (92%) done

Select jobs to execute...


[Wed Sep 20 22:16:36 2023]

rule gene_2_symbol:
    input: results/diffexp/treated-vs-untreated.diffexp.tsv
    output: results/diffexp/treated-vs-untreated.diffexp.symbol.tsv
    log: logs/gene2symbol/results/diffexp/treated-vs-untreated.diffexp.log
    jobid: 1
    reason: Missing output files: results/diffexp/treated-vs-untreated.diffexp.symbol.tsv; Input files updated by another job: results/diffexp/treated-vs-untreated.diffexp.tsv
    wildcards: prefix=results/diffexp/treated-vs-untreated.diffexp
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/c39789d0da717f333b9d416fae158cd7_

?25h
── 
Attaching packages
 ─────────────────────────────────────── tidyverse 1.3.2 ──
 
ggplot2
 3.4.3     
 
purrr  
 1.0.1
 
tibble 
 3.2.1     
 
dplyr  
 1.1.3
 
tidyr  
 1.3.0     
 
stringr
 1.5.0
 
readr  
 2.1.4     
 
forcats
 1.0.0
── 
Conflicts
 ────────────────────────────────────────── tidyverse_conflicts() ──
 
dplyr
::
filter()
 masks 
stats
::filter()
 
dplyr
::
lag()
    masks 
stats
::lag()
 
dplyr
::
select()
 masks 
biomaRt
::select()
?25h
?25h?25h
Ensembl site unresponsive, trying www mirror
Ensembl site unresponsive, trying www mirror
?25h?25h
Batch submitting query [===============>---------------]  50% eta:  5s
                                                                      
?25h
?25h?25h?25h?25h?25h
[Wed Sep 20 22:17:00 2023]

Finished job 25.

57 of 61 steps (93%) done

Select jobs to execute...


[Wed Sep 20 22:17:00 2023]

rule pca:
    input: results/deseq2/all.rds
    output: results/pca.condition.svg
    log: logs/pca.condition.log
    jobid: 60
    reason: Missing output files: results/pca.condition.svg; Input files updated by another job: results/deseq2/all.rds
    wildcards: variable=condition
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/97c4b09bad6bfa4ada6fe14151f3ec1c_

?25h?25h
[Wed Sep 20 22:17:10 2023]

Finished job 60.

58 of 61 steps (95%) done

Select jobs to execute...


[Wed Sep 20 22:17:10 2023]

rule gene_2_symbol:
    input: results/deseq2/normcounts.tsv
    output: results/deseq2/normcounts.symbol.tsv
    log: logs/gene2symbol/results/deseq2/normcounts.log
    jobid: 24
    reason: Missing output files: results/deseq2/normcounts.symbol.tsv; Input files updated by another job: results/deseq2/normcounts.tsv
    wildcards: prefix=results/deseq2/normcounts
    resources: tmpdir=/tmp


Activating conda environment: .snakemake/conda/c39789d0da717f333b9d416fae158cd7_

?25h?25h?25h?25h?25h?25h
[Wed Sep 20 22:17:12 2023]

Finished job 1.

59 of 61 steps (97%) done

?25h
── 
Attaching packages
 ─────────────────────────────────────── tidyverse 1.3.2 ──
 
ggplot2
 3.4.3     
 
purrr  
 1.0.1
 
tibble 
 3.2.1     
 
dplyr  
 1.1.3
 
tidyr  
 1.3.0     
 
stringr
 1.5.0
 
readr  
 2.1.4     
 
forcats
 1.0.0
── 
Conflicts
 ────────────────────────────────────────── tidyverse_conflicts() ──
 
dplyr
::
filter()
 masks 
stats
::filter()
 
dplyr
::
lag()
    masks 
stats
::lag()
 
dplyr
::
select()
 masks 
biomaRt
::select()
?25h?25h?25h
Ensembl site unresponsive, trying www mirror
?25h?25h
?25h?25h?25h?25h?25h?25h
[Wed Sep 20 22:17:38 2023]

Finished job 24.

60 of 61 steps (98%) done

Select jobs to execute...


[Wed Sep 20 22:17:38 2023]

localrule all:
    input: results/diffexp/treated-vs-untreated.diffexp.symbol.tsv, results/deseq2/normcounts.symbol.tsv, results/counts/all.symbol.tsv, results/qc/multiqc_report.html, results/pca.condition.svg, results/pca.condition.svg
    jobid: 0
    reason: Input files updated by another job: results/counts/all.symbol.tsv, results/qc/multiqc_report.html, results/diffexp/treated-vs-untreated.diffexp.symbol.tsv, results/pca.condition.svg, results/deseq2/normcounts.symbol.tsv
    resources: tmpdir=/tmp


[Wed Sep 20 22:17:38 2023]

Finished job 0.

61 of 61 steps (100%) done

Complete log: .snakemake/log/2023-09-20T215311.271697.snakemake.log

Register outputs#

QC#

multiqc_file = ln.File(f"{root_dir}/.test/results/qc/multiqc_report.html")
multiqc_file.save()
How would I register all QC files?
multiqc_results = ln.File.from_dir(f"{root_dir}/results/qc/multiqc_report_data/")
ln.save(multiqc_results)

Count matrix#

count_matrix = ln.File(f"{root_dir}/.test/results/counts/all.symbol.tsv")
count_matrix.save()
❗ file has more than one suffix (path.suffixes), using only last suffix: '.tsv'

Track Snakemake ID#

Snakemake does not have an easily accessible ID that is associated with a run. Therefore, we need to extract it from the log files. We’re planning to simplify this process in the future.

import pathlib
from datetime import datetime

PATH_TO_DOT_SNAKEMAKE_LOG = "rna-seq-star-deseq2/.test/.snakemake/log"
log_files_file_names = list(
    map(
        lambda lf: str(lf).split("/")[-1],
        list(pathlib.Path(PATH_TO_DOT_SNAKEMAKE_LOG).glob("*.snakemake.log")),
    )
)

timestamps = [
    datetime.strptime(filename.split(".")[0], "%Y-%m-%dT%H%M%S")
    for filename in log_files_file_names
]
snakemake_id = log_files_file_names[timestamps.index(max(timestamps))].split(".")[1]

Let us add the information about the session ID to our run record:

run.reference = snakemake_id
run.reference_type = "snakemake_id"
run.save()

Visualize#

View data lineage:

count_matrix.view_flow()
_images/c7f179ef3a299f53b035b6b7f40e4d3cb26baea2cd73321816eb9fb4ae31d256.svg

View the database content:

ln.view()
File
storage_id key suffix accessor description version size hash hash_type transform_id run_id initial_version_id updated_at created_by_id
id
M8klgyfZO619dYLGwUVy t6Qgo37v rna-seq-star-deseq2/.test/results/counts/all.s... .tsv None None None 115658 Zf_hnhy4E3w4b30mRmhS2w md5 0wOgM2TBtkMV6M O0aXDnvbt4EHLqbc0RO6 None 2023-09-20 22:17:38 DzTjkKse
hzj9JOr81f7F4MsM04pO t6Qgo37v rna-seq-star-deseq2/.test/results/qc/multiqc_r... .html None None None 1125890 YPAwYAd7A-mhaHCXV_aWXw md5 0wOgM2TBtkMV6M O0aXDnvbt4EHLqbc0RO6 None 2023-09-20 22:17:38 DzTjkKse
ajrQiMQ0VjTnbTY1EziY t6Qgo37v rna-seq-star-deseq2/.test/ngs-test-data/reads/... .fq None None None 2159449 ofhOQDhdGWvkyzMgeuVh1g md5 hCfEkAZ94EeCNt XltwcXAZ7H4twaCujeqX None 2023-09-20 21:53:10 DzTjkKse
NfeSeRde79nEawAOjdQx t6Qgo37v rna-seq-star-deseq2/.test/ngs-test-data/reads/... .fq None None None 935120 27JbZ5KW0JsMRkICIMVoAQ md5 hCfEkAZ94EeCNt XltwcXAZ7H4twaCujeqX None 2023-09-20 21:53:10 DzTjkKse
qc1Rwx7x3Kw1ZrBWfjfr t6Qgo37v rna-seq-star-deseq2/.test/ngs-test-data/reads/... .fq None None None 2218894 DqfThx982Ai4akCcx-HikA md5 hCfEkAZ94EeCNt XltwcXAZ7H4twaCujeqX None 2023-09-20 21:53:10 DzTjkKse
VMlHiTDcEtWocndgXCUG t6Qgo37v rna-seq-star-deseq2/.test/ngs-test-data/reads/... .fq None None None 925634 B1MLHnWgnl4yOok0kkYIvA md5 hCfEkAZ94EeCNt XltwcXAZ7H4twaCujeqX None 2023-09-20 21:53:10 DzTjkKse
Fz3z3IP3CAQevQzHtxfL t6Qgo37v rna-seq-star-deseq2/.test/ngs-test-data/reads/... .fq None None None 2159449 zg7RgcXv7ue_dHb_Q7-1LQ md5 hCfEkAZ94EeCNt XltwcXAZ7H4twaCujeqX None 2023-09-20 21:53:10 DzTjkKse
Run
transform_id run_at created_by_id reference reference_type
id
XltwcXAZ7H4twaCujeqX hCfEkAZ94EeCNt 2023-09-20 21:53:10 DzTjkKse https://github.com/snakemake-workflows/rna-seq... url
O0aXDnvbt4EHLqbc0RO6 0wOgM2TBtkMV6M 2023-09-20 21:53:10 DzTjkKse 271697 snakemake_id
Storage
root type region updated_at created_by_id
id
t6Qgo37v /home/runner/work/snakemake-lamin-usecases/sna... local None 2023-09-20 21:53:07 DzTjkKse
Transform
name short_name version type reference reference_type initial_version_id updated_at created_by_id
id
0wOgM2TBtkMV6M snakemake-workflows/rna-seq-star-deseq2 None 2.0.0 pipeline https://github.com/laminlabs/snakemake-lamin-u... None None 2023-09-20 22:17:38 DzTjkKse
hCfEkAZ94EeCNt Download None None notebook None None None 2023-09-20 21:53:10 DzTjkKse
User
handle email name updated_at
id
DzTjkKse testuser1 testuser1@lamin.ai Test User1 2023-09-20 21:53:07

Clean up the test instance:

!lamin delete --force snakemake-bulkrna
Hide code cell output
💡 deleting instance testuser1/snakemake-bulkrna
✅     deleted instance settings file: /home/runner/.lamin/instance--testuser1--snakemake-bulkrna.env
✅     instance cache deleted
✅     deleted '.lndb' sqlite file
❗     consider manually deleting your stored data: /home/runner/work/snakemake-lamin-usecases/snakemake-lamin-usecases/docs