spaceranger count --id=18_57617_A1 --transcriptome=/home/skim823/projects/def-fdick/skim823/genomes/spacerange_hg38/refdata-gex-GRCh38-2020-A --probe-set=/home/skim823/projects/def-fdick/skim823/programs/spaceranger-2.1.1/probe_sets/Visium_Human_Transcriptome_Probe_Set_v2.0_GRCh38-2020-A.csv --fastqs=/scratch/skim823/visium/20240117_LH00244_0047_A22GM27LT3_Mura_Kim --sample=18_57617_A1_D1 --cytaimage=/scratch/skim823/visium/20240117_LH00244_0047_A22GM27LT3_Mura_Kim/etc/assay_CAVG10505_2023-12-06_10-13-34_V43L25-333_1701876913_CytAssist/CAVG10505_2023-12-06_10-35-13_2023-12-06_10-13-34_V43L25-333_D1_18-57617-A1.tif --image=/scratch/skim823/visium/20240117_LH00244_0047_A22GM27LT3_Mura_Kim/etc/tiff/18-57617-A1.tif --slide=V43L25-333 --area=D1 --loupe-alignment=/scratch/skim823/visium/20240117_LH00244_0047_A22GM27LT3_Mura_Kim/etc/json/18_57617_A1.jsonIntroduction
When I get *.fastq.gz files back for my Visium spatial libraries, spaceranger count command is used to generate various output files for QC metrics and downstream analysis. The command in my slurm job script looks like this:
With future samples, I want to use Nextflow to automate job submission.
Strategy
My initial thought was to parse params.fastq, but --cytaimage, --image, --area, and --loupe-alignment arguments are no where to be found in these fastq files (unless I submit an ungodly sample name to the genomics core). Instead, I can provide a metadata.csv and use splitCsv() to store and consume all the required arguments.
| id | sample | cytaimage | image | slide | area | json |
|---|---|---|---|---|---|---|
| 18_57617_A1 | 18_57617_A1_D1 | etc/assay_CAVG10505_2023-12-06_10-13-34_V43L25-333_1701876913_CytAssist/CAVG10505_2023-12-06_10-35-13_2023-12-06_10-13-34_V43L25-333_D1_18-57617-A1.tif | etc/tiff/18-57617-A1.tif | V43L25-333 | D1 | etc/json/18_57617_A1.json |
| 20_24241_B2 | 20_24241_B2_A1 | etc/assay_CAVG10505_2023-12-06_10-13-34_V43L25-333_1701876913_CytAssist/CAVG10505_2023-12-06_10-35-13_2023-12-06_10-13-34_V43L25-333_A1_20-24241-B2.tif | etc/tiff/20-24241-B2.tif | V43L25-333 | A1 | etc/json/20_24241_B2.json |
In the working directory, I have ${sample}_{S7,S8}_{L001,L002}_{R1,R2}_001.fastq.gz files. id and sample arguments in the .csv file must follow such format above. I think spaceranger is expecting some pre-determined fastq.gz read pairs across a couple of sequencing lanes.
etc/ is a subdirectory with CytAssist images, hi-res images, and alignment json files.
Nextflow
The full main.nf looks like this:
nextflow.enable.dsl=2
params.csv = "$projectDir/metadata.csv"
params.transcriptome = "/home/skim823/projects/def-fdick/skim823/genomes/spacerange_hg38/refdata-gex-GRCh38-2020-A"
params.probeSet = "/home/skim823/projects/def-fdick/skim823/programs/spaceranger-2.1.1/probe_sets/Visium_Human_Transcriptome_Probe_Set_v2.0_GRCh38-2020-A.csv"
csv_ch = Channel
.fromPath(params.csv)
.splitCsv(header: true)
.map(
row ->
tuple(row.id,
row.sample,
file (row.cytaimage),
file (row.image),
row.slide,
row.area,
file(row.json))
)
transcriptome_ch = Channel.fromPath(params.transcriptome)
probeSet_ch = Channel.fromPath(params.probeSet)
process SPACECOUNT {
publishDir "$projectDir/output", mode: "copy"
cpus 32
memory 128.GB
time 2.h
clusterOptions '--account=def-muram'
input:
tuple val(id), val(sample), file (cytaimage), file (image), val(slide), val(area), file (json)
// setting directories as path() doesn't seem to work. It can't resolve relative paths. If I just use val(), I just have to express parameters as absolute paths in the script.
// path doesn't work but file does!
path transcriptome
path probeSet
output:
path "$id/"
script:
"""
spaceranger count --id $id --fastqs $baseDir --sample $sample --cytaimage $cytaimage --image $image --slide $slide --area $area --loupe-alignment $json --transcriptome $transcriptome --probe-set $probeSet
"""
}
workflow {
SPACECOUNT(csv_ch, transcriptome_ch.collect(), probeSet_ch.collect())
}within
.map()(lines 9-18), must usefile()instead ofpath()(error otherwise)line 34: must use
file()for file paths instead of…path()(no error, but the relative path does not resolve). I thoughtfile()was DSL=1 lingo, but maybe not?
\ (•◡•) /