Week 1: Intro and HPC
2024-01-15
Sample 1
..ATGCTAGATCG…
…GGTACTACT…
Sample 2
..TGGTGTGCTG…
…GCTAGCTGCA…
Gene | Sample 1 | Sample 2 | … |
---|---|---|---|
XYZ | 1234 | 4312 | … |
… | 12 | 44 | … |
samples.fq.gz
filesStep | Environment | Rationale |
---|---|---|
reads ➡️ count table | HPC |
|
count table ➡️ plots | local |
|
reads
into a count table
, we need to use a HPC at a remote location.
graham
from our computer using ssh
graham
through Terminal/Command PromptWarning
Change skim823
with your username!
yes
Basic commands
- pwd
- ls
- cd
- cp
- mv
- mkdir
- touch
- rm
- cat
- head
- tail
Tips
- tab
to “auto” complete
- ctrl/cmd + c
to abort
- ctrl/cmd + a
to move the cursor to the beginning
- ctrl/cmd + e
to move the cursor to the end
- ctrl/cmd + l
to clear the screen
genome.fa
filegenes.gtf/.gff
filereads.fq.gz
filesreads.fq.gz
from your sequencing coregenome
is the reference build of your organism
genes
contains coordinates (among others) to all genes of that genome
genome
and genes
matchFirst, let’s create a new directory somewhere to save our files
Question: what is your current directory?
wget
is a command we can use to download files over the internet
These are our dummy reads (*.fq.gz
):
DKO
for double KO cellsD1
for DMSO treated replicate 1L00?
for sequencing lanes
wget https://github.com/kimsjune/ccir-bioinformatics2024/tree/main/week1/DKO_D1_L001/DKO_D1_L001.gz
wget https://github.com/kimsjune/ccir-bioinformatics2024/tree/main/week1/DKO_D1_L002/DKO_D1_L002.gz
wget https://github.com/kimsjune/ccir-bioinformatics2024/tree/main/week1/DKO_D1_L003/DKO_D1_L003.gz
wget https://github.com/kimsjune/ccir-bioinformatics2024/tree/main/week1/DKO_D1_L004/DKO_D1_L004.gz
reads.fq.gz
, genome.fa
, and genes.gtf
STAR index
Folder | Default quota | Backed up? | Purged? | Use case |
---|---|---|---|---|
/home/$USER |
50GB, 500k files | Y | N | setup files |
/scratch/$USER |
20TB, 1M files | N | Y, after 60d | day-to-day (temp) |
/project/SLRUM_GROUP/$USER |
1TB, 500k files | Y | N | static data |
/home/$USER/nearline |
2TB, 5k files per group | Y | N | long term storage |
command-line stuff (bash
)
basic commmands
basic commands
(network stuff not relevant)
RNA-seq
Griffith lab
Data carpentry
Harvard Core