Week 1: Intro and HPC
2024-01-15
Sample 1
..ATGCTAGATCG…
…GGTACTACT…
Sample 2
..TGGTGTGCTG…
…GCTAGCTGCA…
| Gene | Sample 1 | Sample 2 | … |
|---|---|---|---|
| XYZ | 1234 | 4312 | … |
| … | 12 | 44 | … |

samples.fq.gz files| Step | Environment | Rationale |
|---|---|---|
| reads ➡️ count table | HPC |
|
| count table ➡️ plots | local |
|
reads into a count table, we need to use a HPC at a remote location.
graham from our computer using sshgraham through Terminal/Command PromptWarning
Change skim823 with your username!
yesBasic commands
- pwd
- ls
- cd
- cp
- mv
- mkdir
- touch
- rm
- cat
- head
- tail
Tips
- tab to “auto” complete
- ctrl/cmd + c to abort
- ctrl/cmd + a to move the cursor to the beginning
- ctrl/cmd + e to move the cursor to the end
- ctrl/cmd + l to clear the screen
genome.fa filegenes.gtf/.gff filereads.fq.gz filesreads.fq.gz from your sequencing coregenome is the reference build of your organism
genes contains coordinates (among others) to all genes of that genome
genome and genes matchFirst, let’s create a new directory somewhere to save our files
Question: what is your current directory?
wget is a command we can use to download files over the internet
These are our dummy reads (*.fq.gz):
DKO for double KO cellsD1 for DMSO treated replicate 1L00? for sequencing lanes
wget https://github.com/kimsjune/ccir-bioinformatics2024/tree/main/week1/DKO_D1_L001/DKO_D1_L001.gz
wget https://github.com/kimsjune/ccir-bioinformatics2024/tree/main/week1/DKO_D1_L002/DKO_D1_L002.gz
wget https://github.com/kimsjune/ccir-bioinformatics2024/tree/main/week1/DKO_D1_L003/DKO_D1_L003.gz
wget https://github.com/kimsjune/ccir-bioinformatics2024/tree/main/week1/DKO_D1_L004/DKO_D1_L004.gzreads.fq.gz, genome.fa, and genes.gtfSTAR index
| Folder | Default quota | Backed up? | Purged? | Use case |
|---|---|---|---|---|
/home/$USER |
50GB, 500k files | Y | N | setup files |
/scratch/$USER |
20TB, 1M files | N | Y, after 60d | day-to-day (temp) |
/project/SLRUM_GROUP/$USER |
1TB, 500k files | Y | N | static data |
/home/$USER/nearline |
2TB, 5k files per group | Y | N | long term storage |
command-line stuff (bash)
basic commmands
basic commands
(network stuff not relevant)
RNA-seq
Griffith lab
Data carpentry
Harvard Core