Running FastqToGeneCounts | FastqToGeneCounts Documentation

This is an overview of how to run the pipeline

Edit me

Overview

This section goes over how to execute the workflow
The following topics will be covered:

(Optional) Setting up Screen
Using SnakeMake’s dry run
Executing the workflow

(Optional) Using Screen

Unfortunately, Snakemake does not offer a method of closing the terminal while keeping the jobs running. This makes sense, as the main snakemake --profile cluster command is tied directly to the main terminal process. To overcome this, we will simply start a Screen session. This allows us to close the main terminal window, while keeping our SSH connection/instance alive.

Read more about Screen here

Alternatively, you can run snakemake in a bash script submitted to SLURM, as explained in this Google Doc

First, set a large scrollback for Screen, so we can view more lines after we have detached from the terminal. Execute the following:

echo "defscrollback 10000" >> ~/.screenrc

Once this is done, we can start a screen session with the following command:

screen -S snakemake

To leave the screen session while keeping it running, do the following:

Press and hold the control key
Press a. **Continue holding control``
Press d
The session will exit. Verify the session is alive, but detached by executing screen -ls
1. It should say (Detacted) next to the session name

To re-enter a screen session, execute the following:

# View all screen sessions
screen -ls

# This will show the following output (if a screen session is running)
> There is a screen on:
>	184700.snakemake	(Detached)
> 1 Socket in /run/screen/S-joshl.

# Pick the session you would like to enter (we are going to re-enter the `snakemake` session)
screen -r snakemake

SnakeMake Dry Run

It is highly recommended to run the workflow in dry run mode first to ensure that the workflow will run as expected. A dry run does several things:

It checks the syntax of the Snakefile
Allows you to see what steps in the workflow will be executed
Ensures preliminary configuration is set up properly

A dry run does not truly execute any components of the pipeline. No results will be generated

Execute the following to perform a dry run

# Activate our conda environment
module load mamba
mamba activate snakemake

# Change to the FastqToGeneCounts directory
cd /work/helikarlab/joshl/FastqToGeneCounts

# Perfom a dry run
snakemake --profile cluster --dry-run

Note: If you did renamed the cluster directory to something else, replace the --profile cluster with the name of your directory

Note: If you receive an error when running snakemake --profile cluster --dry-run, replcae cluster with ./cluster

After several seconds, many lines should move through the terminal.
It should end with This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

If this is not the case, an error has occured, and it will need to be investigated before continuing. If you are having troubles, please Open an Issue

Execution

Once you have confirmed that a dry-run will execute successfully, it is time to start a real run of the workflow.

Note: If you have started a screen session, now is the time to re-enter the session

To see what sessions are available:

screen -ls

To re-enter a session:

screen -r SESSION_NAME

The following steps will start the workflow:

# Activate the snakemake environment
module load mamba
mamba activate snakemake

# Make sure you are in the FastqToGeneCounts directory!
cd /work/helikarlab/joshl/FastqToGeneCounts

# Start the workflow
snakemake --profile cluster

Note: If you started a session with screen, exit the session with CTRL+a, d

Any log files will be found in the logs directory of the project directory.
Each rule has its own output folder, with output files containing the information they are running on (tissue name, run number, etc.)