Example Workflow | COMO Documentation

Edit me

Overview

This example will identify drug targets for Rheumatoid Arthritis using GSMN of naïve CD4+ T-cell subtypes

Follow the steps found at Starting the Container to start the container.

Step 1

The outputs of STAR aligner with the --GeneCounts argument can be interfaced directly with COMO. Under the data/STAR_output/ folder, we provide a template structure with naïve B control cells from bulk RNA-sequencing experiments found in NCBI’s Gene Expression Omnibus. We can run the file src/rnaseq_preprocess.py with the create_count_matrix argument set to True to merge the counts from each replicate in each study into a single matrix.
This follows Step 1 in the COMO container.

Alternatively, you can provide a count matrix directly and run with create_count_matrix set to False to just fetch the necessary information required for normalization

Step 2

In the `` folder, we include the following files:

microarracy_data_inputs.xlsx contains GEO accession numbers of microarray data
proteomics_data_inputs.xlsx contains sample names of proteomics data

Protein abundance is located under /ProteomicsDataMatrix_Naive.csv. Sample names for bulk RNA-sequencing data is given in /bulk_data_inputs.xlsx. Running src/rnaseq_preprocess.py will create data/results/Gene_Info_naiveB.csv and data/data_matrices/BulkRNAseqMatrix_naiveB.csv if create_counts_matrix is set to True.

/microarray_data_inputs.xlsx includes microarray samples of naice CD4+ T cells from GSE22886, GSE43005, GSE22045, and GSE24634. The file data/config_sheets/proteomics_data_inputs.xlsx contains sample names of naive CD4+ T cells, with its results found in the file data/data_matrices/naiveB/protein_abundance_naiveB.csv

Using src/merge_xomics.py, you can specify any number of available data sources (microarray, bulk RNA-seq, and proteomics) as inputs. You can also set the expression_requirement parameter, which defines the minimum number of data sources that must have gene expression above the threshold limit for the said gene to be considered active

Note: If a gene is not supported by a data source or platform, the expression_requirement value will decrease by one for each input data source that does not support the gene.

Running Step 1 in the COMO container will generate “gene activity” files based on transcriptomics and proteomics data, as described by Puniya et al., 2020

This will save final output in GeneExpression_Naive_merged.csv and its path in step1_results_files.json

Step 3

Our pipeline includes a modified version of the Recon3D model to use as a reference for model contextualization. The modified version of Recon3D is available at data/GeneralModelUpdatedV2.mat.

Step 4 in COMO will use GeneExpression_Naive_merged.csv (from Step 1, above) in combination with data/GeneralModelUpdatedV2.mat to construct a cell-type specific model of Naive CD4+ cells.
We can use this model in the next steps. However, we advise users to properly investigate, manually curate, and reupload the refined version in data/results to use in Step 4.

Step 4

We used a dataset (GSE56649) of Rheumatoid Arthritis to identify differentially expressed genes (disease genes). We defined accession IDs of this dataset in the inputs found under /naiveB/disease/gene_counts_matrix_arthritis_naiveB.csv.

This step will generate files data/results/naiveB/Disease_UP_GSE56649.txt and data/results/naiveB/Disease_DOWN_GSE56649.txt, and their paths, at step2_results_files.json. Finally, this step will create a disease_files variable that will include aths of files for up- and down- regulated genes.

Step 5

This step will use the model (constructed in Step 2/uploaded curated version) and perform knock-out simulations of genes overlapping with the drug-target data file obtained from the ConnectivityMap database. We refined the drug target-data file and included it at data/RepurposingHub.txt.

This step will use the following files:

Naive_SpecificModel.json (or a pre-curated version uploaded as NaiveModel.mat)
Disease_UP_GSE56649.txt
Disease_DOWN_GSE56649.txt
RepurposingHub.txt

The final output files will include drug targets ranked based on Perburbation Effect Score (PES) as described by Puniya et al., 2020.

The output file data/output/d_score.csv will contain Entrez IDs of ranked genes and their corresponding PES. The file drug_score.csv will contain PES ranked drug targets (Entrez IDs and Gene Symbols) with mapped repurposed drugs.