Open-source software pipeline for cancer classification from high-throughput data using machine learning.
CancerDiscover is an open source command line pipeline tool (released under the GNU General Public License v3) that allow users to efficiently and automatically process large high-throughput datasets by converting data (for example CEL files, etc.), normalizing, and selecting best performing features from multiple feature selection algorithms. The pipeline lets users apply different feature thresholds and various learning algorithms to generate multiple prediction models that distinguish different types and subtypes of cancer.
Cite: If you use our tool, please cite Mohammed, A., Biegert, G., Adamec, J., & Helikar, T. (2018). CancerDiscover: an integrative pipeline for cancer biomarker and cancer class prediction from high-throughput sequencing data. Oncotarget, 9(2), 2565–2573. (https://doi.org/10.18632/oncotarget.23511)
Note: CancerDiscover is an open-source software, in case if you run across bugs or errors, raise an issue over here.
This README file will serve as a guide for using this software tool. We suggest reading through the document, in order to get an idea of the options available, and how to customize the pipeline to fit your needs.
You will need current or very recent generations of your operating system: Linux OS, Mac OSX.
curl -sL bit.do/installation_linux | sh
curl -sL bit.do/installation_mac | sh
To install CancerDiscover dependencies right from scratch, check out our exhaustive guides:
Dr. Akram Mohammed akrammohd@gmail.com
Dr. Tomas Helikar (PI) thelikar2@unl.edu
Dr. Jiri Adamec jadamec2@unl.edu
Greyson Biegert greyson@huskers.unl.edu
This software has been released under the GNU General Public License v3.