diff --git a/CHANGELOG.md b/CHANGELOG.md index cad1c567..a87983e0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -15,6 +15,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - [#507](https://github.com/nf-core/funcscan/pull/507) Updated to nf-core template v3.5.1 (by @jfy133) - [#510](https://github.com/nf-core funcscan/pull/510) Fixed code to make Nextflow strict-syntax compliant (by @jfy133) - [#521](https://github.com/nf-core funcscan/pull/521) Added option to turn on RGI's own cleanup of intermediate files (❤️ to @SamD28 for requesting, added by @jfy133) +- [#519](https://github.com/nf-core/funcscan/pull/519) Added BiG-SLiCE (`bigslice`) as a new BGC clustering tool in the BGC subworkflow. Activated with `--bgc_run_bigslice` and requires `--bgc_bigslice_db` (by @SkyLexS) +- [#528](https://github.com/nf-core/funcscan/pull/528) Updated pipeline template to nf-core/tools version 4.0.2 (by @jfy133) ### `Fixed` @@ -22,11 +24,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### `Dependencies` -| Tool | Previous Version | New Version | -| ------- | ---------------- | ----------- | -| dbCAN | | 5.2.9 | -| MultiQC | 1.27 | 1.34 | -| Bakta | 1.10.4 | 1.11.4 | +| Tool | Previous Version | New Version | +| --------- | ---------------- | ----------- | +| dbCAN | | 5.2.9 | +| MultiQC | 1.27 | 1.34 | +| Bakta | 1.10.4 | 1.11.4 | +| BiG-SLiCE | | 2.0.2 | +| nf-core | 3.3.2 | 4.0.2 | ### `Deprecated` diff --git a/CITATIONS.md b/CITATIONS.md index f6c0630f..6705d544 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -38,6 +38,12 @@ > Schwengers, O., Jelonek, L., Dieckmann, M. A., Beyvers, S., Blom, J., & Goesmann, A. (2021). Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microbial Genomics, 7(11). [DOI: 10.1099/mgen.0.000685](https://doi.org/10.1099/mgen.0.000685) +- [BiG-SLiCE](https://github.com/medema-group/bigslice) + + > Kautsar, S. A., van der Hooft, J. J. J., de Ridder, D., & Medema, M. H. (2021). BiG-SLiCE: A highly scalable tool maps the diversity of 1.2 million biosynthetic gene clusters. GigaScience, 10(1), giaa154. [DOI: 10.1093/gigascience/giaa154](https://doi.org/10.1093/gigascience/giaa154) + + > Kautsar, S. A., et al. (2026). BiG-SLiCE 2.0: improved gene cluster family diversity mapping. Nature Communications. [DOI: 10.1038/s41467-026-68733-5](https://doi.org/10.1038/s41467-026-68733-5) + - [comBGC](https://github.com/nf-core/funcscan) > Frangenberg, J., Fellows Yates, J. A., Ibrahim, A., Perelo, L., & Beber, M. E. (2023). nf-core/funcscan: 1.0.0 - German Rollmops - 2023-02-15. [DOI: 10.5281/zenodo.7643100](https://doi.org/10.5281/zenodo.7643099) diff --git a/README.md b/README.md index 55ceaf73..b3b9e6cd 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,7 @@ The nf-core/funcscan AWS full test dataset are contigs generated by the MGnify s 4. Annotation of coding sequences from 3. to obtain general protein families and domains with [`InterProScan`](https://github.com/ebi-pf-team/interproscan) 5. Screening contigs for antimicrobial peptide-like sequences with [`ampir`](https://cran.r-project.org/web/packages/ampir/index.html), [`Macrel`](https://github.com/BigDataBiology/macrel), [`HMMER`](http://hmmer.org/), [`AMPlify`](https://github.com/bcgsc/AMPlify) 6. Screening contigs for antibiotic resistant gene-like sequences with [`ABRicate`](https://github.com/tseemann/abricate), [`AMRFinderPlus`](https://github.com/ncbi/amr), [`fARGene`](https://github.com/fannyhb/fargene), [`RGI`](https://card.mcmaster.ca/analyze/rgi), [`DeepARG`](https://bench.cs.vt.edu/deeparg). [`argNorm`](https://github.com/BigDataBiology/argNorm) is used to map the outputs of `DeepARG`, `AMRFinderPlus`, and `ABRicate` to the [`Antibiotic Resistance Ontology`](https://www.ebi.ac.uk/ols4/ontologies/aro) for consistent ARG classification terms. -7. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/) +7. Screening contigs for biosynthetic gene cluster-like sequences with [`antiSMASH`](https://antismash.secondarymetabolites.org), [`BiG-SLiCE`](https://github.com/medema-group/bigslice), [`DeepBGC`](https://github.com/Merck/deepbgc), [`GECCO`](https://gecco.embl.de/), [`HMMER`](http://hmmer.org/) 8. Screening contigs for carbohydrate-active enzymes (CAZymes), CAZyme gene clusters and substrates with [run_dbcan](https://github.com/bcb-unl/run_dbcan). 9. Creating aggregated reports for all samples across the workflows with [`AMPcombi`](https://github.com/paleobiotechnology/AMPcombi) for AMPs, [`hAMRonization`](https://github.com/pha4ge/hAMRonization) for ARGs, and [`comBGC`](https://raw.githubusercontent.com/nf-core/funcscan/master/bin/comBGC.py) for BGCs 10. Software version and methods text reporting with [`MultiQC`](http://multiqc.info/) diff --git a/conf/modules.config b/conf/modules.config index c8a394a9..60103280 100644 --- a/conf/modules.config +++ b/conf/modules.config @@ -541,6 +541,29 @@ process { ] } + withName: BIGSLICE_BIGSLICE { + errorStrategy = 'ignore' + ext.args = [ + params.bgc_bigslice_complete ? '--complete' : '', + "--threshold ${params.bgc_bigslice_threshold}", + "--threshold_pct ${params.bgc_bigslice_thresholdpct}", + "--n_ranks ${params.bgc_bigslice_nranks}", + ].join(' ').trim() + publishDir = [ + path: { "${params.outdir}/bgc/" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + } + + withName: BIGSLICE_DOWNLOADDB { + publishDir = [ + path: { "${params.outdir}/bgc/bigslice_db" }, + mode: params.publish_dir_mode, + saveAs: { filename -> filename.equals('versions.yml') ? null : filename }, + ] + } + withName: HAMRONIZATION_ABRICATE { publishDir = [ path: { "${params.outdir}/arg/hamronization/abricate" }, diff --git a/conf/test_bgc_bakta.config b/conf/test_bgc_bakta.config index 27717806..79f475cc 100644 --- a/conf/test_bgc_bakta.config +++ b/conf/test_bgc_bakta.config @@ -37,6 +37,8 @@ params { bgc_gecco_convertmode = 'gbk' bgc_gecco_convertformat = 'bigslice' + bgc_run_bigslice = true + bgc_run_hmmsearch = true bgc_hmmsearch_models = 'https://raw.githubusercontent.com/antismash/antismash/fd61de057e082fbf071732ac64b8b2e8883de32f/antismash/detection/hmm_detection/data/ToyB.hmm' } diff --git a/conf/test_preannotated_bgc.config b/conf/test_preannotated_bgc.config index 15ca6d71..36d45ce9 100644 --- a/conf/test_preannotated_bgc.config +++ b/conf/test_preannotated_bgc.config @@ -34,6 +34,8 @@ params { bgc_gecco_runconvert = true + bgc_run_bigslice = true + bgc_run_hmmsearch = true bgc_hmmsearch_models = 'https://raw.githubusercontent.com/antismash/antismash/fd61de057e082fbf071732ac64b8b2e8883de32f/antismash/detection/hmm_detection/data/ToyB.hmm' } diff --git a/docs/CONTRIBUTING.md b/docs/CONTRIBUTING.md index 87525580..b361a44e 100644 --- a/docs/CONTRIBUTING.md +++ b/docs/CONTRIBUTING.md @@ -182,4 +182,95 @@ If you update images or graphics, follow the nf-core [style guidelines](https:// ## Pipeline specific contribution guidelines - +### Pipeline specific conventions + +- [ ] Use `nextflow` autoformatter +- [ ] Channel assingment with `=` NOT `.set` +- [ ] Parameter names structure: `__` + - All components of structure should not have `_`, i.e., they should all be concatenated: `amp_hmmsearch_savealignments` not `amp_hmmsearch_save_alignments` + - Exception: subworkflow specific activation parameters must start with 'run': `run__screening`, e.g. `run_arg_screening` + - Exception: tool specific skipping parameters must use 'skip' in the second position: `_skip_`, e.g. `amp_skip_macrel` + +### Adding new tool workflow + +This checklist covers adding a specific tool to an _existing_ screening subworkflow. + +> [!NOTE] +> Does not have to be in this precise order + +- [ ] Installed modules `nf-core modules install /` +- [ ] Added tools(s) to relevant `subworkflows/local/.nf` + - [ ] Added relevant modules at the top using `include` statement + - [ ] Added the tool-specific if/else statement controlled with `params._skip_` + - [ ] (Old nf-core module template only) Version channels mixed for every newly module + - [ ] (If applicable) Include auto-database downloading module, if tool needs it + - [ ] (If applicable) Within if/else condition, output channels mixed into the subworkflow's final aggregation/summary tool mix output file for aggregating tool (e.g. hAMRronize, AMPcombi etc.) + - [ ] (If applicable) MultiQC channels mixed +- [ ] Updated `workflows/funcscan.nf` + - [ ] (If applicable) Added new input files via dedicated input channel + - [ ] (If applicable) Added tool specific input control conditions within the screening subworkflow's if/else statement +- [ ] Added parameters and defaults added to `nextflow.config` + - [ ] Added all new parameters following [pipeline-specific conventions](#pipeline-specific conventions) + - [ ] (If applicable) Where possible, include parameter for supplying locally-downloaded database +- [ ] Update `modules.conf` + - [ ] Added `withName:` block + - [ ] Added `ext.args = {}` entry including relevant new pipeline parameters + - [ ] Added `publishDir` placing relevant output files into the screening-subworkflow specific directory with `"${params.outdir}///` +- [ ] If necessary, added any additional pre-execution parameter validation checks at the top of `subworkflows/local/utils_nfcore_funcscan_pipeline.nf` (e.g. for mutually exclusive parameters) +- [ ] Updated Documentation + - [ ] `nf-core pipelines schema build` has been run and updated + - [ ] Checked all tool-specific pipeline parameters moved to the relevant screen type section of schema + - [ ] Checked all tool-specific pipeline parameters have short help text + - [ ] (If appplicable) Checked all tool-specific pipeline parameters have validation checks added (e.g. number range, fixed list etc.) + - [ ] Checked all tool-specific pipeline parameters have long-description with more information, including pointing to original documentation of tool itself + - [ ] (If applicable) Checked all tool-specific pipeline parameters have the `Modifies tool parameter(s)` quote block + - [ ] Added citation to `CITATIONS.md` (citation style: APA 7th edition) + - [ ] Added citation to the toolCitation/BibliographyText functions in `subworkflows/local/utils_nfcore_funcscan_pipeline` + - [ ] Added in-text citation + - [ ] Added bibliography (citation style: APA 7th edition) + - [ ] Added relevant documentation to `usage.md` + - [ ] (If applicable) Added entry in 'Databases and reference files' on how to download databases manually + - [ ] (If applicable) Added entry in 'Notes on screening tools <...>' if specific guidance is needed for execution + - [ ] (If applicable) If new input sample input files (e.g. annotation files) required, updated samplesheet description + - [ ] Described module output in `output.md` + - [ ] Added entry in relevant screening subworkflow section in introduction + - [ ] Added entry in introduction `tree` of whole output directory + - [ ] Added entry in 'Pipeline overview' table of contents + - [ ] Added dedicated 'Tool details' section in relevant subworkflow section including collapsable output list, description of a tool, and (ideally) description what primary output files can be used for + - [ ] Checked all output files specified in the `pattern:` section of `publishDir` are listed if `pattern:` is used, otherwise just all files found in results directory + - [ ] Added entry to 'Pipeline summary list' on `README.md` + - [ ] (Optional) Added to pipeline metro map diagram (can be done just before release) + - [ ] (First time contributor) add or move yourself to the Team list on `README.md`! + - [ ] (First time contributor) add or move yourself to the manifest section of `nextflow.config` as `contributor` +- [ ] (If applicable) On nf-core/test-data: added small test-database on the [funcscan](https://github.com/nf-core/test-datasets/tree/funcscan) branch + - [ ] Added documentation of source and/or how test-data generated/modified +- Updated relevant `conf/test*.conf` + - Specified skipping new tool to `true` in `test_minimal.config` + - (If applicable) Included/adjusted parameters in all relevant test configs + - (If applicable) Added paths to new test database files +- Updated tests + - Added assertions to all output files for each test the tool is executed in + - Updated relevant snapshots with `nf-test --tag --profile + --update-snapshot` + - Checked assertions stable with `nf-test --tag --profile +` +- Added entry to `CHANGELOG.md` (note: PR number can be added after) + - Tagged issue reporter/feature requester as well as author of PR + +### Adding a new screening subworkflow workflow + +Screening subworkflows should + +- [ ] Be written as a local subworkflow +- [ ] By default run all tools, and offer skipping of execution +- [ ] Include final emit channels at a minimum including: + - [ ] Versions channel + - [ ] (If any tools supported) MultiQC channel + - [ ] (If aggregation tool exists) include an aggregation tool step + +Subworkflows within the primary `workflow/funcscan.nf` file, should + +- [ ] Should be imported at the top of the module +- [ ] Have a dedicated if/else statement with running with and without taxonomic classification + - [ ] Should include a empty file filter + - [ ] (If subworkflow includes tools using old nf-core/modules structure) Should include the versions mixing +- [ ] (If applicable) be added to the Annotation if/else statement +- [ ] (If applicable) be included in the MultiQC annotation if/else statement diff --git a/docs/images/funcscan_metro_workflow.png b/docs/images/funcscan_metro_workflow.png index 2b9a6b6d..764fe5f4 100644 Binary files a/docs/images/funcscan_metro_workflow.png and b/docs/images/funcscan_metro_workflow.png differ diff --git a/docs/images/funcscan_metro_workflow.svg b/docs/images/funcscan_metro_workflow.svg index bbc850d2..cead0cdf 100644 --- a/docs/images/funcscan_metro_workflow.svg +++ b/docs/images/funcscan_metro_workflow.svg @@ -26,12 +26,12 @@ inkscape:pagecheckerboard="true" inkscape:deskcolor="#d1d1d1" inkscape:document-units="mm" - showgrid="false" + showgrid="true" borderlayer="true" showborder="true" - inkscape:zoom="1.4142136" - inkscape:cx="1153.9983" - inkscape:cy="295.57064" + inkscape:zoom="0.92956239" + inkscape:cx="891.8175" + inkscape:cy="313.05053" inkscape:window-width="1920" inkscape:window-height="1173" inkscape:window-x="1440" @@ -47,7 +47,7 @@ spacingy="1" spacingx="1" units="mm" - visible="false" />hAMRonizationhAMRonizationAMPcombiAMPcombicomBGCcomBGC133BiG-SLiCEhAMRonizationhAMRonizationAMPcombiAMPcombicomBGCcomBGCDeepBGCv3.1133BiG-SLiCE