Hello CNVkit developers,
I am using CNVkit (v0.9.12) for copy-number analysis of tumor-only whole-exome sequencing (WES) data generated with an Illumina hybrid-capture protocol (DNA Prep with Enrichment, GRCh38).
I would like to clarify a few points regarding CRAM support and preprocessing steps.
- CRAM support
Is CRAM considered fully supported for all CNVkit workflows (coverage, batch, fix, segment, scatter, diagram, etc.)?
- BAM versus CRAM
Assuming that:
the BAM and CRAM contain the same alignments,
the CRAM was generated directly from the BAM,
the correct reference FASTA is available during CRAM decoding,
should CNVkit produce identical coverage estimates and copy-number results from BAM and CRAM inputs?
- MarkDuplicates versus BQSR outputs
In a typical GATK preprocessing workflow:
Aligned BAM/CRAM
↓
MarkDuplicates
↓
BaseRecalibrator
↓
ApplyBQSR
CNVkit can be run either on:
the MarkDuplicates output, or
the final BQSR-recalibrated CRAM/BAM.
Since CNVkit relies primarily on read depth rather than base qualities, should the copy-number results be expected to be identical (or nearly identical) between:
MarkDuplicates CRAM/BAM
ApplyBQSR CRAM/BAM
or are there any known situations where BQSR could affect coverage estimation or downstream segmentation?
Thank you very much for your help and for maintaining CNVkit.
Hello CNVkit developers,
I am using CNVkit (v0.9.12) for copy-number analysis of tumor-only whole-exome sequencing (WES) data generated with an Illumina hybrid-capture protocol (DNA Prep with Enrichment, GRCh38).
I would like to clarify a few points regarding CRAM support and preprocessing steps.
Is CRAM considered fully supported for all CNVkit workflows (coverage, batch, fix, segment, scatter, diagram, etc.)?
Assuming that:
the BAM and CRAM contain the same alignments,
the CRAM was generated directly from the BAM,
the correct reference FASTA is available during CRAM decoding,
should CNVkit produce identical coverage estimates and copy-number results from BAM and CRAM inputs?
In a typical GATK preprocessing workflow:
CNVkit can be run either on:
the MarkDuplicates output, or
the final BQSR-recalibrated CRAM/BAM.
Since CNVkit relies primarily on read depth rather than base qualities, should the copy-number results be expected to be identical (or nearly identical) between:
MarkDuplicates CRAM/BAM
ApplyBQSR CRAM/BAM
or are there any known situations where BQSR could affect coverage estimation or downstream segmentation?
Thank you very much for your help and for maintaining CNVkit.