CRAM support and potential differences between BAM, MarkDuplicates CRAM, and BQSR-recalibrated CRAM

Hello CNVkit developers,

I am using CNVkit (v0.9.12) for copy-number analysis of tumor-only whole-exome sequencing (WES) data generated with an Illumina hybrid-capture protocol (DNA Prep with Enrichment, GRCh38).

I would like to clarify a few points regarding CRAM support and preprocessing steps.

1. CRAM support

Is CRAM considered fully supported for all CNVkit workflows (coverage, batch, fix, segment, scatter, diagram, etc.)?

2. BAM versus CRAM

Assuming that:

the BAM and CRAM contain the same alignments,
the CRAM was generated directly from the BAM,
the correct reference FASTA is available during CRAM decoding,

should CNVkit produce identical coverage estimates and copy-number results from BAM and CRAM inputs?

3. MarkDuplicates versus BQSR outputs

In a typical GATK preprocessing workflow:

```
Aligned BAM/CRAM
    ↓
MarkDuplicates
    ↓
BaseRecalibrator
    ↓
ApplyBQSR
```
CNVkit can be run either on:

the MarkDuplicates output, or
the final BQSR-recalibrated CRAM/BAM.

Since CNVkit relies primarily on read depth rather than base qualities, should the copy-number results be expected to be identical (or nearly identical) between:

MarkDuplicates CRAM/BAM
ApplyBQSR CRAM/BAM

or are there any known situations where BQSR could affect coverage estimation or downstream segmentation?

Thank you very much for your help and for maintaining CNVkit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CRAM support and potential differences between BAM, MarkDuplicates CRAM, and BQSR-recalibrated CRAM #1105

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

CRAM support and potential differences between BAM, MarkDuplicates CRAM, and BQSR-recalibrated CRAM #1105

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions