Skip to content

CRAM support and potential differences between BAM, MarkDuplicates CRAM, and BQSR-recalibrated CRAM #1105

Description

@mi3112

Hello CNVkit developers,

I am using CNVkit (v0.9.12) for copy-number analysis of tumor-only whole-exome sequencing (WES) data generated with an Illumina hybrid-capture protocol (DNA Prep with Enrichment, GRCh38).

I would like to clarify a few points regarding CRAM support and preprocessing steps.

  1. CRAM support

Is CRAM considered fully supported for all CNVkit workflows (coverage, batch, fix, segment, scatter, diagram, etc.)?

  1. BAM versus CRAM

Assuming that:

the BAM and CRAM contain the same alignments,
the CRAM was generated directly from the BAM,
the correct reference FASTA is available during CRAM decoding,

should CNVkit produce identical coverage estimates and copy-number results from BAM and CRAM inputs?

  1. MarkDuplicates versus BQSR outputs

In a typical GATK preprocessing workflow:

Aligned BAM/CRAM
    ↓
MarkDuplicates
    ↓
BaseRecalibrator
    ↓
ApplyBQSR

CNVkit can be run either on:

the MarkDuplicates output, or
the final BQSR-recalibrated CRAM/BAM.

Since CNVkit relies primarily on read depth rather than base qualities, should the copy-number results be expected to be identical (or nearly identical) between:

MarkDuplicates CRAM/BAM
ApplyBQSR CRAM/BAM

or are there any known situations where BQSR could affect coverage estimation or downstream segmentation?

Thank you very much for your help and for maintaining CNVkit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions