What is a CRAM file and how is it used in WGS (Whole Genome Sequencing) analysis?

Updated on

The CRAM file is a compressed format used to store genomic sequencing data that has previously been aligned against a reference genome. In the context of Whole Genome Sequencing (WGS), which makes it possible to analyse 100% of your DNA, the CRAM file contains the DNA reads generated by sequencing, together with their position within the reference genome and the associated quality information. 

Unlike the FASTQ file, which contains the raw data generated directly by the sequencing machine, the CRAM file represents a later stage of the bioinformatics analysis. In other words, the reads have already been processed and aligned, making it possible to know which region of the genome each DNA fragment is located in. 

CRAM is used by bioinformaticians, geneticists and researchers who wish to perform advanced analyses on aligned genomic data, such as reviewing specific regions of the genome, identifying variants, analysing coverage or validating certain genetic findings. It is also useful for users who wish to keep a processed and more compact version of their genomic data for future analyses. 

Characteristics of the CRAM file:

  • High compression: Takes up less space than other aligned formats thanks to a more efficient compression system.

  • Aligned reads: Contains information about the position of each read in the reference genome.

  • Quality information: Preserves relevant technical data, such as quality scores and other metadata associated with sequencing and alignment.

  • Storage efficiency: Especially useful for WGS data, as it allows large volumes of genomic information to be stored in a more compact way.

Limitations

  • Dependence on the reference genome: To read and interpret a CRAM file correctly, it is normally necessary to have the same reference genome used during alignment.

  • Not human-readable: You cannot directly “read” a CRAM file to obtain information about your genetic traits or predispositions. It requires specific bioinformatics tools to visualise or analyse it.

  • Requires technical knowledge: Its use is intended for advanced users, bioinformaticians or professionals familiar with genomic data analysis.

Formats and download:

In addition to CRAM, we also offer other technical formats such as FASTQ and VCF. You can download these files directly from your tellmeGen user account.

Technical requirements:

  • Operating system: Linux or macOS is recommended, although it can also be used on Windows with compatible tools or environments such as WSL.

  • Specific software: Bioinformatics tools such as samtools, IGV, bcftools or other specialised programs.

  • RAM: 32 GB or more is recommended for WGS analyses.

  • Storage: Although the CRAM file takes up less space than FASTQ, it is recommended to have sufficient storage capacity to work with complete genomic data.

This format is suitable for advanced users who wish to work with already aligned genomic data and perform more specific, efficient and detailed technical analyses of their whole genome.