Quality Control

Quality Control

Sequencing Coverage Histograms

The sequencing coverage histograms show distribution of coverage across all chromosomes. In case certain samples seem to have significantly decreased coverage, they should be excluded from the analysis.

Sample labels

Figure 1

Open PDF Figure 1

Sequencing coverage histogram visualize the bulk distribution of read coverage for each sample.

Sample coverage summary

This section contains summary metrics on the number of sites and coverage can be found in the table below. The summary table below is also available as csv file.

sampleName sites_num sites_covgMean sites_covgMedian sites_covgPerc25 sites_covgPerc75 sites_numCovg5 sites_numCovg10 sites_numCovg30 sites_numCovg60
HSC_1 HSC_1 1535856 57.5259047723224 43 11 78 1280343 1175788 923969 568650
HSC_2 HSC_2 1532604 68.6329221377473 54 16 95 1317401 1227364 1014681 702208
CLP_1 CLP_1 1562876 37.6499517556095 31 13 52 1341368 1235318 808100 291721
CLP_2 CLP_2 1611935 31.3210743609389 25 12 42 1426570 1281435 683855 162366
CMP_1 CMP_1 1577592 37.4243613050776 30 13 52 1377168 1260443 811698 288893
CMP_2 CMP_2 1603981 35.210826063401 29 15 47 1449164 1333852 791665 211269
CD4_1 CD4_1 1574822 46.0499218324357 39 17 64 1385345 1300649 961509 455348
CD4_2 CD4_2 1623584 35.9416205136291 29 14 48 1444739 1332125 803998 235704
B_cell_1 B_cell_1 1575604 43.468518104803 35 16 60 1390870 1296874 895373 404242
B_cell_2 B_cell_2 1607899 32.4855858483649 26 13 43 1430493 1297723 717193 170611
Eryth_1 Eryth_1 1406335 19.0826979347026 14 8 23 1204687 954506 190550 16582
Eryth_2 Eryth_2 1330606 10.3067782649409 7 3 12 882078 455716 25135 8311
Mono_1 Mono_1 1399328 17.8618501166274 13 7 21 1176040 908853 149869 13525
Mono_2 Mono_2 1372262 13.8322361181757 10 5 16 1065614 712164 58746 9402
Figure 2

Open PDF Figure 2

Covered sites and median coverages for each sample. Vertical bars depict 0.05 and 0.95 percentiles.

Sequencing Coverage Violin Plots

The plots below show an alternative approach to visualizing the coverage distribution.

Sample chunk

Figure 3

Open PDF Figure 3

Sequencing coverage histogram visualized as violin plots. Distributions are based on 1975284 methylation sites.

Sequencing Coverage Thresholds

In total, between 0.4 and 1.1 million sites are covered in all samples of the dataset. The figure below shows the change in supports for different coverage thresholds. The exact values are available in a dedicated comma-separated file accompanying this report.

Figure 4

Open PDF Figure 4

Line plot showing the number of CpG sites with a given support for different thresholds of minimal coverage. The support of a CpG site is the minimal number of samples that interrogate it.