Preprocessing

Removal of SNP-enriched Sites

187194 sites were removed because they overlap with SNPs. The list of removed sites is available in a dedicated table accompanying this report.

Removal of High Coverage Outlier Sites

0 sites was detected as a high coverage outlier in at least one sample and removed at this step. An outlier site is defined as one whose coverage exceeds 50 times the 0.95-quantile of coverage values in its sample. The list of removed sites is available in a dedicated table accompanying this report.

Masking of Sites with Low Coverage

A total of 33773795 sites with coverage less than 5 were masked by NA in the methylation table The numbers of masked sites per sample are available in a dedicated table accompanying this report.

Removal of Sites with (Many) Missing Values

1374972 sites were removed because they contain more than 7 missing values in the methylation table. This threshold corresponds to 60% of all samples. The total number of missing values in the methylation table before this filtering step was 33257563. A dedicated table of all removed sites is attached to this report.

The figure below shows the distribution of missing values per site.

Sites to include

Figure 1

Open PDF Figure 1

Histogram of number of sites that contain missing values. The vertical line, if visible, denotes the applied threshold.

Filtering Summary

As a final outcome of the filtering procedures, 1562166 sites and 0 samples were removed. These statistics are presented in a dedicated table that accompanies this report and visualized in the figure below.

Figure 2

Open PDF Figure 2

Fractions of removed values in the dataset after applying filtering procedures.

The figure below compares the distributions of the removed methylation β values and of the retained ones.

Plot type

Figure 3

Open PDF Figure 3

Comparison of removed and retained β values.Both distributions are estimated by randomly sampling 1000000 values in each group.