Preprocessing

Removal of SNP-enriched Sites

SNP-based filtering was not performed because the site annotation does not include SNP information.

Removal of High Coverage Outlier Sites

434 sites was detected as a high coverage outlier in at least one sample and removed at this step. An outlier site is defined as one whose coverage exceeds 50 times the 0.95-quantile of coverage values in its sample. The list of removed sites is available in a dedicated table accompanying this report.

Masking of Sites with Low Coverage

A total of 3143364 sites with coverage less than 5 were masked by NA in the methylation table The numbers of masked sites per sample are available in a dedicated table accompanying this report.

Removal of Sites with (Many) Missing Values

579135 sites were removed because they contain more than 7 missing values in the methylation table. This threshold corresponds to 50% of all samples. The total number of missing values in the methylation table before this filtering step was 6338692. A dedicated table of all removed sites is attached to this report.

The figure below shows the distribution of missing values per site.

Sites to include

Figure 1

Open PDF Figure 1

Histogram of number of sites that contain missing values. The vertical line, if visible, denotes the applied threshold.

Filtering Summary

As a final outcome of the filtering procedures, 579569 sites and 0 samples were removed. These statistics are presented in a dedicated table that accompanies this report and visualized in the figure below.

Figure 2

Open PDF Figure 2

Fractions of removed values in the dataset after applying filtering procedures.

The figure below compares the distributions of the removed methylation β values and of the retained ones.

Plot type

Figure 3

Open PDF Figure 3

Comparison of removed and retained β values.Both distributions are estimated by randomly sampling 1000000 values in each group.