producing Escherichia coli colonising adults in Blantyre, Malawi
reveals previously undescribed diversity
Supplementary material
This study
473 genomes
473 genomes
Musicha et al
93 genomes
97 genomes
4 excluded:
failed QC
3342 genomes
Horesh et al
50 largest popPUNK clusters:
• 500 representative
genomes as selected in
original publication
Other popPUNK clusters: 2776 genomes
• All isolates if ≤ 10 isolates
• 10 randomly selected
isolates if > 10 isolates
Supplementary Figure 1: Flowchart of included contextualising isolates.
, A
2
No. of STs per participant
5
No. of
Participants
4 10 5
75
3 22 10 4
50
2 62 7 4 0 25
1 99 4 0 1 0 0
1 2 3 4 5
No. of samples per participant
B
Between−participant, same ST Within−participant, same ST
8
Number of sample−pairs
300
6
200
4
100
2
0 0
0 50 100 150 200 0 50 100 150 200
Core genome SNP distance
Supplementary figure 2: Describing within-participant diversity. A: Heatmap
showing number of participants with a given number of samples and number of STs
per participant. Most (195/230, 85%) participants do not contribute samples with a
duplicated ST. B: Histogram of distribution of pairwise core genome SNP distance,
considering only sample pairs of the same ST, and stratified by whether sample pair
is between- or within-participant. Distributions are similar, justifying keeping all
samples in the analysis. Histogram has bin size 4 SNPs and x-axis is restricted to ≤
200 SNPs to show closely related isolates.