May 27, 2024
Network of large pedigrees reveals social practices of Avar communities – Nature

Network of large pedigrees reveals social practices of Avar communities – Nature

Ancient-DNA laboratory analyses

For the archaeogenetic investigations, petrous bones and teeth were preferentially sampled (Supplementary Table 1). Samples were prepared in dedicated ancient-DNA laboratory facilities at the HUN-REN RCH Institute of Archaeogenomics in Budapest. Sample surfaces were decontaminated using UVC light and cleaned by mechanical removal. About 25–50 mg bone powder was obtained by drilling or powdering and transferred to MPI-EVA in Leipzig, Germany. DNA extraction and subsequent laboratory steps were done in the Ancient DNA Core Unit of the MPI-EVA. DNA was extracted from between 25 mg and 52 mg of powdered sample material using a silica-based method optimized for the recovery of short DNA fragments35. Briefly, lysates were prepared by adding 1 ml extraction buffer (0.45 M EDTA, pH 8.0, 0.25 mg ml–1 proteinase K, 0.05% Tween-20) to the sample material in 2.0-ml Eppendorf LoBind tubes and rotating the tubes at 37 °C for approximately 16 h35,36. Using an automated liquid-handling system (Bravo NGS Workstation B, Agilent Technologies), DNA was purified from 150 µl lysate using silica-coated magnetic beads and binding buffer D, as described previously36. Elution volume was 30 µl. Extraction blanks without sample material were carried alongside the samples during DNA extraction.

DNA libraries were prepared from 30 µl extract using an automated version of single-stranded DNA-library preparation37 described in detail previously38. Escherichia coli uracil–DNA–glycosylase (UDG) and E. coli endonuclease VIII were added during library preparation to remove uracils from the interior of molecules. Libraries were prepared from both the sample DNA extracts and the extraction blanks, and further negative controls (library blanks) were added. Library yields and efficiency of library preparation were determined using two quantitative PCR assays38. Libraries were tagged with pairs of sample-specific indices by PCR extension using AccuPrime Pfx DNA polymerase as described previously38. Indexed libraries were amplified and purified using SPRI (solid-phase reversible immobilization) technology39 as described previously38.

Sample and control libraries were enriched in solution for 1,237,207 informative SNPs (a method commonly used in the field and known as 1240k capture40) targeting 394,577 SNPs first reported in ref. 41 (390k panel) and 842,630 SNPs first reported in ref. 42 (840k panel). Two consecutive rounds of 1240k capture were performed using the Bravo NGS workstation B. Up to 20 libraries were pooled together and sequenced single-read or pair-read on a HiSeq4000 sequencing platform (Illumina Technology). In total, 440 1240k-enriched libraries were sequenced and an average coverage of 2.6× (median 2.25×) for the 1,237,207 sites in the genome, corresponding to a median of 708,514 1240k SNPs covered at least once (Supplementary Table 1).

Ancient-DNA data process and quality controls

The raw sequenced read data (fastq files) were processed through a nf-core/eager v.2.3.2 pipeline43 (https://nf-co.re/eager). To remove adaptors and short reads of less than 30 base pairs, AdapterRemoval v.2.3.1 was used44. The reads were then mapped to the Human Reference Genome Hs37d5 using the bwa v0.7.17 aln/samse alignment algorithm45 with the parameters -n and -l set to 0.01 and 1,024, respectively. The reads with phred mapping quality of less than 30 were then discarded using -q (q30-reads) in Samtools v1.9 (ref. 46). We then used the Picard tools MarkDuplicates function (https://github.com/broadinstitute/picard) to remove PCR duplicates. To estimate the amount of cytosine-to-thymine taphonomic deamination at the ends of the mapped fragments, we used mapDamage v.2.0 (ref. 47) run on a subset of 100,000 q30 reads. Exogenous human autosomal DNA contamination was estimated in male individuals by assessing X-chromosome heterozygosity levels using ANGSD v.0.910 (ref. 48) and mtDNA contamination in male and female individuals was estimated using Schmutzi49. Schmutzi was also used to reconstruct the consensus mitochondrial genome sequence of each individual used as input for HaploGrep2 (ref. 50) to assign mitochondrial haplogroups. For the purpose of graphical representation in Extended Data Fig. 4, all the mitochondrial haplogroups were pruned to the first three characters. If two individuals had, respectively, a two- and three-characters resolution, both of their haplogroups were trimmed to the first two characters. Individuals with only a one-character resolution were excluded from the plot.

Y-chromosome haplogroups were inferred using two different methodologies and the results compared. The Y-chromosome variants were called from in the bam files from samples whose genetic sex was estimated to be male or unassigned using the Samtools v1.946 mpileup and PileupCaller (https://github.com/stschiff/sequenceTools) using the mode –majorityCall; Y-chromosome haplogroup assignment was performed using the software yHaplo (https://github.com/23andMe/yhaplo), with ISOGG panel v.11.349 as a reference (https://isogg.org/tree/; date of access: 2 February 2023). Y-chromosome haplogroups were also defined using the Y-Lineage-Tracker subcommand ‘classify’51, using as a reference panel the ISOGG Y-haplogroup tree v.15.73 (https://isogg.org/tree/); in this case the input files were genotypes from each individual, estimated using the allelePresence method from the ATLAS (https://bitbucket.org/wegmannlab/atlas/)52 call tool, accounting for post-mortem damage patterns and base-score recalibration patterns, estimated respectively with the ATLAS tools PMD and recal.

The results from the two methodologies were then compared, taking into account the differences between the two reference panels. In cases where the two methodologies yielded deeply diverging results (that is, to the first two ISOGG alphanumeric classification symbols) or were discordant with the estimated reciprocal genetic relatedness between individuals (described in the Biological relatedness section), the haplogroup assessment was further investigated using the software pathPhynder53 with default options, using as reference the BigTree Y-chromosome dataset and the reference phylogenetic tree for sample placement provided by GitHub with the software and as input files the bam files filtered for phred mapping quality more than 30. In any other case, the conservative results from Y-LineageTracker (the column Key haplogroup) were considered reliable, given the more-stringent estimation of the genotypes and the updated ISOGG Y-chromosome phylogenetic tree version.

The results of the whole procedure can be found in Supplementary Table 1. PileupCaller (https://github.com/stschiff/sequenceTools) was used to carry out genotype calling from the q30 reads with the –randomHaploid flag that calls haploid genotypes by randomly choosing one high-quality base (phred base quality score ≥30) on the 1240k panel (pseudodiploid calls). We also used the –singleStrandMode, which removes only real cytosine-to-thymine deamination observed with single-stranded DNA libraries by ignoring cytosine–thymine polymorphisms at reads aligning to the forward strand and guanine–adenine polymorphisms at reads aligning to the reverse strand.

To produce the Y-chromosome haplogroup plots in Extended Data Fig. 4, all the haplogroup nomenclature was pruned to the first three characters; haplogroups with less than three characters of ISOGG notation were excluded from the plots. Complete Y-chromosome haplogroups can be found in Supplementary Table 1.

We found low mitochondrial contamination estimates (Supplementary Table 1). Most were less than 5% and only five samples had values between 5% and 10%. Of these we excluded one female individual (RKF048) with 7% contamination and one individual (KFJ019) with 5% contamination and ambiguous sex determination (an indirect sign of possible contamination); the remaining male individuals had low nuclear contamination and were therefore kept for nuclear genomic analyses. We also found low nuclear contamination estimates among the male individuals. We excluded four further individuals with values of more than 7%; RKF094 (15% contamination) was still counted among the related as showing high likelihood of close genetic relatedness with other individuals (Supplementary Table 4). We also excluded individuals with particularly low coverage (more than 20,000 SNPs) because they were not practically usable for further analyses (additional filtering for higher coverage thresholds is detailed for specific analyses in the following sections); these include two individuals also excluded for contamination and another 15 individuals still included as showing high likelihoods of close relatedness (RKF225, HNJ005, HNJ009 and RKF128). We kept 419 individuals for further analyses, 413 excluding one pair among the identical pairs found, and 424 including the previously published individuals from the KUP and KFJ sites10 (Supplementary Table 1). We then merged them with a reference genome-wide panel of 2,280 modern individuals genotyped with microarray technology using the commercial HumanOrigins chip54,55,56 and previously published ancient-individuals’ genotypes sequenced with the same 1240k capture method or a 1240k SNPs subset from data obtained using whole-genome shotgun sequencing10,27,54,55,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74 downloaded from Poseidon (https://poseidon-framework.github.io). We produced two datasets, one including the modern data and the SNPs overlap between the 1240k sites and the HumanOrigins SNP chip (1240KHO dataset, around 600,000 SNPs), and one with ancient data and the whole 1240k panel (the 1240k dataset).

Genomic ancestry modelling with PCA, qpWave/qpAdm, DATES

We used principal component analysis (PCA) with smartpca v.16000 in the EIGENSOFT v.6.0.1 package75 on the 1240KHO dataset using the lsqproject and the autoshrink parameters to project the genotypes of the ancient individuals (containing variable amounts of missing data) on top of the principal components calculated on the set of modern worldwide populations. For one PCA (Fig. 4a) we used a subset of Eurasian populations (the Eurasian PCA) as originally in reference54 adapted as in reference27, and for another PCA (Extended Data Fig. 8b) we used a standard subset of only west Eurasian populations (the west Eurasian PCA), as originally reported76 and then adapted10.

We used the software qpWave/qpAdm (v.1520) of the ADMIXTOOLS package56 to run the f4-statistics-based ancestry analyses on the 1240k dataset41,77. Standard errors for the computed f-statistics were estimated using a block jack-knife with a 5-cM block. We used the default allsnps: NO parameter, thereby calculating all the underlying f4-statistics using the SNP overlap between all the groups for each test. We used a set of outgroups (or right populations) that are similar to those of a previous study10 that included representatives of ancient Eurasian lineages (European Mesolithic hunter-gatherers, European/Anatolia Neolithic, Levant Neolithic, Iranian Neolithic for western Eurasia, and ancient North Eurasian lineage (ANE76), ANA, ancient Siberian and southern East Asia for eastern Eurasia, and key non-Eurasian ones (African, South Asian, Native American) when available, otherwise hte present-day proxies Mbuti.DG, Levant_N, Onge.DG, Iran_N, Iron_Gates_HG, EHG, Mixe.DG, Anatolia_N, DevilsCave_N.SG, Tarim_EMBA1, Kolyma_M.SG and YR_LN. The only difference with respect to ref. 10 is that we used Tarim_EMBA1 (ref. 72) instead of the three Russia_Bolshoy individuals65, which is a higher-coverage dataset of 12 individuals and a better representative of the ANE lineage73 than any other high-ANE ancestry group available in the literature.

To select the sources (or left populations) to model the admixed ancestry of our newly sequenced individuals (the targets), we followed the following rationale. Among the data available from previous studies, we selected only ancient populations (of more than two individuals) that are either approximately contemporaneous or temporally preceding but are as close as possible to the time period of our target individuals, as suggested previously78. In our selection, we also considered the findings from a previous genomic study of the Avar period10, as well as populations that are geographically, historically and archaeologically relevant. This led to a selection of 13 different source groups falling in 3 categories. (1) Sources representative of the east Eurasian Steppe ancestry that include ancient populations and cultures available from preceding time periods in the east Eurasian Steppe and surrounding areas in east Asia. (2) ‘Pre-Avar’ populations that are found in the Carpathian Basin in the first centuries ad, before the Avar period. (3) Relevant temporally preceding (first millennia bc and ad) populations available from across the Pontic- and central Asian Steppe (the ‘steppe’ sources).

Two- and three-way combinations of these sources led to a total of 190 different combinations being tested, all with qpWave P-values of much less than 0.05, which means that the sources are sufficiently differentiated with respect to the set of outgroups. They are therefore suitable sources to be tested76 (Supplementary Table 5), applying the following rationale, which is the same as that used in a previous study10 based on suggestions discussed previously78. We first tested two-way admixing sources using all combinations of eastern Eurasian Steppe groups plus the pre-Avar and steppe sources. If we could reject one but not the other, between the pre-Avar and steppe source models (if one had P < 0.05 we can reject; if the other had P  >  0.05 we cannot reject), we considered the one we cannot reject (P  >  0.05) as valid. If the two-way models did not significantly reject one or the other between the pre-Avar and steppe sources (both with P  >  0.05) or produced no fitting results at all (both with P   < 0.05), we proceeded by testing three-way competitive models, including the eastern Eurasian populations and contrasting directly the pre-Avar plus steppe sources as well as pre-Avar plus pre-Avar, accounting for the variability in ancestry and time period between the pre-Avar populations.

If the three-way models resulted in one of the two contrasting sources between pre-Avar plus steppe resetting the other (bringing its estimated admixture proportion to 0%), we considered these models. If the contrasting sources had intermediate admixture proportions, we considered as successful only those tests that could reject one of the two scenarios between either pre-Avar plus steppe or pre-Avar plus pre-Avar. The individuals who still had unresolved or non-fitting models between a pre-Avar or a steppe source were considered unsolved or failed and were not used for further meta-analyses or interpretations.

For the sake of simplicity and consistency, we chose one eastern Eurasian source to include in our plots and summary statistics: the genetically easternmost group of individuals from the early Avar period in the DTI region (DTI_EA_East; Fig. 4 and previously published10), to which we added data from unrelated individuals at the early-period site of KUP that presented the same genomic profile (Supplementary Fig. 12). We always used this eastern proxy, except in the few instances in which it did not produce fitting models, in favour of another one, suggesting an existing heterogeneity in the eastern component although much reduced with respect to the variability in the western sources (Supplementary Table 5). Nevertheless, it is important to note that although DTI_EA_East is the source that overall produced more fitting models, several other eastern sources (including lateXiongnu, AR_Xianbei_P_2c) resulted in many equally fitting models as well (Supplementary Table 5).

We used DATES v.753 (https://github.com/priyamoorjani/DATES) to date the average time of the east–west Eurasian ancestry admixture estimated for most of the Avar period individuals from the four sites analysed. This method is based on the same principle as many admixture dating methods70,79; it assumes an admixture event between two admixing source populations, an east Asian and a west Eurasian one; in our case we used the unadmixed and high-SNP-covered LBA/IA group of the Ulaanzuukh_SlabGrave in Mongolia63 or the same DTI_EA_East group used in the main qpAdm models as an ANA proxy and the pre-Avar Carpathian Basin ancient sources, Sarmatian10 and Longobard period58 individuals, as a west Eurasian ancestry proxy. DATES calculates the decay of ancestry covariance coefficients between every pair of available overlapping SNPs between the test individuals and the source populations over increasing-genetic-distance windows70. Population-genetic theory suggests that if admixture happens, an exponential function can be fitted to the decay of weighted ancestry covariance, and the number of generations since admixture can be derived from the parameters of such functions79. The higher age limit of admixture events that would still produce detectable decays is theoretically considered to be around 4,000 years80. In practice, recent admixture events (about one to three generations ago) are not properly detected because chromosomal recombination had insufficient generation time to start producing the expected decay pattern81,82. To estimate the goodness of a fit, DATES calculates standard errors and Z-scores using a jack-knife approach, dropping a chromosome at a time. We set a maximum distance parameter of 0.5 cM, a bin size of 0.001 and a starting genetic distance of 0.45 cM. The integrated least-square function was used to estimate the number of generations since admixture parameter. If the raw data show no decay, the exponential function either cannot be fitted or is fitted with low Z-scores, much less than 2, and unreasonable dating estimates with negative values, or large numbers over the theoretical maximum of 4,000 years back in time. All samples showing such values were also inferred as non-admixed by PCA and qpAdm and were excluded from our inferences. For Extended Data Fig. 9, we also included dates with Z-scores of less than 2 (shown with a transparency factor) because in part they reflect the recent (for example, first or second generation) admixture events that we can observe directly in the pedigrees. These DATES estimates are mostly not significant because there is no decay pattern yet to fit an exponential function, but some still provide qualitatively correct recent admixture dates (Supplementary Table 1). We used a standard of 29 years per generation70 to convert the generation times in years since admixture, and used the Avar-period chronological phase of the individuals as the date at death.

Biological relatedness

We used KIN16 as the primary method to assess biological relatedness between each pair of individuals from the four sites we investigated, although we validated the relatedness estimates with the independent methods of haplotype-IBD (detailed below) and BREAD (https://github.com/jonotuke/BREADR) (Supplementary Information). Given that single-stranded UDG-half treated libraries still preserve a roughly 10–30% proportion of C-to-T deamination at the last two base pairs of the mapped fragments, for this analyses we masked two base pairs at both ends of the q30 reads using the trimBam module of bamUtil v.1.0.13 (ref. 83) and used these masked bam files as input data. KIN can confidently identify first- and second-degree relations while differentiating between parent–child and sibling relations16. Although the method does not explicitly differentiate relationships within the second degree, it outputs information about IBD sharing that can help to differentiate between avuncular and grandparent–grandchild relationships. We simulated avuncular, half-sibling and grandparent–grandchild pairs (Supplementary Information) to show that the length of IBD segments and the number of IBD segments can be used to differentiate between avuncular and grandparent–grandchild relationships, while half siblings overlap with both cases. Furthermore, KIN provides indications about third-degree relationships (with around 70% accuracy at 4× sequence coverage). Although these analyses are not sufficient to confidently identify within second-degree relationships, and may lack the power to identify third-degree relatives, they can be crucial when combined with other information, such as pedigree information from different pairs as well as from information about the skeletal age at death, the sex and the uniparental haplogroups (Y chromosome and mtDNA). Therefore, all this information was considered when building and cross-checking the pedigrees of biological relatedness (Supplementary Information). For clarity, we numbered the pedigrees that we found and we define one pedigree as a group of individuals who can be directly connected with close genetic relatedness and for whom a line of descent can be traced. In the case of the largest pedigree we reconstructed (146 individuals from RK), we divided it into five pedigrees descending from five different groups of 11 ‘founder male individuals’ (including multiple brothers as co-founders).

Simulations on second-degree relationships

We followed the methods section for KIN16 and simulated eight diploid individuals using msprime84 with default parameters for the mutation rate (1 × 10–8 per base per generation), the recombination rate r (1 × 10–8 per base per generation) and an effective population size of 3,000. For each individual, we simulated 22 chromosomes with the same lengths as the GRCh38.p14 genome. To form a pedigree, we first simulated a recombined set of chromosomes for each parent and combined them to create the progeny. We obtained recombination points for each chromosome from the software Ped-sim85. We matched the genotype density and the coverage of reads to that of our samples. We simulated 60 such pedigrees (see figure S9 in ref. 16 and Supplementary Figs. 17 and 18).

Consanguinity test (ROHs)

Consanguinity can be tested genetically by a straightforward approach: counting the length and number of long stretches of homozygous portions along the genome of an individual. This analysis is usually defined as ROHs. To estimate ROH, we applied a method called hapROH86 that was designed to infer them on pseudo-haploid, lower-coverage and higher-missing-data ancient DNA samples; the method has also been shown empirically to be highly consistent with independent ROH estimates calculated on the same ancient imputed diploid genomes10. Specific patterns of long ROH (more than 4 cM) along the genome of an individual are typical of consanguineous unions between some of its recent ancestors (up to second-degree cousins86). In Extended Data Fig. 5 we plotted ROH using the python package implemented in hapROH (https://pypi.org/project/hapROH/).

Genotype likelihood calls and imputation/phasing

Haplotype-based analyses (such as IBD described below) require information of the phase for each pair of paternal and maternal chromosomes of an individual, and this in turn requires there to be virtually no missing data along the genome. Obtaining such data from ancient genomes has been shown by recent studies87,88 to be reliable in other similar contexts for coverage of more than 0.5–0.7×, and it has also been applied to 1240k capture data10,89 through simultaneous statistical imputation and phasing. We used the ancient-DNA-specific genotype caller MLE function of ATLAS (https://bitbucket.org/wegmannlab/atlas/)52 to call genotype likelihoods. ATLAS can also calculate the base-quality recalibration (the recal function) that we performed in batches among libraries sequenced in the same sequencing run, accounting for specific sequencing errors. ATLAS recalibration also corrects the base qualities accounting for the empirical ancient DNA-damage pattern observed from the data and reduces the effect of reference bias introduced by genome mapping by relying on a list of 10 million highly conserved genomic positions across 88 mammal species downloaded from ensembl (https://grch37.ensembl.org/). We called genotype likelihoods on the whole 1,000-genomes SNPs panel of around 20 million SNPs and used these calls as input data for imputation with GLIMPSE90, for which we used the phased 1,000 genomes phase-3 release data as reference haplotypes91. We ran GLIMPSE with the default parameters using sex-averaged genetic maps from HapMap, as suggested previously88. The function GLIMPSE_phase was used to perform simultaneous imputation and phasing on genomic chunks of 2,000,000 base pairs with a buffer of 200,000 base pairs. We then used the integrated GLIMPSE_ligate and GLIMPSE_sample functions and bcftools v1.3 (refs. 88,92) to obtain the final phase/imputed vcf files with the genotypes posterior probabilities at every 1240k position.

Haplotype IBD sharing analysis

We performed haplotype IBD analysis with ancIBD, a recently developed method that accounts for the high phasing errors of ancient DNA93. This analysis searches for long haploid blocks along the genomes of two individuals that are identical by descent (IBD), meaning they have been inherited by a common ancestor at some time in the past. Therefore, it can detect close genetic relatives (first to third degrees of relation) as KIN does, but it can also detect more-distant relations, up to sixth degree, within ranges of biological stochasticity85. However, it requires a much higher threshold of coverage, reducing the number of individuals analysed relative to KIN. We used imputed or phased data, including only those individuals with more than 450,000 SNPs obtained with our pseudo-haploid calls and SNPs with genotype posterior probabilities greater than 0.99 after imputation. We used the HapBLOCK function of ancIBD to perform the pairwise estimation with default parameters and only shared blocks of more than 8 cM containing more than 220 SNPs per centimorgan were considered. To further filter for possible false-positive hits, we considered only shared IBD segments longer than 12 cM, and if a pair of individuals had segments of less than 16 cM, we included them only if they had more than one such segment (Supplementary Table 4). We used Cytoscape v.3.9.1 (ref. 94) to plot the networks of pairwise IBD relations.

Network analysis

For the IBD network analysis, only the Avar-period individuals were included. Because subadult individuals might be a confounding factor when assessing the sex-specific patterns of mobility and connectedness, we made an additional network that included only adults. The threshold of adulthood was set at 18 years of age, based on the lower limit of the estimated age of the youngest parent. The entire network consisted of 257 nodes, of which 195 represented adults (105 male and 90 female individuals) and 62 subadults (35 male and 27 female individuals) from four archaeological sites. The links of the network are represented by the IBD connections, which number 2,658 if the entire network is considered and 1,211 if only the adults are selected (Supplementary Table 4). In our analysis, we considered both unweighted and weighted networks. The unweighted network represents a configuration in which the found IBD relations define the presence or absence of links irrespective of their values. However, in the weighted network, the links are weighted by the maximum IBD values of the analysis, allowing the magnitude of relatedness to be evaluated. Both networks are undirected because sharing of IBD segments between two individuals has no directionality.

Degree centrality (k) is defined as the number of links held by the node. The average degree k of the Avar-period adults’ network is 18.07. Considering the assigned weights on the links, which in our case is the sum of the weights (max_IBD) of the links attached to each node, the mean strength w is 1,620.54. When sex is considered as a node attribute, the degree and the strength distributions are significantly different between male and female individuals (Fig. 3c and Supplementary Figs. 47 and 48). For male individuals, k is 27.39 and w is 2,392.37, whereas for female individuals, k is 7.21 and w is 720.08. The two-sample Kolmogorov–Smirnov test revealed significant differences between the male and female individuals’ degree and strength distribution (P < 0.05).

The degree centrality of a node can be partitioned into within-module (kW) and between-module (kB) links by considering the archaeological site of the burial as a module. The kB/k ratio represents the ratio of between-module connections over the total connections, which can range between 0 and 1, with 0 indicating that related individuals are buried solely at the same site and 1 indicating that related individuals are buried only at a different site. To evaluate this ratio, the value of degree centrality must also be considered because individuals with small degree centrality may have a higher kB/k ratio. The other results of the analysis are explained in Supplementary Figs. 4448. The analysis was performed using R and the node measurements were calculated using customized R scripts with the igraph package95.

Isotope analysis and 14C dating

14C dating and isotope analysis (δ13C, δ15N) was performed in the same bone material in the isotope and radiocarbon laboratories at the Curt Engelhorn Centre Archaeometry in Mannheim, Germany. Bone samples were cleaned, chemically treated and collagen extracted using a modified Longin method96. For stable isotope analysis of carbon and nitrogen, triplicates of the resulting collagen were combusted in an elemental analyser (PYROcube, Elementar) and isotopic ratios were measured by isotope ratio mass spectrometry (precisION, Elementar). The same collagen extract was used for 14C dating. After ultrafiltration to remove short-chained macromolecules, the collagen was reduced to graphite using either a commercially available system (AGE3, IonPlus) or a custom-made system. A MICADAS-type accelerator mass spectrometer (IonPlus) was used to determine the conventional 14C ages97. 14C dates were modelled in the software Oxcal v.4.4.4 (ref. 98) and terrestrial samples were calibrated using IntCal20 (ref. 99). Bayesian modelling of 14C dates include prior information of relative chronological information provided by pedigrees following methods outlined previously24. Model results and detailed explanations are given in Supplementary Tables 2 and 3 and Supplementary Information.

For all strontium measurements, the tooth enamel was extracted in a laboratory at the Institute of Archaeogenomics in Budapest. The surface of the teeth was cleaned by a Dremel tool with an abrasion tip, then, after a ten-minute ultrasonic bath, the enamel was carefully powdered with a diamond-coated dental drill bit attached to the Dremel tool, until 25–50 mg was obtained. Strontium separation chemistry for all samples followed a previous method100. Analyses were performed on a Nu Instruments NuPlasma HR at the MC-ICP-MS facility in the Department of Geological Sciences at the University of Cape Town in Rondebosch, South Africa, and followed the procedure and referencing values (SRM987 87Sr/86Sr of 0.710255) described previously101. Past 4.11 software102 was used for the statistical analysis of the isotope data.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link