Peanut Genome Assembly | PeanutBase(Legacy)

MESSAGE

PeanutBase has transitioned to a new technology format. All updates are now at the new site still under development. Please visit the new site and give us your feedback. The legacy site is still available at legacy.peanutbase.org.

This page describes the high-quality genome assembly for peanut (Arachis hypogaea), cultivar "Tifrunner", which was released in December, 2017 (Bertiloli et al., 2019), and the improved version 2.0. An explanation of the reason for the corrected version and a list of differences is available here.

IMPORTANT NOTE: the version 2.0 assembly (gnm2) for Arachis hypogaea var. Tifrunner has not been submitted to GenBank.

Tifrunner is an important U.S. variety, with good market and growth characteristics and resistance to several peanut diseases (early and late leaf spot and TSWV/spotted wilt).

This is a project of the International Peanut Genome Initiative, in order to accelerate breeding progress and get more productive, disease-resistant, stress-tolerant varieties to farmers. The IPGI project has sequenced the genomes of the two diploid progenitors of cultivated peanut, as well as the genome of cultivated peanut itself.

Cultivated peanut
Version 2 (gnm2): Arachis hypogaea cv. Tifrunner: assembly, annotation (gnm2.ann1)
Genome browser: GBrowse
Version 1 (gnm1): Arachis hypogaea cv. Tifrunner: assembly, annotation (gnm1.ann1)
Genome browser: GBrowse and JBrowse
GenBank: assembly GCA_003086295.2

Changes between the genome assemblies version 1 and version 2.
Note: The gene models on gnm2.ann1 are the result of liftover, not a new annotation procedure; ie the gene model structures and basic names stay the same, though some coordinates changed and a few gene models got duplicates.

Download folders for Arachis hypogaea at data store (includes genomes and other available data types).

Diploid progenitors
Arachis duranensis
assembly (gnm1), annotation (gnm1.ann1)
Genome browser: GBrowse and JBrowse
Download folders for Arachis duranensis at data store (includes genomes and other available data types).
GenBank: Assembly GCF_000817695.2

Arachis ipaensis
Version 2 (gnm2): assembly (gnm2). This is a high-quality chromosome-scale genome sequence, based on PacBio RSII and PacBio Sequel read data, with ordering into pseudomolecules by Dovetail using Hi-C Chicago libraries and the HiRise and SNAP software. See more information about the assembly in the assembly directory. (**No annotation yet)
Genome browser: GBrowse and JBrowse
Version 1 (gnm1): assembly (gnm1), annotation (gnm1.ann1)
Genome browser: GBrowse and JBrowse
Download folders for Arachis ipaensis at data store (includes genomes and other available data types).
GenBank: Assembly GCF_000816755.2

Additional details about the A. hypogaea assembly:
The assembly size is 2,556 Mbp, which we estimate to span more than 99% of the actually genome. The scaffold N50 (a measure of the assembly contiguity) is 135.2 MB (the scale of the complete peanut chromosomes). A total of 48.25x of PACBIO sequence (avg. read length of 11,525) was used to generate the initial assembly, which was subsequently polished using Illumina sequences and ARROW. Homozygous SNPs and INDELs were corrected in the release sequence using ~40x of illumina reads (2x250, 800bp insert, library ID ICIH and ICID). Synteny with the diploid A. duranensis and A. ipaensis, along with 1 genetic map and 2 synthetic maps (provided by David Bertioli) were used to identify misjoins in the raw assembly. The resulting assembly was then scaffolded using HiC data. Post scaffolding, 6 additional breaks were made to resolve misjoins introduced during the scaffolding procedure.

The original sequences were combined with the duplicated tetrasomic regions and joined together using 26 joins to create the 20 A. hypogaea chromosomes. During the construction of the chromosomes, all 500bp scaffolded gaps were converted to 1,000 bp gaps, and the map joins that were added consisted of 10,000 bp gaps. Chromosomes were numbered as Arahy.01-Arahy.20, where the A genome is represented as Arahy.01-Arahy.10 and the B genome is represented as Arahy.11-Arahy.20. 99.3% of the assembled sequence is contained in the chromosomes.