Computation of Large-Scale Genomic Evaluations

Содержание

Слайд 2

Early genomic theory

Nejati-Javaremi et al (1997) tested use of genomic relationship matrix

Early genomic theory Nejati-Javaremi et al (1997) tested use of genomic relationship
in BLUP
Meuwissen et al (2001) tested linear and nonlinear estimation of haplotype effects
Both studies assumed that few (<1,000) markers could explain all genetic variance (no polygenic effects in model)
Polygenic variance was only 5% with 50,000 SNP (VanRaden, 2008), but 50% with 1,000

Слайд 3

Multi-step genomic evaluations

Traditional evaluations computed first and used as input data to

Multi-step genomic evaluations Traditional evaluations computed first and used as input data
genomic equations
Allele effects estimated for 45,187 markers by multiple regression, assuming equal prior variance
Polygenic effect estimated for genetic variation not captured by markers, assuming pedigree covariance
Selection index step combines genomic info with traditional info from non-genotyped parents
Applied to 30 yield, fitness, calving and type traits

Слайд 4

Benefits of 1-step genomic evaluation
Account for genomic pre-selection
Expected Mendelian Sampling ≠ 0
Improve

Benefits of 1-step genomic evaluation Account for genomic pre-selection Expected Mendelian Sampling
accuracy and reduce bias
Include many genotyped animals
Redesign animal model software used since 1989

Single-step genomic evaluation

Слайд 5

Pedigree: Parents, Grandparents, etc.

Pedigree: Parents, Grandparents, etc.

Слайд 6

O-Style Haplotypes chromosome 15

O-Style Haplotypes chromosome 15

Слайд 7

Expected Relationship Matrix1

1Calculated assuming that all grandparents are unrelated

1HO9167 O-Style

Expected Relationship Matrix1 1Calculated assuming that all grandparents are unrelated 1HO9167 O-Style

Слайд 8

Pedigree Relationship Matrix

1HO9167 O-Style

Pedigree Relationship Matrix 1HO9167 O-Style

Слайд 9

Genomic Relationship Matrix

1HO9167 O-Style

Genomic Relationship Matrix 1HO9167 O-Style

Слайд 10

Difference (Genomic – Pedigree)

1HO9167 O-Style

Difference (Genomic – Pedigree) 1HO9167 O-Style

Слайд 11

Pseudocolor Plots ― O-Style

Pseudocolor Plots ― O-Style

Слайд 12

X’ R-1 X X’ R-1 W
W’ R-1 X W’ R-1 W +

X’ R-1 X X’ R-1 W W’ R-1 X W’ R-1 W
H-1 k

Model: y = X b + W u + e
+ other random effects not shown

b
u

=

X’ R-1 y
W’ R-1 y

H-1 = A-1 +

0 0
0 G-1 – A22-1

Size of G and A22 >300,000 and doubling each year
Size of A is 60 million animals

1 – Step Equations

Aguilar et al., 2010

Слайд 13

X’R-1X X’R-1W 0 0
W’R-1X W’R-1W+A-1k Q Q
0 Q’ -G/k 0
0

X’R-1X X’R-1W 0 0 W’R-1X W’R-1W+A-1k Q Q 0 Q’ -G/k 0
Q’ 0 A22/k

To avoid inverses, add equations for γ, φ
Use math opposite of absorbing effects

b
u
γ
φ

=

X’ R-1 y
W’ R-1 y
0
0

Iterate for γ using G = Z Z’ / [ 2 Σp(1-p)]
Iterate for φ using A22 multiply (Colleau)
Q’ = [ 0 I ] (I for genotyped animals)

Legarra and Ducrocq, 2011

Modified 1-Step Equations

Слайд 14

1-step genomic model
Add extra equations for γ and φ (Legarra and Ducrocq)

1-step genomic model Add extra equations for γ and φ (Legarra and

Converged ok for JE, bad for HO
Extended to MT using block diagonal
Invert 3x3 A-1u, Gγ, -A22φ blocks? NO
PCG iteration (hard to debug) Maybe

Genomic Algorithms Tested

Слайд 15

Multi-step insertion of GEBV
[W’R-1W + A-1k] u = W’R-1y (without G)
Previous studies

Multi-step insertion of GEBV [W’R-1W + A-1k] u = W’R-1y (without G)
added genomic information to W’R-1W and W’R-1y
Instead: insert GEBV into u, iterate
1-step genomic model using DYD
Solve SNP equations from DYD & YD
May converge faster, but approximate

Genomic Algorithms (continued)

Слайд 16

National U.S. Jersey data
4.4 million lactation phenotypes
4.1 million animals in pedigree
Multi-trait

National U.S. Jersey data 4.4 million lactation phenotypes 4.1 million animals in
milk, fat, protein yields
5,364 male, 11,488 female genotypes
Deregressed MACE evaluations for 7,072 bulls with foreign daughters (foreign dams not yet included)

Data for 1-Step Test

Слайд 17

Jersey Results New = 1-step GPTA milk, Old = multi-step GPTA milk

Jersey Results New = 1-step GPTA milk, Old = multi-step GPTA milk

Слайд 18

Multi-step regressions also improved by modified selection index weights

Data cutoff in August

Multi-step regressions also improved by modified selection index weights Data cutoff in
2008

1-Step vs Multi-Step: Results

Слайд 19

CPU time for 3 trait ST model
JE took 11 sec / round

CPU time for 3 trait ST model JE took 11 sec /
including G
HO took 1.6 min / round including G
JE needed ~1000 rounds (3 hours)
HO needed >5000 rounds (>5 days)
Memory required for HO
30 Gigabytes (256 available)

Computation Required

Слайд 20

Difficult to match G and A across breeds
Nonlinear model (Bayes A) possible

Difficult to match G and A across breeds Nonlinear model (Bayes A)
with SNP effect algorithm
Interbull validation not designed for genomic models
MACE results may become biased

Remaining Issues

Слайд 21

Steps to prepare genotypes

Nominate animal for genotyping
Collect blood, hair, semen, nasal

Steps to prepare genotypes Nominate animal for genotyping Collect blood, hair, semen,
swab, or ear punch
Blood may not be suitable for twins
Extract DNA at laboratory
Prepare DNA and apply to BeadChip
Do amplification and hybridization, 3-day process
Read red/green intensities from chip and call genotypes from clusters

Слайд 22

Ancestor Validation and Discovery

Ancestor discovery can accurately confirm, correct, or discover parents

Ancestor Validation and Discovery Ancestor discovery can accurately confirm, correct, or discover
and more distant ancestors for most dairy animals because most sires are genotyped.
Animal checked against all candidates
SNP test and haplotype test both used
Parents and MGS are suggested to breed associations and breeders since December 2011 to improve pedigrees.

Слайд 23

Ancestor Discovery Results by Breed

*Confirmation = top MGS candidate matched true pedigree

Ancestor Discovery Results by Breed *Confirmation = top MGS candidate matched true
MGS.
†50K genotyped animals only.

Слайд 24

One step model includes:
72 million lactation phenotypes
50 million animals in pedigree
29 million

One step model includes: 72 million lactation phenotypes 50 million animals in
permanent environment
7 million herd mgmt groups
11 million herd by sire interactions
7 traits: Milk, Fat, Protein, SCS, longevity, fertility
Genotypes not yet included

Data (Yield and Health)

Слайд 25

Model options now include:
Multi-trait models
Multiple class and regress variables
Suppress some factors /

Model options now include: Multi-trait models Multiple class and regress variables Suppress
each trait
Random regressions
Foreign data
Parallel processing
Genomic information
Renumber factors in same program

New Features Added

Слайд 26

CPU for all-breed model (7 traits)
ST: 4 min / round with 7

CPU for all-breed model (7 traits) ST: 4 min / round with
processors and ~1000 rounds (2.8 days)
MT: 15 min / round and ~1000 rounds
~200 rounds for updates using priors
Little extra cost to include foreign
Memory required
ST or MT: 32 Gbytes (256 available)

Computation Required: Evaluation

Слайд 27

Impute 636,967 markers for 103,070 animals
Required 10 hours with 6 processors (findhap)
Required

Impute 636,967 markers for 103,070 animals Required 10 hours with 6 processors
50 Gbytes memory
Program FImpute from U. Guelph slightly better
Impute 1 million markers on 1 chromosome (sequences) for 1,000 animals
Required 15 minutes with 6 processors
Required 4 Gbytes memory

Computation Required: Imputation

Слайд 28

Methods to Trace Inheritance

Few markers
Pedigree needed
Prob (paternal or maternal alleles inherited) computed

Methods to Trace Inheritance Few markers Pedigree needed Prob (paternal or maternal
within families
Many markers
Can find matching DNA segments without pedigree
Prob (haplotypes are identical) mostly near 0 or 1 if segments contain many markers

Слайд 29

with Few Markers (12 SNP / chromosome)

Haplotype Probabilities

with Few Markers (12 SNP / chromosome) Haplotype Probabilities

Слайд 30

with More Markers (50 SNP / chromosome)

Haplotype Probabilities

with More Markers (50 SNP / chromosome) Haplotype Probabilities

Слайд 31

Haplotyping Program: findhap.f90

Population haplotyping
Divide chromosomes into segments
List haplotypes by genotype match
Similar to

Haplotyping Program: findhap.f90 Population haplotyping Divide chromosomes into segments List haplotypes by
FastPhase, IMPUTE, or long range phasing
Pedigree haplotyping
Look up parent or grandparent haplotypes
Detect crossovers, fix noninheritance
Impute nongenotyped ancestors

Слайд 32

Coding of Alleles and Segments

Genotypes
0 = BB, 1 = AB or BA,

Coding of Alleles and Segments Genotypes 0 = BB, 1 = AB
2 = AA, 5 = __ (missing)
Allele frequency used for missing
Haplotypes
0 = B, 1 = not known, 2 = A
Segment inheritance (example)
Son has haplotype numbers 5 and 8
Sire has haplotype numbers 8 and 21
Son got haplotype number 5 from dam

Слайд 33

Population Haplotyping Steps

Put first genotype into haplotype list
Check next genotype against list
Do

Population Haplotyping Steps Put first genotype into haplotype list Check next genotype
any homozygous loci conflict?
If haplotype conflicts, continue search
If match, fill any unknown SNP with homozygote
2nd haplotype = genotype minus 1st haplotype
Search for 2nd haplotype in rest of list
If no match in list, add to end of list
Sort list to put frequent haplotypes 1st

Слайд 34

Check New Genotype Against List 1st segment of chromosome 15

5.16% 022222222020020022002020200020000200202000022022222202220
4.37% 022020220202200020022022200002200200200000200222200002202
4.36%

Check New Genotype Against List 1st segment of chromosome 15 5.16% 022222222020020022002020200020000200202000022022222202220
022020022202200200022020220000220202200002200222200202220
3.67% 022020222020222002022022202020000202220000200002020002002
3.66% 022222222020222022020200220000020222202000002020220002022

Get 2nd haplotype by removing 1st from genotype:
022002222002220022022020220020200202202000202020020002020

Search for 1st haplotype that matches genotype:
022112222011221022021110220010110212202000102020120002021

3.65% 022020022202200200022020220000220202200002200222200202222
3.51% 022002222020222022022020220200222002200000002022220002220
3.42% 022002222002220022022020220020200202202000202020020002020
3.24% 022222222020200000022020220020200202202000202020020002020
3.22% 022002222002220022002020002220000202200000202022020202220

Слайд 35

Net Merit by Chromosome Freddie - highest Net Merit bull

Net Merit by Chromosome Freddie - highest Net Merit bull

Слайд 36

Net Merit by Chromosome O Man – Sire of Freddie

Net Merit by Chromosome O Man – Sire of Freddie

Слайд 37

Net Merit by Chromosome Die-Hard - maternal grandsire

Net Merit by Chromosome Die-Hard - maternal grandsire

Слайд 38

Net Merit by Chromosome Planet – high Net Merit bull

Net Merit by Chromosome Planet – high Net Merit bull

Слайд 39

What’s the best cow we can make?

A “Supercow” constructed from the best

What’s the best cow we can make? A “Supercow” constructed from the
haplotypes in the Holstein population would have an EBV(NM$) of $7515

Слайд 40

Conclusions

1-step genomic evaluations tested
Inversion avoided using extra equations
Converged well for JE but

Conclusions 1-step genomic evaluations tested Inversion avoided using extra equations Converged well
not for HO
Same accuracy, less bias than multi-step
Foreign data from MACE included
Further work needed on algorithms
Including genomic information
Extending to all-breed evaluation

Слайд 41

Conclusions

Foreign data can add to national evaluations
In one step model instead of

Conclusions Foreign data can add to national evaluations In one step model
post-process
High correlations of national with MACE
Multi-trait all-breed model developed
Replace software used since 1989
Many new features added
Correlations ~.99 with traditional AM
Tested with 7 yield and health traits
Also tested with 14 JE conformation traits