By the time I joined the Department of Animal and Dairy Science, University of Georgia who are the pioneers in developing the ssGBLUP procedure, I found out that although single-step GBLUP (ssGBLUP) adds simplicity and flexibility to the genetic evaluation system, its implementation involves several details and requires knowledge and experience about peculiarities of the procedure.
ssGBLUP evaluation became a standard procedure in livestock breeding, and the main reason is the ability to combine all pedigree, phenotypes, and genotypes data into one single evaluation, without the need of post-analysis processing. The ssGBLUP is a procedure for combining genomic (G) and pedigree (A) relationship matrices together into a new matrix called H matrix.
One of the central elements in blending G and A matrix is compatibility of these two matrices. For these prepose, there is two remedies, one is multiplying the diagonal of G and A22 (pedigree relationship between genotyped animals) by alpha: G=alpha*G+ (1-alpha)*A22. The default value of alpha in blupf90 is 0.95. Alpha parameter is also used to overcome singularity problems in G matrix. Tunning is another remedy for making a balance between amount of information in A22 (accounting for inbreeding in genotyped animals) and A matrix. In tuning, Tau and Omega are scaling factors estimated basically based on the Fst value, there are other ways, please refer to Vitezica et al. (2011). The default value in BLUPf90 is set to Tau=Omega=1.
One of limitation of ssGBLUP is the inverse of G matrix when the number of genotyped individuals is larger than 100,000. Misztal et al. (2014, 2016) proposed the algorithm of Proven and Young (APY) to construct the G inverse without having to explicitly invert G. The question is how this algorithm works:
In this algorithm instead of inverse of whole G matrix of genotyped animals, only a subset of genotyped animals called core animals are selected and the Inverse of G matrix is constructed for these subset, then for the rest of genotyped animals (non-core animals), the diagonal of G Inverse is calculated, and off-diagonals of G matrix (genomic relationships between core and non-core individuals) are calculated through the recursion. Up to this end, the question is how to select the core animals, according to the various studies on selecting the core and non-core individuals, the number of core should be equal to the number of eigen values of G matrix that explain at least 98 percentage of variation in G-matrix. This number is highly dependent on the effective size of population. As the Ne goes up, the number of core individuals also increase. Do these individuals should be randomly selected or from the last generation or animals with highest contribution to eigen values? There have been several studies in these regards. Most of them concluded that random subset gives the same accuracy and bias as no-random subset. In total, the most common approach for sampling the core individuals is the random sampling.
Misztal, I.; Legarra, A.; Aguilar, I. Computing procedures for genetic evaluation including phenotypic, full pedigree, and genomic information. J. Dairy Sci. 2009, 92, 4648–۴۶۵۵٫
Legarra, A.; Aguilar, I.; Misztal, I. A relationship matrix including full pedigree and genomic information. J. Dairy Sci. 2009, 92, 4656–۴۶۶۳٫
Misztal, I.; Legarra, A.; Aguilar, I. Using recursion to compute the inverse of the genomic relationship matrix. J. Dairy Sci. 2014, 97, 3943–۳۹۵۲٫
Misztal, I. Inexpensive computation of the inverse of the genomic relationship matrix in populations with small effective population size. Genetics 2016, 202, 401–۴۰۹٫
Lourenco, D.; Legarra, A.; Tsuruta, S.; Masuda, Y.; Aguilar, I.; Misztal, I. Single-Step Genomic Evaluations from Theory to Practice: Using SNP Chips and Sequence Data in BLUPF90. Genes ۲۰۲۰, ۱۱, ۷۹۰٫