Minimac4
Loading...
Searching...
No Matches
hidden_markov_model Class Reference

Implements a Hidden Markov Model for genotype imputation. More...

#include <hidden_markov_model.hpp>

Public Member Functions

 hidden_markov_model (float s3_prob_threshold, float s1_prob_threshold, float diff_threshold, float background_error, float decay)
 Constructs a Hidden Markov Model with specified parameters.
 
void traverse_forward (const std::deque< unique_haplotype_block > &ref_haps, const std::vector< target_variant > &tar_variant, std::size_t hap_idx)
 Performs a forward traversal over reference haplotypes for a given target haplotype.
 
void traverse_backward (const std::deque< unique_haplotype_block > &ref_haps, const std::vector< target_variant > &tar_variant, std::size_t hap_idx, std::size_t out_idx, const std::vector< std::vector< std::vector< std::size_t > > > &reverse_maps, full_dosages_results &output, const reduced_haplotypes &full_reference_data)
 Performs a backward traversal over reference haplotypes to compute posterior probabilities.
 

Detailed Description

Implements a Hidden Markov Model for genotype imputation.

This class performs multi-stage HMM-based imputation of genotype dosages for target haplotypes using reference haplotype blocks. It maintains forward and backward probability matrices, junction proportions, and intermediate haplotype states for S1, S2, and S3 probability transformations.

  • The model supports typed and untyped site imputation.
  • Probabilities are normalized and adjusted for recombination, background error, and leave-one-out cross-validation.
  • Implements forward traversal (traverse_forward) and backward traversal (traverse_backward) along haplotype blocks.

Constructor & Destructor Documentation

◆ hidden_markov_model()

hidden_markov_model::hidden_markov_model ( float s3_prob_threshold,
float s1_prob_threshold,
float diff_threshold,
float background_error,
float decay )

Constructs a Hidden Markov Model with specified parameters.

Parameters
s3_prob_thresholdThreshold probability for S3 state.
s1_prob_thresholdThreshold probability for S1 state.
diff_thresholdMinimum difference required between probabilities to make a confident state call.
background_errorExpected background error rate.
decayDecay factor controlling the influence of previous states.

This constructor initializes the internal HMM parameters. These thresholds and the decay factor influence the model's sensitivity to differences in observed probabilities and determine how the hidden states are inferred.

Member Function Documentation

◆ traverse_backward()

void hidden_markov_model::traverse_backward ( const std::deque< unique_haplotype_block > & ref_haps,
const std::vector< target_variant > & tar_variant,
std::size_t hap_idx,
std::size_t out_idx,
const std::vector< std::vector< std::vector< std::size_t > > > & reverse_maps,
full_dosages_results & output,
const reduced_haplotypes & full_reference_data )

Performs a backward traversal over reference haplotypes to compute posterior probabilities.

This function implements the backward algorithm of a Hidden Markov Model (HMM) for a given target haplotype. It computes probabilities by combining forward probabilities (from traverse_forward) with backward likelihoods, handling recombination events, junction proportions, and leave-one-out considerations.

Parameters
ref_hapsA deque of reference haplotype blocks (unique_haplotype_block). Each block contains multiple haplotypes and variant positions.
tar_variantsA vector of target variants (target_variant) to traverse backward.
hap_idxIndex of the target haplotype to traverse within tar_variants.
out_idxOutput index used for storing results in full_dosages_results.
reverse_mapsMaps from expanded to unique haplotypes for each block.
outputReference to a full_dosages_results structure where computed posterior dosages are stored.
full_reference_dataReduced haplotype reference data needed for imputation.

The function performs the following steps:

  1. Initializes backward probability vectors (backward, backward_norecom) and junction proportions for the last block.
  2. Iterates backward over all reference blocks:
    • Computes junction probabilities using the backward pass.
    • Normalizes probabilities to ensure valid distributions.
    • Updates constants used for imputation with combined forward and backward probabilities.
    • Calls impute to compute posterior dosages for each variant position.
    • Conditions backward probabilities on observed target genotypes if available.
  3. Handles recombination corrections and precision jumps using transpose.
Note
  • Assertions ensure probability values remain within valid ranges and that the global index reaches -1 at the end.
  • Observed genotypes with negative values are treated as missing and skipped during conditioning.
  • This method relies on data structures populated during traverse_forward.
Here is the call graph for this function:

◆ traverse_forward()

void hidden_markov_model::traverse_forward ( const std::deque< unique_haplotype_block > & ref_haps,
const std::vector< target_variant > & tar_variant,
std::size_t hap_idx )

Performs a forward traversal over reference haplotypes for a given target haplotype.

This function implements the forward algorithm of a Hidden Markov Model (HMM) to compute the probability of observing the target variants given the reference haplotypes. It accounts for recombination events and optional leave-one-out calculations.

Parameters
ref_hapsA deque of reference haplotype blocks (unique_haplotype_block). Each block contains multiple haplotypes and variant positions.
tar_variantsA vector of target variants (target_variant) for which the forward probabilities are computed.
hap_idxThe index of the target haplotype to traverse within tar_variants.

The function performs the following steps:

  1. Initializes internal forward probability matrices (forward_probs_, forward_norecom_probs_) and junction probabilities (junction_prob_proportions_).
  2. For the first reference block, likelihoods are initialized using initialize_likelihoods.
  3. For subsequent blocks:
    • Computes junction probabilities based on previous block probabilities and recombination proportions.
    • Normalizes junction probabilities to ensure they sum to 1.
    • Updates precision_jumps_ by transposing probabilities to the next position.
  4. Iterates over all variants in each block:
    • Updates probabilities conditioned on observed target genotypes using condition.
    • Computes forward probabilities for the next position using transpose.
Note
  • Assertions ensure probability values remain within valid ranges [0, 1].
  • The function supports handling missing genotypes (observed < 0) by skipping conditioning for those positions.
Here is the call graph for this function:

The documentation for this class was generated from the following files: