  | 
 
 
 
 |  
 |  | 
 
  
    | cmemit(1) | 
    Infernal Manual | 
    cmemit(1) | 
   
 
cmemit - sample sequences from a covariance model 
cmemit [options] <cmfile> 
The cmemit program samples (emits) sequences from the
    covariance model(s) in <cmfile>, and writes them to output.
    Sampling sequences may be useful for a variety of purposes, including
    creating synthetic true positives for benchmarks or tests. 
The default is to sample ten unaligned sequence from each CM.
    Alternatively, with the -c option, you can emit a single
    majority-rule consensus sequence; or with the -a option, you can emit
    an alignment. 
The <cmfile> may contain a library of CMs, in which
    case each CM will be used in turn. 
<cmfile> may be '-' (dash), which means reading this
    input from stdin rather than a file. 
For models with zero basepairs, sequences are sampled from the
    profile HMM filter instead of the CM. However, since these models will be
    nearly identical (unless special options were used in cmbuild to
    prevent this), using the HMM instead of the CM will not change the output in
    a significant way, unless the -l option is used. With -l, the
    HMM will be configured for equiprobable model begin and end positions, while
    the CM will not. You can force cmemit to always sample from the CM
    with the --nohmmonly option. 
  - -h
 
  - Help; print a brief reminder of command line usage and available options.
    
  
 
  - -o <f>
 
  - Save the synthetic sequences to file <f> rather than writing
      them to stdout.
    
  
 
  - -N <n>
 
  - Generate <n> sequences. The default value for
      <n> is 10.
    
  
 
  - -u
 
  - Write the generated sequences in unaligned format (FASTA). This is the
      default behavior.
    
  
 
  - -a
 
  - Write the generated sequences in an aligned format (STOCKHOLM) with
      consensus structure annotation rather than FASTA. Other output formats are
      possible with the --outformat option.
    
  
 
  - -c
 
  - Predict a single majority-rule consensus sequence instead of sampling
      sequences from the CM´s probability distribution. Highly conserved
      residues (base paired residues that score higher than 3.0 bits, or single
      stranded residues that score higher than 1.0 bits) are shown in upper
      case; others are shown in lower case.
    
  
 
  - -e <n>
 
  - Embed the CM emitted sequences in a larger randomly generated sequence of
      length <n> generated from an HMM that was trained on real
      genomic sequences with various GC contents (the same HMM used by
      cmcalibrate). You can use the --iid option to generate 25%
      A, C, G, and U sequence instead. The CM emitted sequence will begin at a
      random position within the larger sequence and will be included in its
      entirety unless the --u5p or --u3p options are used. When
      -e is used in combination with --u5p, the CM emitted
      sequence will always begin at position 1 of the larger sequence and will
      be truncated 5'. When used in combination --u3p the CM emitted
      sequence will always end at position <n> of the larger
      sequence and will be truncated 3'.
    
  
 
  - -l
 
  - Configure the CMs into local mode before emitting sequences. By default
      the model will be in global mode. In local mode, large insertions and
      deletions are more common than in global mode.
    
  
 
 
  - --u5p
 
  - Truncate all emitted sequences at a randomly chosen start position
      <n>, by only outputting residues beginning at
      <n>. A different start point is randomly chosen for each
      sequence.
    
  
 
  - --u3p
 
  - Truncate all emitted sequences at a randomly chosen end position
      <n>, by only outputting residues up to position
      <n>. A different end point is randomly chosen for each
      sequence.
    
  
 
  - --a5p
    <n>
 
  - In combination with the -a option, truncate the emitted alignment
      at a randomly chosen start match position <n>, by only
      outputting alignment columns for positions after match state
      <n> - 1. <n> must be an integer between 0 and
      the consensus length of the model (which can be determined using the
      cmstat program. As a special case, using 0 as <n> will
      result in a randomly chosen start position.
    
  
 
  - --a3p
    <n>
 
  - In combination with the -a option, truncate the emitted alignment
      at a randomly chosen end match position <n>, by only
      outputting alignment columns for positions before match state
      <n> + 1. <n> must be an integer between 1 and
      the consensus length of the model (which can be determined using the
      cmstat program). As a special case, using 0 as <n>
      will result in a randomly chosen end position.
    
  
 
 
  - --seed
    <n>
 
  - Seed the random number generator with <n>, an integer >=
      0. If <n> is nonzero, stochastic sampling of sequences will
      be reproducible; the same command will give the same results. If
      <n> is 0, the random number generator is seeded arbitrarily,
      and stochastic samplings will vary from run to run of the same command.
      The default seed is 0.
    
  
 
  - --iid
 
  - With -e, generate the larger sequences as 25% each A, C, G and U.
    
  
 
  - --rna
 
  - Specify that the emitted sequences be output as RNA sequences. This is
      true by default.
    
  
 
  - --dna
 
  - Specify that the emitted sequences be output as DNA sequences. By default,
      the output alphabet is RNA.
    
  
 
  - --idx
    <n>
 
  - Specify that the emitted sequences be named starting with
      <modelname>.<n>. By default <n> is 1.
    
  
 
  - --outformat
    <s>
 
  - With -a, specify the output alignment format as <s>.
      Acceptable formats are: Pfam, AFA, A2M, Clustal, and Phylip. AFA is
      aligned fasta. Only Pfam and Stockholm alignment formats will include
      consensus structure annotation.
    
  
 
  - --tfile
    <f>
 
  - Dump tabular sequence parsetrees (tracebacks) for each emitted sequence to
      file <f>. Primarily useful for debugging.
    
  
 
  - --exp
    <x>
 
  - Exponentiate the emission and transition probabilities of the CM by
      <x> and then renormalize those distributions before emitting
      sequences. This option changes the CM probability distribution of
      parsetrees relative to default. With <x> less than 1.0 the
      emitted sequences will tend to have lower bit scores upon alignment to the
      CM. With <x> greater than 1.0, the emitted sequences will tend to
      have higher bit scores upon alignment to the CM. This bit score difference
      will increase as <x> moves further away from 1.0 in either
      direction. If <x> equals 1.0, this option has no effect relative to
      default. This option is useful for generating sequences that are either
      more difficult ( <x> < 1.0) or easier ( <x>
      > 1.0) for the CM to distinguish as homologous from background, random
      sequence.
    
  
 
  - --hmmonly
 
  - Emit from the filter profile HMM instead of the CM.
    
  
 
  - --nohmmonly
 
  - Never emit from the filter profile HMM, always use the CM, even for models
      with zero basepairs.
    
  
 
 
See infernal(1) for a master man page with a list of all
    the individual man pages for programs in the Infernal package. 
For complete documentation, see the user guide that came with your
    Infernal distribution (Userguide.pdf); or see the Infernal web page
    (http://eddylab.org/infernal/). 
Copyright (C) 2023 Howard Hughes Medical Institute.
Freely distributed under the BSD open source license. 
For additional information on copyright and licensing, see the
    file called COPYRIGHT in your Infernal source distribution, or see the
    Infernal web page (http://eddylab.org/infernal/). 
 
 
  Visit the GSP FreeBSD Man Page Interface. Output converted with ManDoc.
  |