|
LIBRARY#include <biolibc/align.h> -lbiolibc -lxtend SYNOPSISsize_t bl_align_map_seq_exact(const bl_align_t *params, const char *big, size_t big_len, const char *little, size_t little_len)ARGUMENTSparams bl_align_t parameters. Only min_match is used. big Sequence to be searched for matches to little little Sequence to be located within big DESCRIPTIONLocate the leftmost (farthest 5') match for sequence little within sequence big, using exact matching only.The content of little is assumed to be all upper case. This improves speed by avoiding numerous redundant toupper() conversions on the same string, assuming multiple big strings will be searched for little, as in adapter removal and read mapping. Use strlupper(3) or strupper(3) before calling this function if necessary. A minimum of min_match bases must match between little and big. This mainly matters near the end of big, where remaining bases are fewer than the length of little. Note that alignment is not an exact science. We cannot detect every true little sequence without falsely detecting other sequences, since it is impossible to know whether any given sequence is really from the source of interest (e.g. an adapter) or naturally occurring from another source. The best we can do is guestimate what will provide the most true positives (best statistical power) and fewest false positives. In the case of adapter removal, it is also not usually important to remove every adapter, but only to minimize adapter contamination. Failing to align a small percentage of sequences due to adapter contamination will not change the story told by the downstream analysis. Nor will erroneously trimming off the 3' end of a small percentage of reads containing natural sequences resembling adapters. Just trimming exact matches of the adapter sequence will generally remove 99% or more of the adapter contamination and minimize false-positives. Tolerating 1 or 2 differences has been shown to do slightly better overall. Modern read mapping software is also tolerant of adapter contamination and can clip adapters as needed. RETURN VALUESIndex of little sequence within big if found, index of null terminator of big otherwiseEXAMPLESbl_param_t params; bl_fastq_t read; char *adapter; size_t index; bl_align_set_min_match(¶ms, 3); index = bl_align_map_seq_exact(¶ms, BL_FASTQ_SEQ(&read), BL_FASTQ_SEQ_LEN(&read), little, strlen(adapter)3, 10); if ( index != BL_FASTQ_SEQ_LEN(&read) ) bl_fastq_3p_trim(&read, index); SEE ALSObl_align_map_seq_sub(3), bl_align_set_min_match(3), bl_fastq_3p_trim(3) Visit the GSP FreeBSD Man Page Interface. |