gmap (1)

NAME

gmap - Genomic Mapping and Alignment Program

SYNOPSIS

gmap -dDB|-gFASTA [OPTION]... [QUERY]...

DESCRIPTION

Align the sequences QUERY to the reference, specified with -d or -g. With no QUERY, read standard input.

OPTIONS

Input options

-D, --dir=directory: Genome directory
-d, --db=STRING: Genome database. If argument is '?' (with the quotes), this command lists available databases.
-k, --kmer=INT: kmer size to use in genome database (allowed values: 16 or less). If not specified, the program will find the highest available kmer size in the genome database
--basesize=INT: Base size to use in genome database. If not specified, the program will find the highest available base size in the genome database within selected k-mer size
--sampling=INT: Sampling to use in genome database. If not specified, the program will find the smallest available sampling value in the genome database within selected basesize and k-mer size
-G, --genomefull: Use full genome (all ASCII chars allowed; built explicitly during setup), not compressed version
-g, --gseg=filename: User-supplied genomic segment
-1, --selfalign: Align one sequence against itself in FASTA format via stdin (Useful for getting protein translation of a nucleotide sequence)
-2, --pairalign: Align two sequences in FASTA format via stdin, first one being genomic and second one being cDNA
--cmdline=STRING,STRING: Align these two sequences provided on the command line, first one being genomic and second one being cDNA
-q, --part=INT/INT: Process only the i-th out of every n sequences e.g., 0/100 or 99/100 (useful for distributing jobs to a computer farm).
--input-buffer=INT: Size of input buffer (program reads this many sequences at a time for efficiency) (default 1000)

Computation options

-B, --batch=INT: Mode     Offsets       Positions       Genome
   0      allocate      mmap            mmap
   1      allocate      mmap & preload  mmap
   2      allocate      mmap & preload  mmap & preload (default)
   3      allocate      allocate        mmap & preload
   4      allocate      allocate        allocate
   5      expand        allocate        allocate
Note: For a single sequence, all data structures use mmap. If mmap not available and allocate not chosen, then will use fileio (very slow)
--nosplicing: Turns off splicing (useful for aligning genomic sequences onto a genome)
--min-intronlength=INT: Min length for one internal intron (default 9). Below this size, a genomic gap will be considered a deletion rather than an intron.
-K, --intronlength=INT: Max length for one internal intron (default 1000000)
-w, --localsplicedist=INT: Max length for known splice sites at ends of sequence (default 200000)
-L, --totallength=INT: Max total intron length (default 2400000)
-x, --chimera-margin=INT: Amount of unaligned sequence that triggers search for the remaining sequence (default 40). Enables alignment of chimeric reads, and may help with some non-chimeric reads. To turn off, set to a large value (greater than the query length).
-t, --nthreads=INT: Number of worker threads
-C, --chrsubsetfile=filename: User-supplied chromosome subset file
-c, --chrsubset=string: Chromosome subset to search
-z, --direction=STRING: cDNA direction (sense_force, antisense_force, sense_filter, antisense_filter, or auto (default))
-H, --trimendexons=INT: Trim end exons with fewer than given number of matches (in nt, default 12)
--cross-species: For cross-species alignments, use a more sensitive search for canonical splicing
--canonical-mode=INT: Reward for canonical and semi-canonical introns 0=low reward, 1=high reward (default), 2=low reward for high-identity sequences and high reward otherwise
--allow-close-indels=INT: Allow an insertion and deletion close to each other (0=no, 1=yes (default), 2=only for high-quality alignments)
--microexon-spliceprob=FLOAT: Allow microexons only if one of the splice site probabilities is greater than this value (default 0.90)
--cmetdir=STRING: Directory for methylcytosine index files (created using cmetindex) (default is location of genome index files specified using -D, -V, and -d)
--atoidir=STRING: Directory for A-to-I RNA editing index files (created using atoiindex) (default is location of genome index files specified using -D, -V, and -d)
--mode=STRING: Alignment mode: standard (default), cmet-stranded, cmet-nonstranded, atoi-stranded, or atoi-nonstranded. Non-standard modes requires you to have previously run the cmetindex or atoiindex programs on the genome
-p, --prunelevel: Pruning level: 0=no pruning (default), 1=poor seqs, 2=repetitive seqs, 3=poor and repetitive

Output types

-S, --summary: Show summary of alignments only
-A, --align: Show alignments
-3, --continuous: Show alignment in three continuous lines
-4, --continuous-by-exon: Show alignment in three lines per exon
-Z, --compress: Print output in compressed format
-E, --exons=STRING: Print exons ("cdna" or "genomic")
-P, --protein_dna: Print protein sequence (cDNA)
-Q, --protein_gen: Print protein sequence (genomic)
-f, --format=INT: Other format for output (also note the -A and -S options and other options listed under Output types):
psl (or 1)= PSL (BLAT) format,
gff3_gene (or 2)= GFF3 gene format,
gff3_match_cdna (or 3)= GFF3 cDNA_match format,
gff3_match_est (or 4) = GFF3 EST_match format,
splicesites (or 6) = splicesites output (for GSNAP splicing file),
introns = introns output (for GSNAP splicing file),
map_exons (or 7) = IIT FASTA exon map format,
map_genes (or 8) = IIT FASTA map format,
coords (or 9) = coords in table format,
sampe = SAM format (setting paired_read bit in flag),
samse = SAM format (without setting paired_read bit)

Output options

-n, --npaths=INT: Maximum number of paths to show. If set to 0, prints two paths if chimera detected, else one.
--quiet-if-excessive: If more than maximum number of paths are found, then nothing is printed.
--suboptimal-score=INT: Report only paths whose score is within this value of the best path. By default, if this option is not provided, the program prints all paths found.
-O, --ordered: Print output in same order as input (relevant only if there is more than one worker thread)
-5, --md5: Print MD5 checksum for each query sequence
-o, --chimera-overlap: Overlap to show, if any, at chimera breakpoint
--failsonly: Print only failed alignments, those with no results
--nofails: Exclude printing of failed alignments
--fails-as-input: Print completely failed alignments as input FASTA or FASTQ format
-V, --usesnps=STRING: Use database containing known SNPs (in <STRING>.iit, built previously using snpindex) for reporting output
--split-output=STRING: Basename for multiple-file output, separately for nomapping, uniq, mult, (and chimera, if --chimera-margin is selected)
--output-buffer-size=INT: Buffer size, in queries, for output thread (default 1000). When the number of results to be printed exceeds this size, the worker threads are halted until the backlog is cleared
-F, --fulllength: Assume full-length protein, starting with Met
--cdsstart=INT: Translate codons from given nucleotide (1-based)
-T, --truncate: Truncate alignment around full-length protein, Met to Stop Implies -F flag.
-Y, --tolerant: Translates cDNA with corrections for frameshifts

Options for SAM output

--no-sam-headers: Do not print headers beginning with '@'
--sam-use-0M: Insert 0M in CIGAR between adjacent insertions and deletions Required by Picard, but can cause errors in other tools
--read-group-id=STRING: Value to put into read-group id (RG-ID) field
--read-group-name=STRING: Value to put into read-group name (RG-SM) field
--read-group-library=STRING: Value to put into read-group library (RG-LB) field
--read-group-platform=STRING: Value to put into read-group library (RG-PL) field

Options for quality scores

--quality-protocol=STRING: Protocol for input quality scores. Allowed values:
illumina (ASCII 64-126) (equivalent to -J 64 -j -31)
sanger (ASCII 33-126) (equivalent to -J 33 -j 0)
Default is sanger (no quality print shift) SAM output files should have quality scores in sanger protocol. Or you can specify the print shift with this flag:
-j, --quality-print-shift=INT: Shift FASTQ quality scores by this amount in output (default is 0 for sanger protocol; to change Illumina input to Sanger output, select -31)

External map file options

-M, --mapdir=directory: Map directory
-m, --map=iitfile: Map file. If argument is '?' (with the quotes), this lists available map files.
-e, --mapexons: Map each exon separately
-b, --mapboth: Report hits from both strands of genome
-u, --flanking=INT: Show flanking hits (default 0)
--print-comment: Show comment line for each hit

Alignment output options

-N, --nolengths: No intron lengths in alignment
-I, --invertmode=INT: Mode for alignments to genomic (-) strand:
0=Don't invert the cDNA (default)
1=Invert cDNA and print genomic (-) strand
2=Invert cDNA and print genomic (+) strand
-i, --introngap=INT: Nucleotides to show on each end of intron (default=3)
-l, --wraplength=INT: Wrap length for alignment (default=50)

Help options

--version: Show version
--help: Show this help message

ENVIRONMENT

GMAPDB: genome directory (eqivalent to -D)

FILES

~/.gmaprc: configuration file

AUTHOR

Thomas D. Wu and Colin K. Watanabe

REPORTING BUGS

Report bugs to Thomas Wu <twu@gene.com>.