gmap (1)
NAME
gmap - Genomic Mapping and Alignment ProgramSYNOPSIS
gmap -dDB|-gFASTA [OPTION]... [QUERY]...DESCRIPTION
Align the sequences QUERY to the reference, specified with -d or -g. With no QUERY, read standard input.OPTIONS
Input options
- -D, --dir=directory
- Genome directory
- -d, --db=STRING
- Genome database. If argument is '?' (with the quotes), this command lists available databases.
- -k, --kmer=INT
- kmer size to use in genome database (allowed values: 16 or less). If not specified, the program will find the highest available kmer size in the genome database
- --basesize=INT
- Base size to use in genome database. If not specified, the program will find the highest available base size in the genome database within selected k-mer size
- --sampling=INT
- Sampling to use in genome database. If not specified, the program will find the smallest available sampling value in the genome database within selected basesize and k-mer size
- -G, --genomefull
- Use full genome (all ASCII chars allowed; built explicitly during setup), not compressed version
- -g, --gseg=filename
- User-supplied genomic segment
- -1, --selfalign
- Align one sequence against itself in FASTA format via stdin (Useful for getting protein translation of a nucleotide sequence)
- -2, --pairalign
- Align two sequences in FASTA format via stdin, first one being genomic and second one being cDNA
- --cmdline=STRING,STRING
- Align these two sequences provided on the command line, first one being genomic and second one being cDNA
- -q, --part=INT/INT
- Process only the i-th out of every n sequences e.g., 0/100 or 99/100 (useful for distributing jobs to a computer farm).
- --input-buffer=INT
- Size of input buffer (program reads this many sequences at a time for efficiency) (default 1000)
Computation options
- -B, --batch=INT
-
Mode Offsets Positions Genome
0 allocate mmap mmap
1 allocate mmap & preload mmap
2 allocate mmap & preload mmap & preload (default)
3 allocate allocate mmap & preload
4 allocate allocate allocate
5 expand allocate allocateNote: For a single sequence, all data structures use mmap. If mmap not available and allocate not chosen, then will use fileio (very slow)
- --nosplicing
- Turns off splicing (useful for aligning genomic sequences onto a genome)
- --min-intronlength=INT
- Min length for one internal intron (default 9). Below this size, a genomic gap will be considered a deletion rather than an intron.
- -K, --intronlength=INT
- Max length for one internal intron (default 1000000)
- -w, --localsplicedist=INT
- Max length for known splice sites at ends of sequence (default 200000)
- -L, --totallength=INT
- Max total intron length (default 2400000)
- -x, --chimera-margin=INT
- Amount of unaligned sequence that triggers search for the remaining sequence (default 40). Enables alignment of chimeric reads, and may help with some non-chimeric reads. To turn off, set to a large value (greater than the query length).
- -t, --nthreads=INT
- Number of worker threads
- -C, --chrsubsetfile=filename
- User-supplied chromosome subset file
- -c, --chrsubset=string
- Chromosome subset to search
- -z, --direction=STRING
- cDNA direction (sense_force, antisense_force, sense_filter, antisense_filter, or auto (default))
- -H, --trimendexons=INT
- Trim end exons with fewer than given number of matches (in nt, default 12)
- --cross-species
- For cross-species alignments, use a more sensitive search for canonical splicing
- --canonical-mode=INT
- Reward for canonical and semi-canonical introns 0=low reward, 1=high reward (default), 2=low reward for high-identity sequences and high reward otherwise
- --allow-close-indels=INT
- Allow an insertion and deletion close to each other (0=no, 1=yes (default), 2=only for high-quality alignments)
- --microexon-spliceprob=FLOAT
- Allow microexons only if one of the splice site probabilities is greater than this value (default 0.90)
- --cmetdir=STRING
- Directory for methylcytosine index files (created using cmetindex) (default is location of genome index files specified using -D, -V, and -d)
- --atoidir=STRING
- Directory for A-to-I RNA editing index files (created using atoiindex) (default is location of genome index files specified using -D, -V, and -d)
- --mode=STRING
- Alignment mode: standard (default), cmet-stranded, cmet-nonstranded, atoi-stranded, or atoi-nonstranded. Non-standard modes requires you to have previously run the cmetindex or atoiindex programs on the genome
- -p, --prunelevel
- Pruning level: 0=no pruning (default), 1=poor seqs, 2=repetitive seqs, 3=poor and repetitive
Output types
- -S, --summary
- Show summary of alignments only
- -A, --align
- Show alignments
- -3, --continuous
- Show alignment in three continuous lines
- -4, --continuous-by-exon
- Show alignment in three lines per exon
- -Z, --compress
- Print output in compressed format
- -E, --exons=STRING
- Print exons ("cdna" or "genomic")
- -P, --protein_dna
- Print protein sequence (cDNA)
- -Q, --protein_gen
- Print protein sequence (genomic)
- -f, --format=INT
-
Other format for output (also note the -A and -S options and other
options listed under Output types):
psl (or 1)= PSL (BLAT) format,
gff3_gene (or 2)= GFF3 gene format,
gff3_match_cdna (or 3)= GFF3 cDNA_match format,
gff3_match_est (or 4) = GFF3 EST_match format,
splicesites (or 6) = splicesites output (for GSNAP splicing file),
introns = introns output (for GSNAP splicing file),
map_exons (or 7) = IIT FASTA exon map format,
map_genes (or 8) = IIT FASTA map format,
coords (or 9) = coords in table format,
sampe = SAM format (setting paired_read bit in flag),
samse = SAM format (without setting paired_read bit)
Output options
- -n, --npaths=INT
- Maximum number of paths to show. If set to 0, prints two paths if chimera detected, else one.
- --quiet-if-excessive
- If more than maximum number of paths are found, then nothing is printed.
- --suboptimal-score=INT
- Report only paths whose score is within this value of the best path. By default, if this option is not provided, the program prints all paths found.
- -O, --ordered
- Print output in same order as input (relevant only if there is more than one worker thread)
- -5, --md5
- Print MD5 checksum for each query sequence
- -o, --chimera-overlap
- Overlap to show, if any, at chimera breakpoint
- --failsonly
- Print only failed alignments, those with no results
- --nofails
- Exclude printing of failed alignments
- --fails-as-input
- Print completely failed alignments as input FASTA or FASTQ format
- -V, --usesnps=STRING
- Use database containing known SNPs (in <STRING>.iit, built previously using snpindex) for reporting output
- --split-output=STRING
- Basename for multiple-file output, separately for nomapping, uniq, mult, (and chimera, if --chimera-margin is selected)
- --output-buffer-size=INT
- Buffer size, in queries, for output thread (default 1000). When the number of results to be printed exceeds this size, the worker threads are halted until the backlog is cleared
- -F, --fulllength
- Assume full-length protein, starting with Met
- --cdsstart=INT
- Translate codons from given nucleotide (1-based)
- -T, --truncate
- Truncate alignment around full-length protein, Met to Stop Implies -F flag.
- -Y, --tolerant
- Translates cDNA with corrections for frameshifts
Options for SAM output
- --no-sam-headers
- Do not print headers beginning with '@'
- --sam-use-0M
- Insert 0M in CIGAR between adjacent insertions and deletions Required by Picard, but can cause errors in other tools
- --read-group-id=STRING
- Value to put into read-group id (RG-ID) field
- --read-group-name=STRING
- Value to put into read-group name (RG-SM) field
- --read-group-library=STRING
- Value to put into read-group library (RG-LB) field
- --read-group-platform=STRING
- Value to put into read-group library (RG-PL) field
Options for quality scores
- --quality-protocol=STRING
-
Protocol for input quality scores. Allowed values:
illumina (ASCII 64-126) (equivalent to -J 64 -j -31)
sanger (ASCII 33-126) (equivalent to -J 33 -j 0)Default is sanger (no quality print shift) SAM output files should have quality scores in sanger protocol. Or you can specify the print shift with this flag:
- -j, --quality-print-shift=INT
- Shift FASTQ quality scores by this amount in output (default is 0 for sanger protocol; to change Illumina input to Sanger output, select -31)
External map file options
- -M, --mapdir=directory
- Map directory
- -m, --map=iitfile
- Map file. If argument is '?' (with the quotes), this lists available map files.
- -e, --mapexons
- Map each exon separately
- -b, --mapboth
- Report hits from both strands of genome
- -u, --flanking=INT
- Show flanking hits (default 0)
- --print-comment
- Show comment line for each hit
Alignment output options
- -N, --nolengths
- No intron lengths in alignment
- -I, --invertmode=INT
-
Mode for alignments to genomic (-) strand:
0=Don't invert the cDNA (default)
1=Invert cDNA and print genomic (-) strand
2=Invert cDNA and print genomic (+) strand - -i, --introngap=INT
- Nucleotides to show on each end of intron (default=3)
- -l, --wraplength=INT
- Wrap length for alignment (default=50)
Help options
- --version
- Show version
- --help
- Show this help message
ENVIRONMENT
- GMAPDB
- genome directory (eqivalent to -D)
FILES
- ~/.gmaprc
- configuration file
AUTHOR
Thomas D. Wu and Colin K. WatanabeREPORTING BUGS
Report bugs to Thomas Wu <twu@gene.com>.COPYRIGHT
Copyright 2005 Genentech, Inc. All rights reserved.SEE ALSO
gmap_setup(1), gsnap(1)http://research-pub.gene.com/gmap/