Except where noted, all gene sets were obtained from the BROAD Institute. Pairwise ortholog/in-paralog mapping to G217B was performed by running INPARANOID[12] with default parameters and no outgroup for each genome. Predicted genes were classified as validated by homology if they were a member of an orthogroup (direct ortholog to a gene in the target Danusertib mouse genome or in-paralog of a G217B gene with a direct ortholog in the target genome) for at least 3 of the 16 target genomes. Accession codes Microarray data have been selleck inhibitor submitted to the NCBI Gene Expression Omnibus (GEO) under accession number [GEO:GSE31155]. Nucleotide sequence
data for the reported novel TARs are available in the Third Party Annotation Section of the DDBJ/EMBL/GenBank databases under the accession numbers TPA: BK008128-BK008391. Acknowledgements This work was supported by the Burroughs Wellcome Fund (Request ID 1006254 to A.S.), U54 AI65359 (to A.S.), 2R01 AI066224-06 (to A.S.), and a Howard
Hughes Medical Institute Early Career Scientist Award (to A.S.). We are grateful to Elaine Mardis at the Washington University Genome Sequencing Center for spearheading the sequencing and annotation of the G217B genome, as well as timely sharing of data and resources. We thank the Sil lab for useful discussions and Davina Hocking Murray for assistance with figures. Electronic supplementary material Additional file 1: Table S1. CSV formatted table of gene validation ACP-196 nmr results, corresponding to the classification n Figure 7. Columns: gene – GSC predicted gene name, NAm1ortholog – BROAD gene name for the INPARANOID identified ortholog in H. capsulatum WU24, repeat, wgtaValid, exprValid, and orthoValid – 1 if a gene was classified as repeat or validations by tiling, expression, or homology respectively; also 0 otherwise. Sequences (G217B_predicted.fasta) and gene structures (G217B_predicted.gff3) of the GSC predictions are mirrored at http://histo.ucsf.edu/downloads/. (CSV 668 KB) Additional
file 2: Table S2. CSV formatted table giving GSC predicted gene names corresponding to H. capsulatum G217B genes referenced in the text. As noted in the results section, the predicted gene structures are not necessarily identical to experimentally characterized transcripts. (CSV 679 bytes) Additional file 3: Table S3. GFF3 formatted (tab delimited) table of detected TAR genomic coordinates. Coordinates are given relative to the 11/30/2004 GSC G217B assembly, which is mirrored at http://histo.ucsf.edu/downloads/F_HCG217B.fasta.041130.gz. (GFF3 474 KB) Additional file 4: Data S4. WIG formatted plus strand tiling probe intensities mapped to the 11/30/2004 GSC G217B assembly, suitable for viewing in Gbrowse2 http://gmod.org/wiki/GBrowse. (WIG 9 MB) Additional file 5: Data S5.