A total of 145 protein coding genes are un paired with 63 genes in group I, 8 apply for it genes in group II, and 74 genes in group III. When we characterized clustered protein coding genes with small RNAs, we identified 26 clus ters, which ranged in size from a cluster of 3 genes to clusters as large as 14 genes. A substantial number of clusters are in previously identified regions of D1, D2 and D4 genome duplication, which are segmental duplications with each segment flanked on both ends by inverted repeats such as IR/EhERE1/EhLINEs. Clusters that are not in regions of D1 D4 genome duplications are still flanked by repeti tive elements at either one or both ends. Thus, clustered genes are more likely to be associated with repetitive elements.
In order to determine whether small RNAs are en riched on paired or clustered protein coding genes or in the intergenic DNA regions, we calculated the small RNA density on these paired/clustered Inhibitors,Modulators,Libraries genes as well as the intergenic regions between genes. This was calcu lated as small RNA/bp. We identified that the density of small RNAs mapping to intergenic regions was signifi cantly lower compared to small RNA density mapping to paired/clustered genes. This indi cates that small RNA synthesis is most likely templated using a given gene rather than a long template covering several genes. For intergenic regions that had high small RNA density, we found these small RNAs are often in discrete sections or adjacent to predicted genes. Thus, we postulate that this may be due to small RNAs map ping to an unannotated gene or UTRs.
In summary, our analysis suggests that the small RNA targeted protein coding genes tend to be in pairs or clus Inhibitors,Modulators,Libraries ters, and that Inhibitors,Modulators,Libraries clusters of genes with small RNAs are more often associated with Inhibitors,Modulators,Libraries repetitive elements. Small RNA density on paired/clustered genes versus intergenic regions implies that it is unlikely for either DNA or a long transcript covering several genes to be used as Inhibitors,Modulators,Libraries a template, but rather that transcript derived from each gene is the most likely template. Small RNA distribution patterns within protein coding genes We have previously shown that small RNAs that map antisense to protein coding genes tend to be most abun dant toward the 50 end of genes. However, that analysis was done using a very limited dataset of small RNAs generated from Sanger sequencing.
Our new pyro phosphate sequencing dataset enabled us to examine this observation on a larger scale. Using the stringent criteria of 50 small RNAs mapping to a gene, a total of 226 pro tein coding genes were categorized as group I. We plotted small RNA distri bution along references each gene. There was a clear trend showing that most antisense small RNAs mapped toward the 50 termini of predicted genes. This trend holds true for most targeted protein coding genes and was not caused by a few genes with a high number of small RNAs at the 50 end.