Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

The UCSC Xena platform for public and private cancer genomics data visualization and interpretation

The UCSC Xena platform for public and private cancer genomics data visualization and interpretation bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. The UCSC Xena platform for public and private cancer genomics data visualization and interpretation 1+ 1+ 2 3 2 4 Mary Goldman *, Brian Craft , Mim Hastie , Kristupas Repečka , Fran McDade , Akhil Kamath , 5 6 2 7 1 Ayan Banerjee , Yunhai Luo , Dave Rogers , Angela N. Brooks , Jingchun Zhu *, and David Haussler UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA Clever Canary, New York, NY, USA Vilnius University, Vilnius, Lithuania Birla Institute of Technology and Science, Goa, India National Institute of Technology, Durgapur, India Department of Genetics, Stanford University, Stanford, CA USA Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA + These authors contributed equally to this work * Corresponding author 1 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Introduction There is a great need for easy-to-use cancer genomics visualization tools for both large public data resources such as TCGA (The Cancer Genome Atlas) (Chin 2011) and the GDC (Genomic Data Commons) (Grossman 2016), as well as smaller-scale datasets generated by individual labs. Commonly used interactive visualization tools are either web-based portals or desktop applications. Data portals have a dedicated backend and are a powerful means of viewing centrally hosted resource datasets (e.g. Xena’s predecessor, the UCSC Cancer Browser (currently retired, Zhu 2009), cBioPortal (Cerami 2012), ICGC (International Cancer Genomics Consortium) Data Portal (Zhang 2019), GDC Data Portal (Grossman 2016)). However, researchers wishing to use a data portal to explore their own data have to either a redeploy the entire platform, a difficult task even for bioinformaticians, or upload private data to a server outside the user's control, a non-starter for protected patient data such as germline variants (e.g. MAGI (Mutation Annotation and Genome Interpretation, Leiserson 2015), WebMeV (Wang 2017), Ordino (Streit 2018)). Desktop tools can view a user’s own data securely (e.g. IGV (Integrated Genomics Viewer, Thorvaldsdóttir 2013), Gitools (Perez-Llamas 2011)), but lack well-maintained, prebuilt files for the ever-evolving and expanding public data resources. This dichotomy between data portals and desktop tools highlights the challenge of using a single platform for both large public data and smaller-scale datasets generated by individual labs. Complicating this dichotomy is the expanding amount, and complexity of, cancer genomics data resulting from numerous technological advances, including lower-cost high-throughput sequencing and single-cell based technologies. Cancer genomics datasets are now being generated using new assays such as whole-genome sequencing (Campbell 2017), DNA methylation whole- 2 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. genome bisulfite sequencing (Zhou 2018), and ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing, Corces 2018). Visualizing and exploring these diverse data modalities is important but challenging, especially as many tools have traditionally specialized in only one or perhaps a few data types. And while these complex datasets generate insights individually, integration with other –omics datasets is crucial to help researchers discover and validate findings. UCSC Xena was developed as a high-performance visualization and analysis tool for both large public repositories and private datasets. It was built to scale with the current, and future, data growth and complexity. Xena’s unique privacy-aware architecture enables cancer researchers of all computational backgrounds to explore large diverse datasets. Researchers use the same system to securely explore their own data, together or separately from the public data, all while keeping private data secure. The system easily supports many tens of thousands of samples and has been tested up to as many as a million cells. The simple and flexible architecture supports a variety of common and uncommon data types. Xena's unique Visual Spreadsheet visualization integrates gene-centric and genomic-coordinate-centric views across multiple data modalities, providing a deep, comprehensive view of genomic events within a cohort of tumors. Xena’s privacy-aware architecture UCSC Xena (http://xena.ucsc.edu) has two components: the frontend Xena Browser and the backend Xena Hubs (Figure 1). The web-based Xena Browser empowers biologists to explore data across multiple Xena Hubs with a variety of visualizations and analyses. The backend Xena Hubs host genomics data from laptops, public servers, behind a firewall, or in the cloud, and are configured to be public or private (Supplemental Figure 1). The Xena Browser receives data simultaneously from multiple Xena Hubs and integrates them into a single coherent visualization within the browser. 3 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. There are two types of private Xena hubs (Supplemental Figure 2). The first type is a hub installed on a user’s own computer. It is configured to only respond to requests from the computer’s localhost network interface (i.e. http://127.0.0.1). This ensures that the hub only communicates with the computer on which the hub is installed. The second type of private hub is one configured to respond to requests from external computers, however access to the computer is controlled via a firewall or similar technology. The hub uses the security provided by the computer to secure the data. This model takes advantage of security that is typically already in place to protect such data, thereby reducing user workload by not requiring re-authorization. Users who host this type of private data will share the URL with authorized individuals. Other users who may inadvertently acquire knowledge of the protected hub will not be able to connect due to the firewall. This second type of Xena Hub is useful to share private data within a lab or institution. In addition to the private hubs, there are public Xena hubs, that is hubs that are configured to respond to requests from external computers and are not blocked by a firewall (Supplemental Figure 2). Public hubs enable data sharing by hosting large public resources. While we host a number of public hubs (Supplemental Table 1), users can also set up their own. An example of one is the Treehouse Hub set up by the Childhood Cancer Initiative to share pediatric cancer RNA-seq gene expression data (Supplemental Note). Public and private Xena Hubs use the same software; the only difference is in their configuration. Hubs default to only respond to requests from the computer’s localhost network, locking down data accessibility to the host computer. Hubs only respond to external network requests if a user configures the hub to do so. Xena Hubs are designed to be turn-key, allowing users who may not be computationally savvy to easily install and use a Xena Hub on their personal computer (https://xena.ucsc.edu/private-hubs/). An interactive setup wizard guides users through the process of 4 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. installing and running a Xena Hub on their Windows or Mac computers, while a web wizard guides them through the data loading process (https://tinyurl.com/localXenaHub). In addition to these user- friendly wizards, Xena Hubs can be installed and used via the command line on Windows, Mac and Linux machines. The Xena Browser automatically connects to a default list of the public hubs we host (Supplemental Table 1), and, if it exists, the private local hub on users’ computer. Users can add a new public or private hub by entering the hub URL address on the Xena Browser on the Data Hubs page. Data integration occurs only within the Xena Browser, keeping private data secure. Genomic data flows from a Xena Hub to the Xena Browser (Figure 1), and never communicates the data it displays back to any server. The only exception to this model occurs when saving bookmarks as URLs, a feature that allows users to save live views of their current visualization. If a visualization contains only data from public Xena Hubs, users can generate a URL for their current view, which will take researchers back to the live browser session. Since the data is already public, we store the data in view for each URL on our web server, allowing it to be shared with colleagues or included in presentations. If a view contains any data from a non-public Xena Hub, users are instead required to download the current visualization as a file. By giving users a file instead of a URL, we ensure that we never keep user’s private data on our public servers. This file can then be shared via email, etc and then imported back into the Xena Browser to recreate the live browser session. Thus, even when bookmarking, protected data is kept secure through Xena’s architecture and the use of private hubs. 5 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. In addition to its security advantages, Xena’s unique architecture of a decoupled Xena Browser and multiple Xena Hubs enables several other features. First, researchers can easily view their own private data by installing their own Xena Hub. Xena Hubs are lightweight compared to a full-fledged application and install easily on most computers. Second, users can use the same platform to view both public and private data together. Xena integrates data across multiple hubs, allowing users to view data from separate hubs as a coherent data resource (Figure 2). Xena does this while keeping private data secure and avoiding the need to download large public resources. This is especially useful for researchers who wish to view their own analysis results on public data, such as their own clustering calls, but don’t want to host a separate version of these resources. Third, the Xena platform scales easily. As more datasets are generated, more Xena Hubs are added to the network, effectively growing with expanding genomics resources. These advantages, in addition to the security advantages, are a major departure from and innovation over the UCSC Cancer Browser. Xena Browser visualizations and functionalities The Xena Browser (https://xenabrowser.net) has a wide variety of visualizations and analyses including survival analyses, scatter plots, bar graphs, statistical tests, genomic signatures, as well as our unique Visual Spreadsheet view. The Xena Visual Spreadsheet was designed to enable and enhance integration across diverse data modalities, providing researchers a more biologically complete understanding of genomic events and tumor biology. Analogous to an office spreadsheet application, it is a visual representation of a data grid where each column is a slice of genomic or phenotypic data (e.g. gene expression, mutation calls, methylation probes, subtype classifications, or age), and each row is a single entity (e.g. a bulk tumor sample, cell line, or single cell) (Figure 2). Xena's Visual Spreadsheet displays genomic data in wide variety of gene-centric, coordinate-centric, 6 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. and feature-centric views (Supplemental Figure 3) for both coding and non-coding regions (Supplemental Figure 4). Dynamic web links to the UCSC Genome Browser give genomic context to any gene, chromosome region, or feature. Researchers can easily re-order the Visual Spreadsheet, hierarchically cluster genes, zoom in to just a few samples or out to the whole cohort, all leading to an infinite variety of views in real time. These dynamic views enable the discovery of patterns among genomic and phenotype parameters, even when the data are hosted across multiple data hubs (Figure 2). The power of the Visual Spreadsheet is its deep data integration. Integration across different data modalities, such as gene expression, copy number variation, and DNA methylation, gives users a more comprehensive view of a genomic event in a tumor sample. For example, Xena's Visual Spreadsheet can help elucidate if higher expression for a gene is driven by copy number amplification, or by a missense mutation (Supplemental Figure 5), or by demethylation and opening of the promoter region as reflected in the DNA methylation and ATAC-seq data (Supplemental Figure 3). Integration across gene- and coordinate-centric views helps users examine genomic events in different chromosome contexts. For example, Xena's Visual Spreadsheet can help elucidate if a gene amplification is part of a chromosomal arm duplication or a focal amplification (Supplemental Figure 6). Integration across genomic and clinical data gives users the ability to make connections between genomic patterns and clinically relevant phenotypes such as subtypes. For example, Xena's Visual Spreadsheet can help elucidate if increased HRD (Homologous Recombination Deficiency) signature scores are enriched in a specific cancer type or subtypes (Supplemental Figure 7). Finally, integration across user’s own data and public resources on the same samples helps users to gain insights into their own data. For example, Xena’s Visual Spreadsheet can help a researcher see how a fusion call from the literature relates to the expression of other downstream genes (Figure 2). By not 7 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. differentiating between public data and private data from rendering perspective, it appears to the user that all data come from a coherent source. These diverse integrations help researchers harness the power of comprehensive genomics studies, either their own or of public resources, driving discovery and a deeper understanding of cancer biology. In addition to the Visual Spreadsheet, Xena has many additional powerful views, analyses, and functionalities. Our powerful text-based search allows users to dynamically highlight, filter, and group samples (Supplemental Figure 8). Researchers use this to search the data on the screen similar to the ‘find’ functionality in Microsoft Word. Samples are matched and highlighted in real-time as the user types. Researchers can then filter to their samples of interest, or dynamically build subgroups. This is a powerful way to dynamically construct sub-populations based on any genomic data for comparison and analysis. Xena also has highly configurable Kaplan-Meier analyses, bar charts, box plots, and scatter plots, all with statistical tests automatically computed (Supplemental Figure 7, Supplemental Figure 9). We support data sharing through bookmarked views and high resolution PDFs. Genomic signatures are easily built over gene expression data or any other genomic data type. Performance is critical for interactive visualization tools, especially on the web. Growing sample sizes for genomic experiments has become a challenge for many tools, including for the UCSC Cancer Browser. Knowing this, we optimized Xena to support visualizations on many tens of thousands of samples, delivering slices of data in milliseconds to a few seconds. During the first 8 months in 2019, we averaged 1,400 users/week with an average concurrent current usage of 3.34 users. To ensure we will continue to be performant as we scale, we tested our public hubs deployed in the cloud with 50 concurrent requests and had an average response rate of 244 ms. 8 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Supported data types and public Xena Hubs Today, cancer genomics research studies commonly collect data on somatic mutations, copy number, and gene expression, with other data types being relatively rare. However, as genomics technology advances, we expect these rarer data types to increase in frequency and new data types to be produced. With this in mind we designed Xena to be able to load any tabular or matrix formatted data, giving us exceptional flexibility in the types of data we can visualize, such as ATACseq peak signals (Supplemental Figure 1) and structural variant data (Supplemental Figure 10), a significant advantage over the UCSC Cancer Browser. Current supported data modalities include somatic and germline SNPs (Single Nucleotide Polymorphisms), INDELs, large structural variants, copy number variation, gene-, transcript-, exon-, protein-, miRNA-expression, DNA methylation, ATAC-seq peak signals, phenotypes, clinical data, and sample annotations. UCSC Xena provides interactive online visualization of seminal cancer genomics datasets through multiple public Xena Hubs. We host over 1600 datasets from more than 50 cancer types, including the latest from TCGA, ICGC, TCGA Pan-Cancer Atlas (Hoadley 2018), and the GDC (Supplemental Table 1). Xena Hubs offer a significant and important performance advantage over these resources’ native APIs, especially when visualizing more than just a few samples. We use custom ETL (Extract-Transform-Load) processes to keep the Xena Hubs updated with the latest data from their respective sources (Supplemental Figure 1). We only download and process the derived datasets from each source, such as gene expression values, leaving the raw sequencing data at their respective locations. Xena complements each of these resources by providing powerful interactive visualizations for these data. In addition to these well-known resources, we also host results from the UCSC Toil RNAseq recompute compendium, a uniformly re-aligned and re-called gene and transcript expression dataset 9 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. for all TCGA, TARGET and GTEx samples (Vivian 2017). This dataset allows users to compare gene and transcript expression of TCGA 'tumor' samples to corresponding GTEx 'normal' samples. The UCSC Public hub hosts data curated from various publications. Conclusion UCSC Xena complements existing tools including the cBioPortal, ICGC Portal, GDC Portal, IGV, and St. Jude Cloud (Ma 2018) in a number of ways. First, our focus is on providing researchers a lightweight, easy-to-install platform to visualize their own data as well as data from the public sphere. By visualizing data across multiple hubs simultaneously, Xena differentiates itself from other tools by enabling researchers to view their own data together with consortium data while still maintaining privacy. Further, Xena focuses on integrative visualization of multi-omics datasets across different genomic contexts, including genes, genomic elements, or any genomic region, for both coding and non-coding parts of the genome. Finally, Xena is built for performance. It can easily visualize of tens of thousands of samples in a few seconds and has been tested on single-cell data with up to a million cells. With single-cell technology, datasets will become orders of magnitude larger than traditional bulk tumor samples - Xena is well positioned to rise to this challenge. While it is widely recognized that data sharing is key to advancing cancer research, how it is shared can impact the ease of data access. UCSC Xena is designed for cancer researchers both with and without computational expertise to easily share and access data. Users without a strong computational background can explore their own data by installing a Xena Hub on their personal computer using our installation and data upload wizards. Bioinformaticians can install a private or public Xena Hub on a server, in the cloud, or as part of an analysis pipeline, making generated data available in a user-friendly manner that requires little extra effort. Security for private hubs shared with 10 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. a limited set of researchers is currently provided by protecting the computer itself such as using a firewall, in the future, we plan to develop hub-wide user authorization capability. This would be useful for collaborative projects who share private data with users across multiple institutions. It will also allow integration with existing federated authentication and authorization services. Data sharing has, and will continue to, advance cancer biology and Xena is part of the technological ecosystem that supports this. UCSC Xena is a scalable solution to the rapidly expanding and decentralized cancer genomics data. Xena's architecture, with its web-browser-based visualization and separate data hubs, allows new projects to easily add their data to the growing public compendium. We support many different data modalities, both now and in future, by maintaining flexible input formats. Xena excels at showing trends across cohorts of samples, cells, or cell lines. While we have focused on cancer genomics, the platform is general enough to host any functional genomics data. In this age of expanding data resources, Xena's design supports the ongoing data sharing, integration, and visualization needs of the cancer research community. Acknowledgements Research reported in this publication was supported by National Cancer Institute of the National Institutes of Health under award numbers 5U24CA180951-04 and 5U24CA210974-02. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This project has also been made possible in part by grant number 2018- 182812 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. We would also like to thank AWS Cloud Credits for Research and Google Summer of Code. 11 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. References Campbell, P. J., Getz, G., Stuart, J. M., Korbel, J. O., Stein, L. D., et al. Pan-cancer analysis of whole genomes. Preprint at https://www.biorxiv.org/content/early/2017/07/12/162784 (2017). Cerami, E., Gao, J., Dogrusoz, U., Gross, B. E, Sumer, S. O., et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discovery 5, 401-404 (2012). Chin, L., Hahn, W.C., Getz, G. & Meyerson, M. Making sense of cancer genomic data. Genes & Development 25, 534-555 (2011). Corces, M. R., Granja, J. M., Shams, S., Louie, B. H., Seoane, J. A., Zhou, W., e al. The chromatin accessibility landscape of primary human cancers. Science 362, 6413 (2018). Gao, Q., Liang, W.W., Foltz, S.M. et al. Driver Fusions and Their Implications in the Development and Treatment of Human Cancers. Cell Rep. 23, 227-238.e3 (2018). Grossman, R.L., Heath, A. P., Ferretti, V., Varmus, H. E., Lowy, D. R., et al. Toward a Shared Vision for Cancer Genomic Data. New England Journal of Medicine 375, 1109-1112 (2016). GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204– 213 (2017). Hoadley, K. A., Yau, C., Hinoue, T., Wolf, D. M., Lazar, A. J., et al. Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell 173, 291–304 (2018). Leiserson, M.D.M., Gramazio, C.C., Hu, J., Wu, H.T., Laidlaw, D.H. & Raphael, B.J. MAGI: visualization and collaborative annotation of genomic aberrations. Nature Methods 12, 483–484 (2015). Ma, X., Liu, Y., Liu, Y., Alexandrov, L. B., Edmonson, M.N., et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature 555, 371-376 (2018). Perez-Llamas, C. & Lopez-Bigas, N. Gitools: analysis and visualisation of genomic data using interactive heat-maps. PLoS One 6, e19541 (2011). Streit, M., Gratzl, S., Stitz, H., Wernitznig, A., Zichner, T., & Haslinger, C. Ordino: a visual cancer analysis tool for ranking and exploring genes, cell lines, and tissue samples. Bioinformatics (2019). Thorvaldsdóttir, H., Robinson, J. T., & Mesirov, J. P. Integrative Genomics Viewer (IGV): high- performance genomics data visualization and exploration. Briefings in Bioinformatics 14, 178-192 (2013). 12 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Vivian, J., Rao, A. A., Nothaft, F. A., Ketchum, C., Armstrong, J., et al. Toil enables reproducible, open source, big biomedical data analyses. Nature Biotechnology 35, 314-316 (2017). Wang, Y.E., Kutnetsov, L., Partensky, A., Farid, J., & Quackenbush, J. WebMeV: A Cloud Platform for Analyzing and Visualizing Cancer Genomic Data. Cancer Research 77, e11-e14 (2017). Zhang J., Bajari R., Andric D., Gerthoffert F., Lepsa A., Nahal-Bose H., Stein L. D., & Ferretti V. The International Cancer Genome Consortium Data Portal. Nature Biotechnology 37, 367-369 (2019). Zhou, W., Dinh, H.Q., Ramjan, Z., Weisenberger, D.J., Nicolet, et al. DNA methylation loss in late- replicating domains is linked to mitotic cell division. Nature Genetics 50, 591-602 (2018). Zhu, J., Sanborn, J.Z., Benz, S., Szeto, C., Hsu, F., Kuhn, R.M., Karolchik, D., Archie, J., Lenburg, M.E., Esserman, L.J., et al. The UCSC Cancer Genomics Browser. Nature Methods 6, 239–240 (2009). 13 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Figure 1 Figure 1. Xena’s architecture to securely join public and private data. Data always flows from the Xena Hubs to the Xena Browser for visualization and integration. a) User’s web browser (e.g. Chrome) requests the Xena Browser code and runs it. b) Using the Xena Browser, the user requests a visualization, initiating a request for data from the Xena Browser’s list of public hubs. Simultaneous with this request, the Xena Browser requests data from the private local hub on the user’s computer. c) The Xena Browser code combines data from all Xena Hubs together into one coherent visualization. The user can then interact with the visualization to trigger a new data request. 14 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Figure 2 b. c. A B C CNV D gene exp E exon expression F mutation Sample Fusion chr21:37543774-... ERG ERG SPOP TCGA ERG- 37M 1Mb 44M 100kb 5’ 100kb 3’ 5’ 3’ TMPRSS2 Xena Hub ER G T MP R SS2 Yes a. local Xena Hub 50 samples web browser No Xena Browser User’s private data public hub private hub -0.5 0.5 7.3 12 low high Deleterious log2(tumor/normal) log2(count+1) log2(RPKM+1) Missense Figure 2. An example Xena Browser Visual Spreadsheet examining published ERG (ETS Transcription Factor) - TMPRSS2 (Transmembrane Serine Protease 2) fusion calls in TCGA PRAD (prostate cancer) by combining data from local and public Xena Hubs together. a) A user downloaded ERG-TMPRSS2 fusion calls on TCGA PRAD samples from Gao et al. 2018 (n=492) and loaded the data into their own local Xena Hub. b) TCGA copy number, gene expression and mutation data from the same samples are available via the public TCGA hub. c) The user then compared the fusion calls to the public data using Xena Browser Visual Spreadsheet. Column B is the fusion call from Gao et al. Column C is copy number variation data, zoomed in to a region of chromosome 21 (37-44Mb). Amplifications are in red and deletions are in blue. The diagram at the top shows genes along the chromosome, where red genes are on the positive strand and blue are on the negative strand. Columns D is ERG gene expression and Column E is ERG exon expression. Expression is colored red to green for high to low expression. The gene diagram at the top shows exons as boxes, with tall 15 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. coding regions and shorter untranslated regions. Column F is SPOP (Speckle Type BTB/POZ Protein) mutation status and also has a gene diagram at the top. The position of each mutation is marked in relation to the gene diagram and colored by its functional impact: deleterious mutations are red and missenses are blue. We can see that the fusion calls are highly consistent with the characteristic overexpression of ERG (columns D, E). However, only a subset of those samples in which a fusion was called can be seen to also have the fusion event observed in the copy number data via an intra-chromosomal deletion of chromosome 21 that fuses TMPRSS2 to ERG as shown in column C. This observation is consistent with the 63.3% validation rate described in Gao et. al. 2018. SPOP mutations (blue tick marks in column F) are mutually exclusive with the fusion event. Rows are sorted by the left-most data column (column B) and subsorted on columns thereafter. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png bioRxiv bioRxiv

The UCSC Xena platform for public and private cancer genomics data visualization and interpretation

Loading next page...
 
/lp/biorxiv/the-ucsc-xena-platform-for-public-and-private-cancer-genomics-data-xLGMC91f8Q

References (42)

  • Z. Stephens, Sk Lee, F. Faghri, R. Campbell, ChengXiang Zhai, Miles Efron, R. Iyer, M. Schatz, S. Sinha, G. Robinson (2015)

    Big Data: Astronomical or Genomical?

    PLoS Biology, 13

  • Eric Be, Eric Be, E. Collisson, Joshua Campbell, Angela Brooks, A. Berger, William Lee, J. Chmielecki, D. Beer, L. Cope, C. Creighton, Ludmila Danilova, L. Ding, G. Getz, P. Hammerman, D. Hayes, Bryan Hernandez, J. Herman, J. Heymach, I. Jurisica, R. Kucherlapati, D. Kwiatkowski, M. Ladanyi, Gordon Robertson, N. Schultz, R. Shen, Rileen Sinha, C. Sougnez, M. Tsao, W. Travis, J. Weinstein, D. Wigle, M. Wilkerson, Andy Chu, A. Cherniack, Angela Hadjipanayis, Mara Rosenberg, D. Weisenberger, P. Laird, Amie Radenbaugh, Singer Ma, Joshua Stuart, Lauren Byers, S. Baylin, R. Govindan, M. Meyerson, Mara Li, S. Gabriel, K. Cibulskis, Jaegil Kim, C. Stewart, Lee Lichtenstein, E. Lander, M. Lawrence, Cyriac M, C. Kandoth, R. Fulton, L. Fulton, M. McLellan, R. Wilson, K. Ye, C. Fronick, Christopher Maher, Christopher Miller, M. Wendl, Christopher Cabanski, E. Mardis, C. Wheeler, David Wheeler, Miruna Dhalla, M. Balasundaram, Y. Butterfield, R. Carlsen, E. Chuah, Noreen Dhalla, R. Guin, Carrie Hirst, Darlene Lee, H. Li, Michael Mayo, Richard Moore, A. Mungall, J. Schein, Payal Sipahimalani, Angela Tam, R. Varhol, A. Robertson, N. Wye, N. Thiessen, R. Holt, Steven Jones, M. Marra, Joshua Hodi, M. Imieliński, R. Onofrio, Eran Hodis, Travis Zack, E. Helman, Chandra Pedamallu, J. Mesirov, G. Saksena, S. Schumacher, S. Carter, L. Garraway, R. Beroukhim, Angela Re, Semin Lee, Harshad Mahadeshwar, A. Pantazi, A. Protopopov, X. Ren, S. Seth, Xingzhi Song, Jiabin Tang, Lixing Yang, Jianhua Zhang, Peng-Chieh Chen, Michael Parfenov, Andrew Xu, Netty Santoso, L. Chin, Peter Park, Katherine T, K. Hoadley, J. Auman, S. Meng, Yan Shi, Elizabeth Buda, S. Waring, Umadevi Veluvolu, Donghui Tan, P. Mieczkowski, Corbin Jones, J. Simons, Matthew Soloway, T. Bodenheimer, S. Jefferys, J. Roach, A. Hoyle, Junyuan Wu, S. Balu, Darshan Singh, J. Prins, J. Marron, J. Parker, C. Perou, Jinze Liu, Leslie Bootwalla, D. Maglinte, Philip Lai, M. Bootwalla, D. Berg, Timothy Jr, Mara Mallard, Juok Cho, D. Dicara, David Heiman, Pei Lin, William Mallard, Douglas Voet, Hailei Zhang, L. Zou, M. Noble, N. Gehlenborg, H. Thorvaldsdóttir, Marc-Danie Nazaire, Jim Robinson, William Gross, B. Aksoy, G. Ciriello, B. Taylor, Gideon Dresdner, Jianjiong Gao, Benjamin Gross, V. Seshan, B. Reva, Rileen Sinha, S. Sumer, Nils Weinhold, C. Sander, Sam Haussler, S. Ng, Jingchun Zhu, C. Benz, C. Yau, D. Haussler, P. Spellman, Matthew Perou, P. Kimes, Bradley Liu, B. Broom, Jing Wang, Yiling Lu, Patrick Ng, L. Diao, Wenbin Liu, C. Amos, R. Akbani, G. Mills, Erin Gardn, Erin Curley, J. Paulauskis, Kevin Lau, S. Morris, T. Shelton, D. Mallery, J. Gardner, R. Penny, Charles Tarvin, Charles Saller, Katherine Tarvin, W. Richards, Robert Bryant, R. Cerfolio, A. Bryant, Daniel Farver, D. Raymond, N. Pennell, C. Farver, Christine Raben, Christine Czerwinski, L. Huelsenbeck-Dill, M. Iacocca, N. Petrelli, B. Rabeno, Jennifer Brown, Thomas Bauer, Oleg Nemirovich-Dan, O. Dolzhanskiy, O. Potapova, D. Rotin, Olga Voronina, Elena Nemirovich-Danchenko, K. Fedosenko, Anthony Sica, A. Gal, M. Behera, S. Ramalingam, G. Sica, Douglas Weaver, D. Flieder, J. Boyd, J. Weaver, Bernard Thinh, B. Kohl, Dang Thinh, G. Sandusky, Hartmut Juhl, E. Duhig, Peter Brock, P. Illei, E. Gabrielson, James Shin, Beverly Lee, Kristen Rodgers, D. Trusty, M. Brock, Christina Sullivan, C. Williamson, E. Burks, K. Rieger-Christ, A. Holway, T. Sullivan, Dennis Kosari, M. Asiedu, F. Kosari, William Rusch, N. Rekhtman, M. Zakowski, V. Rusch, Paul Owusu-Sarpong, Paul Zippile, James Suh, H. Pass, C. Goparaju, Y. Owusu-Sarpong, John Albert, John Bartlett, S. Kodeeswaran, J. Parfitt, H. Sekhon, Monique Albert, John Myers, J. Eckman, J. Myers, Richard Gaudioso, R. Cheney, Carl Morrison, Carmelo Gaudioso, Jeffrey Liptay, J. Borgia, P. Bonomi, M. Pool, M. Liptay, Fedor Zaytseva, F. Moiseenko, I. Zaytseva, Hendrik Muley, H. Dienemann, M. Meister, P. Schnabel, T. Muley, M. Peifer, Carmen Egea, C. Gomez-Fernandez, Lynn Herbert, Sophie Egea, Mei Kimryn, Mei Huang, L. Thorne, L. Boice, Ashley Salazar, W. Funkhouser, W. Rathmell, Rajiv Siegfried, R. Dhir, S. Yousem, S. Dacic, F. Schneider, J. Siegfried, R. Hajek, Mark Meyers, M. Watson, Sandra McDonald, B. Meyers, Belinda Bowman, B. Clarke, I. Yang, K. Fong, L. Hunter, M. Windsor, R. Bowman, Solange Letovanec, Solange Peters, I. Letovanec, K. Khan, Mark Pot, M. Jensen, E. Snyder, Deepak Srinivasan, A. Kahn, J. Baboud, D. Pot, Kenna Tarnuz, Kenna Shaw, Margi Sheth, Tanja Davidsen, John Demchok, Liming Yang, Zhining Wang, R. Tarnuzzer, Jean Zenklusen, Bradley Sofia, B. Ozenberger, H. Sofia, William Illei, E. Duhig (2014)

    Comprehensive molecular profiling of lung adenocarcinoma

    Nature, 511

  • M. Corces, Jeffrey Granja, Shadi Shams, Bryan Louie, J. Seoane, Wanding Zhou, T. Silva, C. Groeneveld, Christopher Wong, S. Cho, Ansuman Satpathy, Maxwell Mumbach, K. Hoadley, A. Robertson, Nathan Sheffield, Ina Felau, Mauro Castro, B. Berman, L. Staudt, J. Zenklusen, P. Laird, C. Curtis, W. Greenleaf, Howard Chang (2018)

    The chromatin accessibility landscape of primary human cancers

    Science, 362

  • Quin Wills, A. Mead (2015)

    Application of single-cell genomics in cancer: promise and challenges

    Human Molecular Genetics, 24

  • M. Streit, S. Gratzl, Holger Stitz, Andreas Wernitznig, T. Zichner, C. Haslinger (2019)

    Ordino: a visual cancer analysis tool for ranking and exploring genes, cell lines and tissue samples

    Bioinformatics, 35

  • T. Hudson, W. Anderson, Axel Artez, A. Barker, C. Bell, R. Bernabé, M. Bhan, F. Calvo, I. Eerola, D. Gerhard, A. Guttmacher, M. Guyer, F. Hemsley, Jennifer Jennings, D. Kerr, P. Klatt, Patrik Kolar, Jun Kusada, D. Lane, F. Laplace, Lu Youyong, G. Nettekoven, B. Ozenberger, Jane Peterson, T. Rao, J. Remacle, A. Schafer, T. Shibata, M. Stratton, J. Vockley, Koichi Watanabe, Huanming Yang, M. Yuen, B. Knoppers, M. Bobrow, A. Cambon-Thomsen, L. Dressler, S. Dyke, Y. Joly, Kazuto Kato, Karen Kennedy, Pilar Nicolàs, M. Parker, E. Rial‐Sebbag, C. Romeo-Casabona, K. Shaw, S. Wallace, G. Wiesner, N. Zeps, P. Lichter, A. Biankin, C. Chabannon, L. Chin, B. Clement, E. Álava, F. Degos, M. Ferguson, Peter Geary, D. Hayes, A. Johns, A. Kasprzyk, H. Nakagawa, R. Penny, M. Piris, R. Sarin, A. Scarpa, M. Vijver, P. Futreal, H. Aburatani, M. Bayés, David Botwell, P. Campbell, X. Estivill, S. Grimmond, I. Gut, M. Hirst, C. López-Otín, P. Majumder, M. Marra, J. McPherson, Z. Ning, X. Puente, Y. Ruan, H. Stunnenberg, H. Swerdlow, V. Velculescu, R. Wilson, H. Xue, Liu Yang, P. Spellman, Gary Bader, P. Boutros, Paul Flicek, G. Getz, R. Guigó, Guangwu Guo, D. Haussler, S. Heath, T. Hubbard, T. Jiang, Steven Jones, Qibin Li, N. López-Bigas, Ruibang Luo, L. Muthuswamy, B. Ouellette, J. Pearson, V. Quesada, Benjamin Raphael, C. Sander, T. Speed, Lincoln Stein, Joshua Stuart, J. Teague, Y. Totoki, T. Tsunoda, A. Valencia, D. Wheeler, Honglong Wu, Shancen Zhao, Guangyu Zhou, M. Lathrop, G. Thomas, Teruhiko Yoshida, M. Axton, C. Gunter, L. Miller, Junjun Zhang, Syed Haider, Jianxin Wang, C. Yung, A. Cros, Yong Liang, S. Gnaneshan, J. Guberman, J. Hsu, D. Chalmers, K. Hasel, T. Kaan, W. Lowrance, T. Masui, L. Rodriguez, C. Vergely, D. Bowtell, N. Cloonan, A. deFazio, J. Eshleman, D. Etemadmoghadam, B. Gardiner, J. Kench, R. Sutherland, M. Tempero, N. Waddell, P. Wilson, S. Gallinger, M. Tsao, P. Shaw, G. Petersen, D. Mukhopadhyay, R. DePinho, S. Thayer, K. Shazand, Timothy Beck, M. Sam, Lee Timms, Vanessa Ballin, Youyong Lu, J. Ji, Xiuqing Zhang, Feng Chen, Xueda Hu, Qi Yang, G. Tian, Lianhai Zhang, Xiaofang Xing, Xianghong Li, Zheng‐gang Zhu, Yingyan Yu, Jun Yu, J. Tost, P. Brennan, I. Holcatova, D. Zaridze, A. Brazma, L. Egevard, E. Prokhortchouk, R. Banks, M. Uhlén, Juris Viksna, F. Pontén, K. Skryabin, E. Birney, Å. Borg, A. Børresen-Dale, C. Caldas, J. Foekens, Sancha Martin, J. Reis-Filho, A. Richardson, C. Sotiriou, G. Thoms, L. Veer, D. Birnbaum, H. Blanché, Pascal Boucher, S. Boyault, Jocelyne Masson-Jacquemier, I. Pauporté, X. Pivot, A. Vincent-Salomon, E. Tabone, C. Theillet, I. Treilleux, P. Bioulac-Sage, T. Decaens, D. Franco, M. Gut, Didier Samuel, J. Zucman‐Rossi, R. Eils, B. Brors, J. Korbel, A. Korshunov, P. Landgraf, H. Lehrach, S. Pfister, B. Radlwimmer, G. Reifenberger, Michael Taylor, C. Kalle, P. Majumder, P. Pederzoli, R. Lawlor, M. Delledonne, A. Bardelli, T. Gress, D. Klimstra, G. Zamboni, Y. Nakamura, S. Miyano, Akihiro Fujimoto, E. Campo, S. Sanjosé, E. Montserrat, M. González-Díaz, P. Jares, H. Himmelbauer, S. Beà, S. Aparicio, D. Easton, F. Collins, C. Compton, E. Lander, W. Burke, A. Green, S. Hamilton, O. Kallioniemi, T. Ley, E. Liu, B. Wainwright (2010)

    International network of cancer genome projects

    Nature, 464

  • Xiaotu Ma, Yu Liu, Yanling Liu, L. Alexandrov, M. Edmonson, Charles Gawad, Xin Zhou, Yongjin Li, Michael Rusch, J. Easton, R. Huether, V. Gonzalez-Pena, M. Wilkinson, L. Hermida, S. Davis, Edgar Sioson, S. Pounds, Xueyuan Cao, R. Ries, Zhaoming Wang, Xiang Chen, Li Dong, S. Diskin, Malcolm Smith, J. Auvil, P. Meltzer, C. Lau, E. Perlman, J. Maris, Soheil Meschinchi, S. Hunger, D. Gerhard, Jinghui Zhang (2018)

    Pan-cancer genome and transcriptome analyses of 1,699 pediatric leukemias and solid tumors

    Nature, 555

  • Monica Kong-Beltran, S. Seshagiri, J. Zha, Wenjing Zhu, K. Bhawe, N. Mendoza, Thomas Holcomb, Kanan Pujara, Jeremy Stinson, L. Fu, Christophe Severin, Linda Rangell, R. Schwall, L. Amler, D. Wickramasinghe, R. Yauch (2006)

    Somatic mutations lead to an oncogenic deletion of met in lung cancer.

    Cancer research, 66 1

  • R. Grossman, Allison Heath, Vincent Ferretti, H. Varmus, D. Lowy, W. Kibbe, L. Staudt (2016)

    Toward a Shared Vision for Cancer Genomic Data.

    The New England journal of medicine, 375 12

  • Junjun Zhang, Joachim Baran, A. Cros, Jonathan Guberman, Syed Haider, J. Hsu, Yong Liang, Elena Rivkin, Jianxin Wang, Brett Whitty, Marie Wong-Erasmus, Long Yao, Arek Kasprzyk (2011)

    International Cancer Genome Consortium Data Portal—a one-stop shop for cancer genomics data

    Database: The Journal of Biological Databases and Curation, 2011

  • Anthony Mathelier, Calvin Lefebvre, Allen Zhang, David Arenillas, Jiarui Ding, W. Wasserman, Sohrab Shah (2014)

    Cis-regulatory somatic mutations and gene-expression alteration in B-cell lymphomas

    Genome Biology, 16

  • J. Barretina, G. Caponigro, Nicolas Stransky, K. Venkatesan, Adam Margolin, Sungjoon Kim, Christopher Wilson, J. Lehár, G. Kryukov, D. Sonkin, Anupama Reddy, Manway Liu, Lauren Murray, M. Berger, J. Monahan, Paula Morais, Jodi Meltzer, A. Korejwa, Judit Jané-Valbuena, F. Mapa, Joseph Thibault, Eva Bric-Furlong, P. Raman, Aaron Shipway, I. Engels, Jill Cheng, Guoying Yu, Jianjun Yu, Peter Aspesi, M. Silva, Kalpana Jagtap, Michael Jones, Li Wang, C. Hatton, E. Palescandolo, Supriya Gupta, Scott Mahan, C. Sougnez, R. Onofrio, T. Liefeld, L. Macconaill, W. Winckler, Michael Reich, Nanxin Li, J. Mesirov, S. Gabriel, G. Getz, K. Ardlie, Vivien Chan, V. Myer, B. Weber, Jeff Porter, M. Warmuth, P. Finan, Jennifer Harris, M. Meyerson, T. Golub, Michael Morrissey, W. Sellers, R. Schlegel, L. Garraway (2012)

    The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity

    Nature, 483

  • Christian Perez-Llamas, N. López-Bigas (2011)

    Gitools: Analysis and Visualisation of Genomic Data Using Interactive Heat-Maps

    PLoS ONE, 6

  • K. Hoadley, C. Yau, T. Hinoue, D. Wolf, A. Lazar, E. Drill, R. Shen, Alison Taylor, A. Cherniack, V. Thorsson, R. Akbani, R. Bowlby, Christopher Wong, M. Wiznerowicz, F. Sánchez-Vega, A. Robertson, B. Schneider, M. Lawrence, H. Noushmehr, T. Malta, Joshua Stuart, C. Benz, P. Laird (2018)

    Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer

    Cell, 173

  • F. Sánchez-Vega, Marco Mina, J. Armenia, Walid Chatila, Augustin Luna, Konnor La, Sofia Dimitriadoy, David Liu, Havish Kantheti, S. Saghafinia, D. Chakravarty, Foysal Daian, Qingsong Gao, Matthew Bailey, Wen-Wei Liang, S. Foltz, I. Shmulevich, L. Ding, Zachary Heins, Angelica Ochoa, Benjamin Gross, Jianjiong Gao, Hongxin Zhang, Ritika Kundra, C. Kandoth, Istemi Bahceci, L. Dervishi, U. Dogrusoz, Wanding Zhou, Hui Shen, P. Laird, G. Way, C. Greene, Han Liang, Yonghong Xiao, Chen Wang, A. Iavarone, A. Berger, T. Bivona, A. Lazar, G. Hammer, T. Giordano, L. Kwong, G. McArthur, Chenfei Huang, A. Tward, M. Frederick, F. McCormick, M. Meyerson, E. Allen, A. Cherniack, G. Ciriello, C. Sander, N. Schultz, Samantha Caesar-Johnson, John Demchok, Ina Felau, M. Kasapi, M. Ferguson, C. Hutter, H. Sofia, R. Tarnuzzer, Zhining Wang, Liming Yang, J. Zenklusen, J. Zhang, Sudha Chudamani, Jia Liu, Laxmi Lolla, R. Naresh, T. Pihl, Qiang Sun, Yunhu Wan, Ye Wu, Juok Cho, T. DeFreitas, S. Frazer, N. Gehlenborg, G. Getz, David Heiman, Jaegil Kim, M. Lawrence, Pei Lin, S. Meier, M. Noble, G. Saksena, Douglas Voet, Hailei Zhang, Brady Bernard, N. Chambwe, Varsha Dhankani, T. Knijnenburg, R. Kramer, Kalle Leinonen, Yuexin Liu, Michael Miller, Sheila Reynolds, V. Thorsson, Wei Zhang, R. Akbani, B. Broom, A. Hegde, Z. Ju, R. Kanchi, Anil Korkut, Jun Li, Shiyun Ling, Wenbin Liu, Yiling Lu, G. Mills, Kwok-Shing Ng, A. Rao, Michael Ryan, Jing Wang, J. Weinstein, Jiexin Zhang, Adam Abeshouse, I. Bruijn, Benjamin Gross, Zachary Heins, Konnor La, M. Ladanyi, Moriah Nissan, Sarah Phillips, E. Reznik, R. Sheridan, S. Sumer, Yichao Sun, B. Taylor, Jioajiao Wang, Pavana Anur, Myron Peto, P. Spellman, C. Benz, Joshua Stuart, Christopher Wong, C. Yau, D. Hayes, J. Parker, M. Wilkerson, Adrian Ally, M. Balasundaram, R. Bowlby, Denise Brooks, R. Carlsen, E. Chuah, Noreen Dhalla, Robert Holt, Steven Jones, K. Kasaian, Darlene Lee, Yussanne Ma, M. Marra, Michael Mayo, Richard Moore, A. Mungall, K. Mungall, A. Robertson, S. Sadeghi, J. Schein, Payal Sipahimalani, Angela Tam, N. Thiessen, Kane Tse, Tina Wong, Ashton Berger, R. Beroukhim, C. Cibulskis, S. Gabriel, G. Gao, G. Ha, S. Schumacher, J. Shih, M. Kucherlapati, R. Kucherlapati, Stephen Baylin, L. Cope, Ludmila Danilova, M. Bootwalla, Phillip Lai, D. Maglinte, D. Berg, D. Weisenberger, J. Auman, S. Balu, T. Bodenheimer, C. Fan, K. Hoadley, A. Hoyle, S. Jefferys, Corbin Jones, S. Meng, P. Mieczkowski, Lisle Mose, Amy Perou, C. Perou, J. Roach, Yan Shi, J. Simons, Tara Skelly, Matthew Soloway, Donghui Tan, Umadevi Veluvolu, Huihui Fan, T. Hinoue, Michelle Bellair, K. Chang, K. Covington, C. Creighton, H. Dinh, H. Doddapaneni, L. Donehower, J. Drummond, R. Gibbs, R. Glenn, Walker Hale, Yi Han, Jianhong Hu, V. Korchina, Sandy Lee, L. Lewis, Wei Li, Xiuping Liu, M. Morgan, Donna Morton, D. Muzny, J. Santibanez, Margi Sheth, E. Shinbrot, Linghua Wang, Min Wang, D. Wheeler, Liu Xi, Fengmei Zhao, J. Hess, Elizabeth Appelbaum, Matthew Bailey, M. Cordes, C. Fronick, L. Fulton, R. Fulton, E. Mardis, M. McLellan, Christopher Miller, Heather Schmidt, R. Wilson, D. Crain, Erin Curley, J. Gardner, Kevin Lau, D. Mallery, S. Morris, J. Paulauskis, R. Penny, C. Shelton, T. Shelton, M. Sherman, E. Thompson, P. Yena, Jay Bowen, J. Gastier-Foster, M. Gerken, K. Leraas, T. Lichtenberg, N. Ramirez, L. Wise, E. Zmuda, N. Corcoran, T. Costello, C. Hovens, A. Carvalho, A. Carvalho, José Fregnani, A. Longatto-Filho, R. Reis, C. Scapulatempo-Neto, H. Silveira, D. Vidal, Andrew Burnette, J. Eschbacher, Beth Hermes, Ardene Noss, Rosy Singh, Matthew Anderson, Patricia Castro, M. Ittmann, D. Huntsman, B. Kohl, X. Le, Richard Thorp, C. Andry, Elizabeth Duffy, V. Lyadov, O. Paklina, G. Setdikova, A. Shabunin, M. Tavobilov, C. McPherson, R. Warnick, R. Berkowitz, Daniel Cramer, C. Feltmate, N. Horowitz, A. Kibel, M. Muto, C. Raut, A. Malykh, J. Barnholtz-Sloan, Wendi Barrett, K. Devine, J. Fulop, Q. Ostrom, K. Shimmel, Yingli Wolinsky, A. Sloan, A. Rose, F. Giuliante, M. Goodman, B. Karlan, C. Hagedorn, J. Eckman, Jodi Harr, J. Myers, Kelinda Tucker, L. Zach, B. Deyarmin, Hai Hu, L. Kvecher, C. Larson, R. Mural, S. Somiari, A. Vicha, T. Zelinka, Joseph Bennett, M. Iacocca, B. Rabeno, P. Swanson, M. Latour, L. Lacombe, B. Têtu, A. Bergeron, Mary McGraw, S. Staugaitis, J. Chabot, H. Hibshoosh, Antonia Sepulveda, Tao Su, Timothy Wang, O. Potapova, Olga Voronina, L. Desjardins, O. Mariani, S. Roman-Roman, X. Sastre, M. Stern, F. Cheng, S. Signoretti, A. Berchuck, D. Bigner, E. Lipp, J. Marks, S. McCall, R. McLendon, A. Secord, A. Sharp, M. Behera, D. Brat, Amy Chen, K. Delman, S. Force, F. Khuri, K. Magliocca, S. Maithel, J. Olson, T. Owonikoko, A. Pickens, S. Ramalingam, Dong-Myung Shin, G. Sica, Erwin Meir, Hongzhen Zhang, Wil Eijckenboom, A. Gillis, E. Korpershoek, L. Looijenga, W. Oosterhuis, H. Stoop, K. Kessel, E. Zwarthoff, C. Calatozzolo, L. Cuppini, S. Cuzzubbo, F. DiMeco, G. Finocchiaro, L. Mattei, A. Perin, B. Pollo, Chu Chen, J. Houck, Pawadee Lohavanichbutr, A. Hartmann, C. Stoehr, R. Stoehr, H. Taubert, S. Wach, B. Wullich, W. Kycler, D. Murawa, M. Wiznerowicz, K. Chung, W. Edenfield, Julie Martin, E. Baudin, G. Bubley, R. Bueno, A. Rienzo, W. Richards, S. Kalkanis, T. Mikkelsen, H. Noushmehr, L. Scarpace, N. Girard, M. Aymerich, E. Campo, E. Giné, A. Guillermo, N. Bang, Phan Hanh, Bui Phu, Yufang Tang, H. Colman, K. Evason, P. Dottino, J. Martignetti, H. Gabra, H. Juhl, Teniola Akeredolu, Serghei Stepa, D. Hoon, Keun-Young Ahn, K. Kang, F. Beuschlein, A. Breggia, M. Birrer, D. Bell, M. Borad, A. Bryce, Erik Castle, V. Chandan, J. Cheville, J. Copland, M. Farnell, T. Flotte, N. Giama, T. Ho, Michael Kendrick, J. Kocher, Karla Kopp, C. Moser, D. Nagorney, D. O'Brien, B. O'neill, T. Patel, G. Petersen, F. Que, M. Rivera, L. Roberts, R. Smallridge, T. Smyrk, M. Stanton, R. Thompson, M. Torbenson, J. Yang, Lizhi Zhang, F. Brimo, J. Ajani, Ana Gonzalez, C. Behrens, J. Bondaruk, R. Broaddus, B. Czerniak, B. Esmaeli, J. Fujimoto, J. Gershenwald, C. Guo, Christopher Logothetis, F. Meric-Bernstam, C. Morán, L. Ramondetta, D. Rice, A. Sood, P. Tamboli, T. Thompson, P. Troncoso, A. Tsao, I. Wistuba, Candace Carter, L. Haydu, P. Hersey, V. Jakrot (2018)

    Oncogenic Signaling Pathways in The Cancer Genome Atlas.

    Cell, 173 2

  • klaguia (2010)

    International Network of Cancer Genome Projects

  • D. Robinson, Yi-Mi Wu, R. Lonigro, Pankaj Vats, E. Cobain, Jessica Everett, Xuhong Cao, Erica Rabban, Chandan Kumar-Sinha, V. Raymond, S. Schuetze, A. Alva, J. Siddiqui, R. Chugh, F. Worden, M. Zalupski, J. Innis, R. Mody, S. Tomlins, David Lucas, L. Baker, N. Ramnath, A. Schott, Daniel Hayes, J. Vijai, K. Offit, E. Stoffel, J. Roberts, David Smith, L. Kunju, M. Talpaz, M. Cieslik, A. Chinnaiyan (2017)

    Integrative Clinical Genomics of Metastatic Cancer

    Nature, 548

  • John Vivian, A. Rao, Frank Nothaft, Christopher Ketchum, J. Armstrong, Adam Novak, Jacob Pfeil, Jake Narkizian, A. Deran, Audrey Musselman-Brown, Hannes Schmidt, P. Amstutz, Brian Craft, M. Goldman, K. Rosenbloom, M. Cline, Brian O’Connor, M. Hanna, Chet Birger, W. Kent, D. Patterson, A. Joseph, Jingchun Zhu, S. Zaranek, G. Getz, D. Haussler, B. Paten (2017)

    Toil enables reproducible, open source, big biomedical data analyses

    Nature Biotechnology, 35

  • N. Niknafs, Dewey Kim, Ryangguk Kim, M. Diekhans, Michael Ryan, P. Stenson, D. Cooper, R. Karchin (2013)

    MuPIT interactive: webserver for mapping variant positions to annotated, interactive 3D structures

    Human Genetics, 132

  • M. Cieslik, A. Chinnaiyan (2017)

    Cancer transcriptome profiling at the juncture of clinical translation

    Nature Reviews Genetics, 19

  • L. Chin, W. Hahn, G. Getz, M. Meyerson (2011)

    Making sense of cancer genomic data.

    Genes & development, 25 6

  • Yulia Newton, Adam Novak, Teresa Swatloski, D. McColl, Sahila Chopra, Kiley Graim, A. Weinstein, R. Baertsch, S. Salama, K. Ellrott, Manu Chopra, Theodore Goldstein, D. Haussler, O. Morozova, Joshua Stuart (2017)

    TumorMap: Exploring the Molecular Similarities of Cancer Samples in an Interactive Portal.

    Cancer research, 77 21

  • Michael Schroeder, A. Gonzalez-Perez, N. López-Bigas (2013)

    Visualizing multidimensional cancer genomics data

    Genome Medicine, 5

  • John Gómez, L. García, Gustavo Salazar, J. Villaveces, S. Gore, A. Castro, M. Martin, G. Launay, Rafael Alcántara, N. del-Toro, M. Dumousseau, S. Orchard, S. Velankar, H. Hermjakob, Chenggong Zong, P. Ping, Manuel Corpas, R. Jiménez (2013)

    BioJS: an open source JavaScript framework for biological data visualization

    Bioinformatics, 29 8

  • P. Helmbold, J. Haerting, H. Kölbl (2003)

    Gene-expression signatures in breast cancer.

    The New England journal of medicine, 348 17

  • Qingsong Gao, Wen-Wei Liang, S. Foltz, Gnanavel Mutharasu, R. Jayasinghe, Song Cao, Wen-Wei Liao, Sheila Reynolds, Matthew Wyczalkowski, Lijun Yao, Lihua Yu, Sam Sun, Ken Chen, A. Lazar, R. Fields, M. Wendl, B. Tine, R. Vij, Feng Chen, M. Nykter, I. Shmulevich, L. Ding (2018)

    Driver Fusions and Their Implications in the Development and Treatment of Human Cancers

    Cell reports, 23

  • Ben Langmead, Abhinav Nellore (2018)

    Cloud computing for genomic data analysis and collaboration

    Nature Reviews Genetics, 19

  • L. Chin, L. Chin, Jannik Andersen, P. Futreal (2011)

    Cancer genomics: from discovery science to personalized medicine

    Nature Medicine, 17

  • E. Mardis (2008)

    The impact of next-generation sequencing technology on genetics.

    Trends in genetics : TIG, 24 3

  • O. Morozova, S. Salama, Isabel Bjork, Theodore Goldstein, S. Mueller, L. Sender, A. Sweet-Cordero, D. Haussler, California Team (2017)

    Comparative genomic analysis for pediatric cancer patients evaluated in a California Initiative to Advance Precision Medicine Demonstration Project.

    Journal of Clinical Oncology, 35

  • François Pars, François Pars, François Aguet, Andrew Brown, Stephane Castel, Joe Davis, Yuan He, Brian Jo, P. Mohammadi, YoSon Park, P. Parsana, Ayellet Segrè, B. Strober, Zachary Zappala, Beryl Nguyen, Beryl Cummings, Ellen Gelfand, Kane Hadley, Katherine Huang, Monkol Lek, Xiao Li, Jared Nedzel, Duyen Nguyen, Michael Noble, Timothy Sullivan, Taru Tukiainen, Daniel MacArthur, Gad Getz, Anjene S, Anjene Addington, P. Guan, S. Koester, A. Little, N. Lockhart, Helen Moore, Abhi Rao, Jeffery Struewing, Simona Volpi, Lori Leinweber, Lori Brigham, Richard Hasz, Marcus Hunter, Christopher Johns, Mark Johnson, G. Kopen, W. Leinweber, J. Lonsdale, Alisa McDonald, Bernadette Mestichelli, K. Myer, Bryan Roe, Mike Salvatore, Saboor Shad, Jeffrey Thomas, Gary Walters, Michael Washington, Joseph Wheeler, J. Bridge, B. Foster, Bryan Gillard, E. Karasik, Rachna Kumar, Mark Miklos, M. Moser, Scott Jewell, Robert Montroy, D. Rohrer, Dana Valley, Deborah Mash, David Davis, Leslie Branton, Leslie Sobin, Mary Barcus, Philip Branton, Nathan Garrido-Mart, Nathan Abell, B. Balliu, O. Delaneau, Laure Frésard, Eric Gamazon, D. Garrido-Martín, Ariel Gewirtz, Genna Gliner, Michael Gloudemans, Buhm Han, Amy He, Farhad Hormozdiari, Xin Li, Boxiang Liu, Eun Kang, Ian McDowell, H. Ongen, John Palowitch, Christine Peterson, G. Quon, S. Ripke, A. Saha, Andrey Shabalin, Tyler Shimko, J. Sul, Nicole Teran, Emily Tsang, Hailei Zhang, Yi-Hui Zhou, C. Bustamante, Nancy Cox, Roderic Guigó, Manolis Kellis, M. McCarthy, Donald Conrad, Eleazar Eskin, Gen Li, A. Nobel, C. Sabatti, Barbara Stranger, X. Wen, Fred Wright, Kristin Ardlie, E. Dermitzakis, T. Lappalainen, François Handsake, François Aguet, Kristin Ardlie, Beryl Cummings, Ellen Gelfand, Gad Getz, Kane Hadley, R. Handsaker, Katherine Huang, Seva Kashin, K. Karczewski, Monkol Lek, Xiao Li, Daniel MacArthur, Jared Nedzel, Duyen Nguyen, Michael Noble, Ayellet Segrè, Casandra Trowbridge, Taru Tukiainen, Nathan Brown, Ruth Barshir, Omer Basha, A. Battle, G. Bogu, Andrew Brown, Christopher Brown, Lin Chen, Colby Chiang, Donald Conrad, Nancy Cox, Farhan Damani, B. Engelhardt, Eleazar Eskin, Pedro Ferreira, Laure Frésard, Eric Gamazon, D. Garrido-Martín, Ariel Gewirtz, Genna Gliner, Michael Gloudemans, Roderic Guigo, I. Hall, Buhm Han, Farhad Hormozdiari, C. Howald, Hae Im, Brian Jo, Eun Kang, Yungil Kim, Sarah Kim-Hellmuth, Boxiang Liu, S. Mangul, Ian McDowell, Jean Monlong, Stephen Montgomery, Manuel Muñoz-Aguirre, Anne Ndungu, D. Nicolae, Meritxell Oliva, N. Panousis, Panagiotis Papasaikas, YoSon Park, Anthony Payne, J. Quan, F. Reverter, M. Sammeth, Alexandra Scott, Andrey Shabalin, Reza Sodaei, M. Stephens, Barbara Stranger, S. Urbut, M. Bunt, Gao Wang, Fred Wright, H. Xi, Esti Yeger-Lotem, Judith Zaugg, Yi-Hui Zhou, Joshua Diegel, J. Akey, Daniel Bates, Joanne Chan, M. Claussnitzer, Kathryn Demanelis, Morgan Diegel, J. Doherty, A. Feinberg, Maria Fernando, J. Halow, Kasper Hansen, E. Haugen, P. Hickey, Lei Hou, F. Jasmine, Ruiqi Jian, Lihua Jiang, Audra Johnson, R. Kaul, Manolis Kellis, M. Kibriya, Kristen Lee, Jin Li, Qin Li, Xiao Li, Jessica Lin, Shin Lin, Sandra Linder, C. Linke, Yaping Liu, Matthew Maurano, B. Molinie, Stephen Montgomery, Jemma Nelson, Fidencio Neri, Yongjin Park, B. Pierce, Nicola Rinaldi, L. Rizzardi, R. Sandstrom, Andrew Skol, Kevin Smith, Michael Snyder, J. Stamatoyannopoulos, Hua Tang, L. Wang, M. Wang, Nicholas Wittenberghe, Fan Wu, Rui Zhang, C. Nierras, Philip Vaught, Philip Branton, Latarsha Carithers, P. Guan, Helen Moore, Abhi Rao, J. Vaught, Sarah Volpi, Sarah Gould, Nicole Lockart, Casey Martin, Jeffery Struewing, Simona Volpi, Anjene Koester, Anjene Addington, S. Koester, A. Little, Lori Leinweber, Lori Brigham, Mark Johnson, Brian Roe, Gary Walters, Michael Washington, Jason Moser, Scott Valley, Scott Jewell, Robert Montroy, D. Rohrer, Dana Valley, David Mash, David Davis, Deborah Mash, Anita Robinson, Anita Undale, Anna Smith, D. Tabor, Nancy Roche, J. McLean, Negin Vatanian, Karna Robinson, Leslie Sobin, Mary Barcus, Kimberly Valentino, L. Qi, Steven Hunter, P. Hariharan, Shilpi Singh, K. Um, Takunda Matose, M. Tomaszewski, Laura Traino, Laura Barker, M. Mosavel, L. Siminoff, H. Traino, Paul Trevanio, Paul Flicek, Thomas Juettemann, Magali Ruffier, Daniel Sheppard, K. Taylor, S. Trevanion, D. Zerbino, Brian Rosenbloom, Brian Craft, M. Goldman, M. Haeussler, W. Kent, Christopher Lee, B. Paten, K. Rosenbloom, John Vivian, Jingchun Zhu (2017)

    Genetic effects on gene expression across human tissues

    Nature, 550

  • Yaoyu Wang, Lev Kutnetsov, Antony Partensky, Jalil Farid, John Quackenbush (2017)

    WebMeV: a Cloud Platform for Analyzing and Visualizing Cancer Genomic Data

    bioRxiv

  • Mark Leiserson, Connor Gramazio, Jason Hu, Hsin-Ta Wu, D. Laidlaw, Benjamin Raphael (2015)

    MAGI: visualization and collaborative annotation of genomic aberrations

    Nature Methods, 12

  • T. Malta, Artem Sokolov, A. Gentles, T. Burzykowski, L. Poisson, J. Weinstein, B. Kamińska, J. Huelsken, L. Omberg, O. Gevaert, A. Colaprico, Patrycja Czerwińska, Sylwia Mazurek, L. Mishra, H. Heyn, A. Krasnitz, A. Godwin, A. Lazar, Joshua Stuart, K. Hoadley, P. Laird, H. Noushmehr, M. Wiznerowicz (2018)

    Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation.

    Cell, 173 2

  • E. Cerami, Jianjiong Gao, U. Dogrusoz, Benjamin Gross, S. Sumer, B. Aksoy, A. Jacobsen, Caitlin Byrne, M. Heuer, E. Larsson, Yevgeniy Antipin, B. Reva, A. Goldberg, C. Sander, N. Schultz (2012)

    The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data.

    Cancer discovery, 2 5

  • Wanding Zhou, Huy Dinh, Z. Ramjan, D. Weisenberger, C. Nicolet, Hui Shen, P. Laird, B. Berman (2018)

    DNA methylation loss in late-replicating domains is linked to mitotic cell division

    Nature genetics, 50

  • M. Jensen, Vincent Ferretti, R. Grossman, L. Staudt (2017)

    The NCI Genomic Data Commons as an engine for precision medicine.

    Blood, 130 4

  • L. Ding, Matthew Bailey, E. Porta-Pardo, V. Thorsson, A. Colaprico, D. Bertrand, David Gibbs, A. Weerasinghe, Kuan-lin Huang, Collin Tokheim, I. Cortés-Ciriano, R. Jayasinghe, Feng Chen, Lihua Yu, Sam Sun, Catharina Olsen, Jaegil Kim, Alison Taylor, A. Cherniack, R. Akbani, Chayaporn Suphavilai, N. Nagarajan, Joshua Stuart, G. Mills, Matthew Wyczalkowski, B. Vincent, C. Hutter, J. Zenklusen, K. Hoadley, M. Wendl, L. Shmulevich, A. Lazar, D. Wheeler, G. Getz (2018)

    Perspective on Oncogenic Processes at the End of the Beginning of Cancer Genomics.

    Cell, 173 2

  • Heidi Ledford (2010)

    Big science: The cancer genome challenge

    Nature, 464

  • P. Campbell, G. Getz, Joshua Stuart, J. Korbel, Lincoln Stein, Icgc (2020)

    Pan-cancer analysis of whole genomes

    Nature, 578

  • H. Thorvaldsdóttir, James Robinson, J. Mesirov (2012)

    Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration

    Briefings in Bioinformatics, 14

  • D. Hanahan, R. Weinberg (2011)

    Hallmarks of Cancer: The Next Generation

    Cell, 144

Publisher
bioRxiv
Copyright
© 2019, Posted by Cold Spring Harbor Laboratory. This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at http://creativecommons.org/licenses/by-nc-nd/4.0/
DOI
10.1101/326470
Publisher site
See Article on Publisher Site

Abstract

bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. The UCSC Xena platform for public and private cancer genomics data visualization and interpretation 1+ 1+ 2 3 2 4 Mary Goldman *, Brian Craft , Mim Hastie , Kristupas Repečka , Fran McDade , Akhil Kamath , 5 6 2 7 1 Ayan Banerjee , Yunhai Luo , Dave Rogers , Angela N. Brooks , Jingchun Zhu *, and David Haussler UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USA Clever Canary, New York, NY, USA Vilnius University, Vilnius, Lithuania Birla Institute of Technology and Science, Goa, India National Institute of Technology, Durgapur, India Department of Genetics, Stanford University, Stanford, CA USA Department of Biomolecular Engineering, University of California, Santa Cruz, CA, USA + These authors contributed equally to this work * Corresponding author 1 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Introduction There is a great need for easy-to-use cancer genomics visualization tools for both large public data resources such as TCGA (The Cancer Genome Atlas) (Chin 2011) and the GDC (Genomic Data Commons) (Grossman 2016), as well as smaller-scale datasets generated by individual labs. Commonly used interactive visualization tools are either web-based portals or desktop applications. Data portals have a dedicated backend and are a powerful means of viewing centrally hosted resource datasets (e.g. Xena’s predecessor, the UCSC Cancer Browser (currently retired, Zhu 2009), cBioPortal (Cerami 2012), ICGC (International Cancer Genomics Consortium) Data Portal (Zhang 2019), GDC Data Portal (Grossman 2016)). However, researchers wishing to use a data portal to explore their own data have to either a redeploy the entire platform, a difficult task even for bioinformaticians, or upload private data to a server outside the user's control, a non-starter for protected patient data such as germline variants (e.g. MAGI (Mutation Annotation and Genome Interpretation, Leiserson 2015), WebMeV (Wang 2017), Ordino (Streit 2018)). Desktop tools can view a user’s own data securely (e.g. IGV (Integrated Genomics Viewer, Thorvaldsdóttir 2013), Gitools (Perez-Llamas 2011)), but lack well-maintained, prebuilt files for the ever-evolving and expanding public data resources. This dichotomy between data portals and desktop tools highlights the challenge of using a single platform for both large public data and smaller-scale datasets generated by individual labs. Complicating this dichotomy is the expanding amount, and complexity of, cancer genomics data resulting from numerous technological advances, including lower-cost high-throughput sequencing and single-cell based technologies. Cancer genomics datasets are now being generated using new assays such as whole-genome sequencing (Campbell 2017), DNA methylation whole- 2 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. genome bisulfite sequencing (Zhou 2018), and ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing, Corces 2018). Visualizing and exploring these diverse data modalities is important but challenging, especially as many tools have traditionally specialized in only one or perhaps a few data types. And while these complex datasets generate insights individually, integration with other –omics datasets is crucial to help researchers discover and validate findings. UCSC Xena was developed as a high-performance visualization and analysis tool for both large public repositories and private datasets. It was built to scale with the current, and future, data growth and complexity. Xena’s unique privacy-aware architecture enables cancer researchers of all computational backgrounds to explore large diverse datasets. Researchers use the same system to securely explore their own data, together or separately from the public data, all while keeping private data secure. The system easily supports many tens of thousands of samples and has been tested up to as many as a million cells. The simple and flexible architecture supports a variety of common and uncommon data types. Xena's unique Visual Spreadsheet visualization integrates gene-centric and genomic-coordinate-centric views across multiple data modalities, providing a deep, comprehensive view of genomic events within a cohort of tumors. Xena’s privacy-aware architecture UCSC Xena (http://xena.ucsc.edu) has two components: the frontend Xena Browser and the backend Xena Hubs (Figure 1). The web-based Xena Browser empowers biologists to explore data across multiple Xena Hubs with a variety of visualizations and analyses. The backend Xena Hubs host genomics data from laptops, public servers, behind a firewall, or in the cloud, and are configured to be public or private (Supplemental Figure 1). The Xena Browser receives data simultaneously from multiple Xena Hubs and integrates them into a single coherent visualization within the browser. 3 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. There are two types of private Xena hubs (Supplemental Figure 2). The first type is a hub installed on a user’s own computer. It is configured to only respond to requests from the computer’s localhost network interface (i.e. http://127.0.0.1). This ensures that the hub only communicates with the computer on which the hub is installed. The second type of private hub is one configured to respond to requests from external computers, however access to the computer is controlled via a firewall or similar technology. The hub uses the security provided by the computer to secure the data. This model takes advantage of security that is typically already in place to protect such data, thereby reducing user workload by not requiring re-authorization. Users who host this type of private data will share the URL with authorized individuals. Other users who may inadvertently acquire knowledge of the protected hub will not be able to connect due to the firewall. This second type of Xena Hub is useful to share private data within a lab or institution. In addition to the private hubs, there are public Xena hubs, that is hubs that are configured to respond to requests from external computers and are not blocked by a firewall (Supplemental Figure 2). Public hubs enable data sharing by hosting large public resources. While we host a number of public hubs (Supplemental Table 1), users can also set up their own. An example of one is the Treehouse Hub set up by the Childhood Cancer Initiative to share pediatric cancer RNA-seq gene expression data (Supplemental Note). Public and private Xena Hubs use the same software; the only difference is in their configuration. Hubs default to only respond to requests from the computer’s localhost network, locking down data accessibility to the host computer. Hubs only respond to external network requests if a user configures the hub to do so. Xena Hubs are designed to be turn-key, allowing users who may not be computationally savvy to easily install and use a Xena Hub on their personal computer (https://xena.ucsc.edu/private-hubs/). An interactive setup wizard guides users through the process of 4 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. installing and running a Xena Hub on their Windows or Mac computers, while a web wizard guides them through the data loading process (https://tinyurl.com/localXenaHub). In addition to these user- friendly wizards, Xena Hubs can be installed and used via the command line on Windows, Mac and Linux machines. The Xena Browser automatically connects to a default list of the public hubs we host (Supplemental Table 1), and, if it exists, the private local hub on users’ computer. Users can add a new public or private hub by entering the hub URL address on the Xena Browser on the Data Hubs page. Data integration occurs only within the Xena Browser, keeping private data secure. Genomic data flows from a Xena Hub to the Xena Browser (Figure 1), and never communicates the data it displays back to any server. The only exception to this model occurs when saving bookmarks as URLs, a feature that allows users to save live views of their current visualization. If a visualization contains only data from public Xena Hubs, users can generate a URL for their current view, which will take researchers back to the live browser session. Since the data is already public, we store the data in view for each URL on our web server, allowing it to be shared with colleagues or included in presentations. If a view contains any data from a non-public Xena Hub, users are instead required to download the current visualization as a file. By giving users a file instead of a URL, we ensure that we never keep user’s private data on our public servers. This file can then be shared via email, etc and then imported back into the Xena Browser to recreate the live browser session. Thus, even when bookmarking, protected data is kept secure through Xena’s architecture and the use of private hubs. 5 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. In addition to its security advantages, Xena’s unique architecture of a decoupled Xena Browser and multiple Xena Hubs enables several other features. First, researchers can easily view their own private data by installing their own Xena Hub. Xena Hubs are lightweight compared to a full-fledged application and install easily on most computers. Second, users can use the same platform to view both public and private data together. Xena integrates data across multiple hubs, allowing users to view data from separate hubs as a coherent data resource (Figure 2). Xena does this while keeping private data secure and avoiding the need to download large public resources. This is especially useful for researchers who wish to view their own analysis results on public data, such as their own clustering calls, but don’t want to host a separate version of these resources. Third, the Xena platform scales easily. As more datasets are generated, more Xena Hubs are added to the network, effectively growing with expanding genomics resources. These advantages, in addition to the security advantages, are a major departure from and innovation over the UCSC Cancer Browser. Xena Browser visualizations and functionalities The Xena Browser (https://xenabrowser.net) has a wide variety of visualizations and analyses including survival analyses, scatter plots, bar graphs, statistical tests, genomic signatures, as well as our unique Visual Spreadsheet view. The Xena Visual Spreadsheet was designed to enable and enhance integration across diverse data modalities, providing researchers a more biologically complete understanding of genomic events and tumor biology. Analogous to an office spreadsheet application, it is a visual representation of a data grid where each column is a slice of genomic or phenotypic data (e.g. gene expression, mutation calls, methylation probes, subtype classifications, or age), and each row is a single entity (e.g. a bulk tumor sample, cell line, or single cell) (Figure 2). Xena's Visual Spreadsheet displays genomic data in wide variety of gene-centric, coordinate-centric, 6 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. and feature-centric views (Supplemental Figure 3) for both coding and non-coding regions (Supplemental Figure 4). Dynamic web links to the UCSC Genome Browser give genomic context to any gene, chromosome region, or feature. Researchers can easily re-order the Visual Spreadsheet, hierarchically cluster genes, zoom in to just a few samples or out to the whole cohort, all leading to an infinite variety of views in real time. These dynamic views enable the discovery of patterns among genomic and phenotype parameters, even when the data are hosted across multiple data hubs (Figure 2). The power of the Visual Spreadsheet is its deep data integration. Integration across different data modalities, such as gene expression, copy number variation, and DNA methylation, gives users a more comprehensive view of a genomic event in a tumor sample. For example, Xena's Visual Spreadsheet can help elucidate if higher expression for a gene is driven by copy number amplification, or by a missense mutation (Supplemental Figure 5), or by demethylation and opening of the promoter region as reflected in the DNA methylation and ATAC-seq data (Supplemental Figure 3). Integration across gene- and coordinate-centric views helps users examine genomic events in different chromosome contexts. For example, Xena's Visual Spreadsheet can help elucidate if a gene amplification is part of a chromosomal arm duplication or a focal amplification (Supplemental Figure 6). Integration across genomic and clinical data gives users the ability to make connections between genomic patterns and clinically relevant phenotypes such as subtypes. For example, Xena's Visual Spreadsheet can help elucidate if increased HRD (Homologous Recombination Deficiency) signature scores are enriched in a specific cancer type or subtypes (Supplemental Figure 7). Finally, integration across user’s own data and public resources on the same samples helps users to gain insights into their own data. For example, Xena’s Visual Spreadsheet can help a researcher see how a fusion call from the literature relates to the expression of other downstream genes (Figure 2). By not 7 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. differentiating between public data and private data from rendering perspective, it appears to the user that all data come from a coherent source. These diverse integrations help researchers harness the power of comprehensive genomics studies, either their own or of public resources, driving discovery and a deeper understanding of cancer biology. In addition to the Visual Spreadsheet, Xena has many additional powerful views, analyses, and functionalities. Our powerful text-based search allows users to dynamically highlight, filter, and group samples (Supplemental Figure 8). Researchers use this to search the data on the screen similar to the ‘find’ functionality in Microsoft Word. Samples are matched and highlighted in real-time as the user types. Researchers can then filter to their samples of interest, or dynamically build subgroups. This is a powerful way to dynamically construct sub-populations based on any genomic data for comparison and analysis. Xena also has highly configurable Kaplan-Meier analyses, bar charts, box plots, and scatter plots, all with statistical tests automatically computed (Supplemental Figure 7, Supplemental Figure 9). We support data sharing through bookmarked views and high resolution PDFs. Genomic signatures are easily built over gene expression data or any other genomic data type. Performance is critical for interactive visualization tools, especially on the web. Growing sample sizes for genomic experiments has become a challenge for many tools, including for the UCSC Cancer Browser. Knowing this, we optimized Xena to support visualizations on many tens of thousands of samples, delivering slices of data in milliseconds to a few seconds. During the first 8 months in 2019, we averaged 1,400 users/week with an average concurrent current usage of 3.34 users. To ensure we will continue to be performant as we scale, we tested our public hubs deployed in the cloud with 50 concurrent requests and had an average response rate of 244 ms. 8 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Supported data types and public Xena Hubs Today, cancer genomics research studies commonly collect data on somatic mutations, copy number, and gene expression, with other data types being relatively rare. However, as genomics technology advances, we expect these rarer data types to increase in frequency and new data types to be produced. With this in mind we designed Xena to be able to load any tabular or matrix formatted data, giving us exceptional flexibility in the types of data we can visualize, such as ATACseq peak signals (Supplemental Figure 1) and structural variant data (Supplemental Figure 10), a significant advantage over the UCSC Cancer Browser. Current supported data modalities include somatic and germline SNPs (Single Nucleotide Polymorphisms), INDELs, large structural variants, copy number variation, gene-, transcript-, exon-, protein-, miRNA-expression, DNA methylation, ATAC-seq peak signals, phenotypes, clinical data, and sample annotations. UCSC Xena provides interactive online visualization of seminal cancer genomics datasets through multiple public Xena Hubs. We host over 1600 datasets from more than 50 cancer types, including the latest from TCGA, ICGC, TCGA Pan-Cancer Atlas (Hoadley 2018), and the GDC (Supplemental Table 1). Xena Hubs offer a significant and important performance advantage over these resources’ native APIs, especially when visualizing more than just a few samples. We use custom ETL (Extract-Transform-Load) processes to keep the Xena Hubs updated with the latest data from their respective sources (Supplemental Figure 1). We only download and process the derived datasets from each source, such as gene expression values, leaving the raw sequencing data at their respective locations. Xena complements each of these resources by providing powerful interactive visualizations for these data. In addition to these well-known resources, we also host results from the UCSC Toil RNAseq recompute compendium, a uniformly re-aligned and re-called gene and transcript expression dataset 9 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. for all TCGA, TARGET and GTEx samples (Vivian 2017). This dataset allows users to compare gene and transcript expression of TCGA 'tumor' samples to corresponding GTEx 'normal' samples. The UCSC Public hub hosts data curated from various publications. Conclusion UCSC Xena complements existing tools including the cBioPortal, ICGC Portal, GDC Portal, IGV, and St. Jude Cloud (Ma 2018) in a number of ways. First, our focus is on providing researchers a lightweight, easy-to-install platform to visualize their own data as well as data from the public sphere. By visualizing data across multiple hubs simultaneously, Xena differentiates itself from other tools by enabling researchers to view their own data together with consortium data while still maintaining privacy. Further, Xena focuses on integrative visualization of multi-omics datasets across different genomic contexts, including genes, genomic elements, or any genomic region, for both coding and non-coding parts of the genome. Finally, Xena is built for performance. It can easily visualize of tens of thousands of samples in a few seconds and has been tested on single-cell data with up to a million cells. With single-cell technology, datasets will become orders of magnitude larger than traditional bulk tumor samples - Xena is well positioned to rise to this challenge. While it is widely recognized that data sharing is key to advancing cancer research, how it is shared can impact the ease of data access. UCSC Xena is designed for cancer researchers both with and without computational expertise to easily share and access data. Users without a strong computational background can explore their own data by installing a Xena Hub on their personal computer using our installation and data upload wizards. Bioinformaticians can install a private or public Xena Hub on a server, in the cloud, or as part of an analysis pipeline, making generated data available in a user-friendly manner that requires little extra effort. Security for private hubs shared with 10 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. a limited set of researchers is currently provided by protecting the computer itself such as using a firewall, in the future, we plan to develop hub-wide user authorization capability. This would be useful for collaborative projects who share private data with users across multiple institutions. It will also allow integration with existing federated authentication and authorization services. Data sharing has, and will continue to, advance cancer biology and Xena is part of the technological ecosystem that supports this. UCSC Xena is a scalable solution to the rapidly expanding and decentralized cancer genomics data. Xena's architecture, with its web-browser-based visualization and separate data hubs, allows new projects to easily add their data to the growing public compendium. We support many different data modalities, both now and in future, by maintaining flexible input formats. Xena excels at showing trends across cohorts of samples, cells, or cell lines. While we have focused on cancer genomics, the platform is general enough to host any functional genomics data. In this age of expanding data resources, Xena's design supports the ongoing data sharing, integration, and visualization needs of the cancer research community. Acknowledgements Research reported in this publication was supported by National Cancer Institute of the National Institutes of Health under award numbers 5U24CA180951-04 and 5U24CA210974-02. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This project has also been made possible in part by grant number 2018- 182812 from the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation. We would also like to thank AWS Cloud Credits for Research and Google Summer of Code. 11 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. References Campbell, P. J., Getz, G., Stuart, J. M., Korbel, J. O., Stein, L. D., et al. Pan-cancer analysis of whole genomes. Preprint at https://www.biorxiv.org/content/early/2017/07/12/162784 (2017). Cerami, E., Gao, J., Dogrusoz, U., Gross, B. E, Sumer, S. O., et al. The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data. Cancer Discovery 5, 401-404 (2012). Chin, L., Hahn, W.C., Getz, G. & Meyerson, M. Making sense of cancer genomic data. Genes & Development 25, 534-555 (2011). Corces, M. R., Granja, J. M., Shams, S., Louie, B. H., Seoane, J. A., Zhou, W., e al. The chromatin accessibility landscape of primary human cancers. Science 362, 6413 (2018). Gao, Q., Liang, W.W., Foltz, S.M. et al. Driver Fusions and Their Implications in the Development and Treatment of Human Cancers. Cell Rep. 23, 227-238.e3 (2018). Grossman, R.L., Heath, A. P., Ferretti, V., Varmus, H. E., Lowy, D. R., et al. Toward a Shared Vision for Cancer Genomic Data. New England Journal of Medicine 375, 1109-1112 (2016). GTEx Consortium. Genetic effects on gene expression across human tissues. Nature 550, 204– 213 (2017). Hoadley, K. A., Yau, C., Hinoue, T., Wolf, D. M., Lazar, A. J., et al. Cell-of-Origin Patterns Dominate the Molecular Classification of 10,000 Tumors from 33 Types of Cancer. Cell 173, 291–304 (2018). Leiserson, M.D.M., Gramazio, C.C., Hu, J., Wu, H.T., Laidlaw, D.H. & Raphael, B.J. MAGI: visualization and collaborative annotation of genomic aberrations. Nature Methods 12, 483–484 (2015). Ma, X., Liu, Y., Liu, Y., Alexandrov, L. B., Edmonson, M.N., et al. Pan-cancer genome and transcriptome analyses of 1,699 paediatric leukaemias and solid tumours. Nature 555, 371-376 (2018). Perez-Llamas, C. & Lopez-Bigas, N. Gitools: analysis and visualisation of genomic data using interactive heat-maps. PLoS One 6, e19541 (2011). Streit, M., Gratzl, S., Stitz, H., Wernitznig, A., Zichner, T., & Haslinger, C. Ordino: a visual cancer analysis tool for ranking and exploring genes, cell lines, and tissue samples. Bioinformatics (2019). Thorvaldsdóttir, H., Robinson, J. T., & Mesirov, J. P. Integrative Genomics Viewer (IGV): high- performance genomics data visualization and exploration. Briefings in Bioinformatics 14, 178-192 (2013). 12 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Vivian, J., Rao, A. A., Nothaft, F. A., Ketchum, C., Armstrong, J., et al. Toil enables reproducible, open source, big biomedical data analyses. Nature Biotechnology 35, 314-316 (2017). Wang, Y.E., Kutnetsov, L., Partensky, A., Farid, J., & Quackenbush, J. WebMeV: A Cloud Platform for Analyzing and Visualizing Cancer Genomic Data. Cancer Research 77, e11-e14 (2017). Zhang J., Bajari R., Andric D., Gerthoffert F., Lepsa A., Nahal-Bose H., Stein L. D., & Ferretti V. The International Cancer Genome Consortium Data Portal. Nature Biotechnology 37, 367-369 (2019). Zhou, W., Dinh, H.Q., Ramjan, Z., Weisenberger, D.J., Nicolet, et al. DNA methylation loss in late- replicating domains is linked to mitotic cell division. Nature Genetics 50, 591-602 (2018). Zhu, J., Sanborn, J.Z., Benz, S., Szeto, C., Hsu, F., Kuhn, R.M., Karolchik, D., Archie, J., Lenburg, M.E., Esserman, L.J., et al. The UCSC Cancer Genomics Browser. Nature Methods 6, 239–240 (2009). 13 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Figure 1 Figure 1. Xena’s architecture to securely join public and private data. Data always flows from the Xena Hubs to the Xena Browser for visualization and integration. a) User’s web browser (e.g. Chrome) requests the Xena Browser code and runs it. b) Using the Xena Browser, the user requests a visualization, initiating a request for data from the Xena Browser’s list of public hubs. Simultaneous with this request, the Xena Browser requests data from the private local hub on the user’s computer. c) The Xena Browser code combines data from all Xena Hubs together into one coherent visualization. The user can then interact with the visualization to trigger a new data request. 14 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. Figure 2 b. c. A B C CNV D gene exp E exon expression F mutation Sample Fusion chr21:37543774-... ERG ERG SPOP TCGA ERG- 37M 1Mb 44M 100kb 5’ 100kb 3’ 5’ 3’ TMPRSS2 Xena Hub ER G T MP R SS2 Yes a. local Xena Hub 50 samples web browser No Xena Browser User’s private data public hub private hub -0.5 0.5 7.3 12 low high Deleterious log2(tumor/normal) log2(count+1) log2(RPKM+1) Missense Figure 2. An example Xena Browser Visual Spreadsheet examining published ERG (ETS Transcription Factor) - TMPRSS2 (Transmembrane Serine Protease 2) fusion calls in TCGA PRAD (prostate cancer) by combining data from local and public Xena Hubs together. a) A user downloaded ERG-TMPRSS2 fusion calls on TCGA PRAD samples from Gao et al. 2018 (n=492) and loaded the data into their own local Xena Hub. b) TCGA copy number, gene expression and mutation data from the same samples are available via the public TCGA hub. c) The user then compared the fusion calls to the public data using Xena Browser Visual Spreadsheet. Column B is the fusion call from Gao et al. Column C is copy number variation data, zoomed in to a region of chromosome 21 (37-44Mb). Amplifications are in red and deletions are in blue. The diagram at the top shows genes along the chromosome, where red genes are on the positive strand and blue are on the negative strand. Columns D is ERG gene expression and Column E is ERG exon expression. Expression is colored red to green for high to low expression. The gene diagram at the top shows exons as boxes, with tall 15 bioRxiv preprint first posted online May. 18, 2018; doi: http://dx.doi.org/10.1101/326470. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license. coding regions and shorter untranslated regions. Column F is SPOP (Speckle Type BTB/POZ Protein) mutation status and also has a gene diagram at the top. The position of each mutation is marked in relation to the gene diagram and colored by its functional impact: deleterious mutations are red and missenses are blue. We can see that the fusion calls are highly consistent with the characteristic overexpression of ERG (columns D, E). However, only a subset of those samples in which a fusion was called can be seen to also have the fusion event observed in the copy number data via an intra-chromosomal deletion of chromosome 21 that fuses TMPRSS2 to ERG as shown in column C. This observation is consistent with the 63.3% validation rate described in Gao et. al. 2018. SPOP mutations (blue tick marks in column F) are mutually exclusive with the fusion event. Rows are sorted by the left-most data column (column B) and subsorted on columns thereafter.

Journal

bioRxivbioRxiv

Published: Sep 26, 2019

There are no references for this article.