Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Thera-SAbDab: the Therapeutic Structural Antibody Database

Thera-SAbDab: the Therapeutic Structural Antibody Database Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 Published online 26 September 2019 Nucleic Acids Research, 2020, Vol. 48, Database issue D383–D388 doi: 10.1093/nar/gkz827 Thera-SAbDab: the Therapeutic Structural Antibody Database 1 1 2 3 4 Matthew I.J. Raybould , Claire Marks , Alan P. Lewis ,JiyeShi , Alexander Bujotzek , 5 1,* Bruck Taddese and Charlotte M. Deane Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 24-29 St Giles’, Oxford OX1 3LB, UK, Data and Computational Sciences, GlaxoSmithKline Research and Development, Gunnels Wood Road, 3 4 Stevenage SG1 2NY, UK, Chemistry Department, UCB Pharma, 216 Bath Road, Slough SL1 3WE, UK, Roche Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, DE-82377 Penzberg, Germany and Discovery Sciences Department, AstraZeneca, Granta Park, Cambridge CB21 6GH, UK Received August 08, 2019; Revised September 09, 2019; Editorial Decision September 13, 2019; Accepted September 24, 2019 ABSTRACT Whole monoclonal antibody (mAb) therapies dominate the industry - drugs that mimic natural antibodies by con- The Therapeutic Structural Antibody Database taining two identical variable domain structures with a (Thera-SAbDab; http://opig.stats.ox.ac.uk/webapps/ particular specificity ( 3). The broader class of monoclonal therasabdab) tracks all antibody- and nanobody- therapies also includes Fragment antigen binding (Fab) re- related therapeutics recognized by the World Health gions (a single arm of a whole antibody), single-chain Fv Organisation (WHO), and identifies any correspond- (scFv) regions (a heavy and light chain variable domain connected by an engineered glycine-rich linker), and single- ing structures in the Structural Antibody Database domain variable fragments. These fragments can be ex- (SAbDab) with near-exact or exact variable domain pressed in dimeric form to improve avidity, or conjugated sequence matches. Thera-SAbDab is synchronized with polyethylene glycol (‘pegylated’) for slower clearance with SAbDab to update weekly, reflecting new Pro- (4), with radioisotopes for diagnostic purposes (5), or with tein Data Bank entries and the availability of new se- radioisotopes or noxious small molecules/peptides for cy- quence data published by the WHO. Each therapeutic totoxicity (6). summary page lists structural coverage (with links to Recent developments in protein engineering have resulted the appropriate SAbDab entries), alignments show- in bispecific immunotherapies, where two distinct variable ing where any near-matches deviate in sequence, domain binding sites are incorporated into a single protein. and accompanying metadata, such as intended tar- As of June 2019, bispecific mAbs, linked Fabs, linked scFvs get and investigated conditions. Thera-SAbDab can and linked single-domain variable fragments have all been assessed in clinical trials (7). be queried by therapeutic name, by a combination of A primary source of information on immunotherapies is metadata, or by variable domain sequence - return- the World Health Organisation (WHO), which publishes ing all therapeutics that are within a specified se- biannual ‘Proposed’ (8) and ‘Recommended’ (9) Interna- quence identity over a specified region of the query. tional Nonproprietary Name (INN) lists. These INNs serve The sequences of all therapeutics listed in Thera- as globally-recognized generic names by which pharma- SAbDab (461 unique molecules, as of 5 August 2019) ceuticals can be identified. To be granted an INN, appli- are downloadable as a single file with accompanying cants must include a full amino acid sequence, the clos- metadata. est V and J gene, the IG subclass, and the light chain type (see https://extranet.who.int/tools/inn online application/). INTRODUCTION This information, coupled with the $12 000 cost of applica- tion (as of August 2019), makes INN lists a useful source of Immunotherapeutics derived from B-cell genes are an in- therapies that companies intend to carry forward into clin- creasingly successful and significant proportion of the ical trials. global drugs market, designed to treat a wide range of dis- Several databases already harvest this information. Two eases (1–3). non-commercial antibody-specific resources are the IMGT To whom correspondence should be addressed. Tel: +44 1865 272860; Email: deane@stats.ox.ac.uk C The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 D384 Nucleic Acids Research, 2020, Vol. 48, Database issue Monoclonal Antibody Database (IMGT mAb-DB; http: DATA SOURCES //www.imgt.org/mAb-DB (10), and WHOINNIG (http:// Sequence data www.bioinf.org.uk/abs/abybank/whoinnig). The Therapeutic Antibody Database (TABS; https:// Proposed INN lists (8,9), published by the WHO, are tabs.craic.com) is antibody-specific and commercial, also the source of the majority of sequence information in scraping patents for therapies. Other databases not spe- Thera-SAbDab. These are released biannually (one in cific to antibodies can also capture WHO information, such January/February and another in June/July) and––since as ChEMBL (https://www.ebi.ac.uk/chembl), DrugBank list P95 in 2006––represent a reliable record of variable (https://www.drugbank.ca) and KEGG DRUG (https:// domain sequences for all antibody- and nanobody-related www.genome.jp/kegg/drug). therapeutics granted a proposed INN. Of the 129 antibody- Most databases supply additional metadata for their related therapeutics proposed before 2006, we were able therapeutic entries, such as clinical trial status, companies to find sequence information for 47 (36.4%) through involved in development, target specificity, and alternative theIMGTmAb-DB(http://www.imgt.org/mAb-DB/). Al- names. For example, the recently published ABCD database though we continue to search, and joint academia-industry provides antibody synonyms, antigen UniProt links and initiatives such as Abvance encourage their release (https: publication references (11). However, while these reposito- //www.pistoiaalliance.org/projects/abvance/), sequences for ries supply sequence information (either on individual sum- the remaining 82 may never become public knowledge. mary pages or through reference to the primary literature), All sequences are then numbered by ANARCI (17), it is currently not possible to query them by sequence, nor which uses Hidden Markov Models to align input sequences to bulk-download relevant sets of therapeutic sequences for to pre-numbered germline sequences. Assigning a number- direct bioinformatic analysis. ing allows users to more easily interpret the significance of Structural knowledge about both the intended target and mutations in near-identical sequence matches. For example, the therapeutic lead compound is of high importance for ra- if the mismatch occurs in the extremities of the framework tional drug discovery (12,13). For example, co-crystal com- region, it may be judged to have minimal effect on binding plexes reveal where a drug binds to its target (the surface site structure. ‘epitope’), and separately-solved structures enable more ac- curate docking experiments. It can also assist subsequent Structural data development and optimization, as homology models of mu- tants derived from a known structure are in general more Thera-SAbDab compares all numbered therapeutic se- accurate than those for which no close structural partner quences to the structures in SAbDab (16), which prefilters is available (14). The Protein Data Bank (15) (PDB) now the PDB (15) for all structures whose sequences align to contains over 150 000 solved structures, and though it is B-cell germline genes. As all SAbDab structures are also highly biased towards certain protein classes, many diverse pre-numbered, the comparison of therapeutics to public targets of pharmacological interest are represented. A sig- structural space is efficient. All the existing functionality of nificant fraction of these structures contain antibody vari- SAbDab (e.g. interactive molecular viewers and numbered able domains, and these are recorded by the Structural Anti- structure downloads) is made easily accessible from Thera- body Database (SAbDab (16); 7184 variable domain struc- SAbDab search results. tures over 3663 PDB entries as of 5 August 2019). Both IMGT mAb-DB and TABS report a set of known thera- Therapeutic metadata peutic structures in the PDB, but their reported structural coverage of therapeutic space is low. For example, neither Therapeutic metadata comprises a mixture of inherent database reports any known structural information for bis- characteristics and continually-changing status updates. pecific immunotherapeutics. Certain static properties can be acquired automatically. To address these deficiencies, we have created the Ther- For example, light chain type is identified through our AN- apeutic Structural Antibody Database (Thera-SAbDab; ARCI germline alignment (17), while isotype, INN Pro- http://opig.stats.ox.ac.uk/webapps/therasabdab). We har- posed and Recommended years, and intended target(s) can vest sequences as they are released by the WHO, number be harvested directly from the INN lists. Sequence com- them with ANARCI (17), and perform a weekly sequence parison can also be used to identify where different INN alignment of all therapeutic variable domain sequences to names refer to identical variable domains. Other character- the sequences of known structures stored in SAbDab. Struc- istics, such as which companies are involved in therapeutic tures with sequence identity matches of 100%, 99% and development, must be manually curated at the time of de- 95–98% are recorded and categorized, with alignments on position. each therapeutic summary page to show precisely where Time-dependent characteristics for new entries are also each near-identical structure differs from the therapeutic se- manually curated after sequence identification, and there- quence. after every 3 months. We source clinical trial infor- Thera-SAbDab can be queried by INN, by a combina- mation, developmental status, and investigated condi- tion of metadata, such as INN proposal year, clinical trial tion data from a range of sources including AdisInsight status, or target, or by sequence (including over a specified (https://adisinsight.springer.com), ClinicalTrials.gov (https: region of the sequence). We make available all therapeu- //clinicaltrials.gov), and DrugBank (https://www.drugbank. tic sequences contained within Thera-SAbDab, alongside ca). These websites are updated more regularly, and so metadata, to facilitate further research. are preferable sources for this time-sensitive metadata; we Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 Nucleic Acids Research, 2020, Vol. 48, Database issue D385 chain designed to bind to TNF-, and VH(ALB) is an- other heavy chain designed to bind ALB. Thera-SAbDab has identified a structure for the TNFA binding domain with sequence identity of 95.65% [5m2j; chain D]. Inspec- tion of the sequence alignment shows that 5m2j has a 100% Chothia-defined CDRH3 sequence match to VH(TNFA), and in fact only differs by one mutation across all Chothia- defined ( 19) CDRs: 31D in VH(TNFA) is 31N in 5m2j. 5m2j is a VHH2 llama nanobody, suggesting that SAbDab’s coverage of nanobody structural space will be increasingly highlighted by Thera-SAbDab as more single-chain thera- pies arrive in the clinic. Therapeutically-relevant structures are continually being deposited in the PDB, even many years after initial devel- Figure 1. The number of antibody- and nanobody-related therapeutics as- opment. For example, since 2009, the WHO have recorded signed an International Nonproprietary Name (INN) by year. A record nine antibody-related therapeutics against IL17A––seven number of 72 of these therapeutics were recognized by the WHO in 2018. monoclonals and two bispecifics. The first, secukinumab, was recognized in 2009, and since 2014 has been approved include these fields in Thera-SAbDab to allow for more for use in certain types of arthritis, psoriasis, and spondyli- pharmacologically-relevant searches, as well as to identify tis. As of early June 2019, there were no close structures for all post Phase-I candidates for inclusion in our vfi e updat- any of these IL17A-binders. However, on 19 June 2019, Eli ing developability guidelines (18). Lilly deposited an exact variable domain structure for ixek- izumab (an IL17A-targetting monoclonal antibody, 6nov) and a close structure for tibulizumab (an IL17A-binding CONTENTS and TNFSF13B-binding bispecific antibody, 6nou) in the As of 5 August 2019, Thera-SAbDab is tracking 558 PDB (20). SAbDab detected and numbered them in its INNs, representing 543 unique therapeutics. Of the 558 weekly update, making Thera-SAbDab the first antibody INN names, 473 could be mapped to variable domain se- database to link to the structures of IL17A-binding ther- quences (87.1%), representing 461 unique therapeutics with apeutic antibodies. sequence data. 436 were monoclonal therapies (three pairs of which share identical variable domains: avelumab & bin- USAGE trafusp, losatuxizumab & serclutamab and radretumab & bifikafusp), and 25 were bispecific therapies. Plotting the cu- There are multiple ways to search Thera-SAbDab. Thera- mulative sum of these unique therapeutics by year deposited SAbDab can be queried directly by INN if structural infor- in a WHO ‘Proposed INN’ list shows an exponential in- mation about a particular therapeutic is needed. Alterna- crease since the early 2000s (Figure 1). tively a combination of metadata can be specified to iden- We searched the IMGT mAb-DB (10)and TABS tify structures for a particular subset of therapeutic space, databases (on 28 June 2019) for structures of these 461 ther- for example binders to a particular antigen, or therapeutics apeutics. IMGT mAb-DB identified 72 structures of ther- at a particular stage of clinical trials (Figure 2A). Results apeutic variable domains, across 36 different monoclonal are returned in a table format, with links to each therapeu- therapeutics, while TABS reported 53 structures of ther- tic summary page and a selected array of metadata (Figure apeutic variable domains, across 32 different monoclonal 2B). therapeutics. In contrast, Thera-SAbDab (at the 100% se- Each therapeutic summary page lists a structural sum- quence identical threshold) contained 152 therapeutic vari- mary (including our database sequence), with links to rel- able domain structures, across 84 distinct monoclonal ther- evant SAbDab entries (with PDB codes and chains), and apeutics and 7 distinct bispecific therapeutics. A further alignment charts (if structures with 95–99% sequence iden- 21 monoclonal therapeutics had maximum sequence iden- tity are detected). Each SAbDab link redirects the user to tity matches of 99% (up to two mutations away from a the SAbDab summary page for the relevant PDB entry, publicly-available structure), and 13 monoclonals and 4 bis- where all existing functionality can be accessed. Links to ap- pecifics had maximum sequence identity matches of 95– propriate SAbPred (21) informatics tools (such as ABody- 98%. We conclude that, at present, around a quarter (27.1%) Builder (22) for variable domain structure modelling, and of WHO-recognized monoclonal therapeutics have exact or TAP (18) for developability assessment) are also provided. close (≥95% sequence identity) structural coverage. 44.0% Finally, we list all the remaining metadata that we have of bispecific therapeutics have at least one variable domain recorded for the therapeutic, ranging from records of inves- with exact or close structural coverage, and two have exact tigated conditions, to which companies are developing the matches for both variable domains. therapeutic, to its estimated developmental status. Thera-SAbDab contains structural information for even A third way to search Thera-SAbDab is by sequence (Fig- the most diversely-formatted therapeutics. Ozoralizumab, ure 2C and D). This can be harnessed in numerous ways. For a bispecific therapy in active Phase-III clinical trials example, by querying with a known therapeutic sequence, for rheumatoid arthritis, has a VH(TNFA)–VH(ALB)– researchers can look for sequence commonalities between VH(TNFA) configuration, where VH(TNFA) is a heavy therapeutics over any region of the variable domain. Alter- Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 D386 Nucleic Acids Research, 2020, Vol. 48, Database issue Figure 2. Searching Thera-SAbDab. (A) Search by attribute. Here, we search for any therapeutic designed to bind to ERBB2 (often over-expressed in breast cancer). (B) Eight therapeutics are designed to bind to ERBB2, seven monoclonals and one bispecific. Four have exact structural information for the ERBB2 binding site. Click the therapeutic name to enter the therapeutic summary page. (C) Search by sequence. Here we search for therapeutics with at least 70% sequence identity across the heavy and light chain CDRs of the input sequence. (D) Any results are returned alongside sequence identity across the specified region. Alignments show any sequence mismatches across the variable domain sequence. Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 Nucleic Acids Research, 2020, Vol. 48, Database issue D387 natively, by querying with a developmental candidate se- FUNDING quence, researchers can search for similarity to any other Engineering and Physical Sciences Research Council therapeutic, or specifically to those designed to bind to the and Medical Research Council [EP/L016044/1]; Glaxo- same target. This could identify potential patenting issues, SmithKline plc; AstraZeneca plc; F. Hoffmann-La Roche highlight a risk of polyspecificity, or suggest a binding mode AG; UCB Celltech. Funding for open access charge: RCUK to the intended target. Open Access Block Grant and processed through the A further selection of sample use cases for Thera- Bodleian Library, University of Oxford. SAbDab are available at http://opig.stats.ox.ac.uk/ Conflict of interest statement. None declared. webapps/therasabdab/about. REFERENCES ACCESSIBILITY OF THE DATA 1. Grilo,A.L. and Mantalaris,A. (2018) The increasingly human and profitable monoclonal antibody market. Trends Biotechnol., 37, 9–16. Thera-SAbDab can be queried at http://opig.stats.ox.ac. 2. Steeland,S., Vandenbroucke,R.E. and Libert,C. (2017) Nanobodies uk/webapps/therasabdab. All sequence data harvested by as therapeutics: big opportunities for small antibodies. Drug Discov. Thera-SAbDab can be downloaded from the ‘Downloads’ Today, 21, 1076–1113. tab of the search page. Sequences are supplied alongside 3. Kaplon,H. and Reichert,J.M. (2019) Antibodies to watch in 2019. mAbs, 11, 219–238. the therapeutic INN, format, isotype, light chain category, 4. Jevse ˇ var,S., Kusterle,M. and Kenig,M. (2012) PEGylation of highest clinical trial stage reached, and estimated develop- Antibody Fragments for Half-Life Extension. In: Proetzel,G and mental status. We also supply a list of therapeutics for which Ebersbach,H (eds). Antibody Methods and Protocols. Methods in sequence information has not yet been released. Molecular Biology (Methods and Protocols).Vol. 901, Humana Press, Totowa. 5. Steiner,M. and Neri,D. (2011) Antibody-radionuclide conjugates for cancer therapy: historical considerations and new trends. Clin. Cancer Res., 17, 6406–6416. CONCLUSION 6. Beck,A., Goetsch,L., Dumontet,C. and Corva¨ıa,N. (2017) Strategies and challenges for the next generation of antibody-drug conjugates. We have created Thera-SAbDab with the central aim of col- Nat. Rev. Drug Discov., 16, 315–337. lating all public structural knowledge for WHO-recognized 7. Labrijn,A.F., Janmaat,M.L., Reichert,J.M. and Parren,P.W.H.I. antibody- and nanobody-related therapeutic variable do- (2019) Bispecific antibodies: a mechanistic review of the pipeline. Nat. Rev. Drug Disc., 18, 585–608. mains. Rather than relying on text-mining approaches, 8. WHO (2018) Proposed International Nonproprietary Names (INN) which can miss PDB depositions that omit reference to the List 120. WHO Drug Information, 32, 559–689. structure’s therapeutic relevance, Thera-SAbDab uses a sys- 9. WHO (2019) Recommended International Nonproprietary Names tematic approach at the level of sequence identity to detect (INN) List 81. WHO Drug Information, 33, 59–134. exact and close matches to our repository of therapeutic 10. Poiron,C., Wu,Y., Ginestoux,C., Ehrenmann,F., Duroux,P. and Lefranc,M.-P. (2010) IMGT/mAb-DB: the IMGT database for variable domains. therapeutic monoclonal antibodies. JOBIM 2010, 13, 470b. This approach has not only enabled us to identify over 11. Lima,W. C., Gasteiger,E., Marcatili,P., Duek,P., Bairoch,A. and twice the number of monoclonal therapies with 100% Cosson,P. (2019) The ABCD database: a repository for chemically sequence-identical structures in the PDB than in existing defined antibodies. Nucleic Acids Res., 48, gkz714. 12. van Montfont,R.L.M. and Workman,P. (2017) Structure-based drug databases, but has also identified exact variable domain design: aiming for a perfect fit. Essays Biochem. 61, 431–437. structures for several bispecific therapies. Our approach 13. Raybould,M.I.J., Wong,W.K. and Deane,C.M. (2019) can also distinguish between PDB structures with 100%, Antibody-antigen complex modelling in the era of immunoglobulin 99%, and 95–98% sequence identity matches. Sequence repertoire sequencing. Mol. Syst. Des. Eng., 4, 679–688. alignments guide the interpretation of structures of near- 14. Muhammed,M.T. and Aki-Yalcin,E. (2019) Homology modeling in drug discovery: overview, current applications, and future identical sequence. perspectives. Chem. Biol. Drug Des., 93, 12–20. Like IMGT-DB, Thera-SAbDab can be queried by meta- 15. Berman,H.B., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., data, but uniquely it can also be queried by variable domain Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data sequence. This enables researchers to identify any therapeu- Bank. Nucleic Acids Res., 28, 235–242. 16. Dunbar,J., Krawczyk,K., Leem,J., Baker,T., Fuchs,A., Georges,G., tics proximal over any variable domain region to their query Shi,J. and Deane,C.M. (2014) SAbDab: the structural antibody sequence. database. Nucleic Acids Res., 42, D1140–D1146. Thera-SAbDab’s sequence database will be updated with 17. Dunbar,J. and Deane,C.M. (2016) ANARCI: antigen receptor new sequence information twice per year, in line with the numbering and receptor classification. Bioinformatics, 32, 298–300. release of new WHO Proposed INN lists. An updated list 18. Raybould,M.I.J., Marks,C., Krawczyk,K., Taddese,B., Nowak,J., Lewis,A.P., Bujotzek,A., Shi,J. and Deane,C.M. (2019) Five of all therapeutic variable domain sequences with metadata computational developability guidelines for therapeutic antibody is supplied as a single file to facilitate further analysis, for profiling. Proc. Natl. Acad. Sci. U.S.A., 116, 4025–4030. example into the properties of therapeutic antibody-antigen 19. Al-Lazikani,B., Lesk,A.M. and Chothia,C. (1997) Standard interfaces. conformations for the canonical structures of immunoglobulins. J. Mol. Biol., 273, 927–948. As shown for IL17A-binding therapeutics, new clinically- 20. Benschop,R.J., Chow,C.-K., Tian,Y., Nelson,J., Barmettler,B., relevant structures are continually being released. Accord- Atwell,S., Clawson,D., Chai,Q., Jones,B., Fitchett,J. et al. (2019) ingly, Thera-SAbDab checks SAbDab after each weekly up- Development of tibulizumab, a tetravalent bispecific antibody date for new matches, ensuring that this data is rapidly cap- targeting BAFF and IL-17A for the treatment of autoimmune tured. disease. mAbs, 11, 1175–1190. Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 D388 Nucleic Acids Research, 2020, Vol. 48, Database issue 21. Dunbar,J., Krawczyk,K., Leem,J., Marks,C., Nowak,J., Regep,C., 22. Leem,J., Dunbar,J., Georges,G., Shi,J. and Deane,C.M. (2016) Georges,G., Kelm,S., Popovic,B. and Deane,C.M. (2014) SAbPred: a ABodyBuilder: automated antibody structure prediction with structure-based antibody prediction server. Nucleic Acids Res., 44, data-driven accuracy estimation. mAbs, 8, 1259–1268. W474–W478. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Nucleic Acids Research Oxford University Press

Loading next page...
 
/lp/oxford-university-press/thera-sabdab-the-therapeutic-structural-antibody-database-nsUVx50ebD

References (28)

Publisher
Oxford University Press
Copyright
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.
ISSN
0305-1048
eISSN
1362-4962
DOI
10.1093/nar/gkz827
Publisher site
See Article on Publisher Site

Abstract

Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 Published online 26 September 2019 Nucleic Acids Research, 2020, Vol. 48, Database issue D383–D388 doi: 10.1093/nar/gkz827 Thera-SAbDab: the Therapeutic Structural Antibody Database 1 1 2 3 4 Matthew I.J. Raybould , Claire Marks , Alan P. Lewis ,JiyeShi , Alexander Bujotzek , 5 1,* Bruck Taddese and Charlotte M. Deane Oxford Protein Informatics Group, Department of Statistics, University of Oxford, 24-29 St Giles’, Oxford OX1 3LB, UK, Data and Computational Sciences, GlaxoSmithKline Research and Development, Gunnels Wood Road, 3 4 Stevenage SG1 2NY, UK, Chemistry Department, UCB Pharma, 216 Bath Road, Slough SL1 3WE, UK, Roche Pharma Research and Early Development, Large Molecule Research, Roche Innovation Center Munich, DE-82377 Penzberg, Germany and Discovery Sciences Department, AstraZeneca, Granta Park, Cambridge CB21 6GH, UK Received August 08, 2019; Revised September 09, 2019; Editorial Decision September 13, 2019; Accepted September 24, 2019 ABSTRACT Whole monoclonal antibody (mAb) therapies dominate the industry - drugs that mimic natural antibodies by con- The Therapeutic Structural Antibody Database taining two identical variable domain structures with a (Thera-SAbDab; http://opig.stats.ox.ac.uk/webapps/ particular specificity ( 3). The broader class of monoclonal therasabdab) tracks all antibody- and nanobody- therapies also includes Fragment antigen binding (Fab) re- related therapeutics recognized by the World Health gions (a single arm of a whole antibody), single-chain Fv Organisation (WHO), and identifies any correspond- (scFv) regions (a heavy and light chain variable domain connected by an engineered glycine-rich linker), and single- ing structures in the Structural Antibody Database domain variable fragments. These fragments can be ex- (SAbDab) with near-exact or exact variable domain pressed in dimeric form to improve avidity, or conjugated sequence matches. Thera-SAbDab is synchronized with polyethylene glycol (‘pegylated’) for slower clearance with SAbDab to update weekly, reflecting new Pro- (4), with radioisotopes for diagnostic purposes (5), or with tein Data Bank entries and the availability of new se- radioisotopes or noxious small molecules/peptides for cy- quence data published by the WHO. Each therapeutic totoxicity (6). summary page lists structural coverage (with links to Recent developments in protein engineering have resulted the appropriate SAbDab entries), alignments show- in bispecific immunotherapies, where two distinct variable ing where any near-matches deviate in sequence, domain binding sites are incorporated into a single protein. and accompanying metadata, such as intended tar- As of June 2019, bispecific mAbs, linked Fabs, linked scFvs get and investigated conditions. Thera-SAbDab can and linked single-domain variable fragments have all been assessed in clinical trials (7). be queried by therapeutic name, by a combination of A primary source of information on immunotherapies is metadata, or by variable domain sequence - return- the World Health Organisation (WHO), which publishes ing all therapeutics that are within a specified se- biannual ‘Proposed’ (8) and ‘Recommended’ (9) Interna- quence identity over a specified region of the query. tional Nonproprietary Name (INN) lists. These INNs serve The sequences of all therapeutics listed in Thera- as globally-recognized generic names by which pharma- SAbDab (461 unique molecules, as of 5 August 2019) ceuticals can be identified. To be granted an INN, appli- are downloadable as a single file with accompanying cants must include a full amino acid sequence, the clos- metadata. est V and J gene, the IG subclass, and the light chain type (see https://extranet.who.int/tools/inn online application/). INTRODUCTION This information, coupled with the $12 000 cost of applica- tion (as of August 2019), makes INN lists a useful source of Immunotherapeutics derived from B-cell genes are an in- therapies that companies intend to carry forward into clin- creasingly successful and significant proportion of the ical trials. global drugs market, designed to treat a wide range of dis- Several databases already harvest this information. Two eases (1–3). non-commercial antibody-specific resources are the IMGT To whom correspondence should be addressed. Tel: +44 1865 272860; Email: deane@stats.ox.ac.uk C The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 D384 Nucleic Acids Research, 2020, Vol. 48, Database issue Monoclonal Antibody Database (IMGT mAb-DB; http: DATA SOURCES //www.imgt.org/mAb-DB (10), and WHOINNIG (http:// Sequence data www.bioinf.org.uk/abs/abybank/whoinnig). The Therapeutic Antibody Database (TABS; https:// Proposed INN lists (8,9), published by the WHO, are tabs.craic.com) is antibody-specific and commercial, also the source of the majority of sequence information in scraping patents for therapies. Other databases not spe- Thera-SAbDab. These are released biannually (one in cific to antibodies can also capture WHO information, such January/February and another in June/July) and––since as ChEMBL (https://www.ebi.ac.uk/chembl), DrugBank list P95 in 2006––represent a reliable record of variable (https://www.drugbank.ca) and KEGG DRUG (https:// domain sequences for all antibody- and nanobody-related www.genome.jp/kegg/drug). therapeutics granted a proposed INN. Of the 129 antibody- Most databases supply additional metadata for their related therapeutics proposed before 2006, we were able therapeutic entries, such as clinical trial status, companies to find sequence information for 47 (36.4%) through involved in development, target specificity, and alternative theIMGTmAb-DB(http://www.imgt.org/mAb-DB/). Al- names. For example, the recently published ABCD database though we continue to search, and joint academia-industry provides antibody synonyms, antigen UniProt links and initiatives such as Abvance encourage their release (https: publication references (11). However, while these reposito- //www.pistoiaalliance.org/projects/abvance/), sequences for ries supply sequence information (either on individual sum- the remaining 82 may never become public knowledge. mary pages or through reference to the primary literature), All sequences are then numbered by ANARCI (17), it is currently not possible to query them by sequence, nor which uses Hidden Markov Models to align input sequences to bulk-download relevant sets of therapeutic sequences for to pre-numbered germline sequences. Assigning a number- direct bioinformatic analysis. ing allows users to more easily interpret the significance of Structural knowledge about both the intended target and mutations in near-identical sequence matches. For example, the therapeutic lead compound is of high importance for ra- if the mismatch occurs in the extremities of the framework tional drug discovery (12,13). For example, co-crystal com- region, it may be judged to have minimal effect on binding plexes reveal where a drug binds to its target (the surface site structure. ‘epitope’), and separately-solved structures enable more ac- curate docking experiments. It can also assist subsequent Structural data development and optimization, as homology models of mu- tants derived from a known structure are in general more Thera-SAbDab compares all numbered therapeutic se- accurate than those for which no close structural partner quences to the structures in SAbDab (16), which prefilters is available (14). The Protein Data Bank (15) (PDB) now the PDB (15) for all structures whose sequences align to contains over 150 000 solved structures, and though it is B-cell germline genes. As all SAbDab structures are also highly biased towards certain protein classes, many diverse pre-numbered, the comparison of therapeutics to public targets of pharmacological interest are represented. A sig- structural space is efficient. All the existing functionality of nificant fraction of these structures contain antibody vari- SAbDab (e.g. interactive molecular viewers and numbered able domains, and these are recorded by the Structural Anti- structure downloads) is made easily accessible from Thera- body Database (SAbDab (16); 7184 variable domain struc- SAbDab search results. tures over 3663 PDB entries as of 5 August 2019). Both IMGT mAb-DB and TABS report a set of known thera- Therapeutic metadata peutic structures in the PDB, but their reported structural coverage of therapeutic space is low. For example, neither Therapeutic metadata comprises a mixture of inherent database reports any known structural information for bis- characteristics and continually-changing status updates. pecific immunotherapeutics. Certain static properties can be acquired automatically. To address these deficiencies, we have created the Ther- For example, light chain type is identified through our AN- apeutic Structural Antibody Database (Thera-SAbDab; ARCI germline alignment (17), while isotype, INN Pro- http://opig.stats.ox.ac.uk/webapps/therasabdab). We har- posed and Recommended years, and intended target(s) can vest sequences as they are released by the WHO, number be harvested directly from the INN lists. Sequence com- them with ANARCI (17), and perform a weekly sequence parison can also be used to identify where different INN alignment of all therapeutic variable domain sequences to names refer to identical variable domains. Other character- the sequences of known structures stored in SAbDab. Struc- istics, such as which companies are involved in therapeutic tures with sequence identity matches of 100%, 99% and development, must be manually curated at the time of de- 95–98% are recorded and categorized, with alignments on position. each therapeutic summary page to show precisely where Time-dependent characteristics for new entries are also each near-identical structure differs from the therapeutic se- manually curated after sequence identification, and there- quence. after every 3 months. We source clinical trial infor- Thera-SAbDab can be queried by INN, by a combina- mation, developmental status, and investigated condi- tion of metadata, such as INN proposal year, clinical trial tion data from a range of sources including AdisInsight status, or target, or by sequence (including over a specified (https://adisinsight.springer.com), ClinicalTrials.gov (https: region of the sequence). We make available all therapeu- //clinicaltrials.gov), and DrugBank (https://www.drugbank. tic sequences contained within Thera-SAbDab, alongside ca). These websites are updated more regularly, and so metadata, to facilitate further research. are preferable sources for this time-sensitive metadata; we Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 Nucleic Acids Research, 2020, Vol. 48, Database issue D385 chain designed to bind to TNF-, and VH(ALB) is an- other heavy chain designed to bind ALB. Thera-SAbDab has identified a structure for the TNFA binding domain with sequence identity of 95.65% [5m2j; chain D]. Inspec- tion of the sequence alignment shows that 5m2j has a 100% Chothia-defined CDRH3 sequence match to VH(TNFA), and in fact only differs by one mutation across all Chothia- defined ( 19) CDRs: 31D in VH(TNFA) is 31N in 5m2j. 5m2j is a VHH2 llama nanobody, suggesting that SAbDab’s coverage of nanobody structural space will be increasingly highlighted by Thera-SAbDab as more single-chain thera- pies arrive in the clinic. Therapeutically-relevant structures are continually being deposited in the PDB, even many years after initial devel- Figure 1. The number of antibody- and nanobody-related therapeutics as- opment. For example, since 2009, the WHO have recorded signed an International Nonproprietary Name (INN) by year. A record nine antibody-related therapeutics against IL17A––seven number of 72 of these therapeutics were recognized by the WHO in 2018. monoclonals and two bispecifics. The first, secukinumab, was recognized in 2009, and since 2014 has been approved include these fields in Thera-SAbDab to allow for more for use in certain types of arthritis, psoriasis, and spondyli- pharmacologically-relevant searches, as well as to identify tis. As of early June 2019, there were no close structures for all post Phase-I candidates for inclusion in our vfi e updat- any of these IL17A-binders. However, on 19 June 2019, Eli ing developability guidelines (18). Lilly deposited an exact variable domain structure for ixek- izumab (an IL17A-targetting monoclonal antibody, 6nov) and a close structure for tibulizumab (an IL17A-binding CONTENTS and TNFSF13B-binding bispecific antibody, 6nou) in the As of 5 August 2019, Thera-SAbDab is tracking 558 PDB (20). SAbDab detected and numbered them in its INNs, representing 543 unique therapeutics. Of the 558 weekly update, making Thera-SAbDab the first antibody INN names, 473 could be mapped to variable domain se- database to link to the structures of IL17A-binding ther- quences (87.1%), representing 461 unique therapeutics with apeutic antibodies. sequence data. 436 were monoclonal therapies (three pairs of which share identical variable domains: avelumab & bin- USAGE trafusp, losatuxizumab & serclutamab and radretumab & bifikafusp), and 25 were bispecific therapies. Plotting the cu- There are multiple ways to search Thera-SAbDab. Thera- mulative sum of these unique therapeutics by year deposited SAbDab can be queried directly by INN if structural infor- in a WHO ‘Proposed INN’ list shows an exponential in- mation about a particular therapeutic is needed. Alterna- crease since the early 2000s (Figure 1). tively a combination of metadata can be specified to iden- We searched the IMGT mAb-DB (10)and TABS tify structures for a particular subset of therapeutic space, databases (on 28 June 2019) for structures of these 461 ther- for example binders to a particular antigen, or therapeutics apeutics. IMGT mAb-DB identified 72 structures of ther- at a particular stage of clinical trials (Figure 2A). Results apeutic variable domains, across 36 different monoclonal are returned in a table format, with links to each therapeu- therapeutics, while TABS reported 53 structures of ther- tic summary page and a selected array of metadata (Figure apeutic variable domains, across 32 different monoclonal 2B). therapeutics. In contrast, Thera-SAbDab (at the 100% se- Each therapeutic summary page lists a structural sum- quence identical threshold) contained 152 therapeutic vari- mary (including our database sequence), with links to rel- able domain structures, across 84 distinct monoclonal ther- evant SAbDab entries (with PDB codes and chains), and apeutics and 7 distinct bispecific therapeutics. A further alignment charts (if structures with 95–99% sequence iden- 21 monoclonal therapeutics had maximum sequence iden- tity are detected). Each SAbDab link redirects the user to tity matches of 99% (up to two mutations away from a the SAbDab summary page for the relevant PDB entry, publicly-available structure), and 13 monoclonals and 4 bis- where all existing functionality can be accessed. Links to ap- pecifics had maximum sequence identity matches of 95– propriate SAbPred (21) informatics tools (such as ABody- 98%. We conclude that, at present, around a quarter (27.1%) Builder (22) for variable domain structure modelling, and of WHO-recognized monoclonal therapeutics have exact or TAP (18) for developability assessment) are also provided. close (≥95% sequence identity) structural coverage. 44.0% Finally, we list all the remaining metadata that we have of bispecific therapeutics have at least one variable domain recorded for the therapeutic, ranging from records of inves- with exact or close structural coverage, and two have exact tigated conditions, to which companies are developing the matches for both variable domains. therapeutic, to its estimated developmental status. Thera-SAbDab contains structural information for even A third way to search Thera-SAbDab is by sequence (Fig- the most diversely-formatted therapeutics. Ozoralizumab, ure 2C and D). This can be harnessed in numerous ways. For a bispecific therapy in active Phase-III clinical trials example, by querying with a known therapeutic sequence, for rheumatoid arthritis, has a VH(TNFA)–VH(ALB)– researchers can look for sequence commonalities between VH(TNFA) configuration, where VH(TNFA) is a heavy therapeutics over any region of the variable domain. Alter- Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 D386 Nucleic Acids Research, 2020, Vol. 48, Database issue Figure 2. Searching Thera-SAbDab. (A) Search by attribute. Here, we search for any therapeutic designed to bind to ERBB2 (often over-expressed in breast cancer). (B) Eight therapeutics are designed to bind to ERBB2, seven monoclonals and one bispecific. Four have exact structural information for the ERBB2 binding site. Click the therapeutic name to enter the therapeutic summary page. (C) Search by sequence. Here we search for therapeutics with at least 70% sequence identity across the heavy and light chain CDRs of the input sequence. (D) Any results are returned alongside sequence identity across the specified region. Alignments show any sequence mismatches across the variable domain sequence. Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 Nucleic Acids Research, 2020, Vol. 48, Database issue D387 natively, by querying with a developmental candidate se- FUNDING quence, researchers can search for similarity to any other Engineering and Physical Sciences Research Council therapeutic, or specifically to those designed to bind to the and Medical Research Council [EP/L016044/1]; Glaxo- same target. This could identify potential patenting issues, SmithKline plc; AstraZeneca plc; F. Hoffmann-La Roche highlight a risk of polyspecificity, or suggest a binding mode AG; UCB Celltech. Funding for open access charge: RCUK to the intended target. Open Access Block Grant and processed through the A further selection of sample use cases for Thera- Bodleian Library, University of Oxford. SAbDab are available at http://opig.stats.ox.ac.uk/ Conflict of interest statement. None declared. webapps/therasabdab/about. REFERENCES ACCESSIBILITY OF THE DATA 1. Grilo,A.L. and Mantalaris,A. (2018) The increasingly human and profitable monoclonal antibody market. Trends Biotechnol., 37, 9–16. Thera-SAbDab can be queried at http://opig.stats.ox.ac. 2. Steeland,S., Vandenbroucke,R.E. and Libert,C. (2017) Nanobodies uk/webapps/therasabdab. All sequence data harvested by as therapeutics: big opportunities for small antibodies. Drug Discov. Thera-SAbDab can be downloaded from the ‘Downloads’ Today, 21, 1076–1113. tab of the search page. Sequences are supplied alongside 3. Kaplon,H. and Reichert,J.M. (2019) Antibodies to watch in 2019. mAbs, 11, 219–238. the therapeutic INN, format, isotype, light chain category, 4. Jevse ˇ var,S., Kusterle,M. and Kenig,M. (2012) PEGylation of highest clinical trial stage reached, and estimated develop- Antibody Fragments for Half-Life Extension. In: Proetzel,G and mental status. We also supply a list of therapeutics for which Ebersbach,H (eds). Antibody Methods and Protocols. Methods in sequence information has not yet been released. Molecular Biology (Methods and Protocols).Vol. 901, Humana Press, Totowa. 5. Steiner,M. and Neri,D. (2011) Antibody-radionuclide conjugates for cancer therapy: historical considerations and new trends. Clin. Cancer Res., 17, 6406–6416. CONCLUSION 6. Beck,A., Goetsch,L., Dumontet,C. and Corva¨ıa,N. (2017) Strategies and challenges for the next generation of antibody-drug conjugates. We have created Thera-SAbDab with the central aim of col- Nat. Rev. Drug Discov., 16, 315–337. lating all public structural knowledge for WHO-recognized 7. Labrijn,A.F., Janmaat,M.L., Reichert,J.M. and Parren,P.W.H.I. antibody- and nanobody-related therapeutic variable do- (2019) Bispecific antibodies: a mechanistic review of the pipeline. Nat. Rev. Drug Disc., 18, 585–608. mains. Rather than relying on text-mining approaches, 8. WHO (2018) Proposed International Nonproprietary Names (INN) which can miss PDB depositions that omit reference to the List 120. WHO Drug Information, 32, 559–689. structure’s therapeutic relevance, Thera-SAbDab uses a sys- 9. WHO (2019) Recommended International Nonproprietary Names tematic approach at the level of sequence identity to detect (INN) List 81. WHO Drug Information, 33, 59–134. exact and close matches to our repository of therapeutic 10. Poiron,C., Wu,Y., Ginestoux,C., Ehrenmann,F., Duroux,P. and Lefranc,M.-P. (2010) IMGT/mAb-DB: the IMGT database for variable domains. therapeutic monoclonal antibodies. JOBIM 2010, 13, 470b. This approach has not only enabled us to identify over 11. Lima,W. C., Gasteiger,E., Marcatili,P., Duek,P., Bairoch,A. and twice the number of monoclonal therapies with 100% Cosson,P. (2019) The ABCD database: a repository for chemically sequence-identical structures in the PDB than in existing defined antibodies. Nucleic Acids Res., 48, gkz714. 12. van Montfont,R.L.M. and Workman,P. (2017) Structure-based drug databases, but has also identified exact variable domain design: aiming for a perfect fit. Essays Biochem. 61, 431–437. structures for several bispecific therapies. Our approach 13. Raybould,M.I.J., Wong,W.K. and Deane,C.M. (2019) can also distinguish between PDB structures with 100%, Antibody-antigen complex modelling in the era of immunoglobulin 99%, and 95–98% sequence identity matches. Sequence repertoire sequencing. Mol. Syst. Des. Eng., 4, 679–688. alignments guide the interpretation of structures of near- 14. Muhammed,M.T. and Aki-Yalcin,E. (2019) Homology modeling in drug discovery: overview, current applications, and future identical sequence. perspectives. Chem. Biol. Drug Des., 93, 12–20. Like IMGT-DB, Thera-SAbDab can be queried by meta- 15. Berman,H.B., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., data, but uniquely it can also be queried by variable domain Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data sequence. This enables researchers to identify any therapeu- Bank. Nucleic Acids Res., 28, 235–242. 16. Dunbar,J., Krawczyk,K., Leem,J., Baker,T., Fuchs,A., Georges,G., tics proximal over any variable domain region to their query Shi,J. and Deane,C.M. (2014) SAbDab: the structural antibody sequence. database. Nucleic Acids Res., 42, D1140–D1146. Thera-SAbDab’s sequence database will be updated with 17. Dunbar,J. and Deane,C.M. (2016) ANARCI: antigen receptor new sequence information twice per year, in line with the numbering and receptor classification. Bioinformatics, 32, 298–300. release of new WHO Proposed INN lists. An updated list 18. Raybould,M.I.J., Marks,C., Krawczyk,K., Taddese,B., Nowak,J., Lewis,A.P., Bujotzek,A., Shi,J. and Deane,C.M. (2019) Five of all therapeutic variable domain sequences with metadata computational developability guidelines for therapeutic antibody is supplied as a single file to facilitate further analysis, for profiling. Proc. Natl. Acad. Sci. U.S.A., 116, 4025–4030. example into the properties of therapeutic antibody-antigen 19. Al-Lazikani,B., Lesk,A.M. and Chothia,C. (1997) Standard interfaces. conformations for the canonical structures of immunoglobulins. J. Mol. Biol., 273, 927–948. As shown for IL17A-binding therapeutics, new clinically- 20. Benschop,R.J., Chow,C.-K., Tian,Y., Nelson,J., Barmettler,B., relevant structures are continually being released. Accord- Atwell,S., Clawson,D., Chai,Q., Jones,B., Fitchett,J. et al. (2019) ingly, Thera-SAbDab checks SAbDab after each weekly up- Development of tibulizumab, a tetravalent bispecific antibody date for new matches, ensuring that this data is rapidly cap- targeting BAFF and IL-17A for the treatment of autoimmune tured. disease. mAbs, 11, 1175–1190. Downloaded from https://academic.oup.com/nar/article-abstract/48/D1/D383/5573951 by guest on 05 March 2020 D388 Nucleic Acids Research, 2020, Vol. 48, Database issue 21. Dunbar,J., Krawczyk,K., Leem,J., Marks,C., Nowak,J., Regep,C., 22. Leem,J., Dunbar,J., Georges,G., Shi,J. and Deane,C.M. (2016) Georges,G., Kelm,S., Popovic,B. and Deane,C.M. (2014) SAbPred: a ABodyBuilder: automated antibody structure prediction with structure-based antibody prediction server. Nucleic Acids Res., 44, data-driven accuracy estimation. mAbs, 8, 1259–1268. W474–W478.

Journal

Nucleic Acids ResearchOxford University Press

Published: Jan 8, 2020

There are no references for this article.