Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Monte Carlo Method Based QSAR Modeling of Coumarin Derivates as Potent HIV‐1 Integrase Inhibitors and Molecular Docking Studies of Selected 4‐phenyl Hydroxycoumarins

Monte Carlo Method Based QSAR Modeling of Coumarin Derivates as Potent HIV‐1 Integrase Inhibitors... ACTA FACULTATIS MEDICAE NAISSENSIS DOI: 10.2478/afmnai-2014-0011 UDC:547.587.51:616.98:578.828 Scientific Journal of the Faculty of Medicine in Nis 2014;31(2):95-103 Original article Jovana Veselinovi1, Aleksandar Veselinovi2, Andrey Toropov3, Alla Toropova3, Ivana Damnjanovi1, Goran Nikoli2 University of Nis, Faculty of Medicine, Department of Pharmacy, Serbia Univerity of Nis, Faculty of Medicine, Department of Chemistry, Serbia 3 IRCCS-Instituto di Ricerche Farmacologiche Mario Negri, Milano, Italy SUMMARY In search for new and promising coumarin compounds as HIV-1 integrase inhibitors, chemoinformatic methods like quantitative structure-activity relationships (QSAR) modeling and molecular docking have an important role since they can predict desired activity and propose molecule binding to enzyme. The aim of this study was building of QSAR models for coumarin derivatives as HIV-1 integrase inhibitors with the application of Monte Carlo method. SMILES notation was used to represent the molecular structure and for defining optimal SMILES-based descriptors. Molecular docking into rigid enzyme active site with flexible molecule was performed. Computational results indicated that this approach can satisfactorily predict the desired activity with very good statistical significance. For best built model statistical parameters were: a) 3' Processing activity: R2=0.9980 and Q2=0.9977 for training set and R2=0.9788 for test set and b) Integration activity: R2=0.9999 and Q2=0.9998 for training set and R2= 0.9213 for test set. Built QSAR models were applied to selected 4-phenyl hydroxycoumarins for calculating desired activity and for HIV-1 integrase inhibition estimation. Additionally, molecular docking study was performed to a newly identified pocket in the HIV-1 integrase enzyme structure for determination of selected 4-phenyl hydroxycoumarins binding mode. Monte Carlo method proved to be an efficient approach to build up a robust model for estimating HIV-1 integrase inhibition of coumarin compounds. Based on QSAR and molecular docking studies, 4-phenyl hydroxycoumarins can be considered as promising model compounds for developing new HIV-1 integrase inhibitors. Key words: coumarins, HIV-1 integrase inhibition, QSAR, molecular docking Corresponding author: Jovana Veselinovi · phone: +381 18 4570029· e-mail: milosavljevic.jovana@hotmail.com · 95 INTRODUCTION Acquired immunodeficiency syndrome (AIDS), reported in 1981 (1), is a fatal disorder resulting from a chronic persistent infection by the human retrovirus, human immunodeficiency virus (HIV) (2). Today, AIDS is considered as one of the most devastating diseases faced by mankind, with an estimation of 34 million people living with HIV worldwide at the end of 2010 according to Joint United Nations Programme on HIV/AIDS (3). Up to now, successful chemotherapy has not been developed. Currently, Reverse Transcriptase (RT) and Protease (PT) inhibitors are the main targets for the majority of available drugs for HIV treatment. However, toxicity and rapid development of resistance to these inhibitors are the main issues related to the current therapy (4). Therefore, the development of new anti-HIV agents with varied structure and mechanisms of action is of great importance. HIV-1 integraze (HIV-1IN) is a very attractive and unexplored target for developing of new anti-HIV drugs as it plays a vital role in replication cycle and it has no cellular counterpart (5-7). Various compounds exhibit HIV-1IN inhibitory activity, including lignanolides (8), curcumins (9), aurintricarboxylic acids (10), dicaffeoyl quinic acids and analogues (11, 12), diaryl sulfones (13). Unfortunately, all of stated inhibitors have the 1,2-dihydroxy (catechol) moiety, separated by an appropriate linker, so all of them have significant cytotoxicity because of catechol moiety autoxidation to reactive quinone species (14, 15). To overcame this problem, a series of coumarin derivatives which do not contain catechol functionality but possess good HIV-1IN inhibition activity was synthesized (16). The importance of quantitative structure-reactivity relationship (QSAR) studies in modern drug design is well established since QSAR can make the early prediction of activity-related characteristics of drug candidates and can eliminate molecules with undesired properties (17). The main goal of QSAR approach is to correlate the biological activity of a series of compounds with the calculated molecular properties in terms of descriptors (18). Thousands of molecular descriptors are used in QSAR studies for the purpose of encoding molecules chemical and structural features (19, 20) with great importance of topological descriptors calculated on the basis of molecular graphs (21). The simplified molecular input line entry system (SMILES) is an alternative to molecular graphs and it can be used for representation of molecular structures (22). Recent papers have reported the applicability of SMILES based descriptors in QSAR analysis with models built on the basis of Monte Carlo method (23-27). Several QSAR studies dealing with coumarin compounds as HIV-1IN inhibitors are reported (28-31). The aim of this research is to build QSAR modes for coumarin derivates as HIV-1IN inhibitors with SMILES based optimal descriptors and application of Monte Carlo method by using CORAL software. Built QSAR models were applied to selected 4-phenyl hydroxycouma96 rins with good antioxidant properties (32) but with no literature data about their HIV-1IN inhibition activity. Further, docking study is performed to a newly identified pocket right behind catalytic core domain (CCD) helix 4 (33) in the HIV-1IN enzyme for determinating the possible binding mode of selected 4-phenyl hydroxycoumarins. METHOD Data. A dataset of 26 coumarin derivatives with determined HIV-1 integrase inhibition activity was selected for QSAR study (16). Figure 1 presents general structures of used coumarin compounds for QSAR modeling. As an endpoint for QSAR model building pIC50 for enzyme 3' processing and integration was used. Figure 1. General molecular structures of used molecules Canonical SMILES for all compounds were generated with the ACD/ChemSketch program (ACD/Chem Sketch v.11.0) in order to preserve consistency because different software may generate different SMILES notations. One random split into the training and test set was examined (20% of molecules are taken as test compounds). The role of the training set is in developing of the model. The role of test set is selection of preferable values for the number of epoch of the Monte Carlo optimization and the threshold value. Optimal descriptors. SMILES is a representation of the molecular structure by sequence of symbols. Some symbols represent molecular fragments, such as atoms or bonds (e.g. 'C', 'N','=', '#', etc.). Some of these fragments are represented by two symbols (e.g. 'Br', 'Cl', '@@', etc.) which cannot be separated. Optimal SMILES-based descriptors, determined by descriptor correlation weight (DCW(T,Nepoch)), were calculated with CORAL software (http://www.insilico.eu/coral) as: DCW(T,Nepoch)=CW(Sk)+CW(SSk)+CW(SSSk) (1) where Sk, SSk, and SSSk are one-, two-, and threecomponent SMILES attributes, respectively; the component of SMILES attribute is SMILES symbol previously defined (27). Two parameters in Eq. 1 should be defined for the Monte Carlo optimization: threshold (T) and the number of epochs (Nepoch). The classification of components of the representation of the molecular structure into two classes is done with the following criteria: rare and active which is defined with the T. The correlation weight of a rare component is fixed as zero, because this component brings noise to the model, so rare component is discarded from building up of the model and T is zero. The Nepoch is the number of epochs of the Monte Carlo optimization (one epoch is the cycle of modifications of all correlation weights involved in the model). The predictive potentials of the model are mathematical functions T and Nepoch in the Monte Carlo optimization. The searches for the most predictive combination of T and Nepoch were concluded from values 0-7 for T and 0-70 for Nepoch for all models, according to previously published methodology (23-27). Having numerical data on these correlation weights (CW) one can calculate DCW (T,Nepoch) for compounds of training and test set. Least squares method was used to calculate endpoint from theses data. Endpoint=C0+C1× DCW(T,Nepoch) (2) CORAL and difference (diff) between expr and calc are presented in Table 1. Statistical criteria of the predictability of the models are represented in Table 2 (37). Using Eq.2 for predicting pIC50 following equations were calculated from best Monte Carlo runs: 3' Processing: pIC50=1.8584 (±0.0036)+0.0187 (±0.0000215)×DCW(0,3) (3) Integration: pIC50=2.5636 (±0.0004)+0.0180 (±0.0000034)×DCW(0,3) (4) Molecular docking. 3D structures of the compounds for docking simulation were constructed using MarvinSketch 6.1.0, 2013, ChemAxon (http://www. chemaxon. com). Geometry optimization was carried out by employing MMFF94 molecular force field (34). To date, no full strength structure of HIV-1IN is available to elucidate the spatial arrangement of its three domains: N-terminal (NTD), catalytic core (CCD) and C-terminal (CTD). In the field of the development of allosterically targeted HIV1IN inhibitors a new advantageous approach for the discovery of compounds effective against HIV-1IN strand -transfer drug-resistant viral strains has been proposed recently (35). A new site in integrase, a valid region for the structure-based design of allosteric integrase inhibitors, has been identified using a structure-based design process (protein data bank code: 3NF7) (33). The compounds were docked into enzyme binding sites using the MolegroVirtual Docker (MVD) (36). The Molegro Virtual Docker (MVD v. 2013.6.0.1.) software was employed for docking ligands to the rigid enzyme model for identification of hydrogen bonds and hydrophobic interactions between residues at the active site. The binding site was computed with a grid resolution of 0.3 Å. The MolDock SE as a search algorithm was used and the number of runs was set to 100. The parameters of docking procedure were: population size 50, maximum number of iterations 1500, energy threshold 100.00 and maximum number of steps 300. The number of generated poses was 10. The estimation of ligand­receptor interactions was described by the MVD-related scoring functions: MolDock Score, Rerank Score, Hbond Score, Similarity Score, and Docking Score. The ligand was docked into computed cavity instead ligand from 3NF7 using the MolDock Optimizer algorithm and its interactions were monitored using detailed energy estimates. A maximum population of 100 and maximum iterations of 10.000 were used for each run and the 5 best poses were retained. Built QSAR models were applied for predicting pIC50 values of selected 4-phenyl hydroxycoumarins (7hydroxy-4-phenyl coumarin (7C), 5,7-dihydroxy-4-phenyl coumarin (5,7C) and 7,8-dihydroxy-4-phenyl coumarin (7, 8C)) with good antioxidant properties (32) but with no literature data about their HIV-1IN inhibition activity. Eq. 3 and Eq. 4 were applied for calculation of pIC50 for enzyme 3' Processing and Integration for selected 4phenyl hydroxycoumarins. Calculated values and molecular structures of used coumarin derivates are presented in Table 3. Monte Carlo method can be used for classification of molecular features (SAk) calculated with SMILES notation based descriptors. The list of the SAk together with correlation weights for the three probes of the Monte Carlo optimization for all enzyme activities is given in the Table 4. In order to gain insight into the plausible mechanism for 3' Processing and Integration actions docking simulations were performed for 7C; 5,7C and 7,8C. Figure 2 presents the best docking poses for all investigated coumarins inside enzyme binging pocket. Two dimensional representation of the best docking poses for all investigated coumarins inside enzyme binding pocket are shown in Figure 3 (38). RESULTS The chemical structures represented with SMILES notation, the experimental activity for 3' Processing and Integration (expr) data, the calculated data (calc) with Table 1. Structures of 26 examined coumarin derivates as a HIV-1 integrase inhibitors represented with SMILES notations, calculated values for DCW, the experimental activity data (pIC50) - expr (16), calculated values for pIC50 with application of CORAL - calc and difference (diff) between expr and calc 3' PROCESSING SMILES NOTATION 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 OC=1c5ccccc5OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4ccccc4 Oc1ccc(cc1)C(C2=C(O)c3ccccc3OC2=O)C4=C(O)c5ccccc5OC4=O Oc1ccc(cc1OC)C(C2=C(O)c3ccccc3OC2=O)C4=C(O)c5ccccc5OC4=O CN(C)c1ccc(cc1)C(C2=C(O)c3ccccc3OC2=O)C4=C(O)c5ccccc5OC4=O [O-][N+](=O)c1ccc(cc1)C(C2=C(O)c3ccccc3OC2=O)C4=C(O)c5ccccc5OC4=O O=C(O)c1ccc(cc1)C(C2=C(O)c3ccccc3OC2=O)C4=C(O)c5ccccc5OC4=O OC=4c5ccccc5OC(=O)C=4C(C=1C(=O)Oc2ccccc2C=1O)c3cccs3 OC=1c5ccccc5OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4ccco4 OC=1c6ccccc6OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4cc5ccccc5nc4 OC=1c6ccccc6OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4cc5ccccc5cc4 OC=1c6ccccc6OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4ccc(cc4)c5ccccc5 OC=1c7ccccc7OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4ccc5c6ccccc6Cc5c4 OC=1c6ccccc6OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c5ccc(/C=C/c4ccccc4)cc5 OC=1c6ccccc6OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c5ccc(OCc4ccccc4)cc5 OC=1c7ccccc7OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c6cc(OCc4ccccc4)cc(OCc5ccccc5)c6 OC=1c9ccccc9OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4ccc(cc4)C(C5=C(O)c6ccccc6OC5=O) C7=C(O)c8ccccc8OC7=O Oc3ccc4C(O)=C(CC1=C(O)c2ccc(O)cc2OC1=O)C(=O)Oc4c3 Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c5ccccc5 Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c5ccc(cc5)C(C6=C(O)c7ccc (O)cc7OC6=O)C8=C(O)c9ccc(O)cc9OC8=O Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c5ccccn5 Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c5cccnc5 Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c5cc6ccccc6cc5 Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c6ccc(/C=C/c5ccccc5)cc6 Oc5ccc6C(O)=C(C2c4ccccc4OC=1c3ccc(O)cc3OC(=O)C=12)C(=O)Oc6c5 Oc6ccc7C(O)=C(C2C=5C(=O)Oc1ccccc1C=5OC4=C2C(=O)Oc3cc(O)ccc34)C(=O)Oc7c6 Oc1ccc2c(c1)OC(=O)C=C2O 98 Set Train Test Train Train Train Train Train Test Train Train Train Train Train Test Train Train Train Train Train Train Train Test Train Train Test Train DCW 133.901 112.152 101.154 116.902 130.123 131.395 114.362 129.901 142.612 155.618 160.847 167.13 180.856 154.346 172.077 204.771 131.927 150.123 244.311 122.112 123.609 161.091 177.588 109.849 124.82 88.717 Expr 4.367 3.893 3.752 4.055 4.301 4.319 4 4.468 4.538 4.721 4.854 5 5.26 5.071 5.097 5.699 4.334 4.764 6.432 4.076 4.208 5.377 5.155 3.917 4.447 3.523 Calc 4.362 3.956 3.75 4.044 4.292 4.315 3.997 4.288 4.525 4.768 4.866 4.984 5.24 4.745 5.076 5.688 4.325 4.666 6.427 4.142 4.17 4.871 5.179 3.913 4.193 3.517 Diff 0.005 -0.063 0.002 0.011 0.009 0.004 0.003 0.18 0.013 -0.047 -0.012 0.016 0.02 0.326 0.021 0.011 0.009 0.098 0.005 -0.066 0.038 0.506 -0.024 0.004 0.254 0.006 DCW 130.045 117.992 114.24 123.728 156.225 124.463 99.275 126.551 152.307 163.071 154.801 165.807 182.81 157.301 183.03 217.421 126.441 141.188 237.543 124.932 139.441 171.71 199.447 103.785 140.916 81.441 INTEGRATION Expr 4.411 4.131 4.119 4.301 4.921 4.31 3.83 3.951 4.854 5.046 4.886 5.108 5.432 4.854 5.432 6.097 4.348 4.654 6.481 4.31 4.602 5.45 5.745 3.914 4.648 3.488 Calc 4.423 4.191 4.119 4.301 4.925 4.315 3.832 4.355 4.85 5.057 4.898 5.109 5.436 4.946 5.44 6.1 4.353 4.637 6.487 4.324 4.603 5.223 5.755 3.918 4.631 3.489 Diff -0.012 -0.06 0 0 -0.004 -0.005 -0.002 -0.404 0.004 -0.011 -0.012 -0.001 -0.004 -0.092 -0.008 -0.003 -0.005 0.017 -0.006 -0.014 -0.001 0.227 -0.01 -0.004 0.017 -0.001 Table 2. Statistical quality of built QSAR models 3' PROCESSING Training R2 1 2 3 Av 0.9993 0.9977 0.9980 0.9983 Q2 0.9992 0.9974 0.9977 0.9981 s 0.020 0.033 0.032 0.028 R2 0.9671 0.9368 0.9788 0.9609 Test rm(av)2 0.8083 0.6138 0.6213 0.6811 rm2 0.0577 0.1677 0.1467 0.1240 s 0.305 0.265 0.341 0.304 R2 0.9999 0.9999 0.9999 0.9999 Training Q2 0.9998 0.9998 0.9998 0.9998 s 0.010 0.005 0.008 0.008 R2 0.9185 0.9213 0.9186 0.9195 INTEGRATION Test rm(av)2 0.5230 0.5678 0.5798 0.5569 rm2 0.1825 0.2041 0.1965 0.1945 s 0.248 0.268 0.239 0.252 Av is average value from three independant Monte Carlo runs (1, 2 and 3) R2 is correlation coefficient Q2 is cross-validated correlation coefficient s is standard error of estimation rm(av)2 should be > 0.5 (37) rm2 should be < 0.2 (37) Table 3. Molecular structures of used coumarin derivates with calculated pIC50 values for enzyme 3' Processing and Integration activities using Eq. 3 and 4. Molecule 7C 5,7C 7,8C R1 H OH H R2 OH OH OH R3 H H OH pIC50 (3'Processing) 3.329 3.138 3.259 pIC50 (Integration) 3.640 3.451 3.402 Table 4. The list of the SAk with correlation weights for three independent Monte Carlo optimization runs for best QSAR model 3' PROCESSING SAk n........... Decrease s...c....... N...+....... C...(...C... N...(...C... O...(...C... Increase c........... o........... C...=....... C...C....... Run 1 -3.997 -1.002 -0.745 -0.502 -0.496 0.252 0.504 0.997 4.254 4.5 Run 2 -3.248 -1.502 -1.252 -1.249 -1.246 1.003 0.746 0.996 4.5 3.496 Run 3 -5.004 -4.005 -0.004 -0.999 -2.251 1.25 0.998 0.997 2.748 3.497 SAk n...c...c... n........... O...C...=... O...=...(... O...=....... c...O....... O...C....... C.../....... c...C....... C...(....... INTEGRATION Run 1 -3.999 -2.998 -1.253 -1 -0.496 0.245 0.502 0.997 1.001 1.505 Run 2 -3.002 -2.248 -0.996 -0.998 -0.997 0.753 0.996 0.504 1.501 0.747 Run 3 -2.497 -2.247 -1.245 -0.998 -1.005 1.001 0.999 0.004 1 0.998 Figure 2. Surface diagram showing docked selected 4-phenyl hydroxycoumarins Figure 3. Two dimensional representations of the best docking pose for a) 7-hydroxy-4-phenyl coumarin, b) 5,7-dihydroxy-4-phenyl coumarin and c) 7,8-dihidroxy-4-phenyl coumarin inside binding pocket DISCUSSION QSAR study Results from Table 2 show that the predictability for all models is good. Also, the results are satisfactory from the point of view of new criteria (37). The correlation weights for molecular features calculated with SMILES can be used for classification of the aforementioned features according to their values from three probes for defined Monte Carlo model. They could be divided into three categories: features with stable positive values of correlation weights (promoters of increase of an endpoint); features with stable negative values of correlation weights (promoters of decrease of an endpoint); and unstable features which have positive values of correlation weights together with negative correlation weights values for several models (26, 27). For example, if the correlation weight of Sk CW(Sk) is >0 in all three runs of the optimization, then the Sk is promoter of Ac increase. However, if CW(Sk) is <0 in all three runs of the optimization, then the Sk is promoter of Ac decrease. In the end, if there are both CW(Sk) >0 and CW(Sk) <0, or Sk is blocked in three runs of optimization then Sk has an undefined role. Same rule is applied for all SAk. It must be noted that SAk have mechanistic interpretation and according to presented results from Table 4 SAk can be classified as following. For example, for 3' Processing `n...........', `N...+.......' and `C...(...C...' are promoters of decrease while `C...C.......', `C...=.......' and `o...........' are promoters of increase. `n...........' can be interpreted as aromatic nitrogen atom, `N...+ .......' as sp3 nitrogen atom with positive charge and `C...(...C...' as sp3 carbon atoms with branching. `C... C.......' can be interpreted as two sp3 carbon atoms without branching, `C...=.......' as sp2 carbon atom since `=' is a symbol for double bond, `o...........' as aromatic oxygen atom. mensional representations of the best docking pose for selected coumarins (Figure 3) give a more detailed insight into the interactions with particular amino acids in enzyme binding pocket. Based on the presented results, it can be concluded that hydrophobic interactions between investigated coumarins and binding pocket play an important role. However, number, bond length and bond energy of hydrogen bonds formed between ligand and enzyme has an important role in ligand effect on investigated activity. It was observed from in silico studies of compounds binding to 3NF7 that 7C oxygen from carbonyl group forms hydrogen bond with Asn-144 (bond length 4.67 Å). 5,7C does not form any hydrogen bonds with enzyme. Compound 7,8C hydroxyl group in position 8 forms three hydrogen bonds, two with Ser-147 (2.19 Å and 4.59 Å) and one with Asn-144 (4.23 Å). Oxygen from carbonyl group forms one with Asn-144 (4.69 Å) and sp3 oxygen one with Tyr-143 (3.55 Å). CONCLUSION QSAR models for coumarin compounds as potent HIV-1 integrase inhibitors were built. Monte Carlo method proved to be an efficient tool to build up a robust model for estimating HIV-1 integrase inhibition. For suggested modeling process optimal descriptors were based on SMILES notation. The predictive potential of the applied approach was tested with one split into the training and test set. The robustness of model was confirmed with different methods. The SMILES attributes which are promoters of increase/decrease of HIV-1 integrase inhibition were identified. Built QSAR models were applied to selected 4-phenyl coumarins for inhibition prediction. Further, the correlation between calculated inhibitory activity and the in silico molecular docking scores of these compounds was obtained through hydrogen bonding interactions. Our results suggest that 4-phenyl hydroxycoumarins may be considered as good molecular templates for potential HIV-1 integrase inhibitors. Acknowledgment Molecular docking The results of molecular docking studies are presented in Figure 2 and 3. On the surface diagram (Figure 2) it can be seen that hydrophobic parts of the molecules are oriented toward the hydrophobic parts of the enzyme binding pocket (red colored surface). Two di- This work has been financially supported by Ministry of Education and Science, Republic of Serbia, under Project Numbers OI 172044 and TR 31060. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Acta Facultatis Medicae Naissensis de Gruyter

Monte Carlo Method Based QSAR Modeling of Coumarin Derivates as Potent HIV‐1 Integrase Inhibitors and Molecular Docking Studies of Selected 4‐phenyl Hydroxycoumarins

Loading next page...
 
/lp/de-gruyter/monte-carlo-method-based-qsar-modeling-of-coumarin-derivates-as-potent-z9T5011nKn
Publisher
de Gruyter
Copyright
Copyright © 2014 by the
ISSN
2217-2521
eISSN
2217-2521
DOI
10.2478/afmnai-2014-0011
Publisher site
See Article on Publisher Site

Abstract

ACTA FACULTATIS MEDICAE NAISSENSIS DOI: 10.2478/afmnai-2014-0011 UDC:547.587.51:616.98:578.828 Scientific Journal of the Faculty of Medicine in Nis 2014;31(2):95-103 Original article Jovana Veselinovi1, Aleksandar Veselinovi2, Andrey Toropov3, Alla Toropova3, Ivana Damnjanovi1, Goran Nikoli2 University of Nis, Faculty of Medicine, Department of Pharmacy, Serbia Univerity of Nis, Faculty of Medicine, Department of Chemistry, Serbia 3 IRCCS-Instituto di Ricerche Farmacologiche Mario Negri, Milano, Italy SUMMARY In search for new and promising coumarin compounds as HIV-1 integrase inhibitors, chemoinformatic methods like quantitative structure-activity relationships (QSAR) modeling and molecular docking have an important role since they can predict desired activity and propose molecule binding to enzyme. The aim of this study was building of QSAR models for coumarin derivatives as HIV-1 integrase inhibitors with the application of Monte Carlo method. SMILES notation was used to represent the molecular structure and for defining optimal SMILES-based descriptors. Molecular docking into rigid enzyme active site with flexible molecule was performed. Computational results indicated that this approach can satisfactorily predict the desired activity with very good statistical significance. For best built model statistical parameters were: a) 3' Processing activity: R2=0.9980 and Q2=0.9977 for training set and R2=0.9788 for test set and b) Integration activity: R2=0.9999 and Q2=0.9998 for training set and R2= 0.9213 for test set. Built QSAR models were applied to selected 4-phenyl hydroxycoumarins for calculating desired activity and for HIV-1 integrase inhibition estimation. Additionally, molecular docking study was performed to a newly identified pocket in the HIV-1 integrase enzyme structure for determination of selected 4-phenyl hydroxycoumarins binding mode. Monte Carlo method proved to be an efficient approach to build up a robust model for estimating HIV-1 integrase inhibition of coumarin compounds. Based on QSAR and molecular docking studies, 4-phenyl hydroxycoumarins can be considered as promising model compounds for developing new HIV-1 integrase inhibitors. Key words: coumarins, HIV-1 integrase inhibition, QSAR, molecular docking Corresponding author: Jovana Veselinovi · phone: +381 18 4570029· e-mail: milosavljevic.jovana@hotmail.com · 95 INTRODUCTION Acquired immunodeficiency syndrome (AIDS), reported in 1981 (1), is a fatal disorder resulting from a chronic persistent infection by the human retrovirus, human immunodeficiency virus (HIV) (2). Today, AIDS is considered as one of the most devastating diseases faced by mankind, with an estimation of 34 million people living with HIV worldwide at the end of 2010 according to Joint United Nations Programme on HIV/AIDS (3). Up to now, successful chemotherapy has not been developed. Currently, Reverse Transcriptase (RT) and Protease (PT) inhibitors are the main targets for the majority of available drugs for HIV treatment. However, toxicity and rapid development of resistance to these inhibitors are the main issues related to the current therapy (4). Therefore, the development of new anti-HIV agents with varied structure and mechanisms of action is of great importance. HIV-1 integraze (HIV-1IN) is a very attractive and unexplored target for developing of new anti-HIV drugs as it plays a vital role in replication cycle and it has no cellular counterpart (5-7). Various compounds exhibit HIV-1IN inhibitory activity, including lignanolides (8), curcumins (9), aurintricarboxylic acids (10), dicaffeoyl quinic acids and analogues (11, 12), diaryl sulfones (13). Unfortunately, all of stated inhibitors have the 1,2-dihydroxy (catechol) moiety, separated by an appropriate linker, so all of them have significant cytotoxicity because of catechol moiety autoxidation to reactive quinone species (14, 15). To overcame this problem, a series of coumarin derivatives which do not contain catechol functionality but possess good HIV-1IN inhibition activity was synthesized (16). The importance of quantitative structure-reactivity relationship (QSAR) studies in modern drug design is well established since QSAR can make the early prediction of activity-related characteristics of drug candidates and can eliminate molecules with undesired properties (17). The main goal of QSAR approach is to correlate the biological activity of a series of compounds with the calculated molecular properties in terms of descriptors (18). Thousands of molecular descriptors are used in QSAR studies for the purpose of encoding molecules chemical and structural features (19, 20) with great importance of topological descriptors calculated on the basis of molecular graphs (21). The simplified molecular input line entry system (SMILES) is an alternative to molecular graphs and it can be used for representation of molecular structures (22). Recent papers have reported the applicability of SMILES based descriptors in QSAR analysis with models built on the basis of Monte Carlo method (23-27). Several QSAR studies dealing with coumarin compounds as HIV-1IN inhibitors are reported (28-31). The aim of this research is to build QSAR modes for coumarin derivates as HIV-1IN inhibitors with SMILES based optimal descriptors and application of Monte Carlo method by using CORAL software. Built QSAR models were applied to selected 4-phenyl hydroxycouma96 rins with good antioxidant properties (32) but with no literature data about their HIV-1IN inhibition activity. Further, docking study is performed to a newly identified pocket right behind catalytic core domain (CCD) helix 4 (33) in the HIV-1IN enzyme for determinating the possible binding mode of selected 4-phenyl hydroxycoumarins. METHOD Data. A dataset of 26 coumarin derivatives with determined HIV-1 integrase inhibition activity was selected for QSAR study (16). Figure 1 presents general structures of used coumarin compounds for QSAR modeling. As an endpoint for QSAR model building pIC50 for enzyme 3' processing and integration was used. Figure 1. General molecular structures of used molecules Canonical SMILES for all compounds were generated with the ACD/ChemSketch program (ACD/Chem Sketch v.11.0) in order to preserve consistency because different software may generate different SMILES notations. One random split into the training and test set was examined (20% of molecules are taken as test compounds). The role of the training set is in developing of the model. The role of test set is selection of preferable values for the number of epoch of the Monte Carlo optimization and the threshold value. Optimal descriptors. SMILES is a representation of the molecular structure by sequence of symbols. Some symbols represent molecular fragments, such as atoms or bonds (e.g. 'C', 'N','=', '#', etc.). Some of these fragments are represented by two symbols (e.g. 'Br', 'Cl', '@@', etc.) which cannot be separated. Optimal SMILES-based descriptors, determined by descriptor correlation weight (DCW(T,Nepoch)), were calculated with CORAL software (http://www.insilico.eu/coral) as: DCW(T,Nepoch)=CW(Sk)+CW(SSk)+CW(SSSk) (1) where Sk, SSk, and SSSk are one-, two-, and threecomponent SMILES attributes, respectively; the component of SMILES attribute is SMILES symbol previously defined (27). Two parameters in Eq. 1 should be defined for the Monte Carlo optimization: threshold (T) and the number of epochs (Nepoch). The classification of components of the representation of the molecular structure into two classes is done with the following criteria: rare and active which is defined with the T. The correlation weight of a rare component is fixed as zero, because this component brings noise to the model, so rare component is discarded from building up of the model and T is zero. The Nepoch is the number of epochs of the Monte Carlo optimization (one epoch is the cycle of modifications of all correlation weights involved in the model). The predictive potentials of the model are mathematical functions T and Nepoch in the Monte Carlo optimization. The searches for the most predictive combination of T and Nepoch were concluded from values 0-7 for T and 0-70 for Nepoch for all models, according to previously published methodology (23-27). Having numerical data on these correlation weights (CW) one can calculate DCW (T,Nepoch) for compounds of training and test set. Least squares method was used to calculate endpoint from theses data. Endpoint=C0+C1× DCW(T,Nepoch) (2) CORAL and difference (diff) between expr and calc are presented in Table 1. Statistical criteria of the predictability of the models are represented in Table 2 (37). Using Eq.2 for predicting pIC50 following equations were calculated from best Monte Carlo runs: 3' Processing: pIC50=1.8584 (±0.0036)+0.0187 (±0.0000215)×DCW(0,3) (3) Integration: pIC50=2.5636 (±0.0004)+0.0180 (±0.0000034)×DCW(0,3) (4) Molecular docking. 3D structures of the compounds for docking simulation were constructed using MarvinSketch 6.1.0, 2013, ChemAxon (http://www. chemaxon. com). Geometry optimization was carried out by employing MMFF94 molecular force field (34). To date, no full strength structure of HIV-1IN is available to elucidate the spatial arrangement of its three domains: N-terminal (NTD), catalytic core (CCD) and C-terminal (CTD). In the field of the development of allosterically targeted HIV1IN inhibitors a new advantageous approach for the discovery of compounds effective against HIV-1IN strand -transfer drug-resistant viral strains has been proposed recently (35). A new site in integrase, a valid region for the structure-based design of allosteric integrase inhibitors, has been identified using a structure-based design process (protein data bank code: 3NF7) (33). The compounds were docked into enzyme binding sites using the MolegroVirtual Docker (MVD) (36). The Molegro Virtual Docker (MVD v. 2013.6.0.1.) software was employed for docking ligands to the rigid enzyme model for identification of hydrogen bonds and hydrophobic interactions between residues at the active site. The binding site was computed with a grid resolution of 0.3 Å. The MolDock SE as a search algorithm was used and the number of runs was set to 100. The parameters of docking procedure were: population size 50, maximum number of iterations 1500, energy threshold 100.00 and maximum number of steps 300. The number of generated poses was 10. The estimation of ligand­receptor interactions was described by the MVD-related scoring functions: MolDock Score, Rerank Score, Hbond Score, Similarity Score, and Docking Score. The ligand was docked into computed cavity instead ligand from 3NF7 using the MolDock Optimizer algorithm and its interactions were monitored using detailed energy estimates. A maximum population of 100 and maximum iterations of 10.000 were used for each run and the 5 best poses were retained. Built QSAR models were applied for predicting pIC50 values of selected 4-phenyl hydroxycoumarins (7hydroxy-4-phenyl coumarin (7C), 5,7-dihydroxy-4-phenyl coumarin (5,7C) and 7,8-dihydroxy-4-phenyl coumarin (7, 8C)) with good antioxidant properties (32) but with no literature data about their HIV-1IN inhibition activity. Eq. 3 and Eq. 4 were applied for calculation of pIC50 for enzyme 3' Processing and Integration for selected 4phenyl hydroxycoumarins. Calculated values and molecular structures of used coumarin derivates are presented in Table 3. Monte Carlo method can be used for classification of molecular features (SAk) calculated with SMILES notation based descriptors. The list of the SAk together with correlation weights for the three probes of the Monte Carlo optimization for all enzyme activities is given in the Table 4. In order to gain insight into the plausible mechanism for 3' Processing and Integration actions docking simulations were performed for 7C; 5,7C and 7,8C. Figure 2 presents the best docking poses for all investigated coumarins inside enzyme binging pocket. Two dimensional representation of the best docking poses for all investigated coumarins inside enzyme binding pocket are shown in Figure 3 (38). RESULTS The chemical structures represented with SMILES notation, the experimental activity for 3' Processing and Integration (expr) data, the calculated data (calc) with Table 1. Structures of 26 examined coumarin derivates as a HIV-1 integrase inhibitors represented with SMILES notations, calculated values for DCW, the experimental activity data (pIC50) - expr (16), calculated values for pIC50 with application of CORAL - calc and difference (diff) between expr and calc 3' PROCESSING SMILES NOTATION 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 OC=1c5ccccc5OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4ccccc4 Oc1ccc(cc1)C(C2=C(O)c3ccccc3OC2=O)C4=C(O)c5ccccc5OC4=O Oc1ccc(cc1OC)C(C2=C(O)c3ccccc3OC2=O)C4=C(O)c5ccccc5OC4=O CN(C)c1ccc(cc1)C(C2=C(O)c3ccccc3OC2=O)C4=C(O)c5ccccc5OC4=O [O-][N+](=O)c1ccc(cc1)C(C2=C(O)c3ccccc3OC2=O)C4=C(O)c5ccccc5OC4=O O=C(O)c1ccc(cc1)C(C2=C(O)c3ccccc3OC2=O)C4=C(O)c5ccccc5OC4=O OC=4c5ccccc5OC(=O)C=4C(C=1C(=O)Oc2ccccc2C=1O)c3cccs3 OC=1c5ccccc5OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4ccco4 OC=1c6ccccc6OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4cc5ccccc5nc4 OC=1c6ccccc6OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4cc5ccccc5cc4 OC=1c6ccccc6OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4ccc(cc4)c5ccccc5 OC=1c7ccccc7OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4ccc5c6ccccc6Cc5c4 OC=1c6ccccc6OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c5ccc(/C=C/c4ccccc4)cc5 OC=1c6ccccc6OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c5ccc(OCc4ccccc4)cc5 OC=1c7ccccc7OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c6cc(OCc4ccccc4)cc(OCc5ccccc5)c6 OC=1c9ccccc9OC(=O)C=1C(C2=C(O)c3ccccc3OC2=O)c4ccc(cc4)C(C5=C(O)c6ccccc6OC5=O) C7=C(O)c8ccccc8OC7=O Oc3ccc4C(O)=C(CC1=C(O)c2ccc(O)cc2OC1=O)C(=O)Oc4c3 Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c5ccccc5 Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c5ccc(cc5)C(C6=C(O)c7ccc (O)cc7OC6=O)C8=C(O)c9ccc(O)cc9OC8=O Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c5ccccn5 Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c5cccnc5 Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c5cc6ccccc6cc5 Oc1ccc2C(O)=C(C(=O)Oc2c1)C(C3=C(O)c4ccc(O)cc4OC3=O)c6ccc(/C=C/c5ccccc5)cc6 Oc5ccc6C(O)=C(C2c4ccccc4OC=1c3ccc(O)cc3OC(=O)C=12)C(=O)Oc6c5 Oc6ccc7C(O)=C(C2C=5C(=O)Oc1ccccc1C=5OC4=C2C(=O)Oc3cc(O)ccc34)C(=O)Oc7c6 Oc1ccc2c(c1)OC(=O)C=C2O 98 Set Train Test Train Train Train Train Train Test Train Train Train Train Train Test Train Train Train Train Train Train Train Test Train Train Test Train DCW 133.901 112.152 101.154 116.902 130.123 131.395 114.362 129.901 142.612 155.618 160.847 167.13 180.856 154.346 172.077 204.771 131.927 150.123 244.311 122.112 123.609 161.091 177.588 109.849 124.82 88.717 Expr 4.367 3.893 3.752 4.055 4.301 4.319 4 4.468 4.538 4.721 4.854 5 5.26 5.071 5.097 5.699 4.334 4.764 6.432 4.076 4.208 5.377 5.155 3.917 4.447 3.523 Calc 4.362 3.956 3.75 4.044 4.292 4.315 3.997 4.288 4.525 4.768 4.866 4.984 5.24 4.745 5.076 5.688 4.325 4.666 6.427 4.142 4.17 4.871 5.179 3.913 4.193 3.517 Diff 0.005 -0.063 0.002 0.011 0.009 0.004 0.003 0.18 0.013 -0.047 -0.012 0.016 0.02 0.326 0.021 0.011 0.009 0.098 0.005 -0.066 0.038 0.506 -0.024 0.004 0.254 0.006 DCW 130.045 117.992 114.24 123.728 156.225 124.463 99.275 126.551 152.307 163.071 154.801 165.807 182.81 157.301 183.03 217.421 126.441 141.188 237.543 124.932 139.441 171.71 199.447 103.785 140.916 81.441 INTEGRATION Expr 4.411 4.131 4.119 4.301 4.921 4.31 3.83 3.951 4.854 5.046 4.886 5.108 5.432 4.854 5.432 6.097 4.348 4.654 6.481 4.31 4.602 5.45 5.745 3.914 4.648 3.488 Calc 4.423 4.191 4.119 4.301 4.925 4.315 3.832 4.355 4.85 5.057 4.898 5.109 5.436 4.946 5.44 6.1 4.353 4.637 6.487 4.324 4.603 5.223 5.755 3.918 4.631 3.489 Diff -0.012 -0.06 0 0 -0.004 -0.005 -0.002 -0.404 0.004 -0.011 -0.012 -0.001 -0.004 -0.092 -0.008 -0.003 -0.005 0.017 -0.006 -0.014 -0.001 0.227 -0.01 -0.004 0.017 -0.001 Table 2. Statistical quality of built QSAR models 3' PROCESSING Training R2 1 2 3 Av 0.9993 0.9977 0.9980 0.9983 Q2 0.9992 0.9974 0.9977 0.9981 s 0.020 0.033 0.032 0.028 R2 0.9671 0.9368 0.9788 0.9609 Test rm(av)2 0.8083 0.6138 0.6213 0.6811 rm2 0.0577 0.1677 0.1467 0.1240 s 0.305 0.265 0.341 0.304 R2 0.9999 0.9999 0.9999 0.9999 Training Q2 0.9998 0.9998 0.9998 0.9998 s 0.010 0.005 0.008 0.008 R2 0.9185 0.9213 0.9186 0.9195 INTEGRATION Test rm(av)2 0.5230 0.5678 0.5798 0.5569 rm2 0.1825 0.2041 0.1965 0.1945 s 0.248 0.268 0.239 0.252 Av is average value from three independant Monte Carlo runs (1, 2 and 3) R2 is correlation coefficient Q2 is cross-validated correlation coefficient s is standard error of estimation rm(av)2 should be > 0.5 (37) rm2 should be < 0.2 (37) Table 3. Molecular structures of used coumarin derivates with calculated pIC50 values for enzyme 3' Processing and Integration activities using Eq. 3 and 4. Molecule 7C 5,7C 7,8C R1 H OH H R2 OH OH OH R3 H H OH pIC50 (3'Processing) 3.329 3.138 3.259 pIC50 (Integration) 3.640 3.451 3.402 Table 4. The list of the SAk with correlation weights for three independent Monte Carlo optimization runs for best QSAR model 3' PROCESSING SAk n........... Decrease s...c....... N...+....... C...(...C... N...(...C... O...(...C... Increase c........... o........... C...=....... C...C....... Run 1 -3.997 -1.002 -0.745 -0.502 -0.496 0.252 0.504 0.997 4.254 4.5 Run 2 -3.248 -1.502 -1.252 -1.249 -1.246 1.003 0.746 0.996 4.5 3.496 Run 3 -5.004 -4.005 -0.004 -0.999 -2.251 1.25 0.998 0.997 2.748 3.497 SAk n...c...c... n........... O...C...=... O...=...(... O...=....... c...O....... O...C....... C.../....... c...C....... C...(....... INTEGRATION Run 1 -3.999 -2.998 -1.253 -1 -0.496 0.245 0.502 0.997 1.001 1.505 Run 2 -3.002 -2.248 -0.996 -0.998 -0.997 0.753 0.996 0.504 1.501 0.747 Run 3 -2.497 -2.247 -1.245 -0.998 -1.005 1.001 0.999 0.004 1 0.998 Figure 2. Surface diagram showing docked selected 4-phenyl hydroxycoumarins Figure 3. Two dimensional representations of the best docking pose for a) 7-hydroxy-4-phenyl coumarin, b) 5,7-dihydroxy-4-phenyl coumarin and c) 7,8-dihidroxy-4-phenyl coumarin inside binding pocket DISCUSSION QSAR study Results from Table 2 show that the predictability for all models is good. Also, the results are satisfactory from the point of view of new criteria (37). The correlation weights for molecular features calculated with SMILES can be used for classification of the aforementioned features according to their values from three probes for defined Monte Carlo model. They could be divided into three categories: features with stable positive values of correlation weights (promoters of increase of an endpoint); features with stable negative values of correlation weights (promoters of decrease of an endpoint); and unstable features which have positive values of correlation weights together with negative correlation weights values for several models (26, 27). For example, if the correlation weight of Sk CW(Sk) is >0 in all three runs of the optimization, then the Sk is promoter of Ac increase. However, if CW(Sk) is <0 in all three runs of the optimization, then the Sk is promoter of Ac decrease. In the end, if there are both CW(Sk) >0 and CW(Sk) <0, or Sk is blocked in three runs of optimization then Sk has an undefined role. Same rule is applied for all SAk. It must be noted that SAk have mechanistic interpretation and according to presented results from Table 4 SAk can be classified as following. For example, for 3' Processing `n...........', `N...+.......' and `C...(...C...' are promoters of decrease while `C...C.......', `C...=.......' and `o...........' are promoters of increase. `n...........' can be interpreted as aromatic nitrogen atom, `N...+ .......' as sp3 nitrogen atom with positive charge and `C...(...C...' as sp3 carbon atoms with branching. `C... C.......' can be interpreted as two sp3 carbon atoms without branching, `C...=.......' as sp2 carbon atom since `=' is a symbol for double bond, `o...........' as aromatic oxygen atom. mensional representations of the best docking pose for selected coumarins (Figure 3) give a more detailed insight into the interactions with particular amino acids in enzyme binding pocket. Based on the presented results, it can be concluded that hydrophobic interactions between investigated coumarins and binding pocket play an important role. However, number, bond length and bond energy of hydrogen bonds formed between ligand and enzyme has an important role in ligand effect on investigated activity. It was observed from in silico studies of compounds binding to 3NF7 that 7C oxygen from carbonyl group forms hydrogen bond with Asn-144 (bond length 4.67 Å). 5,7C does not form any hydrogen bonds with enzyme. Compound 7,8C hydroxyl group in position 8 forms three hydrogen bonds, two with Ser-147 (2.19 Å and 4.59 Å) and one with Asn-144 (4.23 Å). Oxygen from carbonyl group forms one with Asn-144 (4.69 Å) and sp3 oxygen one with Tyr-143 (3.55 Å). CONCLUSION QSAR models for coumarin compounds as potent HIV-1 integrase inhibitors were built. Monte Carlo method proved to be an efficient tool to build up a robust model for estimating HIV-1 integrase inhibition. For suggested modeling process optimal descriptors were based on SMILES notation. The predictive potential of the applied approach was tested with one split into the training and test set. The robustness of model was confirmed with different methods. The SMILES attributes which are promoters of increase/decrease of HIV-1 integrase inhibition were identified. Built QSAR models were applied to selected 4-phenyl coumarins for inhibition prediction. Further, the correlation between calculated inhibitory activity and the in silico molecular docking scores of these compounds was obtained through hydrogen bonding interactions. Our results suggest that 4-phenyl hydroxycoumarins may be considered as good molecular templates for potential HIV-1 integrase inhibitors. Acknowledgment Molecular docking The results of molecular docking studies are presented in Figure 2 and 3. On the surface diagram (Figure 2) it can be seen that hydrophobic parts of the molecules are oriented toward the hydrophobic parts of the enzyme binding pocket (red colored surface). Two di- This work has been financially supported by Ministry of Education and Science, Republic of Serbia, under Project Numbers OI 172044 and TR 31060.

Journal

Acta Facultatis Medicae Naissensisde Gruyter

Published: Jun 26, 2014

There are no references for this article.