Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Lipid Data Analyzer: unattended identification and quantitation of lipids in LC-MS data

Lipid Data Analyzer: unattended identification and quantitation of lipids in LC-MS data Vol. 27 no. 4 2011, pages 572–577 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btq699 Data and text mining Advance Access publication December 17, 2010 Lipid Data Analyzer: unattended identification and quantitation of lipids in LC-MS data 1,2 3 2 2 Jürgen Hartler , Martin Trötzmüller , Chandramohan Chitraju , Friedrich Spener , 3 1,4,∗ Harald C. Köfeler and Gerhard G. Thallinger 1 2 Institute for Genomics and Bioinformatics, Graz University of Technology, Institute of Molecular Biosciences, University of Graz, Core Facility for Mass Spectrometry, Center for Medical Research, Medical University of Graz and Core Facility Bioinformatics, Austrian Center for Industrial Biotechnology (ACIB GmbH), Graz, Austria Associate Editor: Martin Bishop ABSTRACT 1 INTRODUCTION Motivation: The accurate measurement of the lipidome permits Advances in lipidomics technologies utilizing mass spectrometry insights into physiological and pathological processes. Of the present have led to a rapid increase in the number, size and rate high-throughput technologies, LC-MS especially bears potential of at which datasets are generated. LC-MS especially bears the monitoring quantitative changes in hundreds of lipids simultaneously. potential for monitoring quantitative changes in hundreds of lipids In order to extract valuable information from huge amount of mass simultaneously. In order to deal with the acquired data automated spectrometry data, the aid of automated, reliable, highly sensitive and reliable software tools are required. and specific analysis algorithms is indispensable. Currently available lipidomics MS-quantitation tools can be Results: We present here a novel approach for the quantitation classified in two groups, each extracting 2D subsets of MS data: one of lipids in LC-MS data. The new algorithm obtains its analytical group (Ejsing et al., 2006; Haimi et al., 2006, 2009; Leavell and power by two major innovations: (i) a 3D algorithm that confines Leary, 2006; Song et al., 2007) requires m/z profiles or m/z spectra; the peak borders in m/z and time direction and (ii) the use of the and the other group (Katajamaa et al., 2006; Pluskal et al., 2010) theoretical isotopic distribution of an analyte as selection/exclusion extracts chromatograms for quantitation. A representative of the first criterion. The algorithm is integrated in the Lipid Data Analyzer (LDA) group is LIMSA which can analyse m/z spectra only (Haimi et al., application which additionally provides standardization, a statistics 2006, 2009). It utilizes an additional program (SECD) to extract module for results analysis, a batch mode for unattended analysis the m/z spectra. The user has to define manually a trapezoid in of several runs and a 3D viewer for the manual verification. The the m/z-time map of the LC-MS data containing the lipid series statistics module offers sample grouping, tests between sample of interest (a collection of lipids having the same amount of C groups and export functionalities, where the results are visualized atoms). This extracted m/z spectra serve as input for the LIMSA by heat maps and bar charts. The presented algorithm has been algorithm. In contrast, mzMine2 (Katajamaa et al., 2006; Pluskal applied to data from a controlled experiment and to biological data, et al., 2010), a program from the second group, provides several containing analytes distributed over an intensity range of 10 . Our algorithms to automatically extract chromatograms for processing. approach shows improved sensitivity and an extremely high positive Both programs select the peaks for an analyte based on the exact m/z predictive value compared with existing methods. Consequently, the value. However, an LC-MS peak is a 3D object, whereupon each novel algorithm, integrated in a user-friendly application, is a valuable of its data points is characterized by retention time, m/z ratio and improvement in the high-throughput analysis of the lipidome. intensity. Consequently, all of the available tools lose information Implementation and availability: The Java application is freely from a whole dimension (in the first group by the m/z profile available for non-commercial users at http://genome.tugraz.at/lda. extraction and the other group by the chromatogram extraction), Raw data associated with this manuscript may be downloaded and thus decrease the ability to clearly distinguish between closely from ProteomeCommons.org Tranche using the following hash: overlapping peaks (Fig. 1). Furthermore, overlapping peaks occur ZBh3nS5bXk6I/Vn32tB5Vh0qnMpVIW71HByFFQqM0RmdF4/4Hcn frequently in lipidomic data due to the particular nature of lipids. H3Wggh9kU2teYVOtM1JWwHIeMHqSS/bc2yYNFmyUAAAAAAACl The chemical structures of lipids can differ just by one double bond, DQ== resulting in a mass difference of 2 Da between the analytes. Thus, Contact: Gerhard.Thallinger@tugraz.at the base peak (+0 isotope) of the analyte has almost the same mass Supplementary information: Supplementary data are available as the second isotopic peak (+2 isotope) of the analyte with one from Bioinformatics online. additional double bond. Furthermore, elution times are usually very close due to the physicochemical similarity of the analytes. Received on October 1, 2010; revised on December 11, 2010; In this article, we present a novel algorithm that addresses the accepted on December 13, 2010 difficulties specific to lipidomics MS-data quantitation by two novel approaches: (i) exact peak border confinement and (ii) accounting for the theoretical isotopic distribution for peak selection. The novel border confinement increases the ability to detect overlaps To whom correspondence should be addressed. and reduces their effects. The theoretical isotopic distribution is 572 © The Author 2010. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 572 572–577 Lipid Data Analyzer Fig. 1. LDA 3D-view on MS data. (A) Shows the 3D view at the top of the picture and the 2D chromatogram view at the bottom. In the 3D view, the isotopic peaks (+0, +1 and +2) of TG56:6 and TG56:7, plus the +0 peak of TG56:5 are depicted, whereupon the peaks stained in red correspond to the quantified peaks of TG56:6. The 2D view shows the extracted chromatogram at the m/z value of TG56:6. In the chromatogram the overlapping +2 isotopic peak of TG56:7 is not distinguishable from the main peak of 56:6; resulting in one peak. (B) Shows the zoomed 3D view of the overlap of the +2 peak of TG56:7 with the +0 peak of TG56:6. In 3D the two peaks are clearly discernable. The red stained part of the overlapping peaks indicates the part that has been used for the quantitation of TG56:6. used as an exclusion and selection criterion and as such increases 2.2 Controlled experiment specificity. These two principles allow accurate and highly specific The controlled experiment consisted of five mixtures (prepared as triplicates) peak identification, which is required for the unattended high- of the analytes TG54:0, TG54:1, TG54:2 and TG54:3, and TG48:0 as throughput analysis of lipids. To assess the performance of the reference value, which was present at the same concentration (5µM). In the first mix, the analytes were present at the same concentration (2µM) new algorithm, we (i) conducted a controlled experiment to test to test the ionization efficiencies of the analytes, while in mix 2–5 the the accuracy of the quantitation and (ii) applied the algorithm to analytes were mixed at different concentrations to test the quantitation biological data to test its sensitivity and positive predictive value. at different intensities. The relative concentrations of the analytes were: TG54:3 : TG54:2 = 2:1; TG54:2 : TG54:1 = 5:1; TG54:1 : TG54:0 = 10:1. Mixes 2–5 correspond to a dilution series with the following ratios: 2 SYSTEM AND METHODS mix2:mix3=2:1; mix2:mix4=5:1; mix2 : mix5 = 10 : 1. The Lipid Data Analyzer (LDA) software is a stand-alone, platform- Data were acquired by MS (more details about sample preparation and MS independent Java application, using Java3D (https://java3d.dev.java.net) for can be found in Supplementary Section 2) and analysed with LDA, LIMSA visualization. and mzMine2. The standard parameters of LIMSA and mzMine2 were inappropriate for high-resolution MS data (too few analytes were identified), 2.1 Theoretical isotopic distribution and thus the parameters were adapted accordingly (Supplementary The theoretical isotopic distribution is calculated for each mass shift with the Section 3). The analytes showed ammonium and ammonium-acetonitrile aid of probability theory, so that the probability for an isotopic mass peak is adducts, and the sum of both adducts was taken as the quantitative measure. The analysis with LDA was performed in an automated way relying only on the sum of the mutual combinations of atomic isotopes, which result in the the m/z value. In contrast, LIMSA required manual extraction of the profiles +x mass shift: for the automated quantitation, and mzMine2 information on retention time. p(+x) = p(mutualCombination) (1) Following that, the ratios of the dilution series were calculated and the For the theoretical isotopic distribution, the reference values are then average deviation from the expected ratio was taken as quality parameter: determined based on the measured base peak according to the formula (Supplementary Section 1): ratio(i) p(+x) AverageDeviation = (3) Area(+x) = Area(+0) (2) ∗ n idealRatio(i) p(+0) i=1 [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 573 572–577 J.Hartler et al. 2.3 Biological experiment For the biological data, three groups of C57BL male mice (n = 3 each) aged 12 weeks were maintained on a regular light-dark cycle (14 h light, 10 h dark) and fed on either standard laboratory chow diet (4.5% w/w crude fat) or high fat diet (34% w/w crude fat). For fasted samples, mice fed on chow diet were fasted over night for 16 h. The aim was to characterize the TG species of lipid droplets (LD) isolated from primary hepatocytes of mice subjected to the different feeding regimes. Primary hepatocytes were isolated by collagenase digestion according to Riccalton-Banks et al. (2003) with some modifications (Supplementary Section 4). Primary hepatocytes were lysed by nitrogen cavitation and LDs were isolated by ultra centrifugation. Lipids were extracted according to Folch et al. (1957). The sample was split into two parts, since LDs contain mainly TG (Blouin et al., 2010) and the chromatographic separation and peak shape would be very poor for TGs without dilution, whereas other lipids would be hardly detectable in a diluted sample. One part of the extract was diluted 1 : 66 and served for the analysis of TGs, and the other fraction remained undiluted for the analysis of phospholipids and sphingomyelin. The extracts were dissolved in CHCl /MeOH (1:1, v/v) and acquired by LC-MS as described in Supplementary Section 2. First, the peaks were manually verified (visualization with the implemented 3D viewer). A peak was considered valid if it occurs at the expected retention time (each additional double bond in a molecular species lowers the retention time by about 0.5–1.5 min). Subsequently, TG data were automatically analysed by LDA and mzMine2 and the results were compared. While LDA performs the automated quantitation based on the m/z value of the analyte only, mzMine2 requires additionally information about the retention time (cf. controlled experiment). The data from undiluted samples were analysed with LDA only to show the applicability to molecule classes other than TG. Fig. 2. Illustration of the novel algorithm from a top-view on the m/z map. The bigger ellipse represents the peak to be quantified; the smaller one is the overlapping one. First, a standard chromatogram ‘A’ with a broad m/z range 3 ALGORITHM IMPLEMENTATION ‘a’ is extracted (no overlap distinguishable). Second, at the time point with 3.1 3D algorithm the highest intensity an m/z-profile ‘B’ with a narrow range ‘b’ is extracted. Third, the m/z borders and the m/z value with the highest intensity are used The novel algorithm for accurate peak confinement works on the to extract chromatogram ‘C’ with range ‘c’ (uses borders found in ‘B’) and basis of several chromatogram and profile extractions, in which ‘D’ with narrow range ‘d’. ‘C’ and ‘D’ are used to identify borders in the the chromatogram/profile extraction is based on an adapted version time range. Now four border points of the peak are determined (dots on the (Hartler et al., 2007) of the ASAPRatio algorithm (Li et al., 2003) borders of the bigger ellipse) and an ellipse is fitted through them. The ellipse with a changed peak border detection method. This new method is used as the peak border. Just intensities inside the ellipse contribute to the detects peak borders based on abrupt changes in the gradient of total peak intensity. a peak curve (see Section 5 and SFig. 1 in the Supplementary Material). The chromatogram and profiles are used to determine points are detected, which can be used for the ellipse fit in the time four peak border points. An ellipse through these border points is direction. fitted, forming the border for the peak (Supplementary Section 6). In general, the border points of the chromatogram with the smaller Raw intensities inside the ellipse are used to calculate the area of m/z range are used for the ellipse. However, the chromatogram the peak. This approach minimizes the effect of potential overlaps with the narrow m/z range could possibly cut just a small part of and does not contain smoothing artefacts, since it works on raw the overlapping peak, resulting in an undetectable change of the intensities. gradient in the curve, whereas the extracted broader chromatogram Figure 1 shows two overlapping peaks which cannot be separated detects the change. Hence, the broader chromatogram returns a much by the conventional 2D methods. In Figure 2, these two peaks smaller time range than the narrower one. In this case, the border are depicted from a top-view on the m/z map and the novel 3D points of the broad chromatogram are taken for the ellipse fit. The algorithm is illustrated schematically. First, the algorithm extracts a ellipse is then fitted with the two border points in m/z direction and chromatogram with a broad m/z range and identifies the time point the two border points in time direction (blue points in Fig. 2), and at the peak summit. Second, at the time point of the summit, an only intensities within this ellipse contribute to the final peak area. m/z profile with a narrow time range is extracted. The algorithm determines the peak summit again and detects the two border points 3.2 Theoretical isotopic distribution of the profile peak, which already form points for the ellipse fit in m/z direction. Then, two chromatograms are extracted; the first one Due to the nature of lipids, there is in general more than one uses the two profile border points as m/z range for the extraction; candidate peak available for a distinct m/z value. In order to select the second one uses a very narrow range around the m/z value of the correct peak, LDA takes the theoretical isotopic distribution the profile peak summit. Now, in both chromatograms peak border into account, which is used as a selection/exclusion criterion. First, [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 574 572–577 Lipid Data Analyzer the algorithm checks whether the peak belongs to another isotopic 4 EXPERIMENTAL VALIDATION distribution (exclusion). Second, it checks whether the isotopic In order to test the accuracy of the algorithm, we conducted a peaks related to the analyte match (in a certain range) its theoretically controlled experiment (see Section 2) and compared the results of calculated values (selection). If both criteria are fulfilled the peak LDA, mzMine2 (Katajamaa et al., 2006; Pluskal et al., 2010) and is accepted. In contrast with the other existing algorithms, the LDA LIMSA (Haimi et al., 2006, 2009). First, samples were prepared has no deisotoping step; every isotopic peak is quantified separately containing all analytes at the same concentration to test the ionization and the sum is taken as the quantitative measure. efficiency. LDA and mzMine2 showed nearly the same intensities for TG54:0-2, but TG54:3 and the standard TG48:0 showed smaller intensities than expected (Supplementary Table TS1). TG54:3 and 3.3 Robust standardization TG48:0 eluted exactly at the same retention times (25.9–26.1min) in In addition to the novel algorithm the software features a statistics all of the samples (Supplementary Table TS1), and suppressed each section with standardization. Three different standardization other. Consequently, we did not use TG48:0 as a standard, since methods are provided: (i) correction based on a single standard; it would have skewed the results. Second, we compared analytes (ii) a median method; (iii) standardization on an internally developed over a dilution series (mix 2–5), from which we calculated a ratio methodology. The normalization on a single standard is trivial: relative to the areas found in mix 3 (Table 1 and for more details each standard forms a reference value. The median method uses see Supplementary Table TS1). In this experiment, the suppression the median of the standards in a sample as a reference. The third effects on TG54:3 could just be noticed at mix2, because here the method has been developed to provide a very robust method of intensity of TG54:3 was highest. In 8 of 12 possible ratios, the LDA standardization because suppression effects are quite common in was closer to the ideal ratio than mzMine2, and in 10 out of 12 MS. First, statistics from each standard are calculated (median ratios than LIMSA. The average deviation from the ideal ratio was and coefficient of variation). Second, the ‘most robust’ standard 9.7% for LDA, 10.5% for mzMine2 and 36.3% for LIMSA. The is selected by a two-step procedure. In the first step, standards LDA performed just slightly better (0.8%) than mzMine2 in this are selected only if they are found at least 90% of the most often experiment, since the peaks in the controlled experiment were quite found standard. The best standard is then the one with the highest well separated, and no overlap occurred. However, the intention intensity which is within 5% of the standard with the best coefficient of this experiment was just to show the accuracy of the results. of variation. If there is none within the 5%, the one with the Nevertheless, the areas of the LDA could be taken for further best coefficient of variation is taken. Third, applicable standards analysis without manual review, whereas for mzMine2 manual are selected for each experiment by a generous outlier removal. A selection of the correct hit from a list of ambiguous peaks was standard is an outlier only if it is less than half of the lower quartile necessary in 25 of the 150 possible cases. value or if it is higher than the double of the upper quartile value. The performance of the algorithms on biological data was Fourth, a ‘global reference area’is defined, which is the ‘most robust’ assessed without LIMSA, because of the unsatisfactory results of standard area of an experiment whose value is closest to the mean the controlled experiment. The comparison with mzMine2 was value of the ‘most robust’ standard. Fifth, for each experiment a conducted solely based on TG data, since it allowed for a more reference ratio between each applicable standard for the experiment objective manual identification of a correct peak due to the regular and the experiment chosen in the previous step is calculated. distribution of TG peaks. For this data, LDA showed higher sensitivity (86.1% → 93.3%) and a significantly improved positive s s 1exp 2exp n n predictive value (89.4% → 99.1%) compared with mzMine2 r = , r = ..., (4) 1 2 s s 1exp 2exp (Table 2 and for more details see Supplementary Table TS2), which ref ref is extremely important for automated high-throughput analysis. The reference for the experiment is then the ‘global reference’ times Regarding sensitivity, LDA correctly identified 678 (93.3%) of the mean of the single standard ratios (a comparison of the standard the 727 possible peaks, whereas mzMine2 identified 626 peaks median method and the new method can be found in Supplementary (85.4%). However, 168 of the mzMine2 identifications returned Section 7): more than one candidate peak (ambiguous identifications), and for LDA only 21. Nineteen of those 21 peaks were located in the refArea =globalRefArea mean(r ,r ,r ,...) (5) peak-tail belonging to the analyte with <5% intensity of the main expn 1 2 3 Table 1. Comparison of quantitation accuracy of the controlled experiment Lipid mix2 : mix3 mix4 : mix3 mix5 : mix3 Ideal LDA mzMine2 LIMSA Ideal LDA mzMine2 LIMSA Ideal LDA mzMine2 LIMSA TG54:0 2 2.3 2.5 0.9 0.4 0.42 0.43 0.57 0.2 0.20 0.20 0.24 TG54:1 2 2.3 2.4 3.8 0.4 0.38 0.38 0.31 0.2 0.15 0.15 0.16 TG54:2 2 2.3 2.2 0.9 0.4 0.40 0.40 0.35 0.2 0.17 0.17 0.17 TG54:3 2 1.8 1.8 1.0 0.4 0.43 0.43 0.38 0.2 0.19 0.19 0.12 The values are calculated relative to mix3. ‘Ideal’ corresponds to the expected ratio. [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 575 572–577 J.Hartler et al. Table 2. Comparison of the identification performance of the LDA and mzMine2 LDA mzMine2 Peaks unambiguously identified (A) 657 458 Peaks ambiguously identified (B) 21 168 Wrong peak identified (C) 4 71 Additional wrong peak (no peak there) (D) 2 3 Peaks not identified (E) 45 30 Peaks present in sample (F =A+B+C+E) 727 727 Total peaks identified (G =A+B+C+D) 684 700 Sensitivity ((A+B)/F) 93.3% 86.1% Sensitivity unambiguous (A/F) 90.4% 63.0% Positive predictive value ((A+B)/G) 99.1% 89.4% The sensitivity and the positive predictive value are the final quality parameters. peak, or a peak appearing with two summits was considered as two possible identifications. The remaining two peaks would not Fig. 3. LDA result visualization. Three groups of C57BL male mice (n =3 be viable for analysis and a manual decision would have been each) were analysed in this experiment (fed/high fat diet 34–36; fed/normal necessary. Regarding positive predictive value, 678 (99.1%) of the diet 37–39; fasted/normal diet 40–42). The values are normalized on the total 684 reported LDA identifications were correct, while 626 (89.4%) of TG content. Changes of single molecules are detected at glance with the heat the 700 mzMine2 hits were true positives, with the same amount of map (A). The bar charts (B) allow a more detailed comparison (here TG 56:1- ambiguous identifications as in the sensitivity evaluation. This high 11 is depicted; the values are normalized on the total content of TG56). The heat map cells provide direct links to the 3D-chromatogramm viewer (Fig. 1). positive predictive value demonstrates the extremely high reliability In Supplementary Material, a detailed list of the lipid assignments can be of the algorithm over a large intensity range, since the intensity ratio found in Table TS2, sheet ‘assignments’. between highest and lowest valid identifications is almost 10 . The data from the undiluted extract were used to demonstrate the general applicability of the software to samples containing variations, as well as on ‘external standards’, which are added minor amounts of phospholipids, i.e. phosphatidyl choline (PC), before sample preparation and should compensate losses throughout phosphatidyl ethanolamine (PE) and sphingomyelin (SM), in the sample preparation. The standardization procedures are described at presence of bulk TG. Even though the average intensity of a PC Section 3.3. peak is around 5% and an average SM peak is around 1% of the The results can be displayed in heat maps and/or bar intensity of an average TG peak (Supplementary Table TS3), the charts (Fig. 3), in which several display options are available positive predictive values attained were 97.2% for PC, 93.4% for (normalization on standards, base peak, total class content, content PE, 97.3% for SM and respective sensitivities were 92.5% for PC, of a specific group, etc.), and an overview of the content of the total 84.0% for PE and 85.7% for SM (Supplementary Table TS4). classes. The results can be exported into Microsoft Excel or tab- delimited format, and the heat maps and bar charts in PNG or SVG format. 5 SOFTWARE DESCRIPTION In the LDA, we implemented a sophisticated 3D viewer for the manual verification of the results (Fig. 1), because the extracted The application supports mzXML (Pedrioli et al., 2004) as raw profiles/chromatograms are the result of data processing procedures data format. In addition, the Thermo RAW format and the Waters and provide a constrained view on the data. The 3D viewer offers MassLynx raw data format are supported via the freely available adjustable m/z display ranges and resolution levels. Furthermore, software ReAdW and massWolf (Keller et al., 2005), which can hits can be deleted, added or corrected with several available be directly integrated in the program. Before quantitation, the quantitation methods (3D method, standard ASAPRatio method, the mzXML files are translated into an internal indexed chrom file method described in Hartler et al. (2007), and the method described format, which provides rapid access to the data (especially useful for in Section 5 and SFig. 1 in the Supplementary Data). data visualization). The molecules to be identified are defined in a Microsoft Excel file, containing, for each analyte, the name and the molecular formula and the mass. The m/z must be entered manually, 6 DISCUSSION since it is possible to define several adducts or modifications for one lipid. The quantitation itself can be performed in batch mode. The The LDA proved to be a powerful and reliable tool for high- calculation of the theoretical isotopic distribution is based on the list throughput analysis of the lipidome using LC-MS. In a controlled of chemical elements and the probabilities of their isotopes provided experiment, we showed that the results are more accurate than in an extendable XML file. with other available tools (average deviation from the ideal The software features a statistics section with normalization, ratio was 9.7% for LDA, 10.5% for mzMine2 and 36.3% grouping, visualization and export functionalities. The data can be for LIMSA). Moreover on biological data, the novel algorithm normalized on ‘internal standards’, which are added before the proved its real strength: the sensitivity increased from 86.1 to MS data acquisition and should compensate instrument-specific 93.3% and the positive predictive value from 89.4 to 99.1% [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 576 572–577 Lipid Data Analyzer compared with mzMine2. The high positive predictive value is Funding: LipidomicNet, an EU Framework 7 project (grant no. particularly important for an automated analysis with minimal 202272); the Austrian Ministry of Science and Research, GEN-AU human intervention. Moreover, we demonstrated the applicability project Bioinformatics Integration Network. of LDA to low abundant lipid species in lipid droplets, like Conflict of Interest: none declared. phospholipids and sphingomyelin. Although intensities measured were close to noise levels, our algorithm still achieved good to excellent positive predictive values. Although the experimental data REFERENCES in this study was generated on an FT-MS instrument, the application Blouin,C.M. et al. (2010) Lipid droplet analysis in caveolin-deficient adipocytes: is not limited to high-resolution data. The parameters for the 3D alterations in surface phospholipid composition and maturation defects. J. Lipid algorithm have already been adapted for QTOF and low-resolution Res., 51, 945–956. QTRAP. Ejsing,C.S. et al. (2006) Automated identification and quantification of glycerophospholipid molecular species by multiple precursor ion scanning. LDA is not limited to the analysis of lipids, but can be extended to Anal. Chem., 78, 6202–6214. other singly charged analytes by providing the respective molecule Folch,J. et al. (1957) A simple method for the isolation and purification of total lipides definition file. Support for multiple charged molecules like peptides from animal tissues. J. Biol. Chem., 226, 497–509. would require the extraction of more chromatograms for the Haimi,P. et al. (2006) Software tools for analysis of mass spectrometric lipidome data. Anal. Chem., 78, 8324–8331. theoretical isotopic distribution as selection/exclusion criterion (see Haimi,P. et al. (2009) Instrument-independent software tools for the analysis of MS-MS Section 3.2). For example, when the algorithm checks whether or not and LC-MS lipidomics data. Methods Mol. Biol., 580, 285–294. the peak belongs to another isotopic distribution, a chromatogram Hartler,J. et al. (2007) MASPECTRAS: a platform for management and analysis of at m/z (analyte) – mass (neutron) is extracted. To check a doubly proteomics LC-MS/MS data. BMC Bioinformatics., 8, 197. charged molecule a chromatogram at m/z (analyte) – (mass Katajamaa,M. et al. (2006) MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics, 22, 634–636. (neutron))/2 is required and so on. The smoothing of a chromatogram Keller,A. et al. (2005) A uniform proteomics MS/MS analysis platform utilizing open is quite time consuming, thus a pre-screening method that works on XML file formats. Mol. Syst. Biol., 1, 2005. the raw data are expected to analyse multiply charged molecules in Leavell,M.D. and Leary,J.A. (2006) Fatty acid analysis tool (FAAT): An FT-ICR MS a rapid manner. lipid analysis algorithm. Anal. Chem., 78, 5497–5503. Li,X.J. et al. (2003) Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry. Anal. Chem., 75, 6648–6657. ACKNOWLEDGEMENTS Pedrioli,P.G. et al. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol., 22, 1459–1466. The authors thank the mass spectrometry department of Gerald N. Pluskal,T. et al. (2010) MZmine 2: modular framework for processing, visualizing, and Rechberger of the University of Graz for providing the QTOF data, analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics, the group of Bernd Helms, Department for Biochemistry and Cell 11, 395. Biology of the University of Utrecht for providing the QTRAP data, Riccalton-Banks,L. et al. (2003) A simple method for the simultaneous isolation of stellate cells and hepatocytes from rat liver tissue. Mol. Cell Biochem., 248, 97–102. Anita Eberl of the Core Facility for Mass Spectrometry, Center Song,H. et al. (2007) Algorithm for processing raw mass spectrometric data to identify for Medical Research, Medical University of Graz for the fruitful and quantitate complex lipid molecular species in mixtures by data-dependent discussions and Ravi Tharakan, Bayview Proteomics Center, Johns scanning and fragment ion database searching. J. Am. Soc. Mass Spectrom., 18, Hopkins University for critically reading the manuscript. 1848–1858. [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 577 572–577 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bioinformatics Oxford University Press

Lipid Data Analyzer: unattended identification and quantitation of lipids in LC-MS data

Loading next page...
 
/lp/oxford-university-press/lipid-data-analyzer-unattended-identification-and-quantitation-of-0zAZRSfpwt

References (29)

Publisher
Oxford University Press
Copyright
© The Author 2010. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
ISSN
1367-4803
eISSN
1460-2059
DOI
10.1093/bioinformatics/btq699
pmid
21169379
Publisher site
See Article on Publisher Site

Abstract

Vol. 27 no. 4 2011, pages 572–577 BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btq699 Data and text mining Advance Access publication December 17, 2010 Lipid Data Analyzer: unattended identification and quantitation of lipids in LC-MS data 1,2 3 2 2 Jürgen Hartler , Martin Trötzmüller , Chandramohan Chitraju , Friedrich Spener , 3 1,4,∗ Harald C. Köfeler and Gerhard G. Thallinger 1 2 Institute for Genomics and Bioinformatics, Graz University of Technology, Institute of Molecular Biosciences, University of Graz, Core Facility for Mass Spectrometry, Center for Medical Research, Medical University of Graz and Core Facility Bioinformatics, Austrian Center for Industrial Biotechnology (ACIB GmbH), Graz, Austria Associate Editor: Martin Bishop ABSTRACT 1 INTRODUCTION Motivation: The accurate measurement of the lipidome permits Advances in lipidomics technologies utilizing mass spectrometry insights into physiological and pathological processes. Of the present have led to a rapid increase in the number, size and rate high-throughput technologies, LC-MS especially bears potential of at which datasets are generated. LC-MS especially bears the monitoring quantitative changes in hundreds of lipids simultaneously. potential for monitoring quantitative changes in hundreds of lipids In order to extract valuable information from huge amount of mass simultaneously. In order to deal with the acquired data automated spectrometry data, the aid of automated, reliable, highly sensitive and reliable software tools are required. and specific analysis algorithms is indispensable. Currently available lipidomics MS-quantitation tools can be Results: We present here a novel approach for the quantitation classified in two groups, each extracting 2D subsets of MS data: one of lipids in LC-MS data. The new algorithm obtains its analytical group (Ejsing et al., 2006; Haimi et al., 2006, 2009; Leavell and power by two major innovations: (i) a 3D algorithm that confines Leary, 2006; Song et al., 2007) requires m/z profiles or m/z spectra; the peak borders in m/z and time direction and (ii) the use of the and the other group (Katajamaa et al., 2006; Pluskal et al., 2010) theoretical isotopic distribution of an analyte as selection/exclusion extracts chromatograms for quantitation. A representative of the first criterion. The algorithm is integrated in the Lipid Data Analyzer (LDA) group is LIMSA which can analyse m/z spectra only (Haimi et al., application which additionally provides standardization, a statistics 2006, 2009). It utilizes an additional program (SECD) to extract module for results analysis, a batch mode for unattended analysis the m/z spectra. The user has to define manually a trapezoid in of several runs and a 3D viewer for the manual verification. The the m/z-time map of the LC-MS data containing the lipid series statistics module offers sample grouping, tests between sample of interest (a collection of lipids having the same amount of C groups and export functionalities, where the results are visualized atoms). This extracted m/z spectra serve as input for the LIMSA by heat maps and bar charts. The presented algorithm has been algorithm. In contrast, mzMine2 (Katajamaa et al., 2006; Pluskal applied to data from a controlled experiment and to biological data, et al., 2010), a program from the second group, provides several containing analytes distributed over an intensity range of 10 . Our algorithms to automatically extract chromatograms for processing. approach shows improved sensitivity and an extremely high positive Both programs select the peaks for an analyte based on the exact m/z predictive value compared with existing methods. Consequently, the value. However, an LC-MS peak is a 3D object, whereupon each novel algorithm, integrated in a user-friendly application, is a valuable of its data points is characterized by retention time, m/z ratio and improvement in the high-throughput analysis of the lipidome. intensity. Consequently, all of the available tools lose information Implementation and availability: The Java application is freely from a whole dimension (in the first group by the m/z profile available for non-commercial users at http://genome.tugraz.at/lda. extraction and the other group by the chromatogram extraction), Raw data associated with this manuscript may be downloaded and thus decrease the ability to clearly distinguish between closely from ProteomeCommons.org Tranche using the following hash: overlapping peaks (Fig. 1). Furthermore, overlapping peaks occur ZBh3nS5bXk6I/Vn32tB5Vh0qnMpVIW71HByFFQqM0RmdF4/4Hcn frequently in lipidomic data due to the particular nature of lipids. H3Wggh9kU2teYVOtM1JWwHIeMHqSS/bc2yYNFmyUAAAAAAACl The chemical structures of lipids can differ just by one double bond, DQ== resulting in a mass difference of 2 Da between the analytes. Thus, Contact: Gerhard.Thallinger@tugraz.at the base peak (+0 isotope) of the analyte has almost the same mass Supplementary information: Supplementary data are available as the second isotopic peak (+2 isotope) of the analyte with one from Bioinformatics online. additional double bond. Furthermore, elution times are usually very close due to the physicochemical similarity of the analytes. Received on October 1, 2010; revised on December 11, 2010; In this article, we present a novel algorithm that addresses the accepted on December 13, 2010 difficulties specific to lipidomics MS-data quantitation by two novel approaches: (i) exact peak border confinement and (ii) accounting for the theoretical isotopic distribution for peak selection. The novel border confinement increases the ability to detect overlaps To whom correspondence should be addressed. and reduces their effects. The theoretical isotopic distribution is 572 © The Author 2010. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 572 572–577 Lipid Data Analyzer Fig. 1. LDA 3D-view on MS data. (A) Shows the 3D view at the top of the picture and the 2D chromatogram view at the bottom. In the 3D view, the isotopic peaks (+0, +1 and +2) of TG56:6 and TG56:7, plus the +0 peak of TG56:5 are depicted, whereupon the peaks stained in red correspond to the quantified peaks of TG56:6. The 2D view shows the extracted chromatogram at the m/z value of TG56:6. In the chromatogram the overlapping +2 isotopic peak of TG56:7 is not distinguishable from the main peak of 56:6; resulting in one peak. (B) Shows the zoomed 3D view of the overlap of the +2 peak of TG56:7 with the +0 peak of TG56:6. In 3D the two peaks are clearly discernable. The red stained part of the overlapping peaks indicates the part that has been used for the quantitation of TG56:6. used as an exclusion and selection criterion and as such increases 2.2 Controlled experiment specificity. These two principles allow accurate and highly specific The controlled experiment consisted of five mixtures (prepared as triplicates) peak identification, which is required for the unattended high- of the analytes TG54:0, TG54:1, TG54:2 and TG54:3, and TG48:0 as throughput analysis of lipids. To assess the performance of the reference value, which was present at the same concentration (5µM). In the first mix, the analytes were present at the same concentration (2µM) new algorithm, we (i) conducted a controlled experiment to test to test the ionization efficiencies of the analytes, while in mix 2–5 the the accuracy of the quantitation and (ii) applied the algorithm to analytes were mixed at different concentrations to test the quantitation biological data to test its sensitivity and positive predictive value. at different intensities. The relative concentrations of the analytes were: TG54:3 : TG54:2 = 2:1; TG54:2 : TG54:1 = 5:1; TG54:1 : TG54:0 = 10:1. Mixes 2–5 correspond to a dilution series with the following ratios: 2 SYSTEM AND METHODS mix2:mix3=2:1; mix2:mix4=5:1; mix2 : mix5 = 10 : 1. The Lipid Data Analyzer (LDA) software is a stand-alone, platform- Data were acquired by MS (more details about sample preparation and MS independent Java application, using Java3D (https://java3d.dev.java.net) for can be found in Supplementary Section 2) and analysed with LDA, LIMSA visualization. and mzMine2. The standard parameters of LIMSA and mzMine2 were inappropriate for high-resolution MS data (too few analytes were identified), 2.1 Theoretical isotopic distribution and thus the parameters were adapted accordingly (Supplementary The theoretical isotopic distribution is calculated for each mass shift with the Section 3). The analytes showed ammonium and ammonium-acetonitrile aid of probability theory, so that the probability for an isotopic mass peak is adducts, and the sum of both adducts was taken as the quantitative measure. The analysis with LDA was performed in an automated way relying only on the sum of the mutual combinations of atomic isotopes, which result in the the m/z value. In contrast, LIMSA required manual extraction of the profiles +x mass shift: for the automated quantitation, and mzMine2 information on retention time. p(+x) = p(mutualCombination) (1) Following that, the ratios of the dilution series were calculated and the For the theoretical isotopic distribution, the reference values are then average deviation from the expected ratio was taken as quality parameter: determined based on the measured base peak according to the formula (Supplementary Section 1): ratio(i) p(+x) AverageDeviation = (3) Area(+x) = Area(+0) (2) ∗ n idealRatio(i) p(+0) i=1 [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 573 572–577 J.Hartler et al. 2.3 Biological experiment For the biological data, three groups of C57BL male mice (n = 3 each) aged 12 weeks were maintained on a regular light-dark cycle (14 h light, 10 h dark) and fed on either standard laboratory chow diet (4.5% w/w crude fat) or high fat diet (34% w/w crude fat). For fasted samples, mice fed on chow diet were fasted over night for 16 h. The aim was to characterize the TG species of lipid droplets (LD) isolated from primary hepatocytes of mice subjected to the different feeding regimes. Primary hepatocytes were isolated by collagenase digestion according to Riccalton-Banks et al. (2003) with some modifications (Supplementary Section 4). Primary hepatocytes were lysed by nitrogen cavitation and LDs were isolated by ultra centrifugation. Lipids were extracted according to Folch et al. (1957). The sample was split into two parts, since LDs contain mainly TG (Blouin et al., 2010) and the chromatographic separation and peak shape would be very poor for TGs without dilution, whereas other lipids would be hardly detectable in a diluted sample. One part of the extract was diluted 1 : 66 and served for the analysis of TGs, and the other fraction remained undiluted for the analysis of phospholipids and sphingomyelin. The extracts were dissolved in CHCl /MeOH (1:1, v/v) and acquired by LC-MS as described in Supplementary Section 2. First, the peaks were manually verified (visualization with the implemented 3D viewer). A peak was considered valid if it occurs at the expected retention time (each additional double bond in a molecular species lowers the retention time by about 0.5–1.5 min). Subsequently, TG data were automatically analysed by LDA and mzMine2 and the results were compared. While LDA performs the automated quantitation based on the m/z value of the analyte only, mzMine2 requires additionally information about the retention time (cf. controlled experiment). The data from undiluted samples were analysed with LDA only to show the applicability to molecule classes other than TG. Fig. 2. Illustration of the novel algorithm from a top-view on the m/z map. The bigger ellipse represents the peak to be quantified; the smaller one is the overlapping one. First, a standard chromatogram ‘A’ with a broad m/z range 3 ALGORITHM IMPLEMENTATION ‘a’ is extracted (no overlap distinguishable). Second, at the time point with 3.1 3D algorithm the highest intensity an m/z-profile ‘B’ with a narrow range ‘b’ is extracted. Third, the m/z borders and the m/z value with the highest intensity are used The novel algorithm for accurate peak confinement works on the to extract chromatogram ‘C’ with range ‘c’ (uses borders found in ‘B’) and basis of several chromatogram and profile extractions, in which ‘D’ with narrow range ‘d’. ‘C’ and ‘D’ are used to identify borders in the the chromatogram/profile extraction is based on an adapted version time range. Now four border points of the peak are determined (dots on the (Hartler et al., 2007) of the ASAPRatio algorithm (Li et al., 2003) borders of the bigger ellipse) and an ellipse is fitted through them. The ellipse with a changed peak border detection method. This new method is used as the peak border. Just intensities inside the ellipse contribute to the detects peak borders based on abrupt changes in the gradient of total peak intensity. a peak curve (see Section 5 and SFig. 1 in the Supplementary Material). The chromatogram and profiles are used to determine points are detected, which can be used for the ellipse fit in the time four peak border points. An ellipse through these border points is direction. fitted, forming the border for the peak (Supplementary Section 6). In general, the border points of the chromatogram with the smaller Raw intensities inside the ellipse are used to calculate the area of m/z range are used for the ellipse. However, the chromatogram the peak. This approach minimizes the effect of potential overlaps with the narrow m/z range could possibly cut just a small part of and does not contain smoothing artefacts, since it works on raw the overlapping peak, resulting in an undetectable change of the intensities. gradient in the curve, whereas the extracted broader chromatogram Figure 1 shows two overlapping peaks which cannot be separated detects the change. Hence, the broader chromatogram returns a much by the conventional 2D methods. In Figure 2, these two peaks smaller time range than the narrower one. In this case, the border are depicted from a top-view on the m/z map and the novel 3D points of the broad chromatogram are taken for the ellipse fit. The algorithm is illustrated schematically. First, the algorithm extracts a ellipse is then fitted with the two border points in m/z direction and chromatogram with a broad m/z range and identifies the time point the two border points in time direction (blue points in Fig. 2), and at the peak summit. Second, at the time point of the summit, an only intensities within this ellipse contribute to the final peak area. m/z profile with a narrow time range is extracted. The algorithm determines the peak summit again and detects the two border points 3.2 Theoretical isotopic distribution of the profile peak, which already form points for the ellipse fit in m/z direction. Then, two chromatograms are extracted; the first one Due to the nature of lipids, there is in general more than one uses the two profile border points as m/z range for the extraction; candidate peak available for a distinct m/z value. In order to select the second one uses a very narrow range around the m/z value of the correct peak, LDA takes the theoretical isotopic distribution the profile peak summit. Now, in both chromatograms peak border into account, which is used as a selection/exclusion criterion. First, [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 574 572–577 Lipid Data Analyzer the algorithm checks whether the peak belongs to another isotopic 4 EXPERIMENTAL VALIDATION distribution (exclusion). Second, it checks whether the isotopic In order to test the accuracy of the algorithm, we conducted a peaks related to the analyte match (in a certain range) its theoretically controlled experiment (see Section 2) and compared the results of calculated values (selection). If both criteria are fulfilled the peak LDA, mzMine2 (Katajamaa et al., 2006; Pluskal et al., 2010) and is accepted. In contrast with the other existing algorithms, the LDA LIMSA (Haimi et al., 2006, 2009). First, samples were prepared has no deisotoping step; every isotopic peak is quantified separately containing all analytes at the same concentration to test the ionization and the sum is taken as the quantitative measure. efficiency. LDA and mzMine2 showed nearly the same intensities for TG54:0-2, but TG54:3 and the standard TG48:0 showed smaller intensities than expected (Supplementary Table TS1). TG54:3 and 3.3 Robust standardization TG48:0 eluted exactly at the same retention times (25.9–26.1min) in In addition to the novel algorithm the software features a statistics all of the samples (Supplementary Table TS1), and suppressed each section with standardization. Three different standardization other. Consequently, we did not use TG48:0 as a standard, since methods are provided: (i) correction based on a single standard; it would have skewed the results. Second, we compared analytes (ii) a median method; (iii) standardization on an internally developed over a dilution series (mix 2–5), from which we calculated a ratio methodology. The normalization on a single standard is trivial: relative to the areas found in mix 3 (Table 1 and for more details each standard forms a reference value. The median method uses see Supplementary Table TS1). In this experiment, the suppression the median of the standards in a sample as a reference. The third effects on TG54:3 could just be noticed at mix2, because here the method has been developed to provide a very robust method of intensity of TG54:3 was highest. In 8 of 12 possible ratios, the LDA standardization because suppression effects are quite common in was closer to the ideal ratio than mzMine2, and in 10 out of 12 MS. First, statistics from each standard are calculated (median ratios than LIMSA. The average deviation from the ideal ratio was and coefficient of variation). Second, the ‘most robust’ standard 9.7% for LDA, 10.5% for mzMine2 and 36.3% for LIMSA. The is selected by a two-step procedure. In the first step, standards LDA performed just slightly better (0.8%) than mzMine2 in this are selected only if they are found at least 90% of the most often experiment, since the peaks in the controlled experiment were quite found standard. The best standard is then the one with the highest well separated, and no overlap occurred. However, the intention intensity which is within 5% of the standard with the best coefficient of this experiment was just to show the accuracy of the results. of variation. If there is none within the 5%, the one with the Nevertheless, the areas of the LDA could be taken for further best coefficient of variation is taken. Third, applicable standards analysis without manual review, whereas for mzMine2 manual are selected for each experiment by a generous outlier removal. A selection of the correct hit from a list of ambiguous peaks was standard is an outlier only if it is less than half of the lower quartile necessary in 25 of the 150 possible cases. value or if it is higher than the double of the upper quartile value. The performance of the algorithms on biological data was Fourth, a ‘global reference area’is defined, which is the ‘most robust’ assessed without LIMSA, because of the unsatisfactory results of standard area of an experiment whose value is closest to the mean the controlled experiment. The comparison with mzMine2 was value of the ‘most robust’ standard. Fifth, for each experiment a conducted solely based on TG data, since it allowed for a more reference ratio between each applicable standard for the experiment objective manual identification of a correct peak due to the regular and the experiment chosen in the previous step is calculated. distribution of TG peaks. For this data, LDA showed higher sensitivity (86.1% → 93.3%) and a significantly improved positive s s 1exp 2exp n n predictive value (89.4% → 99.1%) compared with mzMine2 r = , r = ..., (4) 1 2 s s 1exp 2exp (Table 2 and for more details see Supplementary Table TS2), which ref ref is extremely important for automated high-throughput analysis. The reference for the experiment is then the ‘global reference’ times Regarding sensitivity, LDA correctly identified 678 (93.3%) of the mean of the single standard ratios (a comparison of the standard the 727 possible peaks, whereas mzMine2 identified 626 peaks median method and the new method can be found in Supplementary (85.4%). However, 168 of the mzMine2 identifications returned Section 7): more than one candidate peak (ambiguous identifications), and for LDA only 21. Nineteen of those 21 peaks were located in the refArea =globalRefArea mean(r ,r ,r ,...) (5) peak-tail belonging to the analyte with <5% intensity of the main expn 1 2 3 Table 1. Comparison of quantitation accuracy of the controlled experiment Lipid mix2 : mix3 mix4 : mix3 mix5 : mix3 Ideal LDA mzMine2 LIMSA Ideal LDA mzMine2 LIMSA Ideal LDA mzMine2 LIMSA TG54:0 2 2.3 2.5 0.9 0.4 0.42 0.43 0.57 0.2 0.20 0.20 0.24 TG54:1 2 2.3 2.4 3.8 0.4 0.38 0.38 0.31 0.2 0.15 0.15 0.16 TG54:2 2 2.3 2.2 0.9 0.4 0.40 0.40 0.35 0.2 0.17 0.17 0.17 TG54:3 2 1.8 1.8 1.0 0.4 0.43 0.43 0.38 0.2 0.19 0.19 0.12 The values are calculated relative to mix3. ‘Ideal’ corresponds to the expected ratio. [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 575 572–577 J.Hartler et al. Table 2. Comparison of the identification performance of the LDA and mzMine2 LDA mzMine2 Peaks unambiguously identified (A) 657 458 Peaks ambiguously identified (B) 21 168 Wrong peak identified (C) 4 71 Additional wrong peak (no peak there) (D) 2 3 Peaks not identified (E) 45 30 Peaks present in sample (F =A+B+C+E) 727 727 Total peaks identified (G =A+B+C+D) 684 700 Sensitivity ((A+B)/F) 93.3% 86.1% Sensitivity unambiguous (A/F) 90.4% 63.0% Positive predictive value ((A+B)/G) 99.1% 89.4% The sensitivity and the positive predictive value are the final quality parameters. peak, or a peak appearing with two summits was considered as two possible identifications. The remaining two peaks would not Fig. 3. LDA result visualization. Three groups of C57BL male mice (n =3 be viable for analysis and a manual decision would have been each) were analysed in this experiment (fed/high fat diet 34–36; fed/normal necessary. Regarding positive predictive value, 678 (99.1%) of the diet 37–39; fasted/normal diet 40–42). The values are normalized on the total 684 reported LDA identifications were correct, while 626 (89.4%) of TG content. Changes of single molecules are detected at glance with the heat the 700 mzMine2 hits were true positives, with the same amount of map (A). The bar charts (B) allow a more detailed comparison (here TG 56:1- ambiguous identifications as in the sensitivity evaluation. This high 11 is depicted; the values are normalized on the total content of TG56). The heat map cells provide direct links to the 3D-chromatogramm viewer (Fig. 1). positive predictive value demonstrates the extremely high reliability In Supplementary Material, a detailed list of the lipid assignments can be of the algorithm over a large intensity range, since the intensity ratio found in Table TS2, sheet ‘assignments’. between highest and lowest valid identifications is almost 10 . The data from the undiluted extract were used to demonstrate the general applicability of the software to samples containing variations, as well as on ‘external standards’, which are added minor amounts of phospholipids, i.e. phosphatidyl choline (PC), before sample preparation and should compensate losses throughout phosphatidyl ethanolamine (PE) and sphingomyelin (SM), in the sample preparation. The standardization procedures are described at presence of bulk TG. Even though the average intensity of a PC Section 3.3. peak is around 5% and an average SM peak is around 1% of the The results can be displayed in heat maps and/or bar intensity of an average TG peak (Supplementary Table TS3), the charts (Fig. 3), in which several display options are available positive predictive values attained were 97.2% for PC, 93.4% for (normalization on standards, base peak, total class content, content PE, 97.3% for SM and respective sensitivities were 92.5% for PC, of a specific group, etc.), and an overview of the content of the total 84.0% for PE and 85.7% for SM (Supplementary Table TS4). classes. The results can be exported into Microsoft Excel or tab- delimited format, and the heat maps and bar charts in PNG or SVG format. 5 SOFTWARE DESCRIPTION In the LDA, we implemented a sophisticated 3D viewer for the manual verification of the results (Fig. 1), because the extracted The application supports mzXML (Pedrioli et al., 2004) as raw profiles/chromatograms are the result of data processing procedures data format. In addition, the Thermo RAW format and the Waters and provide a constrained view on the data. The 3D viewer offers MassLynx raw data format are supported via the freely available adjustable m/z display ranges and resolution levels. Furthermore, software ReAdW and massWolf (Keller et al., 2005), which can hits can be deleted, added or corrected with several available be directly integrated in the program. Before quantitation, the quantitation methods (3D method, standard ASAPRatio method, the mzXML files are translated into an internal indexed chrom file method described in Hartler et al. (2007), and the method described format, which provides rapid access to the data (especially useful for in Section 5 and SFig. 1 in the Supplementary Data). data visualization). The molecules to be identified are defined in a Microsoft Excel file, containing, for each analyte, the name and the molecular formula and the mass. The m/z must be entered manually, 6 DISCUSSION since it is possible to define several adducts or modifications for one lipid. The quantitation itself can be performed in batch mode. The The LDA proved to be a powerful and reliable tool for high- calculation of the theoretical isotopic distribution is based on the list throughput analysis of the lipidome using LC-MS. In a controlled of chemical elements and the probabilities of their isotopes provided experiment, we showed that the results are more accurate than in an extendable XML file. with other available tools (average deviation from the ideal The software features a statistics section with normalization, ratio was 9.7% for LDA, 10.5% for mzMine2 and 36.3% grouping, visualization and export functionalities. The data can be for LIMSA). Moreover on biological data, the novel algorithm normalized on ‘internal standards’, which are added before the proved its real strength: the sensitivity increased from 86.1 to MS data acquisition and should compensate instrument-specific 93.3% and the positive predictive value from 89.4 to 99.1% [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 576 572–577 Lipid Data Analyzer compared with mzMine2. The high positive predictive value is Funding: LipidomicNet, an EU Framework 7 project (grant no. particularly important for an automated analysis with minimal 202272); the Austrian Ministry of Science and Research, GEN-AU human intervention. Moreover, we demonstrated the applicability project Bioinformatics Integration Network. of LDA to low abundant lipid species in lipid droplets, like Conflict of Interest: none declared. phospholipids and sphingomyelin. Although intensities measured were close to noise levels, our algorithm still achieved good to excellent positive predictive values. Although the experimental data REFERENCES in this study was generated on an FT-MS instrument, the application Blouin,C.M. et al. (2010) Lipid droplet analysis in caveolin-deficient adipocytes: is not limited to high-resolution data. The parameters for the 3D alterations in surface phospholipid composition and maturation defects. J. Lipid algorithm have already been adapted for QTOF and low-resolution Res., 51, 945–956. QTRAP. Ejsing,C.S. et al. (2006) Automated identification and quantification of glycerophospholipid molecular species by multiple precursor ion scanning. LDA is not limited to the analysis of lipids, but can be extended to Anal. Chem., 78, 6202–6214. other singly charged analytes by providing the respective molecule Folch,J. et al. (1957) A simple method for the isolation and purification of total lipides definition file. Support for multiple charged molecules like peptides from animal tissues. J. Biol. Chem., 226, 497–509. would require the extraction of more chromatograms for the Haimi,P. et al. (2006) Software tools for analysis of mass spectrometric lipidome data. Anal. Chem., 78, 8324–8331. theoretical isotopic distribution as selection/exclusion criterion (see Haimi,P. et al. (2009) Instrument-independent software tools for the analysis of MS-MS Section 3.2). For example, when the algorithm checks whether or not and LC-MS lipidomics data. Methods Mol. Biol., 580, 285–294. the peak belongs to another isotopic distribution, a chromatogram Hartler,J. et al. (2007) MASPECTRAS: a platform for management and analysis of at m/z (analyte) – mass (neutron) is extracted. To check a doubly proteomics LC-MS/MS data. BMC Bioinformatics., 8, 197. charged molecule a chromatogram at m/z (analyte) – (mass Katajamaa,M. et al. (2006) MZmine: toolbox for processing and visualization of mass spectrometry based molecular profile data. Bioinformatics, 22, 634–636. (neutron))/2 is required and so on. The smoothing of a chromatogram Keller,A. et al. (2005) A uniform proteomics MS/MS analysis platform utilizing open is quite time consuming, thus a pre-screening method that works on XML file formats. Mol. Syst. Biol., 1, 2005. the raw data are expected to analyse multiply charged molecules in Leavell,M.D. and Leary,J.A. (2006) Fatty acid analysis tool (FAAT): An FT-ICR MS a rapid manner. lipid analysis algorithm. Anal. Chem., 78, 5497–5503. Li,X.J. et al. (2003) Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry. Anal. Chem., 75, 6648–6657. ACKNOWLEDGEMENTS Pedrioli,P.G. et al. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol., 22, 1459–1466. The authors thank the mass spectrometry department of Gerald N. Pluskal,T. et al. (2010) MZmine 2: modular framework for processing, visualizing, and Rechberger of the University of Graz for providing the QTOF data, analyzing mass spectrometry-based molecular profile data. BMC Bioinformatics, the group of Bernd Helms, Department for Biochemistry and Cell 11, 395. Biology of the University of Utrecht for providing the QTRAP data, Riccalton-Banks,L. et al. (2003) A simple method for the simultaneous isolation of stellate cells and hepatocytes from rat liver tissue. Mol. Cell Biochem., 248, 97–102. Anita Eberl of the Core Facility for Mass Spectrometry, Center Song,H. et al. (2007) Algorithm for processing raw mass spectrometric data to identify for Medical Research, Medical University of Graz for the fruitful and quantitate complex lipid molecular species in mixtures by data-dependent discussions and Ravi Tharakan, Bayview Proteomics Center, Johns scanning and fragment ion database searching. J. Am. Soc. Mass Spectrom., 18, Hopkins University for critically reading the manuscript. 1848–1858. [12:56 22/1/2011 Bioinformatics-btq699.tex] Page: 577 572–577

Journal

BioinformaticsOxford University Press

Published: Dec 17, 2010

There are no references for this article.