Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Reinforcement learned adversarial agent (ReLAA) for active fault detection and prediction in space habitats

Reinforcement learned adversarial agent (ReLAA) for active fault detection and prediction in... www.nature.com/npjmgrav ARTICLE OPEN Reinforcement learned adversarial agent (ReLAA) for active fault detection and prediction in space habitats 1✉ 1✉ 1 1 1 Matthew Overlin , Steven Iannucci , Bradly Wilkins , Alexander McBain and Jason Provancher With growing interest for human space tourism in the twenty-first century, much attention has been directed to the robust engineering of Environmental Control and Life Support Systems in space habitats. The stable, reliable operation of such a habitat is partly achieved with an ability to recognize and predict faults. For these two purposes, a reinforcement learning adversarial agent (ReLAA) is utilized in this work. A ReLAA is trained with experimental data to actively recognize and predict faults. These capabilities are achieved by proposing actions that activate known faults in a system. Instead of issuing these harmful actions to the actual hardware, a digital twin of the mock space habitat is simulated to discover vulnerabilities that would lead to faulted operation. The methods developed in this work will allow for the discovery of damaging latent behavior, and the reduction of false positive and negative fault identification. npj Microgravity (2023) 9:15 ; https://doi.org/10.1038/s41526-023-00252-9 13,14 INTRODUCTION approaches . In short, models with physical basis are under- stood to be more explainable, generalizable, and interpretable; all Space tourism is a budding industry with increased interest from 1,2 are qualities necessary in models of life-sustainment systems. the general public . Companies such as SpaceX, Blue Origin, Other digital twin systems have been successful in integrating Virgin Galactic, and Boeing are either planning sub-orbital leisure physics-based models in lieu of a surplus of data that is necessary flights or have already completed such trips. Union Bank of 7,15 for supervised machine-learned systems . Switzerland estimates that the space tourism sector of the space Each ReLAA developed and implemented in this work is an economy will be worth US$4bn by 2030 . Present-day trips, artificial neural network (ANN). Such networks are often trained however, may only be short-duration visits that occur over a with variants of gradient descent (ADAM, SGD, etc.), a first-order period of hours or days. For such trips to be possible, science and optimization algorithm used to find local minima in objective engineering research has sought to understand the potential for functions. Instead of these traditional optimization algorithms, medical risk during these crewed missions . To increase the safety some have found advantages with training ANNs through and reliability of these missions, accurate system health monitor- neuroevolution, an evolutionary process that allows an ANN’s ing (SHM) must be deployed. The methods in this work will parameters to change with new training data. Unlike gradient- identify faulted operation in order to enable safer leisure travel based approaches, activation functions, hyperparameters, archi- with reduced medical risk. tectures, and algorithms can be learned in addition to the ANN Separate from the design, engineering, and construction of 16,17 parameters . As explained later in this article, the training vessels launched into low Earth orbit, this work primarily seeks to process for the ReLAA is completed through neuroevolution. monitor the operation of these vessels. Many conventional SHM Artificial intelligence is usually implemented in the form of an fault detection methods compare measurement data with 5–9 ANN which may be described as a universal approximator. They established healthy operational bounds . Such methods may are trained through a learning process where the parameters of be described as passive, since a fault is declared based on static the ANN are optimized to approximate a unique policy. Through pre-defined rules. Passive HM does not capture and understand reinforcement learning, a perturb-and-observe approach is used the short- or long-term dynamics of the system, thus leaving it during the training process . Actions selected by an agent are vulnerable to unexpected or sudden faults. For example, a passive issued to a digital twin of an experimental apparatus (or an actual system will not understand the relationship between two features experimental apparatus), and the results are quantified as that could combine into a coupled or cascading fault. Other research has developed fault detection solutions that can desirable or not by calculating a reward. If the actions lead to a be updated during operation, but these methods may be purely large reward, then the ANN yielding the high reward is used to 10–12 19 data-driven models . Because the agent in this work employs a generate offspring agents . In this work, a ReLAA is not used for controlling elements in the mock space habitat, but instead for physics-based digital twin, potential failures may be captured in a fault recognition and prediction. digital twin’s simulation results. Such information is useful during Unfortunately, genetic algorithms like NeuroEvolution tend to the agent’s training and deployment phases. Purely data-driven 19,20 converge upon a single solution . In a system as complex as models may not have failure data available during training. Such failure data may not be easily attainable from an experimental the one designed and implemented in this work, a single solution setup with expensive hardware assets . is not capable of understanding the varied dynamics while still Prior research has found that purely physics-based models or being computationally feasible . Research into expanding the hybrid physics-based and data-driven models hold certain key diversity of the solution space is ongoing, with promising 17,20,22 advantages not found with purely data-driven modeling results . For the ReLAAs, two common techniques are Department of Autonomy and Data Science, PacMar Technologies, Honolulu, HI, USA. email: moverlin@gmail.com; siannucci@pacmartech.com Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA 1234567890():,; M. Overlin et al. implemented, the clustering and fitness sharing, both of which are placed in states where power consumption is approximately equal designed to promote diversity and outliers during training. to the system’s capacity. This work introduces a framework to develop and test an active fault detection strategy on a physical demonstration system. First, in Thermal control system (TCS) Methods section, the design and construction of the physical mock The TCS was designed, built, instrumented, and operated as one space habitat for life sustainment is outlined as well as the sensors of the sub-systems in the mock space habitat. The TCS in this work and tools required to measure and communicate the physical is different from more practical systems which may be integrated system state to the software implementation. Also, in Methods with ECLSSs in modern spacecraft . Certain assumptions were section, a brief description is given for the following project tasks: made and should be noted. There is a single closed loop of middleware implementation, digital twin development, fault circulating water, which would not be practically implemented on elicitation and reinforcement learning. In Results section, results spacecraft. Typically, there would be internal and external loops, are shared from the experimental operation and fault emulation. and other fluids would be circulated through these loops such as a Results from the deployment of multiple ReLAAs are also presented. propylene glycol mixture or ammonia which have lower freezing Finally, a discussion and conclusion are included. temperatures. Because this TCS is to be used as part of a mock space habitat for the ReLAA, simplifications in the TCS design were accepted. With the current design and implementation, a variety METHODS of faults could still be realized in the experimental setup. Physical demonstration system In the TCS, the temperature measurements are insightful since Each ReLAA developed in this work was tested and validated with the goal of the TCS is to regulate the habitat’s room temperature. live sensor stream data collected from the mock space habitat. The habitat temperature is ultimately regulated with the proper The experimental setup is described as a system of systems: operation of the TCS, and this is achieved with additional thermal control system (TCS), grey water filtering system (GWS), observability from other sensors. Pressure and flow measurements and an electrical integration model. obtained at various locations in the TCS allow for sufficient The goal of the physical demonstration system is to have a visibility. An annotated picture of the TCS is shown in Fig. 1. physical test system capable of eliciting measurable, realistic faults that are representative of an actual space habitat. Grey water filtering system (GWS) Considerations in the design must be made for not only the The GWS was designed based on a water recycling system actuation of faults within the system(s) but also the measurement designed and operated in prior NASA work . The components in and detection of the faults. The measurement and detection the GWS were sized to deliver roughly 21 L/h of potable water. A capability, primarily achieved with a variety of sensors, allows for simplified schematic of the GWS is shown in Fig. 2. When filtering the physical system to be integrated with a physics-based model grey water, the forward osmosis (FO) module is first used. Water to create a unified digital twin. The large volumes of data, flows through two different paths within the FO module, an inner collected via instrumentation hardware in the mock space path (consisting of feed/dirty water) and an outer path (consisting habitat, will be key in training the ReLAAs and also validating of draw/salt water) in the opposite direction. These two cavities the digital twin used by the ReLAAs. containing water flow are separated by a semi-permeable An empty, isolated room was re-purposed to serve as the mock 3 membrane. Due to the osmotic pressure differential between space habitat in this work with an assumed volume of 28 m and 3 the feed water and draw water, water passes through this 9.3 m of habitable space per occupant. Some prior work has membrane from the feed to draw side and ultimately into the investigated many of the factors that would lead to a certain draw solution (DS) tank. A majority of the contaminants would be habitable volume, and has suggested a lower limit for the removed in this filtering process with the resulting product from habitable volume given a certain number of days for a crewed 23 3 the FO being a salt water solution. voyage . A volume of 9.3 m would roughly translate to a crewed Water is then pumped out of the DS tank and through a reverse duration of 17 days (or fewer). Thus, the decision was made to osmosis (RO) module. The RO module uses hydraulic pressure consider a habitat capable of sustaining three personnel. Then, the rather than osmotic pressure to force a solution through a semi- number of habitat occupants (3) was used as the basis for sizing permeable membrane. The RO has one input and two outputs, the GWS. Waste produced by each occupant is assumed to be with potable water exiting one output and rejecting water cycling 7L/day/person. With the assumption that the GWS would be back to the DS tank. For every 1 L of potable water flowing into processing a day’s volume of water in 1 h, a through-flow rate of the product tank, 2 L of water re-enter the DS tank from the 21 L/h is assumed. The estimated maximum power consumed by rejection of the RO module. A variety of sensors are included in the whole GWS is 250 W. The room’s temperature would be the experimental setup for the GWS: pressure, flow, total dissolved controlled with the TCS so that a habitat temperature of 20 °C is solids, electric power consumption, and tank water level. maintained. Altogether, a maximum power of 560 W is assumed from the TCS. These design decisions spurred an initially estimated Experimental instrumentation, data acquisition, and control power draw for each system to ensure appropriate relative power draws and the appropriate consideration of components. These The experimental setup integrates the data collection and power draws were then used to size the electrical system. Given operation activities of the physical demonstration system to a 250 W from the GWS, 560 W from the TCS and 200 W from a load sensor and computational hub that can remotely monitor and bank, the electrical system was sized to supply 1 kW of power to actuate the system. This computational hub uses NASA’s core the whole mock space habitat. Flight System middleware software architecture to ingest sensor As the scope of this project is limited to the aforementioned data, process raw voltages to physical values, and record historical systems, other potential systems typically found in an Environ- data for use by a digital twin. Further developments were also mental Control and Life Support Systems (ECLSS) will be emulated accomplished to allow a terminal user to have remote access to in the experimental setup with the 200 W load bank. The load the system, control programmable actuators on the system, and bank was sized to account for the difference between the GWS synchronize data to a cloud storage service. The sensor and and TCS power draws and the capacity of the DC power source to control hub computer is a Raspberry Pi 4b single-board computer ensure that all habitat systems cannot be powered simulta- that communicates with two Arduino Mega microcontroller neously. This will enable faults to cascade when the system is boards via USB. One microcontroller is used for reading sensor npj Microgravity (2023) 15 Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA 1234567890():,; M. Overlin et al. Fig. 2 In the grey water filtering system (GWS), grey water is sourced from a feed tank, filtered through a forward osmosis (FO) module, filtered through a reverse osmosis (RO) module, and finally fed into the product tank as potable water. a A simplified schematic illustrates the operation of the GWS. Essentially, there are 3 loops in which water flows. The FO and RO modules are key components. b The GWS was mostly assembled, installed, and instrumented on one wall within the mock space habitat. The GWS’s feed tank (left) and product tank (right) are out of view in this picture. (The annotated picture is provided by PacMar Technologies and used with permission.). Fig. 1 In the thermal control system (TCS), water is circulated through a closed loop to regulate the mock space habitat’s room temperature. a A simplified schematic illustrates the operation of Using the Functional Mock-up Interface (FMI) standard, the the TCS by showing how important components (heat exchanger, model was compiled as a functional mock-up unit (FMU) for pump, chiller, and heater) are arranged in the loop. b The TCS was use by a ReLAA . The FMI standard is often used to simplify the mostly assembled, installed, and instrumented on one wall within creation, storage, exchange, and use of dynamic system models so the mock space habitat (chiller not shown). (The annotated picture that they may be flexibly simulated on a variety of computational is provided by PacMar Technologies and used with permission.). platforms. When using the FMU, the ReLAA would specify input stimuli, set points, and other necessary information needed to values, and the other is used for issuing commands to the specify a what-if scenario. With the integrated digital twin model demonstration system’s programmable actuators. implemented as an FMU, the ReLAA could be deployed offline to recognize faults in previously captured data or online to forward- Digital twin development look from a present snapshot of data. The integrated physics-based model (TCS, GWS, and electrical subsystem model), simulated in the MathWorks Simscape Fault elicitation environment, is referred to as a digital twin because it is a virtual A failure modes and effects analysis was conducted to identify representation of the mock space habitat that is updated from faults of interest within the system(s). These faults inform where real-time data and used to inform decision-making processes 26,27 and how perturbations are applied to the mock habitat and how (fault recognition and prediction) . In this work, the integrated they will be measured, and the expected system response given the model’s parameters were adjusted so that its simulation results operational scenario. Table 1 highlights some faults to be emulated would agree with experimentally captured data from the habitat. in the mock habitat and the associated perturbation mechanism. Because the ReLAA would need the ability to forward-look and For binary mechanisms like switches and relays, the agent has the anticipate faults, the digital twin would need to execute ability to toggle the position between ON and OFF states. For simulations faster than real time. The difference between simulated and measured waveforms was quantified, on average, continuously adjustable values such as variable resistors and values with a mean absolute deviation of 7% or less. There is often a the agent had the ability to discretely increment the actuator by a tradeoff between model accuracy and simulation speed, and this set value. That is, the rightmost column in Table 1 lists the actuation work prioritized simulation speed, though the model’s accuracy points in the experimental setup where the ReLAA may affect was found to be satisfactory. change with its actions. The faults and mechanisms outlined in Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA npj Microgravity (2023) 15 M. Overlin et al. KL divergence metric, the fitness function is scaled by γ, which is Table 1. Identified faults through the failure modes and effects computed as shown in Eq. (3): analysis and their associated mechanisms where the ReLAA can affect change in the experimental setup with its actions. γ ¼ fKLðπ ; π ÞjKLðπ ; π Þ < δg P Q P Q (3) Subsystem Faults Mechanism This scaling factor γ is applied such that the ith agent has a GWS Membrane fouling, clogging, Valves −1 restrictions, leaks fitness of F , but a scaled fitness of F γ . This scaled fitness i i TCS Freezing Chiller, external heater accounts for diversity between agents and is the metric used to Blockages, leaks Valves select those agents which will parent subsequent generations of Electrical Power spikes Variable resistors agents or be used in deployment. If a population member Q is Shorts Circuit breaker within δ KL divergence it is considered a neighbor and included in Power loss DC power supply the distance scalar. Population members that are similar to each Sensors Sensor failure Disconnect from power, other will not survive to the next generation, increasing diversity software logic in the population. Sensor drift Variable resistors, The agent learns a mapping from the state space to the action software logic space using guidance from a fitness function. The state space of Misc. LSS systems Load spikes Load bank the adversarial agent is the sensor data that would be mirrored in the real system. The action space is the set of components that the agent could perturb to cause a fault. For example, this includes Table 1 were evaluated individually to uncover specificfault each of the pipes it can clog and the filters it can foul. Each agent is responses, as well as combinations of perturbations to elicit a feed-forward neural network with inputs of the state-space sensor cascaded faults. This fault analysis informed a design of experi- measurements, a list of 50 values. It has four layers, with a SoftMax ments that was later executed as part of this work. output layer that has the same length as the number of possible faults, an array with 14 units. The output vector contains floats from Reinforcement learning 0 to 1 that represent the percentage probability that an action is This framework uses reinforcement-learned adversarial agents to taken. Then, when assuming a normal distribution, an action is learn perturbations to the digital twin that cause faults. The selected. After this selection, the action is then issued to the digital adversarial agent executes forward simulations while performing twin or the mock space habitat. When the action is issued, a fault these perturbations on the digital twin to predict faults early, may or may not occur. If the ReLAA is well-trained (indicated with a thereby identifying latent conditions which could lead to future high fitness), then it is more successful in causing faults with the faults. The adversarial agent is trained using a neuroevolutionary actions selected from its output vector of probabilities. approach to learn how to cause and diagnose faults in the Before training, a wide variety of experiments were performed system. Here, a neural network represents the learned policy to capture several possible healthy and faulted operational while an evolutionary algorithm iteratively optimizes the policy. scenarios. In this data which shows actions and sensor data The validated digital twin allows for the use of a genetic together, the ReLAA learns this mapping. That is, the agent was algorithm for training. Without it, an intractable amount of trained to find actions that push the system into failure. Agents experimentally captured data from the mock space habitat only act during a limited time horizon to encourage the discovery would be necessary. The digital twin allows for the agents to of imminent failure cases. break the system during training while not actually harming any The digital twin can be simulated with a real-time factor of physical systems. The verification of the demonstration system 20 to 1. To train agents efficiently, each training run is conducted and the speed of simulation become essential to the quality of on a standalone CPU process, allowing for 24 branches to be the adversarial agent. simulated at once. Due to the total policy of the agents being In addition, fitness sharing is introduced after clustering to spread over the entire population, parallelization allows for the increase diversity in the population. In fitness sharing, each current state space to be subject to numerous operational population member scales its fitness based on its proximity to scenarios, leading to varied possible fault generation events. population members. Therefore, densely packed population Overall, the parallelization enables training to be conducted up to members have a lower fitness value than comparably good 18× faster than without parallelization. solutions in sparsely populated regions. The distance is the KL The flow of data during training and deployment is shown in divergence between solutions, each represented by a discrete Fig. 3. The iterative neuroevolutionary optimization occurs in the distribution based on states . In general, the KL divergence training block on the left, while data generated by agents is saved between two probability distributions P and Q is computed as for future use in the deployment. Once the agents are trained, shown below in Eq. (1): each can be loaded into the deployment framework where it will be given the chance to perturb simulation events from the current pðiÞ KLðP; QÞ¼ pðiÞ ln (1) simulation state, as indicated in live sensor data provided to a qðiÞ deployed ReLAA. Note that the ReLAA can be provided with past or present sensor data and a digital twin simulation with 0 time The policy calculates a discrete distribution based on the current steps can be performed to identify past or present faults. Of state, therefore to compute the KL divergence between two course, the results of the 0-time step simulation are trivial. From policies, denoted by π and π . A set of recent states, s ∈ S,isused P Q the digital twin’s initial state, the simulation would illustrate the to get divergence by running the population member through an time evolution into another state- the same state since the optimization problem. That is, the quantity computed in Eq. (2)is to simulation is performed for 0 time steps. The simulated data is be maximized to ensure diversity between two policies π and π . P Q then analyzed to identify faults. A database of historical data is generated during the training phase due to the limitations of the KLðπ ; π Þ¼ KLðπ ðsÞ; π ðsÞÞ P Q P Q (2) jjSjj FMU software standard and sensors. In order to generate a s2S simulation from a complex state, the sensor values are fed into a Next, a population member’s fitness scales with respect to the Ball Tree algorithm. This structure finds the closest internal state distance of all nearby population members. Therefore, with the that can be loaded into the FMU given the current sensor state. npj Microgravity (2023) 15 Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA M. Overlin et al. Fig. 3 Multiple ReLAAs are trained before they are deployed to detect faults. The FMU (the implementation form of the digital twin) is then RESULTS simulated to identify faults and appropriately notify a user. The mock space habitat is operated in a variety of conditions to emulate the normal and faulted operations of an actual space habitat. ReLAA’s rewards and fitness The fault detection methods shared in this work, using ReLAAs, will identify actions that lead to faulted operation. This is achieved Mock space habitat operation and fault emulation utilizing a digital twin rather than taking potentially harmful Twenty different experiments were conducted, each containing actions in the mock space habitat. The ReLAA agent is adversarial several disturbances to allow the mock habitat to operate in in nature and earns a reward (during its training process) when different states. Measurement data from the TCS and GWS are damaging actions are found. The reward function at any given shown in Fig. 4. state, r is defined below where, m, is the number of features First, an experiment was designed and performed over several measured. For each feature, there is a reward. For a single feature hours with regard to the TCS. The time evolution of the x , the feature reward depends upon if this feature value is above, temperatures in the TCS is shown in the first plot in Fig. 4. The below, or within upper and lower operational bounds. This is true TCS is allowed to reach a steady state, but then certain faults are emulated: sudden pump blockage, reduced flow through for all features when the current state value is within the features chiller, bypass valve opening/closing. When such faults are lower bound and upper bounds, x and x . The equation below is l u actuated, there is usually a deviation in the room temperature, used to compute a feature reward. which is undesirable. 1; if x >x Second, an experiment was performed with regard to the GWS, > v u and a relevant plot of pressures is shown in Fig. 4. In the GWS, its 1; if x <x r ¼ v l (4) normal or faulted operation can be visualized in these pressure x x x x : u v v l maxf ; g; otherwise 1 1 measurements obtained at various points in the experimental ðxuxlÞ ðxuxlÞ 2 2 setup. During healthy operation, the pressures are generally the largest. With the presence of a clog, leak, or other damage, several Then, the reward function for a particular state r can be pressure measurements typically deviate from their nominal computed. r depends on all r values as shown in the equation s v values. Measurement data were collected during various testing below: conditions—healthy, drifted, or faulted—allowing for an operation baseline to be established for agent training and deployment. r ¼ ð1  r Þ (5) s v v¼1 Fault detection and prediction Training was conducted on a validated digital twin, assessing 36 Finally, the fitness of a given agent, F , depends on all rewards r i s agents per epoch. On each generation, agents were sorted into a in a simulation run. goal of six clusters, with each top 2 performing agents in each cluster chosen as parents for the successive generation. Each agent’s parameters were subject to Gaussian noise with a F ¼ r i s (6) jjSjj maximum variability of 0.25 with a standard deviation of 0.10 s2S (each parameter is between –1 and 1). From 12 parents, a total of 36 agents were to be tested in the next training epoch. The The fitness for one ReLAA is ultimately the metric used to performance of an agent is quantified as the cumulative sum of judge whether or not the ReLAA is used to be deployed or the rewards over the 10 testing runs of that generation. worthy of parenting subsequent generations in the neuroevolu- The fitness of the top performer in each generation is shown in tionary process. Fig. 5. Over the scheduled 100 generations, the performance of the top agents steadily improved until the 65th epoch where the Reporting summary population reached its maximum reward. After this peak, the Further information on research design is available in the Nature next 40 generations show a leveling-off in performance as Research Reporting Summary linked to this article. agents find optimal solutions. In addition, the diversity of each Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA npj Microgravity (2023) 15 M. Overlin et al. Fig. 4 Experimentally captured data from the mock space habitat illustrate the operation during normal and faulted conditions. a Normal and faulted operation is shown for a variety of faults emulated in the TCS. b Pressures throughout the GWS change in response to normal and faulted conditions. agent is calculated at each generation. Some distance between level, triggering the pumps to activate. The GWS pumps draw clusters is desirable and incentivized through the neuroevolu- more power than typical to push water through the clog. This tionary structure. Relative distances between clusters increase by results in a decrease in the habitat’s DC bus voltage. With a lower a factor of 20 throughout the first 20 generations as the system operating voltage for the TCS’s centrifugal pump, the pump performs its state space exploration. With many agents trained cannot provide a sufficient pressure differential. For this reason, to discover faulted behavior, the pool of agents used for fault flow is reduced, and the room temperature cannot be effectively detection is considered diverse. Following a peak in diversity regulated with the TCS. In this particular example of an indirect around the 20th epoch, the clusters begin to approximately fault, a clog in the GWS has indirectly prohibited the TCS from converge again to a steady state distance. During this time, the regulating the habitat temperature. fitness of each cluster is improving as denoted by the lightening Agent 2 (blue rollout in Fig. 6) simulates the clogged strainer of each dot’s color. within the TCS. This filter is located on the intake of the TCS pump. When properly trained, the agents will issue actions to the As the clog builds, pressure slowly drops as there is a linear system which leads to faulted behavior. For example, Fig. 6 shows downtrend to the blue, jagged output. When the clogged strainer an agent’s rollout in an operational scenario. As the flow meter fault is further actuated, it is not the ideal choice to break the 2 sensor measurement (shown in black in Fig. 6) would indicate, system. This demonstrates a key contribution to this active the TCS in the mock space habitat is operating satisfactorily, learning framework. Multiple underlying component issues, within healthy operating bounds (shown in red in Fig. 6). where none are independently causing a component fault, can When testing two of our agents on simulation-derived training lead to a full system fault. As demonstrated in this work, our active data, both are able to force the system to a fault using two fault detection framework with the adversarial learning agent will separate actuation methods. In the testing scenario, a clog within predict these hard-to-discover faults. the RO module is actuated at a rate of 0.5% per second until it reaches an over-fault percentage of 30. This slight disturbance is DISCUSSION recognized by the adversarial agents, and they each present a unique solution. The neuroevolutionary training of a population of adversarial Agent 1 (green rollout in Fig. 6) actuates the RO clog fault agents was successful, as seen by increasing rewards for the within the GWS, causing a cascade effect through the electrical agents throughout the training process. In the clustering step of system. Because of the RO clog in the GWS, reduced flow is training, diversity was achieved, and the system didn’t converge observed throughout the TCS. As the clog in the RO filter to a single solution. The top agent’s reward increases steadily at becomes worse, pressure grows above the control threshold the beginning of training to a maximum reward, but then npj Microgravity (2023) 15 Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA M. Overlin et al. reaches a plateau. The inconsistent growth from epoch to epoch This approach to creating adversarial agents was successful in is a result of fitness sharing in the system due to the random reaching its goals, but has several technical limitations. Due to the noise added to the parameters on each generation. This is a complexity of the FMU simulation and the corresponding necessary concession made in the system to not optimize toward operation software, each simulation takes minutes to run and a single local minimum. generates substantial amounts of data. This limits the amount of Fig. 5 KL divergence during adversarial agent training. Average reward is represented by the dot color, with lighter being higher performers. Fig. 6 ReLAA rollout shown for TCS flow meter 2. A successfully trained ReLAA issues actions to the system which allow for faulted operation: the reduction in flow (outside of satisfactory operating bounds shown in red) throughout the TCS as shown in green. Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA npj Microgravity (2023) 15 M. Overlin et al. training that can be conducted and how much data can be stored 11. Iverson, D. et al. General purpose data-driven system monitoring for space operations. J. Aerosp. Comput. Inf. Commun. 9,26–44 (2012). for the deployment framework. 12. Spirkovska, L. et al. Anomaly detection for next-generation space launch ground The digital twins were leveraged to allow for the creation of a true operations. Proceedings of the AIAA SpaceOps 2010 Conference (AIAA, Huntsville, adversarial agent. It allows agents to learn how to break the system AL, 2010). components repeatedly without lasting impact, physical or monetary. 13. Wang, J., Li, Y., Gao, R. & Zhang, F. Hybrid physics-based and data-driven models This process is key when developing intelligent systems that allow for for smart manufacturing: modelling, simulation, and explainability. J. Manuf. Syst. limited or no data collection before deployment. The simulation 63, 381–391 (2022). provides for faster-than-real-time operation, pivotal when attempting 14. Rackauckas, C. et al. Universal Differential Equations for Scientific Machine to accurately predict and observe possible latent faults. Learning. Proceedings of the National Academy of Sciences of the United States of In this work, a framework for active fault detection is America. https://doi.org/10.21203/rs.3.rs-55125/v1 (2020). proposed, implemented and demonstrated. A mock space 15. Uhlemann, T. H.-J., Schock, C., Lehmann, C., Freiberger, S. & Steinhilper, R. The digital twin: demonstrating the potential of real time data acquisition in pro- habitat was designed, built, instrumented, and operated to duction systems. Procedia Manuf. 9, 113–120 (2017). enable the demonstration of the fault detection method 16. Stanley, K., Clune, J., Lehman, J. & Miikkulainen, R. Designing neural networks developed in this work. A suitable FMEA analysis was conducted through neuroevolution. Nat. Mach. Intell. 1,24–35 (2019). to identify damaging faults that would cause irreparable 17. Papavasileiou, E., Cornelis, J. & Jansen, B. A systematic literature review of the suc- damage to the mock space habitat. A design of experiments cessors of “neuroevolution of augmenting topologies”. Evol. Comput. 29,1–73 (2021). was executed to exercise the experimental setup in a variety of 18. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, operating conditions, normal and faulted. A digital twin model, 2018). simulated as an FMU, was validated with the mock habitat’s 19. Such, F. P. et al. Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. Preprint measurement data. at arXiv:1712.06567 [cs] (2018). With a large volume of data collected, multiple ReLAAs were 20. Ibrahim, M. Y., Sridhar, R., Geetha, T. V. & Deepika, S. S. Advances in neuroevo- trained to recognize normal and faulted behavior. The training lution through augmenting topologies – a case study. 2019 11th International for these ReLAAs was completed in several generations using a Conference on Advanced Computing (ICoAC) 111–116. https://doi.org/10.1109/ neuroevolutionary process so that diversity could be achieved. ICoAC48765.2019.246825 (2019). Then, the agents were deployed to discover damaging actions 21. Miller, B. L. & Shaw, M. J. Genetic algorithms with dynamic niche sharing for that could damage the mock space habitat. Because there are multimodal function optimization. Proceedings of IEEE International Conference on multiple agents, a variety of damaging actions were found. With Evolutionary Computation (IEEE, 1996). these vulnerabilities discovered, a system operator can then use 22. Chang, P.-C., Huang, W.-H. & Ting, C.-J. Dynamic diversity control in genetic algorithm for mining unsearched solution space in TSP problems. Expert Syst. this information to take proper action, thereby mitigating or Appl. 37, 1863–1878 (2010). preventing faults. 23. Simon, M. Whitmire, A., Otto, C. & Neubek, D. Factors impacting habitable volume requirements: results from the 2011 Habitable Volume Workshop. National Aeronautics and Space Administration (NASA), Center for Advanced Space Studies- DATA AVAILABILITY Universities Space Research Association (2011). The simulation data and experimental data are available from the corresponding 24. Caldwell, S. & Dunbar, B. National Aeronautics and Space Administration (NASA). author upon request. “7.0 Thermal Control.” Updated 4 November 2021. https://www.nasa.gov/ smallsat-institute/sst-soa/thermal-control (2021). 25. Indranil, R., Hafiychuk, V., & Goebel, K. Model-based diagnosis and prognosis of a CODE AVAILABILITY water recycling system. IEEE Aerospace Conference (2013). The design files, simulation models, and software that support this work are available 26. Gelernter, D. Mirror Worlds or the Day Software Puts the Universe in a Shoebox: from the corresponding author upon request. How Will It Happen and What It Will Mean. ISBN: 0195068122 (Oxford University Press, Inc., 1991). 27. International Business Machine Corporation (IBM). How does a digital twin work? Received: 27 June 2022; Accepted: 10 January 2023; https://www.ibm.com/topics/what-is-a-digital-twin. 28. Functional Mock-up Interface (FMI). https://fmi-standard.org/ (2022). 29. Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Statist. 22, 79–86 (1951). REFERENCES 1. Beard, S. S. & Starzyk, J. Space tourism market study: orbital space travel & ACKNOWLEDGEMENTS destinations with suborbital space travel. 1–72 (Futron Corp., 2002). This work has been supported SBIR contract number 80NSSC-20-C-0129. The authors 2. Ballard, R. & Connolly, J. US/USSR joint research in space biology and medicine on thank the National Aeronautics and Space Administration (NASA), especially Cosmos biosatellites. FASEB J. 4,5–9 (1990). Dr Rodney Martin and Dr Craig Moore, for their financial and technical support. 3. UBS Investment Bank. Future of Space Tourism: Lifting Off? Or has COVID-19 The authors also thank Sierra Nevada Corporation (SNC) for their technical guidance stunted adoption? 20 July. https://www.ubs.com/global/en/investment-bank/in- when designing the mock space habitat and serving as a sub-contractor for this work. focus/2021/space-tourism.html (2021). Finally, the authors thank other personnel at Martin Defense Group who have 4. Antonsen, E. L. et al. Estimating medical risk in human spaceflight. npj Micro- provided technical feedback and guidance throughout the duration of this work, gravity 8, 8 (2022). 5. Tang, S. et al. Operation-aware ISHM for environmental control and life support in especially William Curran who was involved at the beginning of this work. deep space habitants. https://doi.org/10.2514/6.2018-1365 (2018). 6. Abid, A., Khan, M. T. & Iqbal, J. A review on fault detection and diagnosis tech- niques: basics and beyond. Artif. Intell. Rev. 54, 3639–3664 (2021). AUTHOR CONTRIBUTIONS 7. Daigle, M. et al. A comprehensive diagnosis methodology for complex hybrid M.O. helped build and instrument the mock space habitat, served as the principal systems. IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and investigator, and helped prepare this article. S.I. developed software for the ReLAA Humans (2010). and helped prepare this article. B.W. helped with the design, simulation, building, and 8. Jiang, J. & Yu, X. Fault-tolerant control systems: a comparative study between testing of the mock space habitat. A.M. helped with the data analysis, middleware active and passive approaches. Ann. Rev. Control 36,60–72 (2012). development, and execution of experiments. J.P. initiated and supervised this work. 9. Mustapha, S., Lu, Y., Ng, C.-T. & Malinowski, P. Sensor networks for structures health monitoring: placement, implementations, and challenges–areview. Vibration 4, 551–585 (2021). COMPETING INTERESTS 10. Colombano, S. et al. A system for fault management for NASA’s deep space habitat. International Conference on Environmental Systems (ICES) (2013). The authors declare no competing interests. npj Microgravity (2023) 15 Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA M. Overlin et al. ADDITIONAL INFORMATION Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, Supplementary information The online version contains supplementary material adaptation, distribution and reproduction in any medium or format, as long as you give available at https://doi.org/10.1038/s41526-023-00252-9. appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party Correspondence and requests for materials should be addressed to Matthew Overlin material in this article are included in the article’s Creative Commons license, unless or Steven Iannucci. indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory Reprints and permission information is available at http://www.nature.com/ regulation or exceeds the permitted use, you will need to obtain permission directly reprints from the copyright holder. To view a copy of this license, visit http:// creativecommons.org/licenses/by/4.0/. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. © The Author(s) 2023 Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA npj Microgravity (2023) 15 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png npj Microgravity Springer Journals

Reinforcement learned adversarial agent (ReLAA) for active fault detection and prediction in space habitats

Loading next page...
 
/lp/springer-journals/reinforcement-learned-adversarial-agent-relaa-for-active-fault-R6HTkLfQL7
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2023
eISSN
2373-8065
DOI
10.1038/s41526-023-00252-9
Publisher site
See Article on Publisher Site

Abstract

www.nature.com/npjmgrav ARTICLE OPEN Reinforcement learned adversarial agent (ReLAA) for active fault detection and prediction in space habitats 1✉ 1✉ 1 1 1 Matthew Overlin , Steven Iannucci , Bradly Wilkins , Alexander McBain and Jason Provancher With growing interest for human space tourism in the twenty-first century, much attention has been directed to the robust engineering of Environmental Control and Life Support Systems in space habitats. The stable, reliable operation of such a habitat is partly achieved with an ability to recognize and predict faults. For these two purposes, a reinforcement learning adversarial agent (ReLAA) is utilized in this work. A ReLAA is trained with experimental data to actively recognize and predict faults. These capabilities are achieved by proposing actions that activate known faults in a system. Instead of issuing these harmful actions to the actual hardware, a digital twin of the mock space habitat is simulated to discover vulnerabilities that would lead to faulted operation. The methods developed in this work will allow for the discovery of damaging latent behavior, and the reduction of false positive and negative fault identification. npj Microgravity (2023) 9:15 ; https://doi.org/10.1038/s41526-023-00252-9 13,14 INTRODUCTION approaches . In short, models with physical basis are under- stood to be more explainable, generalizable, and interpretable; all Space tourism is a budding industry with increased interest from 1,2 are qualities necessary in models of life-sustainment systems. the general public . Companies such as SpaceX, Blue Origin, Other digital twin systems have been successful in integrating Virgin Galactic, and Boeing are either planning sub-orbital leisure physics-based models in lieu of a surplus of data that is necessary flights or have already completed such trips. Union Bank of 7,15 for supervised machine-learned systems . Switzerland estimates that the space tourism sector of the space Each ReLAA developed and implemented in this work is an economy will be worth US$4bn by 2030 . Present-day trips, artificial neural network (ANN). Such networks are often trained however, may only be short-duration visits that occur over a with variants of gradient descent (ADAM, SGD, etc.), a first-order period of hours or days. For such trips to be possible, science and optimization algorithm used to find local minima in objective engineering research has sought to understand the potential for functions. Instead of these traditional optimization algorithms, medical risk during these crewed missions . To increase the safety some have found advantages with training ANNs through and reliability of these missions, accurate system health monitor- neuroevolution, an evolutionary process that allows an ANN’s ing (SHM) must be deployed. The methods in this work will parameters to change with new training data. Unlike gradient- identify faulted operation in order to enable safer leisure travel based approaches, activation functions, hyperparameters, archi- with reduced medical risk. tectures, and algorithms can be learned in addition to the ANN Separate from the design, engineering, and construction of 16,17 parameters . As explained later in this article, the training vessels launched into low Earth orbit, this work primarily seeks to process for the ReLAA is completed through neuroevolution. monitor the operation of these vessels. Many conventional SHM Artificial intelligence is usually implemented in the form of an fault detection methods compare measurement data with 5–9 ANN which may be described as a universal approximator. They established healthy operational bounds . Such methods may are trained through a learning process where the parameters of be described as passive, since a fault is declared based on static the ANN are optimized to approximate a unique policy. Through pre-defined rules. Passive HM does not capture and understand reinforcement learning, a perturb-and-observe approach is used the short- or long-term dynamics of the system, thus leaving it during the training process . Actions selected by an agent are vulnerable to unexpected or sudden faults. For example, a passive issued to a digital twin of an experimental apparatus (or an actual system will not understand the relationship between two features experimental apparatus), and the results are quantified as that could combine into a coupled or cascading fault. Other research has developed fault detection solutions that can desirable or not by calculating a reward. If the actions lead to a be updated during operation, but these methods may be purely large reward, then the ANN yielding the high reward is used to 10–12 19 data-driven models . Because the agent in this work employs a generate offspring agents . In this work, a ReLAA is not used for controlling elements in the mock space habitat, but instead for physics-based digital twin, potential failures may be captured in a fault recognition and prediction. digital twin’s simulation results. Such information is useful during Unfortunately, genetic algorithms like NeuroEvolution tend to the agent’s training and deployment phases. Purely data-driven 19,20 converge upon a single solution . In a system as complex as models may not have failure data available during training. Such failure data may not be easily attainable from an experimental the one designed and implemented in this work, a single solution setup with expensive hardware assets . is not capable of understanding the varied dynamics while still Prior research has found that purely physics-based models or being computationally feasible . Research into expanding the hybrid physics-based and data-driven models hold certain key diversity of the solution space is ongoing, with promising 17,20,22 advantages not found with purely data-driven modeling results . For the ReLAAs, two common techniques are Department of Autonomy and Data Science, PacMar Technologies, Honolulu, HI, USA. email: moverlin@gmail.com; siannucci@pacmartech.com Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA 1234567890():,; M. Overlin et al. implemented, the clustering and fitness sharing, both of which are placed in states where power consumption is approximately equal designed to promote diversity and outliers during training. to the system’s capacity. This work introduces a framework to develop and test an active fault detection strategy on a physical demonstration system. First, in Thermal control system (TCS) Methods section, the design and construction of the physical mock The TCS was designed, built, instrumented, and operated as one space habitat for life sustainment is outlined as well as the sensors of the sub-systems in the mock space habitat. The TCS in this work and tools required to measure and communicate the physical is different from more practical systems which may be integrated system state to the software implementation. Also, in Methods with ECLSSs in modern spacecraft . Certain assumptions were section, a brief description is given for the following project tasks: made and should be noted. There is a single closed loop of middleware implementation, digital twin development, fault circulating water, which would not be practically implemented on elicitation and reinforcement learning. In Results section, results spacecraft. Typically, there would be internal and external loops, are shared from the experimental operation and fault emulation. and other fluids would be circulated through these loops such as a Results from the deployment of multiple ReLAAs are also presented. propylene glycol mixture or ammonia which have lower freezing Finally, a discussion and conclusion are included. temperatures. Because this TCS is to be used as part of a mock space habitat for the ReLAA, simplifications in the TCS design were accepted. With the current design and implementation, a variety METHODS of faults could still be realized in the experimental setup. Physical demonstration system In the TCS, the temperature measurements are insightful since Each ReLAA developed in this work was tested and validated with the goal of the TCS is to regulate the habitat’s room temperature. live sensor stream data collected from the mock space habitat. The habitat temperature is ultimately regulated with the proper The experimental setup is described as a system of systems: operation of the TCS, and this is achieved with additional thermal control system (TCS), grey water filtering system (GWS), observability from other sensors. Pressure and flow measurements and an electrical integration model. obtained at various locations in the TCS allow for sufficient The goal of the physical demonstration system is to have a visibility. An annotated picture of the TCS is shown in Fig. 1. physical test system capable of eliciting measurable, realistic faults that are representative of an actual space habitat. Grey water filtering system (GWS) Considerations in the design must be made for not only the The GWS was designed based on a water recycling system actuation of faults within the system(s) but also the measurement designed and operated in prior NASA work . The components in and detection of the faults. The measurement and detection the GWS were sized to deliver roughly 21 L/h of potable water. A capability, primarily achieved with a variety of sensors, allows for simplified schematic of the GWS is shown in Fig. 2. When filtering the physical system to be integrated with a physics-based model grey water, the forward osmosis (FO) module is first used. Water to create a unified digital twin. The large volumes of data, flows through two different paths within the FO module, an inner collected via instrumentation hardware in the mock space path (consisting of feed/dirty water) and an outer path (consisting habitat, will be key in training the ReLAAs and also validating of draw/salt water) in the opposite direction. These two cavities the digital twin used by the ReLAAs. containing water flow are separated by a semi-permeable An empty, isolated room was re-purposed to serve as the mock 3 membrane. Due to the osmotic pressure differential between space habitat in this work with an assumed volume of 28 m and 3 the feed water and draw water, water passes through this 9.3 m of habitable space per occupant. Some prior work has membrane from the feed to draw side and ultimately into the investigated many of the factors that would lead to a certain draw solution (DS) tank. A majority of the contaminants would be habitable volume, and has suggested a lower limit for the removed in this filtering process with the resulting product from habitable volume given a certain number of days for a crewed 23 3 the FO being a salt water solution. voyage . A volume of 9.3 m would roughly translate to a crewed Water is then pumped out of the DS tank and through a reverse duration of 17 days (or fewer). Thus, the decision was made to osmosis (RO) module. The RO module uses hydraulic pressure consider a habitat capable of sustaining three personnel. Then, the rather than osmotic pressure to force a solution through a semi- number of habitat occupants (3) was used as the basis for sizing permeable membrane. The RO has one input and two outputs, the GWS. Waste produced by each occupant is assumed to be with potable water exiting one output and rejecting water cycling 7L/day/person. With the assumption that the GWS would be back to the DS tank. For every 1 L of potable water flowing into processing a day’s volume of water in 1 h, a through-flow rate of the product tank, 2 L of water re-enter the DS tank from the 21 L/h is assumed. The estimated maximum power consumed by rejection of the RO module. A variety of sensors are included in the whole GWS is 250 W. The room’s temperature would be the experimental setup for the GWS: pressure, flow, total dissolved controlled with the TCS so that a habitat temperature of 20 °C is solids, electric power consumption, and tank water level. maintained. Altogether, a maximum power of 560 W is assumed from the TCS. These design decisions spurred an initially estimated Experimental instrumentation, data acquisition, and control power draw for each system to ensure appropriate relative power draws and the appropriate consideration of components. These The experimental setup integrates the data collection and power draws were then used to size the electrical system. Given operation activities of the physical demonstration system to a 250 W from the GWS, 560 W from the TCS and 200 W from a load sensor and computational hub that can remotely monitor and bank, the electrical system was sized to supply 1 kW of power to actuate the system. This computational hub uses NASA’s core the whole mock space habitat. Flight System middleware software architecture to ingest sensor As the scope of this project is limited to the aforementioned data, process raw voltages to physical values, and record historical systems, other potential systems typically found in an Environ- data for use by a digital twin. Further developments were also mental Control and Life Support Systems (ECLSS) will be emulated accomplished to allow a terminal user to have remote access to in the experimental setup with the 200 W load bank. The load the system, control programmable actuators on the system, and bank was sized to account for the difference between the GWS synchronize data to a cloud storage service. The sensor and and TCS power draws and the capacity of the DC power source to control hub computer is a Raspberry Pi 4b single-board computer ensure that all habitat systems cannot be powered simulta- that communicates with two Arduino Mega microcontroller neously. This will enable faults to cascade when the system is boards via USB. One microcontroller is used for reading sensor npj Microgravity (2023) 15 Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA 1234567890():,; M. Overlin et al. Fig. 2 In the grey water filtering system (GWS), grey water is sourced from a feed tank, filtered through a forward osmosis (FO) module, filtered through a reverse osmosis (RO) module, and finally fed into the product tank as potable water. a A simplified schematic illustrates the operation of the GWS. Essentially, there are 3 loops in which water flows. The FO and RO modules are key components. b The GWS was mostly assembled, installed, and instrumented on one wall within the mock space habitat. The GWS’s feed tank (left) and product tank (right) are out of view in this picture. (The annotated picture is provided by PacMar Technologies and used with permission.). Fig. 1 In the thermal control system (TCS), water is circulated through a closed loop to regulate the mock space habitat’s room temperature. a A simplified schematic illustrates the operation of Using the Functional Mock-up Interface (FMI) standard, the the TCS by showing how important components (heat exchanger, model was compiled as a functional mock-up unit (FMU) for pump, chiller, and heater) are arranged in the loop. b The TCS was use by a ReLAA . The FMI standard is often used to simplify the mostly assembled, installed, and instrumented on one wall within creation, storage, exchange, and use of dynamic system models so the mock space habitat (chiller not shown). (The annotated picture that they may be flexibly simulated on a variety of computational is provided by PacMar Technologies and used with permission.). platforms. When using the FMU, the ReLAA would specify input stimuli, set points, and other necessary information needed to values, and the other is used for issuing commands to the specify a what-if scenario. With the integrated digital twin model demonstration system’s programmable actuators. implemented as an FMU, the ReLAA could be deployed offline to recognize faults in previously captured data or online to forward- Digital twin development look from a present snapshot of data. The integrated physics-based model (TCS, GWS, and electrical subsystem model), simulated in the MathWorks Simscape Fault elicitation environment, is referred to as a digital twin because it is a virtual A failure modes and effects analysis was conducted to identify representation of the mock space habitat that is updated from faults of interest within the system(s). These faults inform where real-time data and used to inform decision-making processes 26,27 and how perturbations are applied to the mock habitat and how (fault recognition and prediction) . In this work, the integrated they will be measured, and the expected system response given the model’s parameters were adjusted so that its simulation results operational scenario. Table 1 highlights some faults to be emulated would agree with experimentally captured data from the habitat. in the mock habitat and the associated perturbation mechanism. Because the ReLAA would need the ability to forward-look and For binary mechanisms like switches and relays, the agent has the anticipate faults, the digital twin would need to execute ability to toggle the position between ON and OFF states. For simulations faster than real time. The difference between simulated and measured waveforms was quantified, on average, continuously adjustable values such as variable resistors and values with a mean absolute deviation of 7% or less. There is often a the agent had the ability to discretely increment the actuator by a tradeoff between model accuracy and simulation speed, and this set value. That is, the rightmost column in Table 1 lists the actuation work prioritized simulation speed, though the model’s accuracy points in the experimental setup where the ReLAA may affect was found to be satisfactory. change with its actions. The faults and mechanisms outlined in Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA npj Microgravity (2023) 15 M. Overlin et al. KL divergence metric, the fitness function is scaled by γ, which is Table 1. Identified faults through the failure modes and effects computed as shown in Eq. (3): analysis and their associated mechanisms where the ReLAA can affect change in the experimental setup with its actions. γ ¼ fKLðπ ; π ÞjKLðπ ; π Þ < δg P Q P Q (3) Subsystem Faults Mechanism This scaling factor γ is applied such that the ith agent has a GWS Membrane fouling, clogging, Valves −1 restrictions, leaks fitness of F , but a scaled fitness of F γ . This scaled fitness i i TCS Freezing Chiller, external heater accounts for diversity between agents and is the metric used to Blockages, leaks Valves select those agents which will parent subsequent generations of Electrical Power spikes Variable resistors agents or be used in deployment. If a population member Q is Shorts Circuit breaker within δ KL divergence it is considered a neighbor and included in Power loss DC power supply the distance scalar. Population members that are similar to each Sensors Sensor failure Disconnect from power, other will not survive to the next generation, increasing diversity software logic in the population. Sensor drift Variable resistors, The agent learns a mapping from the state space to the action software logic space using guidance from a fitness function. The state space of Misc. LSS systems Load spikes Load bank the adversarial agent is the sensor data that would be mirrored in the real system. The action space is the set of components that the agent could perturb to cause a fault. For example, this includes Table 1 were evaluated individually to uncover specificfault each of the pipes it can clog and the filters it can foul. Each agent is responses, as well as combinations of perturbations to elicit a feed-forward neural network with inputs of the state-space sensor cascaded faults. This fault analysis informed a design of experi- measurements, a list of 50 values. It has four layers, with a SoftMax ments that was later executed as part of this work. output layer that has the same length as the number of possible faults, an array with 14 units. The output vector contains floats from Reinforcement learning 0 to 1 that represent the percentage probability that an action is This framework uses reinforcement-learned adversarial agents to taken. Then, when assuming a normal distribution, an action is learn perturbations to the digital twin that cause faults. The selected. After this selection, the action is then issued to the digital adversarial agent executes forward simulations while performing twin or the mock space habitat. When the action is issued, a fault these perturbations on the digital twin to predict faults early, may or may not occur. If the ReLAA is well-trained (indicated with a thereby identifying latent conditions which could lead to future high fitness), then it is more successful in causing faults with the faults. The adversarial agent is trained using a neuroevolutionary actions selected from its output vector of probabilities. approach to learn how to cause and diagnose faults in the Before training, a wide variety of experiments were performed system. Here, a neural network represents the learned policy to capture several possible healthy and faulted operational while an evolutionary algorithm iteratively optimizes the policy. scenarios. In this data which shows actions and sensor data The validated digital twin allows for the use of a genetic together, the ReLAA learns this mapping. That is, the agent was algorithm for training. Without it, an intractable amount of trained to find actions that push the system into failure. Agents experimentally captured data from the mock space habitat only act during a limited time horizon to encourage the discovery would be necessary. The digital twin allows for the agents to of imminent failure cases. break the system during training while not actually harming any The digital twin can be simulated with a real-time factor of physical systems. The verification of the demonstration system 20 to 1. To train agents efficiently, each training run is conducted and the speed of simulation become essential to the quality of on a standalone CPU process, allowing for 24 branches to be the adversarial agent. simulated at once. Due to the total policy of the agents being In addition, fitness sharing is introduced after clustering to spread over the entire population, parallelization allows for the increase diversity in the population. In fitness sharing, each current state space to be subject to numerous operational population member scales its fitness based on its proximity to scenarios, leading to varied possible fault generation events. population members. Therefore, densely packed population Overall, the parallelization enables training to be conducted up to members have a lower fitness value than comparably good 18× faster than without parallelization. solutions in sparsely populated regions. The distance is the KL The flow of data during training and deployment is shown in divergence between solutions, each represented by a discrete Fig. 3. The iterative neuroevolutionary optimization occurs in the distribution based on states . In general, the KL divergence training block on the left, while data generated by agents is saved between two probability distributions P and Q is computed as for future use in the deployment. Once the agents are trained, shown below in Eq. (1): each can be loaded into the deployment framework where it will be given the chance to perturb simulation events from the current pðiÞ KLðP; QÞ¼ pðiÞ ln (1) simulation state, as indicated in live sensor data provided to a qðiÞ deployed ReLAA. Note that the ReLAA can be provided with past or present sensor data and a digital twin simulation with 0 time The policy calculates a discrete distribution based on the current steps can be performed to identify past or present faults. Of state, therefore to compute the KL divergence between two course, the results of the 0-time step simulation are trivial. From policies, denoted by π and π . A set of recent states, s ∈ S,isused P Q the digital twin’s initial state, the simulation would illustrate the to get divergence by running the population member through an time evolution into another state- the same state since the optimization problem. That is, the quantity computed in Eq. (2)is to simulation is performed for 0 time steps. The simulated data is be maximized to ensure diversity between two policies π and π . P Q then analyzed to identify faults. A database of historical data is generated during the training phase due to the limitations of the KLðπ ; π Þ¼ KLðπ ðsÞ; π ðsÞÞ P Q P Q (2) jjSjj FMU software standard and sensors. In order to generate a s2S simulation from a complex state, the sensor values are fed into a Next, a population member’s fitness scales with respect to the Ball Tree algorithm. This structure finds the closest internal state distance of all nearby population members. Therefore, with the that can be loaded into the FMU given the current sensor state. npj Microgravity (2023) 15 Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA M. Overlin et al. Fig. 3 Multiple ReLAAs are trained before they are deployed to detect faults. The FMU (the implementation form of the digital twin) is then RESULTS simulated to identify faults and appropriately notify a user. The mock space habitat is operated in a variety of conditions to emulate the normal and faulted operations of an actual space habitat. ReLAA’s rewards and fitness The fault detection methods shared in this work, using ReLAAs, will identify actions that lead to faulted operation. This is achieved Mock space habitat operation and fault emulation utilizing a digital twin rather than taking potentially harmful Twenty different experiments were conducted, each containing actions in the mock space habitat. The ReLAA agent is adversarial several disturbances to allow the mock habitat to operate in in nature and earns a reward (during its training process) when different states. Measurement data from the TCS and GWS are damaging actions are found. The reward function at any given shown in Fig. 4. state, r is defined below where, m, is the number of features First, an experiment was designed and performed over several measured. For each feature, there is a reward. For a single feature hours with regard to the TCS. The time evolution of the x , the feature reward depends upon if this feature value is above, temperatures in the TCS is shown in the first plot in Fig. 4. The below, or within upper and lower operational bounds. This is true TCS is allowed to reach a steady state, but then certain faults are emulated: sudden pump blockage, reduced flow through for all features when the current state value is within the features chiller, bypass valve opening/closing. When such faults are lower bound and upper bounds, x and x . The equation below is l u actuated, there is usually a deviation in the room temperature, used to compute a feature reward. which is undesirable. 1; if x >x Second, an experiment was performed with regard to the GWS, > v u and a relevant plot of pressures is shown in Fig. 4. In the GWS, its 1; if x <x r ¼ v l (4) normal or faulted operation can be visualized in these pressure x x x x : u v v l maxf ; g; otherwise 1 1 measurements obtained at various points in the experimental ðxuxlÞ ðxuxlÞ 2 2 setup. During healthy operation, the pressures are generally the largest. With the presence of a clog, leak, or other damage, several Then, the reward function for a particular state r can be pressure measurements typically deviate from their nominal computed. r depends on all r values as shown in the equation s v values. Measurement data were collected during various testing below: conditions—healthy, drifted, or faulted—allowing for an operation baseline to be established for agent training and deployment. r ¼ ð1  r Þ (5) s v v¼1 Fault detection and prediction Training was conducted on a validated digital twin, assessing 36 Finally, the fitness of a given agent, F , depends on all rewards r i s agents per epoch. On each generation, agents were sorted into a in a simulation run. goal of six clusters, with each top 2 performing agents in each cluster chosen as parents for the successive generation. Each agent’s parameters were subject to Gaussian noise with a F ¼ r i s (6) jjSjj maximum variability of 0.25 with a standard deviation of 0.10 s2S (each parameter is between –1 and 1). From 12 parents, a total of 36 agents were to be tested in the next training epoch. The The fitness for one ReLAA is ultimately the metric used to performance of an agent is quantified as the cumulative sum of judge whether or not the ReLAA is used to be deployed or the rewards over the 10 testing runs of that generation. worthy of parenting subsequent generations in the neuroevolu- The fitness of the top performer in each generation is shown in tionary process. Fig. 5. Over the scheduled 100 generations, the performance of the top agents steadily improved until the 65th epoch where the Reporting summary population reached its maximum reward. After this peak, the Further information on research design is available in the Nature next 40 generations show a leveling-off in performance as Research Reporting Summary linked to this article. agents find optimal solutions. In addition, the diversity of each Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA npj Microgravity (2023) 15 M. Overlin et al. Fig. 4 Experimentally captured data from the mock space habitat illustrate the operation during normal and faulted conditions. a Normal and faulted operation is shown for a variety of faults emulated in the TCS. b Pressures throughout the GWS change in response to normal and faulted conditions. agent is calculated at each generation. Some distance between level, triggering the pumps to activate. The GWS pumps draw clusters is desirable and incentivized through the neuroevolu- more power than typical to push water through the clog. This tionary structure. Relative distances between clusters increase by results in a decrease in the habitat’s DC bus voltage. With a lower a factor of 20 throughout the first 20 generations as the system operating voltage for the TCS’s centrifugal pump, the pump performs its state space exploration. With many agents trained cannot provide a sufficient pressure differential. For this reason, to discover faulted behavior, the pool of agents used for fault flow is reduced, and the room temperature cannot be effectively detection is considered diverse. Following a peak in diversity regulated with the TCS. In this particular example of an indirect around the 20th epoch, the clusters begin to approximately fault, a clog in the GWS has indirectly prohibited the TCS from converge again to a steady state distance. During this time, the regulating the habitat temperature. fitness of each cluster is improving as denoted by the lightening Agent 2 (blue rollout in Fig. 6) simulates the clogged strainer of each dot’s color. within the TCS. This filter is located on the intake of the TCS pump. When properly trained, the agents will issue actions to the As the clog builds, pressure slowly drops as there is a linear system which leads to faulted behavior. For example, Fig. 6 shows downtrend to the blue, jagged output. When the clogged strainer an agent’s rollout in an operational scenario. As the flow meter fault is further actuated, it is not the ideal choice to break the 2 sensor measurement (shown in black in Fig. 6) would indicate, system. This demonstrates a key contribution to this active the TCS in the mock space habitat is operating satisfactorily, learning framework. Multiple underlying component issues, within healthy operating bounds (shown in red in Fig. 6). where none are independently causing a component fault, can When testing two of our agents on simulation-derived training lead to a full system fault. As demonstrated in this work, our active data, both are able to force the system to a fault using two fault detection framework with the adversarial learning agent will separate actuation methods. In the testing scenario, a clog within predict these hard-to-discover faults. the RO module is actuated at a rate of 0.5% per second until it reaches an over-fault percentage of 30. This slight disturbance is DISCUSSION recognized by the adversarial agents, and they each present a unique solution. The neuroevolutionary training of a population of adversarial Agent 1 (green rollout in Fig. 6) actuates the RO clog fault agents was successful, as seen by increasing rewards for the within the GWS, causing a cascade effect through the electrical agents throughout the training process. In the clustering step of system. Because of the RO clog in the GWS, reduced flow is training, diversity was achieved, and the system didn’t converge observed throughout the TCS. As the clog in the RO filter to a single solution. The top agent’s reward increases steadily at becomes worse, pressure grows above the control threshold the beginning of training to a maximum reward, but then npj Microgravity (2023) 15 Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA M. Overlin et al. reaches a plateau. The inconsistent growth from epoch to epoch This approach to creating adversarial agents was successful in is a result of fitness sharing in the system due to the random reaching its goals, but has several technical limitations. Due to the noise added to the parameters on each generation. This is a complexity of the FMU simulation and the corresponding necessary concession made in the system to not optimize toward operation software, each simulation takes minutes to run and a single local minimum. generates substantial amounts of data. This limits the amount of Fig. 5 KL divergence during adversarial agent training. Average reward is represented by the dot color, with lighter being higher performers. Fig. 6 ReLAA rollout shown for TCS flow meter 2. A successfully trained ReLAA issues actions to the system which allow for faulted operation: the reduction in flow (outside of satisfactory operating bounds shown in red) throughout the TCS as shown in green. Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA npj Microgravity (2023) 15 M. Overlin et al. training that can be conducted and how much data can be stored 11. Iverson, D. et al. General purpose data-driven system monitoring for space operations. J. Aerosp. Comput. Inf. Commun. 9,26–44 (2012). for the deployment framework. 12. Spirkovska, L. et al. Anomaly detection for next-generation space launch ground The digital twins were leveraged to allow for the creation of a true operations. Proceedings of the AIAA SpaceOps 2010 Conference (AIAA, Huntsville, adversarial agent. It allows agents to learn how to break the system AL, 2010). components repeatedly without lasting impact, physical or monetary. 13. Wang, J., Li, Y., Gao, R. & Zhang, F. Hybrid physics-based and data-driven models This process is key when developing intelligent systems that allow for for smart manufacturing: modelling, simulation, and explainability. J. Manuf. Syst. limited or no data collection before deployment. The simulation 63, 381–391 (2022). provides for faster-than-real-time operation, pivotal when attempting 14. Rackauckas, C. et al. Universal Differential Equations for Scientific Machine to accurately predict and observe possible latent faults. Learning. Proceedings of the National Academy of Sciences of the United States of In this work, a framework for active fault detection is America. https://doi.org/10.21203/rs.3.rs-55125/v1 (2020). proposed, implemented and demonstrated. A mock space 15. Uhlemann, T. H.-J., Schock, C., Lehmann, C., Freiberger, S. & Steinhilper, R. The digital twin: demonstrating the potential of real time data acquisition in pro- habitat was designed, built, instrumented, and operated to duction systems. Procedia Manuf. 9, 113–120 (2017). enable the demonstration of the fault detection method 16. Stanley, K., Clune, J., Lehman, J. & Miikkulainen, R. Designing neural networks developed in this work. A suitable FMEA analysis was conducted through neuroevolution. Nat. Mach. Intell. 1,24–35 (2019). to identify damaging faults that would cause irreparable 17. Papavasileiou, E., Cornelis, J. & Jansen, B. A systematic literature review of the suc- damage to the mock space habitat. A design of experiments cessors of “neuroevolution of augmenting topologies”. Evol. Comput. 29,1–73 (2021). was executed to exercise the experimental setup in a variety of 18. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT Press, operating conditions, normal and faulted. A digital twin model, 2018). simulated as an FMU, was validated with the mock habitat’s 19. Such, F. P. et al. Deep neuroevolution: genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. Preprint measurement data. at arXiv:1712.06567 [cs] (2018). With a large volume of data collected, multiple ReLAAs were 20. Ibrahim, M. Y., Sridhar, R., Geetha, T. V. & Deepika, S. S. Advances in neuroevo- trained to recognize normal and faulted behavior. The training lution through augmenting topologies – a case study. 2019 11th International for these ReLAAs was completed in several generations using a Conference on Advanced Computing (ICoAC) 111–116. https://doi.org/10.1109/ neuroevolutionary process so that diversity could be achieved. ICoAC48765.2019.246825 (2019). Then, the agents were deployed to discover damaging actions 21. Miller, B. L. & Shaw, M. J. Genetic algorithms with dynamic niche sharing for that could damage the mock space habitat. Because there are multimodal function optimization. Proceedings of IEEE International Conference on multiple agents, a variety of damaging actions were found. With Evolutionary Computation (IEEE, 1996). these vulnerabilities discovered, a system operator can then use 22. Chang, P.-C., Huang, W.-H. & Ting, C.-J. Dynamic diversity control in genetic algorithm for mining unsearched solution space in TSP problems. Expert Syst. this information to take proper action, thereby mitigating or Appl. 37, 1863–1878 (2010). preventing faults. 23. Simon, M. Whitmire, A., Otto, C. & Neubek, D. Factors impacting habitable volume requirements: results from the 2011 Habitable Volume Workshop. National Aeronautics and Space Administration (NASA), Center for Advanced Space Studies- DATA AVAILABILITY Universities Space Research Association (2011). The simulation data and experimental data are available from the corresponding 24. Caldwell, S. & Dunbar, B. National Aeronautics and Space Administration (NASA). author upon request. “7.0 Thermal Control.” Updated 4 November 2021. https://www.nasa.gov/ smallsat-institute/sst-soa/thermal-control (2021). 25. Indranil, R., Hafiychuk, V., & Goebel, K. Model-based diagnosis and prognosis of a CODE AVAILABILITY water recycling system. IEEE Aerospace Conference (2013). The design files, simulation models, and software that support this work are available 26. Gelernter, D. Mirror Worlds or the Day Software Puts the Universe in a Shoebox: from the corresponding author upon request. How Will It Happen and What It Will Mean. ISBN: 0195068122 (Oxford University Press, Inc., 1991). 27. International Business Machine Corporation (IBM). How does a digital twin work? Received: 27 June 2022; Accepted: 10 January 2023; https://www.ibm.com/topics/what-is-a-digital-twin. 28. Functional Mock-up Interface (FMI). https://fmi-standard.org/ (2022). 29. Kullback, S. & Leibler, R. A. On information and sufficiency. Ann. Math. Statist. 22, 79–86 (1951). REFERENCES 1. Beard, S. S. & Starzyk, J. Space tourism market study: orbital space travel & ACKNOWLEDGEMENTS destinations with suborbital space travel. 1–72 (Futron Corp., 2002). This work has been supported SBIR contract number 80NSSC-20-C-0129. The authors 2. Ballard, R. & Connolly, J. US/USSR joint research in space biology and medicine on thank the National Aeronautics and Space Administration (NASA), especially Cosmos biosatellites. FASEB J. 4,5–9 (1990). Dr Rodney Martin and Dr Craig Moore, for their financial and technical support. 3. UBS Investment Bank. Future of Space Tourism: Lifting Off? Or has COVID-19 The authors also thank Sierra Nevada Corporation (SNC) for their technical guidance stunted adoption? 20 July. https://www.ubs.com/global/en/investment-bank/in- when designing the mock space habitat and serving as a sub-contractor for this work. focus/2021/space-tourism.html (2021). Finally, the authors thank other personnel at Martin Defense Group who have 4. Antonsen, E. L. et al. Estimating medical risk in human spaceflight. npj Micro- provided technical feedback and guidance throughout the duration of this work, gravity 8, 8 (2022). 5. Tang, S. et al. Operation-aware ISHM for environmental control and life support in especially William Curran who was involved at the beginning of this work. deep space habitants. https://doi.org/10.2514/6.2018-1365 (2018). 6. Abid, A., Khan, M. T. & Iqbal, J. A review on fault detection and diagnosis tech- niques: basics and beyond. Artif. Intell. Rev. 54, 3639–3664 (2021). AUTHOR CONTRIBUTIONS 7. Daigle, M. et al. A comprehensive diagnosis methodology for complex hybrid M.O. helped build and instrument the mock space habitat, served as the principal systems. IEEE Transactions on Systems, Man, and Cybernetics – Part A: Systems and investigator, and helped prepare this article. S.I. developed software for the ReLAA Humans (2010). and helped prepare this article. B.W. helped with the design, simulation, building, and 8. Jiang, J. & Yu, X. Fault-tolerant control systems: a comparative study between testing of the mock space habitat. A.M. helped with the data analysis, middleware active and passive approaches. Ann. Rev. Control 36,60–72 (2012). development, and execution of experiments. J.P. initiated and supervised this work. 9. Mustapha, S., Lu, Y., Ng, C.-T. & Malinowski, P. Sensor networks for structures health monitoring: placement, implementations, and challenges–areview. Vibration 4, 551–585 (2021). COMPETING INTERESTS 10. Colombano, S. et al. A system for fault management for NASA’s deep space habitat. International Conference on Environmental Systems (ICES) (2013). The authors declare no competing interests. npj Microgravity (2023) 15 Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA M. Overlin et al. ADDITIONAL INFORMATION Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, Supplementary information The online version contains supplementary material adaptation, distribution and reproduction in any medium or format, as long as you give available at https://doi.org/10.1038/s41526-023-00252-9. appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party Correspondence and requests for materials should be addressed to Matthew Overlin material in this article are included in the article’s Creative Commons license, unless or Steven Iannucci. indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory Reprints and permission information is available at http://www.nature.com/ regulation or exceeds the permitted use, you will need to obtain permission directly reprints from the copyright holder. To view a copy of this license, visit http:// creativecommons.org/licenses/by/4.0/. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. © The Author(s) 2023 Published in cooperation with the Biodesign Institute at Arizona State University, with the support of NASA npj Microgravity (2023) 15

Journal

npj MicrogravitySpringer Journals

Published: Feb 13, 2023

References