How to Build Self-Healing Cybersecurity Systems

Much of the current cybersecurity efforts in organizations is concentrated around building robustness rather than resilience. The root of the problem is often the fact that much of business understanding of cybersecurity issues revolves around treating cybersecurity as a purely "technical" or "technological" problem, which could be "fixed" either by automation or by separating and limiting human action and decision making within such systems. Think, for example, about the so-called "zero trust" systems, whose purpose is to minimize human impact within cybersecurity systems. The problem, however, is that with minimization of the human impact, such systems also tend to minimize human understanding of the cybersecurity domain. And if people do not understand the risks they are encountering in cyber spaces or if they are actively discouraged from thinking about those risks, their risk taking increases which, ultimately, has detrimental effect on organizational cybersecurity. Since none of the automated or technologically-empowered protection mechanisms can achieve 100% security, too much reliance on such mechanisms might be just as bad as having no defense systems at all.

Why Human-out-of-the-loop Systems Are a Bad Idea

When thinking about human-risk interactions, much depends on how humans perceive risk, as people will always behave not based on the actual (objective) risk, but on their subjective understanding of what that risk might be. For example, even though the probability of becoming a victim of phishing attack is large, if your staff and customers believe that your security system filters fully protect them, they might be particularly risk seeking when making decisions about whether to click on email-embedded links or not. Equally, if your staff and customers overestimate the risk of a phishing attack, they will be missing important messages delivered by email out of fear of clicking "on the wrong thing".

Many of these issues originate in the desire of businesses to construct "silver bullet" detection tools, which would work every time. Yet, cybersecurity threats are often customized to infiltrate particular organizations, highly context-dependent, and often using social engineering. This mean that such "silver bullet" solutions simply do not exist making "business logic" contradict reality when cybersecurity is concerned. The “business logic” which represented the highest level of a system activity - i.e., balancing business interests, customer value, trading, and commercial behavior - is perhaps the hardest to understand when trying to map where the paths through this business are to vulnerabilities that may exist in that business logic, and ultimately lead to cybersecurity disasters. After all, it is highly likely that (after all the business-driven effort to minimize risk via internal investment, testing, and monitoring) external security experts, members of the public, or routine investigations into the organization can find things that were missed.

You cannot know for certain whether a system is secure, though there are mathematical algorithms that can prove a set of rules are secure. A cryptographic set of rules, then, can be used to determine whether an encryption is secure. Theoretically speaking, however, it is still vulnerable from a cybersecurity practitioner perspective and they must decide if the level of security s sufficient. If a system is broken, then clearly it is insecure; but if it is functioning, then it is secure to some degree that may or may not be acceptable. Another complication is that the harm from a security breach can be delayed. For example, hackers can be smartly exploiting your zero day vulnerability or they might smash the system, grab important data, but hold on to it until they see a lucrative opportunity to come along (for example, instead of launching an immediate ransomware attack, they might wait until the date of your annual report or demand ransom on the verge of merger or acquisition deal).

Self-healing Cybersecurity

Clearly, self-healing cybersecurity systems would be impossible to achieve using the human-out-of-the-loop approach, as automation, no matter how well it is designed always relies on previous data about (i) the types of threats that exist; (ii) their relative or absolute frequency; and well as (iii) the likelihood of harm associated with each treat type. Yet, such data is often scarce or even non-existent. Therefore, the future of cybersecurity management needs to build human-in-the-loop systems by adjusting organizational KPIs to include cybersecurity targets as well as include and integrate all the threat and vulnerability information from:

  • intelligence sources

  • vendor communities

  • academic communities

  • user communities

  • hacker communities

In the first instance, there is automation of detection for protection such as

firewall rule updates. Then there are more complex issues, such as a code library vulnerability that needs fixing through vendor and/or internal and external actions. These libraries are subsequently updated to ensure that any developers are using improved code. This reperesents a move towards the concept of self-healing systems, which may include using ML techniques to fix applications based on inputs from threat sources, information gathered from human interactions and other models that look at business logic as rule-based logic. The aim is to develop insights and prevent common errors in the design of coding libraries.

The Importance of Leveraging Domain Knowledge

In complex systems where you cannot anticipate every interaction, the use of

subject experts in their field, as well as non-experts with experience, can enable

a broader evaluation of this complexity similar to the concepts of crowdsourcing

ideas and solutions. A key part of this are domain knowledge experts, who can facilitate a dialog between experts, users, and business effectively, filtering and taking on board a variety of ideas.

Any application that is compromised could lead to catastrophic financial losses. The immediate risk approach would be to ask how secure the network is, then use a VPN, where are the APIs, and how can we safeguard them with secure HTTPS and secure API protocols and so on. But the question of just fixing the API may be insufficient from the viewpoint of "complete security" or, let's just call it "good enough security". The question can be reframed from: “How do we fix secure APIs?” to “How can we run insecure APIs?” This assumes that even though we are running a pretty secure ("good enough") system, there still may be vulnerabilities and attacks, which can cause harm. The question is how can the system be managed to respond to these threats? This typically generates many more ideas, potential solutions, and what-if scenarios than just a focus on "fixing" or "patching the problem with tech". This approach can also introduce other levels and layers of security to improve the resilience of the system overall.


Making a system architecture that is robust enough for certain failures is one thing; but a better approach would be to make the system more flexible, such that it can adjust itself and resolve issues and recover from attacks. Instead of patching everything, we should have more responsive support systems that can investigate and fix attacks in a more flexible way that is adaptive and learning all the time. This might also enable better investment decisions regarding cybersecurity tools and processes, which may not necessarily be just technical but also involve organizational awareness, culture, and leadership. One thing is clear - tech solutions will never be able to fully replace human touch. Hence, self-healing cybersecurity systems are only possible when humans are part of the solution.

#cybersecurity #humanfactor #cyberattack #cyberthreat #cyberrisk #infosec #cybermindset #cyberculture #resilience #robustness


© 2020 by Ganna Pogrebna and Boris Taratine