Over Saturated, Dense, Context-dependent: Cybersecurity and The Market for Personal Data

The development of information and communication technology in the modern economy has created opportunities for businesses to provide customised products, services and experiences to their clients. This customisation became possible due to large volumes of (personal) data which customers generate on a day-to-day basis and which businesses collect, store and analyse. For many businesses, the future relies on their ability to process the data in order to accurately predict consumer preferences and create personalised products, services and experiences in the most cost-effective way. This task is particularly important in the COVID-19 aftermath as data now becomes more and more important. Why? In the current restricted face-to-face communication conditions, most brand-building, reputation-building, as well as customer engagement activities critically depend on the ability of companies to understand the customer just by looking at this customer's data.

Yet, at the moment, despite the catalization of efforts as a result of the COVID pandemic, data-driven business models through personalisation are still in their infancy as even companies with access to large amounts of data struggle to create reliable forecasts of future customer wants and needs in order to quickly react to changes in market trends. One of the most notable examples of forecasting inefficiency are so-called recommendation systems (available via major retailers) which are supposed to make suggestions about what a customer might like to purchase in the future, but which are, in fact, rarely highly regarded by the customers. We have done many tests of customer satisfaction with suggestion/recommendation systems in my lab, and the average satisfaction rates reported by the customers across major retail brands are between 13% and 28%. Furthermore, we also do not see a development of effective markets for data where consumers of goods and services (users) would trade their self-generated data with producers of goods and services (providers). This, in turn, inhibits an effective use of data as a service. At the same time, personal data is one of the major targets for cybercriminals and other types of cyber adversaries, who creatively navigate the gaps in the current data markets and make them work to their advantage (through tools like ransomware, for example).

Current Market for Data: Value and Worth

Let us first consider the current market for data. In this market, users supply data and providers demand data. For the purposes of this argument let us concentrate on user self-generated data which may include personal data (data reflecting behaviour of an individual user) or social data (the data for the whole household, etc.). Providers demand the data and are willing to pay the demand price (some positive number, say PD, which is substantially greater than 0) for the data (this is how much the data is worth to providers). This price is relatively high as it allows providers to offer better (more personalised) goods and services to users and increase providers’ profitability via better understanding user demand for goods and services as well as via increasing user value. We define providers broadly – this could be companies which trade data, data analysts, app developers and providers of goods/services.

Users are willing to offer data at a supply price PS which is perceived by them as very low. In fact, it is often equal to 0 or very close to 0 (this is how much the data is worth to users). In practice, this price is not expressed in monetary terms, i.e., users do not directly receive any money from the providers. Instead, it reflects the “cost” of data to users in terms of, e.g., loss of privacy, etc.

Abstracting from different types of data as well as from different ways in which the data is perceived by users and providers, the level of PD and PS remains stable irrespective of the quality of the data as a service. The data as a service variable depicts how effectively available data can be converted into meaningful business models (provision mechanisms). In other words, it reflects the value of the data for providers and users on the market. Note, that the more the data is valuable for the users and providers, the more it is potentially worth for cyber adversaries.

However, let's imagine for a moment that the value of data can be the same for providers and users for the following reason. If providers receive valuable data about user behaviour, they will be able to provide better (more personalised) goods and services to the users. Under these conditions, what is important is the quality of data. In other words, the higher the quality, the more valuable the data is. The data of higher quality which produces better predictions of behaviour and lead to an increase in user wellbeing and provider profitability should be valued higher by both sides of the market (users and providers). In practice, there is, of course, a lot of uncertainty as to the value of the data. Yet, this question requires a separate investigation and for the purposes of this argument let's just put is aside.

One thing is clear - the current market for personal data is inefficient: since the disparity between the supply and demand price for data is very large, the data is not traded properly. In principle, providers are willing to pay PD to obtain the data, but users are offering the data at a very low price PS which means that providers can either (a) obtain the data themselves at a very low (or even zero) price in which case they receive a profit margin of PD–PS>0 (e.g., Google, Facebook, etc.); or (b) purchase the data from other providers (intermediaries) at PD in which case intermediaries (same old Google, Facebook, etc.) receive a profit margin PD–PS>0. Note that the obtained/purchased data can be of low or high quality as captured by the data as a service variable and the demand/supply price does not depend on it. Note that the data could be highly saturated or dense, context-free or context-dependent, yet, at the moment, the differentiation between these data types is extremely problematic as the value of a particular data set depends almost entirely on the perception about the quality of insights, which could be derived from it.

Future Markets for Data Ignoring Behavioural HDI

In recent years, various issues were raised with regard to supply price for data. Specifically, the development of new technologies resulting in concerns about data ownership, data privacy, as well as the inequality between users and providers in terms of profit distribution from data usage. Under these circumstances, user perceptions of data markets have changed giving rise to scepticism about the potential of trading data with providers. According to this view, providers in the future will still be willing to purchase data at a demand price PD. At the same time, the supply price PS for users will range from very low for less valuable data to high for more valuable data. Therefore, users will only trade the data with providers at an equilibrium price PE at the intersection of supply and demand price functions. Effectively, this means that in order to trade, users would need to provide data of high quality, exert a significant amount of effort to accumulate the data, and engage with providers. This creates serious objections to direct user-provider markets for data since the potential logistical costs of users engaging with providers is very high and very few users would be able to engage with trading data. However, applying such a model of market relationships would not be correct because it does not capture the complex human-data interactions within the digital economy.

Yet, if we imagine that the data quality is not the same and there are types of data, which are distinct from each other based on the data quality (as measured by the potential insight, which might be generated from the data), the picture is becoming very different. This differentiation will have several important implications not only for new business models but also for research and practice of data collection visibility, data ownership structure, platform trade-offs and, ultimately, for the security of personal (customer) data.

Current systems often collect data in ways which are subtle to users: many people do not realise that their supermarket or coffee shop club cards, smartphones or social media webpages constantly collect and accumulate their personal data. Even though providers seem to believe that users prefer subtle data collection to visible (judging, for example, from the caution around the deployment of Google Glass), it is not clear whether users actually prefer devices which collect their personal data in a subtle way to those which do it in a visible way. It is also not clear whether users are more concerned about the visibility of data collection or about the possibility that a device maybe collecting information which is unknown to the user. Data quality differentiation allows us to study these issues systematically by eliciting user preferences over different types of data.

Since the supply of data is dependent on the technology, the ownership of the data often remains with the technology owner. For example, Internet search data trends are owned by large corporations (e.g., Google) or supermarket data owned by large supermarkets (e.g., Tesco) and it is often difficult or even impossible for individual users to obtain their self-generated data. Furthermore, the data collection mechanism, structure, representation, storage and, therefore, the potential applicability of the data is dependent on the technology, i.e., the nature of how the data is collected affects how it could be used. Since such data often has a vertical structure, it is primarily beneficial to companies and not to individual users. However, it is not clear whether users would be interested in having access to their own data (should they be able to view their data in a different way through novel visualisation mechanisms) or prefer to outsource data management and analysis to a third party which would then present it in a meaningful way and communicate it to each user as a set summary statistics or recommendations. Understanding these individual preferences is very important and fields like behavioural data science or behavioural analytics can provide novel data ownership solutions through increased user participation in data markets. There are also many interesting ideas which go in that direction from the AI ethics filed (e.g., the idea of bottom-up data trusts). Specifically, the main idea behind the bottom-up trusts is that "...Unlike the current ‘one size fits all’ approach to data governance, there should be a plurality of Trusts, allowing data subjects to choose a Trust that reflects their aspirations, and to switch Trusts when needed. Data Trusts may arise out of publicly or privately funded initiatives. By potentially facilitating access to ‘pre-authorised’, aggregated data (consent would be negotiated on a collective basis, according to the terms of each Trust), [the] data Trust proposal may remove key obstacles to the realisation of the potential underlying large datasets..."


Data security is closely related to the idea of data value. Currently, the value of personal data is judged subjectively and ultimately depends only on the company's or an individual's perceptions about this value. Yet, if the value of data would be judged based on the level and quality of insight derived from it, this differentiation would allow us to (i) better prioritise important versus less critical data assets within businesses; (ii) more effectively organise data collection, storage, transfer and analytics; as well as (iii) better protect the data from adversaries, who currently thrive on our cluelessness and lack or systemic approach as far as data are concerned.

#businessmodels #cybersecurity #cyberrisks #cyberthreats #datasecurity #cyberattack #dataprotection #risk #infosec #security #reputation #dataownership #dataprotection #informationsecurity