To what extent do you trust the robots?

This Sunday morning, fueled by two strong coffees, I'm cogitating on the issue of workers thoughtlessly disclosing all manner of sensitive personal or proprietary information in their queries to AI/ML/LLM systems and services run by third parties, such as ChatGPT.

This is clearly topical given :

(1) the deluge of publicity and chatter around ChatGPT right now, coupled with

(2) our natural human curiosity to explore new tech toys, plus

(3) limited appreciation of the associated information risks, and

(4) the rarity of controls such as policies and Data Leakage Protection technologies.

Furthermore, even if we do persuade our colleagues (and, let's be honest, ourselves!) to be more careful and circumspect about whatever we are typing or pasting into various online systems, the possibility remains that the general nature of our interests and queries is often sensitive.

For instance, imagine that ChatGPT's owners/admins/systems notice a peak in queries relating to malware from various workers at a given company: what does that suggest? Likewise with searches to Google, lookups on Wikipedia etc. What if recent queries/searches are heavy on, say, 'responding to incidents' and 'paying or not paying ransoms' - what does that imply?

There is significant potential here for 'traffic analysis' and other surveillance on a grand scale, even if the actual data is innocuous (which it often isn't). That makes meta-data about our queries and searches both sensitive and highly valuable.

Odd, then, that I don't recall seeing this information risk widely discussed. Perhaps I have been sound asleep at the keyboard for the decades since AltaVista (anyone else remember that?).

------------------------------------------

Out of interest, I pasted the above text verbatim into ChatGPT, asking the robot to 'rephrase it for an information risk and security blog', producing the following at its third attempt:

The Hidden Risk of Sensitive Information Disclosure in AI/ML/LLM Systems

Introduction

This Sunday morning, fueled by two strong coffees, I'm diving into the issue of workers inadvertently disclosing sensitive personal or proprietary information in their queries to AI/ML/LLM systems and services run by third parties, such as ChatGPT. This topic is crucial, given the widespread adoption of AI technologies and the potential risk to both individuals and organizations.

The Growing Concern

The rapid increase in popularity of AI systems like ChatGPT, driven by human curiosity and the desire to explore new technologies, has led to a surge in queries containing sensitive information. This phenomenon can be attributed to a lack of awareness about the associated information risks and the scarcity of controls such as policies and Data Leakage Protection (DLP) technologies.

The Subtleties of Information Disclosure

Even if we successfully encourage our colleagues (and ourselves) to be more cautious about the information we input into these online systems, the general nature of our interests and queries can still be sensitive. For example, if the operators of ChatGPT notice an uptick in malware-related queries from employees at a particular company, it could indicate potential issues. This applies to Google searches and Wikipedia lookups as well.

Traffic Analysis and Metadata Risks

This situation opens the door for traffic analysis and other forms of large-scale surveillance. Even if the actual data seems innocuous, metadata about our queries and searches can be both sensitive and highly valuable. The potential for abuse of this information has not been widely discussed, which is surprising considering the long history of search engine usage, dating back to AltaVista.

Addressing the Issue

Raising Awareness: Organizations must educate employees about the risks associated with sharing sensitive information on third-party platforms. This includes creating a strong security culture that emphasizes the importance of data protection.
Implementing Controls: Companies should establish policies and controls to prevent inadvertent data leakage. This may include using DLP technologies, monitoring data usage patterns, and restricting access to certain platforms.
Encouraging Privacy: Users should be encouraged to use private browsing modes, virtual private networks (VPNs), and other privacy-enhancing tools to limit the exposure of sensitive metadata.
Engaging with AI Providers: Companies should collaborate with AI service providers to address data privacy concerns and work towards creating secure AI systems that protect users' information.

Conclusion

As AI systems become more prevalent, it is essential to address the information risk posed by the thoughtless disclosure of sensitive data. By raising awareness, implementing controls, encouraging privacy, and engaging with AI providers, we can protect sensitive information and ensure the responsible use of these powerful technologies.

Hmmm, not too bad that, for a soulless box of electronics. It's weak on the security controls although it has come up with an approach that hadn't even occured to me ('4. Engaging with AI Providers ...'). It has vaguely misinterpreted my intended use of 'metadata' - perhaps my fault!. Keywords such as 'crucial', 'must' and 'essential' catch my beady eye too - a stylistic issue as much as the robot's over-emphasis (which may well be due to my explicitly asking it to write a blog piece). In short, I feel like a uni tutor critiquing an undergrad's term paper.