AI-assisted reporting: Why Journalists must relearn data security, privacy risks

Over the past few years, there has been a significant surge in the adoption of AI by Nigerian journalists. Like their colleagues in the Global South, they have moved from being unfamiliar with AI to full adoption of AI-assisted reporting within a very short period.

A Thomson Reuters Foundation survey report, sampling over 200 journalists from 70 countries in the Global South, including Nigeria, found that at least eight in every ten journalists use AI in their workflow. Half of these journalists use these tools daily for everything from research and brainstorming to transcription and fact-checking.

This surge, however, has been largely a grassroots movement. Most journalists using AI are self-taught, experimenting with a vast ecosystem of free, unvetted and unsanctioned AI tools. While this demonstrates incredible initiative, it also reveals a significant technical knowledge gap and a risky phenomenon of Shadow AI in the journalism ecosystem.

A recent study from Northumbria University revealed a critical vulnerability: journalists often possess a “less-than-ideal mental model” of the privacy policies and data-handling practices of the AI companies whose tools they use on a daily basis.

This technical oversight, although less popular in the AI in Journalism conversation, is a direct threat to the foundational principles of data security and privacy in the journalism practice. Journalists are unknowingly introducing an unvetted, insecure, and highly absorbent collaborator into sensitive reporting workflows, creating risks that extend far beyond the individual journalist to their newsroom, sources, legal standing, and the very integrity of the information they produce.

The Unvetted Collaborator in AI-Assisted Journalism Workflow

Traditionally, data security in journalism involved a clear set of practices: securing notes, protecting leads and sources, using secure communication channels, and maintaining confidentiality from sourcing to publication. Introducing AI, particularly tools powered by Large Language Models (LLMs), fundamentally disrupts this framework.

Many journalists, though no fault of their own, are not aware that once information is entered into a public LLM, its confidentiality can no longer be guaranteed. The business model of many AI companies relies on using the prompts, data, and conversations from users to further train their models.

With the way journalists use AI, these AI tools cease to be a passive instrument and become an independent and active participant in the reporting process. Sometimes, as a research assistant working on raw ideas and information from privileged documents and spreadsheets, as a transcription specialist transcribing unfiltered information from primary sources, or as a first draft editor reworking critical information compiled at the nascent stage of a report.

But unlike a human colleague bound by ethical obligations, these (free and public) AI tools are data sponges trained to absorb every and any information fed into them for further improvement. The sensitive details of your investigation risk being absorbed into the model, potentially surfacing later in response to another user’s query.

The gravity of this risk is best understood through a practical, real-world scenario.

Simulating an AI-Assisted Investigation Workflow

Let’s trace the workflow of a freelance investigative reporter who has received a cache of leaked documents exposing a high-profile human trafficking syndicate.

The document is a large file of haphazardly compiled, structured and unstructured information. Think email chain, chat transcripts, receipts and transaction evidence, etc. To make sense of it, the reporter pastes the entire raw text into a public AI tool, asking it to summarise the contents and extract key entities. He still goes on to use this AI tool to research and background his preliminary findings from the summary before deciding on his investigation plan.

Following leads from the AI’s summary, the reporter conducts powerful interviews with victims and witnesses. To save time, these audio files are uploaded to an AI transcription service.

To find corroborating evidence, the reporter uses an AI-powered tool to scrape social media conversations and public directories related to the suspects. The tool gathers names, usernames, posts, comments, etc., which the reporter then asks the AI to analyse for sentiment and connections.
Afterwards, the reporter compiles a first draft and uses an AI writing assistant to refine the language, check for clarity, and strengthen the narrative angle before sending it to a human editor.

In this standard AI-assisted journalism workflow, the journalist has, in at least four distinct points, handed over several pieces of sensitive information and critical components of their investigation to a system with no enforceable duty of confidentiality.

Analysing the vulnerability points and risks in the workflow

Right from when the document’s contents were submitted for a quick summary, they are no longer confidential. The AI now possesses the entire unfiltered leak. It “knows” the names of perpetrators and victims, operational details, and financial trails before the investigation has even truly begun.

A very similar thing occurs during the transcription task, where audio containing unfiltered, emotional, and personally identifiable information (PII), like name, location, age, etc, was fed into an AI model. All this raw intelligence from primary sources is now stored on a third-party server and likely used to train the LLMs to answer other users’ queries better.

AI-powered scraping and analysis is another realm of data security risks and privacy concerns. Web scraping, whose legitimacy is still debated in some instances, collects varying levels of quantitative and qualitative data, from negligible datasets to high-risk corpora of sensitive information.

The simple task of scraping a webpage or social media posts and comments could amount to the reporter becoming a data controller, subject to compliance with the national data protection laws. Although the NDPA still subjects data processing by journalists to a public interest exemption, the use of AI tools in the mix puts it in a tricky position. The tool not only automates data processing, which should require special permissions, but the journalist also can’t totally ensure the principles of data protection regarding purpose and storage limitations. These principles ensure that the processing of collected data is only for the identified purpose and is stored only for the required time.

The last vulnerability point is in refining the first draft with AI. A human editor ideally provides a critical security check, identifying details that might inadvertently expose a source or place the reporter in danger. An AI is optimised for clarity and flow, not for the security of its user. By feeding it the draft, the reporter has revealed the entire story arc, the specific evidence being used, and the final angle of the investigation to the AI model before publication.

More Security Risks to Journalists

The security risks are not limited to the data of a single story; they extend to the journalist, too. AI tools are designed to learn from user interaction to customise the experience. With consistent use of an AI tool for journalistic work, it builds a detailed profile of you to serve you better. It learns you’re an investigative journalist, what topics you specialise in (e.g., political corruption, security sector abuses), the regions you focus on, and even your style of inquiry. This is a real security risk.

Every prompt, communicating your core ideas, methodologies, and lines of investigation to an AI assistant, creates a detailed digital footprint of your work. This is accessible to the AI company and potentially to others through legal requests or data breaches.

AI companies are not known for championing privacy principles, nor are they immune to security breaches. Even advanced and high-profile LLMs like Grok AI and Meta AI recently had private chats of users leak to the public. OpenAI has also faced several breach incidents.

OpenAI CEO Sam Altman’s recent warning of ChatGPT’s lack of legal confidentiality is a good reminder that a government wanting to monitor an investigative journalist could simply issue a legal request to the AI company for the journalist’s entire history, instead of a complex cyberattack, which could still be used.

In an era of digital surveillance and an environment where a journalist’s work can place them at risk of targeted attacks and forced disappearance, these represent a significant personal security threat.

These valid concerns are still aside from the growing cybersecurity threat surrounding AI use. AI user-targeted cyber attacks keep growing by the day, especially with the advent of AI agents. From model poisoning to prompt injection attacks and outright agent hijacking

Way forward for more security in AI-assisted reporting

Obviously, the solution to all these security and privacy risks is not to abandon the use of AI tools in reporting tasks but to approach them with a new level of critical awareness and sense of security.

Journalists need to stop using just any AI tools without caution and continuously relearn what data security and privacy risks are involved in the use of any of the AI tools they would be adopting in their workflow, whether individually or as an organisation. They should also be armed with the knowledge of how to mitigate these risks.

Every tool to be used should be screened to ensure it has a reasonable level of security, and any task within a reporting workflow should be assessed for security impact before deciding to use AI assistants for it.

Journalists and newsrooms also need to refer to the foundations of digital security to protect themselves against digital surveillance brought about by the use of digital and AI reporting tools.

Ultimately, we must view data security in AI-assisted reporting not as a technical chore but as a core journalistic ethic, essential for protecting our sources, our credibility, and the integrity of our work.