Expert Tips for Performing Data Journalism with Minimum Risks

Publishers, responding to readers tired of negative news, plan to embrace explanatory content in 2024. However, this type of content requires lots of data, and the reporters face challenges while mining it online, such as the risk of being blocked by websites or exposed.

William Belov serves as a CEO at Infatica, a global proxy network. According to him, proxy servers can play a significant role in solving these issues. He can help journalists avoid these issues and securely navigate the world of information for the sake of their readers.

2023 is over, and media organizations are building plans for the new year. A recent survey conducted by the Reuters Institute among publishers showed that readers are tired of negative news, and most of the respondents plan to counter this with more explainer content (94%), which helps their audience better understand global events.

One of the best examples of such content is data journalism — a form of investigative journalism dedicated to discovering and sharing the stories hidden in data sets. It emerged in the late 2000s and gained significant attention in 2013 when the UK online media outlet The Guardian published an article about American computer intelligence consultant Edward Snowdon, who had leaked classified files from the country’s National Security Agency, Belov notes. The contents of those documents could be difficult to understand for many readers. However, reporters made an effort to ensure the story was clear by adding graphs and charts to the text.

Since then, The Guardian and many other publications worldwide have published countless stories following that approach. Their works combine traditional journalistic techniques of narrative and reportage with visuals such as charts, animations, and infographics to help their audience grasp the sense of the story.

Keep a Low Profile

Data-driven journalism is a growing field. However, reporters often face difficulties while mining data. First of all, in their investigations, they have to conduct lots of searches, Belov highlights. If done from personal devices, including laptops or smartphones, they risk exposing their residential Internet Protocol (IP) address. That, in turn, reveals a user’s location, including a country and a city. 

IPs can be used by companies or other organizations to track down reporters, as it almost happened with staff members from BuzzFeed News and the Financial Times in late 2022. ByteDance, a Chinese internet company behind the popular social network TikTok, had tried unsuccessfully to identify its own employees who had shared internal company documents with the reporters. Data connected to IPs can be especially problematic for media representatives dealing with sensitive topics, such as in-depth profiles on the business of government, who are trying to keep a low profile while collecting data.

To mitigate these risks, journalists often utilize the famous Tor browser. According to Belov, it allows users to surf the internet anonymously by routing traffic through a series of volunteer-operated servers and making it difficult to trace the user’s identity or location.

Reporters also tend to use VPN servers as they create a secure, encrypted connection between the user’s device and a server. This effectively masks the user’s IP address and encrypts all data transmitted, providing a higher level of security. In addition to aiding in data mining, both VPN servers and Tor can give a reporter access to geo-specific content. Some websites are only accessible within specific regions, and these technologies help bypass these restrictions.

Avoid Red Flags

A high volume of searches from a single residential IP can not only expose a reporter but raise a red flag for a search engine, potentially leading to a user being flagged or blocked entirely, Belov warns. This situation can hinder the possibilities of gathering data and developing a story.

Furthermore, collecting information from online sources often requires web scraping — extracting data from a website with dedicated scraping software. Some information-gathering tools are legal within certain limits. However, websites employ security systems that can detect the activity of scraper bots, recognize their IP addresses, and either block them or feed them with false information. 

Nevertheless, there are ways to address these challenges, too, Belov says. He suggests using residential IP proxies — software designed to obscure a user’s actual IP address by routing traffic through an intermediary. This process involves changing the original IP address to a different one. They create the impression that a website is being accessed by a different person.

According to Belov, some providers offer proxy rotation, which hides a user’s real IP address behind a pool of addresses. They switch at regular intervals to make it appear that searches are originating from multiple sources. It is essential when dealing with search engines. Some of them allow just a specific number of searches per minute from the same IP. Otherwise, a system gets suspicious of malicious activities and blocks the address.

Despite their critical role, proxy servers have limitations, Belov admits. Certain devices and applications may not be compatible with proxy configurations, or servers can be overloaded, etc. Thus, he advises journalists to choose reputable proxy providers and combine proxies with other security measures for enhanced protection.

Spencer Hulse is the Editorial Director at Grit Daily. He is responsible for overseeing other editors and writers, day-to-day operations, and covering breaking news.

Credit: Source link

Comments are closed.