ANI, in Nov 2024, led Indian publishers in opening a legal battle against OpenAI concerning content infringement. A similar concern has been raised by many other publishers worldwide. The concern isn’t unjust. After a lot of hard work, efforts, and resources, publishers generate content and earn out of the engagement it generates.

After Google revolutionised the digital advertising space, millions of publishers surfaced worldwide, which created their own content spaces and started serving ads to earn a pie out of it. BuiltWith estimates over 40 million websites use Google AdSense.
Before, AI chatbots surfaced, especially ChatGPT, which is why OpenAI is made a party to all these legal battles; it’s not that the content of these publishers wasn’t used. However, there were two differences – the content was properly cited and referenced either through a hyperlink or an embed; and there used to be a limited portion of content being used. In both cases, the revenue potential of the content would not diminish.
Now, with AI apps being trained, there is a lot of content scraping happening with the sole objective of collecting lots and lots of data to be used for training the models powering such AI apps. This scraping is not restricted to unknown or little-known apps. It is happening across. Even LinkedIn was found scraping data of its users for model training.
The legal battle is on. We don’t know how much time it’s going to take. By the time there is any decision taken, many of the AI apps may not require scrapping further as their training would have matured and achieved accuracy. So, what is the alternative for publishers? Do they just sit idle and let their hard work feed the revolution of AI?
One of the options for publishers is to stop free content and make everything available on subscription. However, it’s not that easy to run a subscription-based revenue model for digital businesses. There are many examples of subscription-based content digital services which have miserably failed. They cannot afford to restrict the audience or limit the traffic, which directly defines the earnings potential.
The publishers will have to keep their content available to all and earn by maximising the traffic. There are some publishers, and even a feature on many popular social networking apps like X and Instagram, where publishers, after passing qualifiers, are allowed to monetise through subscription services where they can give early as well as exclusive content to those paying for it. Again, it will not work in all cases. Let’s evaluate why?
Let us take the example of ANI. Like every publisher, it wants to serve every news first to its audiences. Now, if using subscription and other features, it makes it available only to a subset of its audiences, the audience will get to know about that news anyhow. It means eventually, any such platform, including ANI, risks losing subscribers.
Another possible model that seems prima facie feasible is that the publishers reverse the content monetisation strategy. Going by the content consumption pattern, more than 95% of the traffic for any news or feature for any publisher would fall within the first 24 hours, in many cases in just the first few hours. After that, there will be a sharp decline in the traffic that any such content can generate. In other words, it means the content is optimally monetisable only in the first 24 hours. Post that, the content is no longer optimally monetised. In fact, there would be a lot of content where the cost of maintaining that page would be higher than the revenue it generates after a few days.
Content, such as news, interviews, etc., is consumed in the first few hours by most of the audience – the general public. The content is revisited only by people who need it for professional purposes like research, teaching, publishing reports, investigations, etc. For example, President Trump created a tsunami in the world trade arena by announcing a reciprocal tariff. This required huge research across the world. Everyone— policy makers, diplomats, economists, researchers, businessmen, think tanks, media, and many more— started looking for so much data and information which was available in old articles and news features.
This is where a reverse model can work. The publications must allow the initial few hours, up to 24 hours or whatever their stats suggest, for free. This is the window when all their audiences will consume the content and generate revenues for them through advertisements. After this, the articles and other features must automatically move behind a payment wall. The publications can modestly price it, say, anywhere between ₹1-50 per view depending on the various factors determining the value of the content.
The publishers, especially news portals, have to understand that there is an archival value of the information they publish. Anyone looking to consume such news with facts, figures, and analysis after some time is going to be a serious user of the content and has some definite interest in it. The publisher has every right to request a nominal fee for maintaining this information repository for such individuals.
The other benefit of this is, of course, making it difficult for scrappers to consume content for training their AI models. Generally, it’s always difficult to scrape and collect data put behind the payment walls. In that case, the only way for anyone wanting data to train AI models would be to engage a publication commercially.
Technology is challenging. It is not so easy to safeguard interests, especially content. Even in this proposed flipping, loopholes will be found. But, it’s worthwhile giving a shot.
Leave a Reply