Protecting Your Customers' Privacy: Why Large Language Models are a Concern for eCommerce Companies

John Suder
May 12, 2023

Large Language Models (LLMs) have received praise for their significant contributions in the fields of natural language processing, machine learning, and artificial intelligence. With their groundbreaking capabilities, LLMs are revolutionizing the way we interact with technology and opening up new possibilities for the future. 

Although these models could simplify certain aspects of running a business, eCommerce companies must be mindful of privacy concerns. 

History of Large Language Models

The roots of large language models can be traced back to the early days of natural language processing. Their initial purpose was to develop machines that could comprehend human language. 

As time passed, these models evolved into more advanced and powerful tools capable of performing various tasks, such as generating text and translating languages

How large language models share data and why this might be troublesome for some companies with privacy concerns.

Large language models are deep neural networks trained on vast amounts of textual data to predict the likelihood of a word given its context. They are trained using a process called “unsupervised learning," which means that they do not require human input or guidance. 

Instead, they use massive amounts of data to learn patterns and relationships in language, enabling them to generate natural-sounding sentences and paragraphs.

One of the key ways in which large language models share data is through pre-training. Pre-training involves training a large language model on a massive corpus of text, such as all of Wikipedia or the entire internet. The resulting model is then fine-tuned on a smaller, more specific dataset for a particular task, such as language translation or sentiment analysis. This process allows developers to create highly accurate and effective models with much less data than is required.

Learning from your customers' data

However, pre-training requires access to large amounts of data, which raises privacy concerns. For example, if a company wants to train a large language model on its customer data, it may need to share that data with a third-party provider. This could expose sensitive information to unintended parties, such as competitors or hackers.

In addition, large language models are commonly used on cloud-based systems, which involve transmitting and processing data through third-party servers and networks. This introduces additional opportunities for data breaches and privacy violations. For example, a cloud provider could inadvertently or intentionally access a company's data, or an attacker could exploit a vulnerability in the provider's infrastructure to gain unauthorized access.

Even before LLMs became popular, it was a significant issue that ultimately resulted in taking from one to give to another.

How large language models are being used in eCommerce

They are used to drive various applications, from chatbots to search engines and virtual assistants to language translation tools that can enable cross-border trade. These models can also assist in fraud detection, and, on the marketing side, aid with content generation such as product descriptions and image editing. 

 As with any powerful technology, there are concerns about how it shares data, especially concerning privacy.

As these models have grown in complexity and size, the issue of user privacy has become more pressing. When using large language models, eCommerce companies should be mindful of protecting their customers' privacy.

Large language models can analyze customer data to identify trends and insights, which can be used to improve product offerings and marketing strategies. 

However, utilizing customer data in this way requires eCommerce companies to prioritize privacy concerns and ensure that sensitive information is not mishandled or exposed to unintended parties.

Large language models use conversion and industry data to control your marketing. 

Say they have Company A as a client who does personalized eCommerce newsletter emails. They know the day/time they send, the open rate, the conversion rate, etc. Company B signs up, and it suggests "other sites in your industry convert best when sent at 1:00 pm local time.” What if a customer has accounts at both?

We must be cautious about AIs that access our documents, recordings, and other personal data, as ethical and privacy concerns have yet to be resolved. Large companies are considering data isolation as a solution. This means that the dataset used for training won't be as extensive but will only consist of your data.

Atlassian faced some issues when they allowed users to connect their GDocs account to display smart links for URLs that were pasted. However, the concern was that Atlassian had complete access to the user's account and may have been able to download all the documents the user had access to for "training their data" (although we do not know if they did so).

LLM Training Alternatives

Some companies are exploring alternative approaches to large-scale language model training and deployment to address these concerns. One approach is federated learning, which allows models to be trained on data distributed across multiple devices or servers without sharing the data itself. This approach could allow companies to train large language models on customer data without exposing that data to third-party providers.

Another approach is to use on-device learning, which involves training models directly on mobile devices or other endpoints. This approach eliminates the need to transmit sensitive data over networks and servers, making it a more secure option for companies with privacy concerns.


Large language models can potentially transform how we interact with machines and each other. However, how they share data can raise significant privacy concerns for companies and individuals. 

To address these concerns, companies must carefully consider their data-sharing practices and explore alternative approaches to large language model training and deployment. By doing so, they can harness the power of these technologies while also protecting the privacy of their customers and users.

More from SUMO Heavy:

The High Cost of Low-Quality: Why Cheap Solutions Could Be Costly Mistakes

Drive More Traffic, Convert More Customers: 20 Content Marketing Tips for eCommerce Growth

How Is AI Defining the Future of eCommerce?

Photo by Philipp Katzenberger on Unsplash

Learn more about SUMO Heavy

If you’d like to learn more about SUMO Heavy, drop us a line, give us a call or contact us on social media.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.