An AI widget is two data events, not one
A plain chat widget loads a script and sets a cookie. An AI chat widget does that, and then it sends the words a visitor types to a large language model that may be running on someone else's servers in another jurisdiction. Those are two separate data events with two separate legal footings, and if you only think about the first one you'll miss the part regulators care about most.
Event one is the widget bootstrapping: it writes to the browser (cookies, localStorage, a session id) before anyone types. Event two is the message itself leaving your site for an LLM provider like OpenAI, Anthropic, or Google. Event one is an ePrivacy storage question. Event two is a GDPR processing-and-transfer question. Handle both.
The storage layer: same rule as any embed
The cookie rule in Article 5(3) of the ePrivacy Directive covers storing or accessing information on a user's device, and the EDPB confirmed in its Guidelines 2/2023 on the technical scope of Article 5(3) (adopted 16 October 2024) that this is technology-agnostic. Cookies, localStorage, and sessionStorage all count. So the AI widget's boot-up storage is treated like any other non-essential tag: unless the chat is the specific service the visitor asked for, hold the loader until consent.
The mechanics are identical to a normal chat tool, so the load-on-click and category-gating patterns in our guide to live chat widget consent apply directly. Categorize the widget honestly using the framework in cookie categories. The new wrinkle is what happens after the visitor starts typing.
The data layer: you're disclosing personal data to a processor
When your chatbot forwards a visitor's message to an LLM API, you're a controller handing personal data to a processor. Chat messages routinely contain names, emails, order numbers, and sometimes health or financial details the visitor volunteers. That triggers three obligations at once:
- A legal basis for the processing. If the visitor opened the chat to get help, you can often rely on that requested service; if the AI is also profiling or training, you need more.
- A data processing agreement with the AI provider under Article 28 GDPR. The provider is your processor for the API traffic. See data processing agreements for website owners.
- A transfer mechanism if the provider processes in the US or elsewhere outside the EEA. That means checking the provider's certification under the EU-US Data Privacy Framework or its standard contractual clauses. See EU-US data transfers.
None of this is exotic. It's the same controller-processor-transfer chain you'd map for any SaaS vendor. The trap is forgetting that a chat bubble is now a pipe to a third-party model.
The training trap
The single biggest mistake is wiring a public widget to a consumer AI account. Consumer ChatGPT uses conversation content to train models by default unless a user turns that off in settings. The API is different: OpenAI states that data sent to the API platform is not used to train its models by default (a change in effect since 1 March 2023), though inputs and outputs may be retained for up to 30 days for abuse monitoring, with zero-data-retention available for eligible endpoints. Check your provider's current data controls and confirm which account type your widget actually uses.
If your integration is on a business or API tier with training disabled, visitor messages don't feed a model. If someone wired the widget to a personal account or left a share-data setting on, your visitors' words could end up in training data, and you'd have no way to promise deletion. Verify this before you ship, not after a complaint.
Regulators are already looking
This isn't hypothetical. The EDPB adopted Opinion 28/2024 on 17 December 2024, addressing when AI models can be considered anonymous, whether legitimate interest can justify training, and what happens when a model is built on unlawfully processed data. The next day, Italy's Garante fined OpenAI 15 million euros over ChatGPT's data handling. For a site embedding an AI chatbot, the practical read-across is transparency (name the AI vendor and what it does), a clear legal basis, and a route to honor deletion and access requests for whatever the system retained.
Watch special-category and children's data
Two categories raise the stakes. First, visitors will type health, financial, and other sensitive details into a chat box without being asked, and Article 9 GDPR treats special-category data as needing a stronger basis than ordinary personal data. An AI widget that hoovers up whatever gets typed and forwards it to a third-party model is a bad place for that to land. Put a visible line in the widget telling people not to share sensitive information, minimize what you log, and consider redacting obvious identifiers before anything reaches the LLM. Second, if minors use your site, an AI chatbot that profiles or retains their inputs runs into children's-privacy rules; our guides on age assurance and children's privacy and COPPA cover what that adds.
A checklist before you embed one
- Gate the widget loader. Load it on click or on category consent so it sets nothing until the visitor wants chat.
- Confirm the account tier and that model training on inputs is off. Prefer an API or business tier with a retention policy you can point to.
- Put a DPA in place with the AI provider and confirm its transfer mechanism if it processes outside the EEA.
- Discourage the widget from collecting special-category data; add a line telling users not to share sensitive details, and don't log more than you need.
- Name the AI vendor in your privacy notice and describe what it receives and retains.
- Have a deletion path for stored conversations that can answer a data subject access request.
A consent platform like CookieBeam blocks the widget's loader until the matching category is granted, and its scanner detects the outbound connection to the AI endpoint, so you can see per page whether a visitor's first keystroke is really the first thing that leaves for the model. Back that with server-side enforcement if you want a hard stop a client-side bypass can't defeat.