What Sora Collected Before the Lights Went Out
On March 24, 2026, Open AI announced they were shutting down Sora. Not much else was said. Just a good bye post on X and some mentioning about sharing the timelines later. The standalone Sora app launched in September of 2025, it quickly peaked at number one on the App Store and hit a million downloads even faster than the ChatGPT app did. It was also more or less dead a couple of months later. By January, downloads were down by 45% month over month. The Android version launched November 3, 2025. The app lasted about six months. OpenAI made around $2.1 million from it.
We had a look at the APK around the time of the shutdown.
It Was Never Just a Video App.
The public pitch was simple: type a prompt, get a video. Cast yourself in it. Share it.
Sora's backend runs under the internal codename "project_y". Every API call routes through paths like "project_y/feed", "project_y/cameos" or "project_y/profile/pymk". That last path mentioned, PYMK, is an abbreviation that the social media industry has used for years. It stands for People You May Know. Sora had follows, followers, DM's, leaderboards, block lists and a notification inbox. Pretty much like a standard social media app. The video generation was the hook, the platform underneath it was the product.
At the center of that platform we had the Cameo system. A Cameo was a short video of your face that you uploaded to the app. Once created, anyone on Sora could use your Cameo as the basis for AI-generated video, your face, in content you never made and may never have seen. To create that Cameo, the app uses Google's MLKit face detection library against your video on-device. The APK bundles "play-services-mlkit-face-detection" with native JNI bindings for raw frame-by-frame processing: "detectFacesImageByteArrayMultiPlanesJni". From that analysis, the app extracts a gender attribute. That attribute is stored as a typed API object, "SoraCameoGender", and transmitted to the backend through an endpoint called "ApiGenderUpdate", routed to "project_y/cameos/update_v2".
You did not enter your gender. The app inferred it from your face and sent it to a server.
Four Systems Watching at Once.
Running underneath all of this is a surveillance stack that most users did not know existed.
Segment is an analytics platform used to log your behavior within the Sora app. Every tap, scroll, every button you press, gets recorded and sent back to Segment's servers. Datadog does something similar, but at a slightly higher level. Tracking how you move through the app from the moment you open it to the moment you close it. Then there is Pioneer. Pioneer is OpenAI's own internal version of the same kind of tools just described, built in-house and separate from both. Pioneer registers the moment you install the app and keeps sending data about you and your device with every event after installation. The data collected by Pioneer goes directly to OpenAI unlike Segment or Datadog.
On top of all three, Sora includes a tool called Sentry Session Replay. What Sentry Session Replay does is record your screen. It captures what is on your screen, compiled within the app, turns those captures into a video, and sends that video to Sentry's servers. OpenAI's stated reason for this is debugging, catching crashes and errors. The problem is that whether or not the text you type gets blurred out in that recording is a setting OpenAI controls remotely. They can change it without updating the app.
You would never know.
The app also collects your location. Not just your rough area, it requests permission for precise GPS. That location gets quietly attached to the requests your phone sends to Sora's servers every time you use the app. On top of that, Sora used Cloudflare to read which country your internet connection was coming from.
The Training Default Nobody Opted Into
Two strings in the Sora APK needs a closer look.
The first is "videoTrainingAllowed", a boolean that controls if the videos a user generates can be used to train OpenAI's models. The second is "no_auth_training_enabled_by_default", that one is exactly what it sounds like, for users who are NOT logged in, video training was on by default.
Both of those strings also appear in the ChatGPT Android APK. In the ChatGPT app there are corresponding UI state strings, "Video Training Enabled" and "Video Training Disabled", which suggests the setting is surfaced somewhere in settings. In Sora, there is no corresponding disabled string for the unauthenticated case.
A third string appears in both APKs: "shared-training". Training data collected through Sora and ChatGPT would feed into the same pool.
The EU Was Blocked for a Reason
Sora was never available in the European Union. The moment you open the app, it checks where you are connecting from. If you are in the EU, you get a hard block.
The reason is not technical, it is legal.
Sora scans your face and draws conclusions from it. That requires explicit consent under GDPR. Sora defaults unauthenticated users into AI training without any agreement. That requires a legal basis under GDPR and it does not have one. Sora sends screen recordings to servers in the United States. That requires safeguards under GDPR. Whether those exist is unknown.
Three problems. One solution: block the EU, and no regulator ever gets to ask.
The App Is Gone. The Data Is Not.
OpenAI says it is "exploring ways to support export and preservation" of user content. It has said nothing about what happens to the data it already collected, the Cameo videos, the biometric gender inferences, the session recordings, the Pioneer analytics events, the content flagged as training-eligible under a default-on setting.
The APK contains strings for "Account Deletion Request" and "Account Deletion Success". A deletion flow exists. Whether that deletion propagates to Sentry's servers, to Segment's pipeline, to whatever system holds the "shared-training" data pool, the code cannot tell you that.
Sora ran for six months. It built a social graph around users' faces, inferred biometric attributes without disclosure, screen-recorded sessions and sent the footage to US servers, and defaulted unauthenticated users into AI training. It excluded the one jurisdiction with the legal infrastructure to investigate any of it. Now it is shutting down, and OpenAI has not explained what happens to any of that data next.
The app is now gone.