Unsuspecting Data Leaks: A healthtech nightmare

Unsuspecting Data Leaks A healthtech nightmare flo health app privado ai
Anuj Agarwal
November 11, 2022

Recently, a lot of healthcare companies came under the FTC radar for failing to secure their users' data. For example, Flo, a period and fertility tracking app with 100 million+ users was recently under the FTC scanner for sharing users’ personal information like period cycles and intention to get pregnant with Facebook and Google against their privacy policy. More recently, Advocate Aurora Health (AAH) faced a class action lawsuit for sharing sensitive healthcare data with Meta (previously Facebook). In this article, we will discuss the potential causes of the leak, and explore how organizations can protect their users' data and ensure that their applications adhere to their privacy practices.

Why is it difficult to detect data leaks

Google and Facebook provide developers with analytics platforms to understand their users' behavior and earn revenue through targeted ads. For this, developers need to integrate third-party SDKs of these analytics platforms into their apps. Data is shared via these SDKs and sent over to Google and Facebook’s servers, from where organizations and developers have very little control over how this data is used to target ads to its users.

When developers integrated the SDKs, they shared a host of personal data of the users, including sensitive health information. On the other hand, the privacy policy of the company is usually prepared by privacy professionals and engineers, who have limited visibility of the code that runs the applications of the company. Privacy engineers then send assessments to the developers to declare their coding and data-sharing practices, most of which have to be filled manually and carry the risk of privacy holds in case the assessments violate the privacy policy of the company.

We believe that it is this lack of visibility and manual processes that led to a gap in the privacy policy of the company and the actual data-sharing practices that were followed in the app. If the privacy team had clear visibility of the data-sharing practices of the app, they would have taken appropriate steps to ensure that they are aligned with their privacy policies and regulations to protect the users' data.

How it could have been protected

To ensure that the privacy policy of a company is strictly followed, data security and privacy engineers need to have complete visibility of the codebase and various third-party integrations connected to them. However, with hundreds of developers committing code every day, it becomes impossible for privacy engineers to manually check the code and ensure that it is compliant.

Organizations need an automated mechanism to analyze the data flows across various applications and outside the company to third parties to solve this problem. Such a mechanism or tool will help organizations monitor and flag privacy issues as soon as new code is committed by the developers. The best way is to integrate such tools into the CI/CD pipelines of the development workflows.

Creating data flows with Privado

Now we will take an example to show how we can shift the monitoring and detection workflows closer to the development workflows. For this example, we will take the HealthPlus repository, a healthcare practice management software that handles the sensitive health data of its users. First, we will map out the data flows of the existing repository to analyze the flow of various data elements in the repository. To do that, we will use the open-source Privado code scanner. To install Privado, we follow these steps.

After we have installed Privado, to scan the repository, follow these steps:

git clone https://github.com/AnujAgrawal30/HealthPlus
privado scan HealthPlus

The scan will take about a minute, after which we can see the various data elements and data flows in the repository. An example data flow of Medical Certificate data is shown below:

Data flow of Medical Certificates in the original repository

Now, let’s assume that a developer needs to add Facebook Ads SDK to the repository. To simulate such conditions, I’ve created an Example.java file with the following contents.

package Doctor;

import java.util.ArrayList;

import com.facebook.ads.sdk.APIContext;
import com.facebook.ads.sdk.AdAccount;
import com.facebook.ads.sdk.AdAccount.EnumCampaignStatus;
import com.facebook.ads.sdk.AdAccount.EnumCampaignObjective;
import com.facebook.ads.sdk.Campaign;
import com.facebook.ads.sdk.APIException;

public class Example {

  public static final String ACCESS_TOKEN = "[Your access token]";
  public static final Long ACCOUNT_ID = 0L;
  public static final String APP_SECRET = "[Your app secret]";

  public static final APIContext context = new APIContext(ACCESS_TOKEN, APP_SECRET);

public Example (ArrayList> data) {
    try {
      AdAccount account = new AdAccount(ACCOUNT_ID, context, data);
      Campaign campaign = account.createCampaign()
        .setName("Java SDK Test Campaign")
    } catch (APIException e) {

Then, if a developer sends any data to this SDK for marketing purposes, it will be detected by the Privado scanner and show up in the dashboard. Let’s scan the repository again and see the results:

Data flow of Medical Certificates after adding Facebook SDK

As we can see, the scanner was able to detect the addition of the Facebook SDK in the code. We can also look up the Code Analysis section to view a detailed line-by-line flow of the data to the SDK, as displayed below:

Code Analysis of Medical Certificate data after adding Facebook SDK

Through this, we can move privacy and data security assessments closer to the developer workflows and detect flaws and violations early on to save time and the risk of a regulatory violation.

As a side note, while scanning the repository I came across an interesting data element, categorized as “Religion / Religious Beliefs”. I was interested to know why such a data element was being used in a healthcare repository. By navigating the Code Analysis tool for the Religion data element, I was able to pinpoint exactly where the data element was being initialized, the entire journey of the data element including 5 log leakages, and other details. This can be interesting from a privacy engineer’s perspective, where often they are not able to scan the entire codebase manually and have to resort to sending assessments to developers to map out the data flows.

Code Analysis showing the storage of religious data in the database

You can check out the tool yourself on Github. Feel free to drop comments and do share your experiences about creating data flows and mapping data elements used in your repositories.

Unsuspecting Data Leaks A healthtech nightmare flo health app privado ai
Posted by
Anuj Agarwal
November 11, 2022

Anuj Agrawal is a Developer Relations Engineer at Privado

Get started with Privado

Thank you for subscribing, we have sent a confirmation email to your inbox.
Oops! Something went wrong while submitting the form.