Support Centre

You have out of 5 free articles left for the month

Signup for a trial to access unlimited content.

Start Trial

Continue reading on DataGuidance with:

Free Member

Limited Articles

Create an account to continue accessing select articles, resources, and guidance notes.

Free Trial

Unlimited Access

Start your free trial to access unlimited articles, resources, guidance notes, and workspaces.

International: RTA and NIST publish blog on data distribution in privacy preserving federated learning

On February 27, 2024, the UK Department for Science, Innovation, and Technology (DSIT) Responsible Technology Adoption Unit (RTA) in collaboration with the US National Institute of Standards and Technology (NIST) published a blog post titled: Data Distribution in Privacy-Preserving Federated Learning. In particular, the blog post explains how data can be distributed or partitioned among participants to build privacy-preserving federated learning systems.

What is data partitioning?

The blog post describes data partitioning as the process through which data is distributed among participating organizations, compared to a centralized scheme in which one party holds all the data. The blog categorizes data partitioning into two primary schemes: horizontal partitioning and vertical partitioning.

Horizontal Partitioning

According to the blog, in horizontal partitioning data is distributed such that each participant holds different records (rows) but retains all variables (columns). The blog explains that horizontal partitioning simplifies the development of privacy-preserving federated learning systems, allowing each participant to train a model on their dataset and later combine these models for enhanced performance.

Vertical Partitioning

On the other hand, the blog notes that vertical partitioning involves distributing different features (columns) of data across participants, each of whom holds data on the same entities. However, the blog details that vertical partitioning introduces challenges in linking data points from different partitions during training, making the development of privacy-preserving systems more complex.

The blog acknowledges the challenges of both partitioning schemes, particularly in scenarios that involve a combination of horizontal and vertical partitioning. The blog references the financial crime track of the US-UK privacy-enhancing technologies (PETs) prize challenges as an example of such a scenario, highlighting the added difficulties in ensuring privacy while achieving accurate model training.

You can read the NIST blog post here and the DSIT post here