A new safe and efficient data processing technology

Saint Petersburg Electrotechnical University

Published on 27 July 2021

Saint Petersburg Electrotechnical University ETU-LETI scientists, together with Smartilizer, studied a new approach to data analysis that does not require transferring data from the source to an analytical center. The researchers tested the effectiveness of existing open-source systems on different data sets: sensor readings from moving cars and X-rays of pneumonia patients. To test the applicability in IoT systems, the authors evaluated the following features: ease of use and installation, analysis capabilities, accuracy, and performance. The paper was published in the journal Sensors. The Internet of Things (IoT) is a data transmission network that consists of physical objects with in-built connectors. Using such connectors, the objects are able to communicate with each other and their environment. For example, in the smart home concept, appliances are connected to each other and external control device, allowing managing from a cell phone. The standard architecture of an IoT system consists of three layers. The first (device layer) is the hardware devices that produce and collect the data. The middle layer is responsible for transferring data from the devices to the application layer, which provides services or applications that integrate or analyze the data. Traditional approaches to such systems involve data collection from IoT devices into one centralized repository for further analysis. However, they are not always applicable due to a large volume of collected data, communication channels with limited bandwidth, security and privacy requirements. Significant disadvantages are an increase in total processing time, network traffic, and risk of unauthorized access to the data. Therefore, new approaches to the analysis of such data are being developed. One of them is federated learning that allows analyzing data directly on sources and federating the results of each analysis to yield a result as traditional centralized data processing. There is less load and risk because all the data is processed locally.

One of the main applications of this AI-based technology is the security and privacy of personal data collected around the world every second. This issue has become extremely important after the adoption of several legislative regulations, such as the GDPR in the European Union, CCPA in the USA, and PDPA in Singapore. They require transparent processing of personal data with an explicitly stated purpose and the consent of the data subject.

In a smart home, the data sources are the devices in each apartment: the alarm clock, the bathroom faucet, the underfloor heating, and the lights. In the traditional approach, all data from each apartment is collected in a centralized repository. It is used to train a model (such as a neural network), and after that, the model would be transmitted back to the smart home control system.

At the alarm call, such a model "knows" that heating should start warming up, the bathtub should be filled, and the lights in certain rooms should turn on. On the one hand, data collection is necessary to train such a model because the more data, the smarter the model.

On the other hand, information about you: when you get up, when you go to the bathroom, when you eat, and so on, becomes available to someone else, and you do not know how it will be used. According to the principles of federated learning, the data will not leave your apartment.

ETU "LETI" scientists tested systems from different companies: Google, Webank, Baidu, the OpenMined community, and others. The authors conducted a series of experiments with them on three data sets.

The first contained the parameters of a moving passenger car (average speed, engine load, etc.) and assessed the driving style, the road surface, and the traffic state. The second included similar signal data for dumpers, and its analysis provided information about the vehicle operation. Finally, the third set was X-ray images from 5,232 patients (3,383 images of them with signs of pneumonia). The analysis allowed us to distinguish sick people from healthy ones.

"We compared all currently available open-source federated learning frameworks and evaluated their capabilities. Our approach proved to be effective in all three cases. However, not all of them are suitable for industrial development now. Some systems are still in their early stages and not ready for widespread use. Nevertheless, the federated learning technology itself is extremely relevant and rapidly developing," says Ivan Kholod, Dean of the Faculty of Computer Science and Technology at ETU "LETI."

Related Tags: