Anastasius Gavras
Eurescom GmbH
The rapid evolution of communication technologies, coupled with the imminent deployment of sixth-generation (6G) wireless networks, underscores the importance of harnessing Artificial Intelligence (AI) and Machine Learning (ML) techniques to address emerging challenges and optimise network performance. AI/ML have emerged as powerful tools capable of revolutionising various facets of communication networks, offering solutions to issues in network planning, diagnostics, and optimization. As the telecommunications landscape continues to evolve, understanding the potential applications of AI/ML in communications networks becomes imperative for industry stakeholders, researchers, and policymakers alike.
This article delves into the multifaceted role of AI/ML in shaping the future of telecommunication networks, and provides recommendations concerning the future availability of large data sets, which are necessary for training and benchmarking algorithms. By elucidating the transformative potential of AI/ML in telecommunications, this article seeks to provide insights into the future trajectory of network research and innovation in this area.
Current Use of AI/ML
Artificial Intelligence (AI) and Machine Learning (ML) are currently being investigated for application in future generations of communications networks, such as 5G and 6G. A broad family of neural networks, which are typically used ML techniques to model complex relationships between input and output parameters of a system or to find patterns in data, such as feed-forward neural networks, deep neural networks, recurrent neural networks, and convolutional neural networks, are considered to solve various networking challenges of the future networks. The related challenges can be grouped in the following three main areas:
- Network Planning,
- Network Diagnostics/Insights,
- Network Optimization and Control.
In Network Planning, attention is given to AI/ML assisted approaches to guide planning solutions. As beyond 5G networks become increasingly complex and multi-dimensional, parallel layers of connectivity are considered a trend towards disaggregated deployments in which a base station is distributed over a set of separate physical network elements which ends up in the growing number of services and network slices that need to be operated. This climbing complexity renders traditional approaches in network planning obsolete and calls for their replacement with automated methods that can use AI/ML to guide planning decisions. In this respect solution in two main areas emerge:
- network element placement – optimum constellation of base stations each located to provide best network performance (coverage, terminal density and mobility, required hardware/cabling, overall costs)
- Cloud-RAN clusters dimensioning – providing optimal allocation of baseband unit (BBU) functions.
In Network Diagnostics, attention is given to the tools that can autonomously inspect the network state and trigger alarms when necessary. The specific investigations target:
- network characteristics forecast solutions (forecasting mobile traffic for Quality of Experience and Quality of Service improvement and maintaining required service level agreement),
- precise user localizations methods, and
- security incident identification and forecast (e.g., real-time detection of distributed denial-of-service [DDoS] attacks).
In Network Optimization and Control, attention is given to the different network segments, including radio access, transport / fronthaul (FH) / backhaul (BH), virtualization infrastructure, end-to-end (E2E) network slicing, security, and application functions. Among the applications of AI/ML in radio access, the slicing in multi-tenant networks, radio resource provisioning and traffic steering, user association, demand-driven power allocation, joint MAC scheduling (across several gNBs), and propagation channel estimation and modelling are being investigated and discussed. The considered solutions can operate in real-time, near-real-time, and non-real-time manner, depending on the specific application time-scale needs.
Availability of data-sets
However, in order that the application of AI/ML in communication network flourishes, the availability of reliable data-sets is a crucial prerequisite. Only this will enable the efficient use of AI/ML algorithms, as well as will allow AI/ML-based solution validation and system troubleshooting. The success of AI/ML models in a variety of network applications and services relies heavily on the use of network data in diverse levels of granularity. Publicly available real and simulated benchmark datasets play an important role in model development and evaluation, as well as fair comparison with state-of-the-art solutions.
Training of AI/ML algorithms is requiring large amounts of data, which are typically not readily available for many reasons. In research projects often the amount of data that can be generated with the prototype systems and experimentation use cases do not suffice to efficiently train the algorithms. Therefore, access to “ready-to-use” available large data sets of network traffic data from the different network domains in publicly accessible repositories is required, for the benefit of all involved researchers and developers, facilitating the:
- development of a comprehensive testing and evaluation framework for assessing the performance, reliability, safety and explainability of AI/ML systems
- collaboration and knowledge sharing through building, contributing to and maintaining an open repository of data and models supporting the development of AI/ML-based solutions
- integration and interoperability test of various solutions, e.g. by including outcomes of numerous projects using or developing AI/ML-based solutions in the public domain, in order to address scalability questions, or to consider specific use cases for vertical industries, etc.
However, there are so far no sustainable initiatives which have emerged that attempted to create the ultimate/reference networking related AI/ML data sets. The main reasons are:
- willingness to share data – with a few notable exceptions. Typically, data and model owners do not share their assets due to business, data privacy reasons, etc. Assured methods to sanitize data against aforementioned concerns are not trusted
- contextual variations of available data – spanning along various network technology domains and vertical industry sectors, each with its own unique requirements and challenges
- pace of technological advancements, where AI/ML technologies currently evolve very rapidly, and the landscape changes significantly within a short period
- available resources and duplication of efforts in creating the sustainable repositories of data-sets, or consolidating individually created data-sets (e.g. by research projects in publications etc.) within the publicly available repositories.
Conclusions
In the context of European projects, telecom operators have proposed the idea of building a repository of open data sets, however with very limited success so far. The idea of choosing a pre-competitive environment such as a European framework program to build an open repository, seems attractive and could be the best environment to overcome the potential concerns. Considering the heavy dependency of AI/ML on such open training data, it is worthwhile to consider launching an open initiative for creating such a repository in the near future, while considering the following obstacles:
- Data privacy and security considerations are critical when it comes to sharing and releasing data for AI/ML training. Protecting sensitive information and ensuring compliance with privacy regulations restricts the availability and release of data. Stakeholders hesitate to share data due to concerns about unauthorized access, misuse, or breaches of privacy.
- Data bias and fairness, where data used for training AI/ML models can reflect biases present in the data collection process or societal contexts.
- Data access and availability imbalance, where data availability may not be evenly distributed across different network technology domains, vertical industry sectors, or geographic regions.
- Data ownership and proprietary restrictions limit the availability and release of data.
- Data quality and reliability, where data inconsistencies, biases, inaccuracies, or insufficient quantities impact the performance and generalization of data models.
- Gathering and aggregating large-scale data for AI/ML training is complex and time-consuming. It involves collaboration among multiple sources, data cleaning and pre-processing, and addressing legal and ethical considerations.
Establishing an open repository for network data-sets, including its continuous maintenance for both its accessibility and gathering the latest relevant data, which can be used for training and benchmarking algorithms by all involved researchers remains one of the main challenges of networking related application of newest AI/ML techniques.