Telco goes Cloud. Really?

June 14, 2021

We are living in an area of digital transformation for many years now, driven by a continuously increasing pace of innovation and new disruptive business models. The speed to bring a new business idea as a digital service to the market is one of the key success factors, but it’s worth nothing when you are not able to scale your new service offering as soon as it gets market traction and increasing demand.
Another aspect is to minimize the initial investments needed for building a new digital service offering to check market acceptance and collect first feedback with a minimum viable product. Using somebody else’s computer, networking and storage infrastructure as a service and just paying for usage makes it much easier to start with a new business and provides you with another cash flow compared to building your own data center first.
These are just two of many other reasons why we have The Cloud in today’s IT landscape as the environment for many different businesses and industries.

The Telecommunications industry (aka Telco) with many different Communication Services Providers (CSPs) worldwide is one of these industries and in this article I will describe my view on how the Telco industry is adopting and evolving cloud technology, processes and culture to transform their business of producing and selling communication and multimedia services for their customers.
I think since the Covid-19 pandemic outbreak and its impact on our professional life, besides the impact on our social and private life, the value of reliable communication services cannot be overestimated, allowing many of us to work remotely now in a safe environment.

A bit of history

Without going too much into the details I just want to list here some important cornerstones, based on [1]. Ignoring the early phases, where the cloud metaphor was used to describe the changes in responsibility between service providers for compute resources and users, the basic principles of the cloud as we know it today were created at the beginning of the 2000s.  AWS was founded in 2002, providing first infrastructure services like EC2 (Compute) or S3 (Storage) for developers enabling them to easily build application by just using these services, aka Infrastructure as a Service (IaaS). Google followed 2008 with Google App Engine, which was already a first kind of Platform as a Service (PaaS) – which means a fully maintained infrastructure and a deployment platform for users to create web applications using common languages/technologies such as Python, Node.js and PHP. Microsoft followed later in 2010 with Microsoft Azure, and started to provide – among other services –  its cash cow Microsoft Office as Software as a Service (SaaS).  All of these three companies evolved their cloud services to what we call today the Hyperscalers, also thanks to companies like Netflix, LinkedIn, Salesforce, Facebook, Twitter or AirBnB, just to name some very few, who build their own business and related services on the Hyperscalers cloud infrastructures.
In 2010 Rackspace and NASA founded an open source project for cloud computing called OpenStack with the intention to help organizations offering cloud-computing services running on standard hardware, and a huge community of contributors worldwide started to work on creating an industry standard for building private cloud environments or public cloud infrastructure services. Nowadays OpenStack is “The Most Widely Deployed Open Source Cloud Software in the World”

Starting in 2012, ETSI created a huge set of specifications to transform Telco networks to software based networks using cloud-based infrastructure and virtualized network functions and services under the term Network Function Virtualization (NFV). The Telco industry has chosen OpenStack as a de-facto standard for IaaS to implement NFV infrastructure solutions. The following diagram illustrates the ETSI NFV Architecture Framework and the related specification areas.

From a technology perspective cloud computing in IT and Telco is using different virtualization technologies to abstract from the underlying compute hardware (hypervisors like KVM, Xen, ESXi or Hyper-V , or container technologies based on OCI), as well as Software Defined Networking and Software Defined Storage to provide all resources to end users in a self-service manner via API’s or to allow Telco Service Providers a highly automated operation of their network. Linux is the dominant operating system in the cloud to host manyfold types of applications. Based on Linux, the container technology evolved rapidly and Kubernetes is now the industry standard for container orchestration. Meanwhile Kubernetes services are available on almost all Public Clouds or as enterprise products like Red Hat’s OpenShift Container Platform for Hybrid and Multi Cloud deployments (i.e. on-premises on Bare Metal or virtualized, or on many Public Clouds) to help companies on their digital transformation journey with an open hybrid multi cloud strategy. Kubernetes is also set as a core component for new Telco platforms to manage Containerized Network Functions for 5G or Open RAN (Radio Access Network).

Telco Cloud – what does it mean?

One short and crisp description from Red Hat [2]: Telco cloud is a software-defined, highly resilient cloud infrastructure that allows telcos to add services more quickly, respond faster to changes in demand, and centrally manage their resources more efficiently. It is one of the key foundational components for transforming a telecommunications company into a digital service provider.

In contrast to Hyperscalers, Telco clouds are not primarily built to offer IaaS, PaaS or SaaS to the market, as long as we do not consider network services like voice or mobile broadband as SaaS offerings, or Public Cloud offerings like Open Telekom Cloud from Deutsche Telekom.
Telco clouds are enablers for Telco companies to become more agile and faster to implement new services or features and to operate the network more efficiently. With more traditional Telco network technology, provided by so called Network Equipment Providers (NEPs) or Independent Software Vendors (ISVs), Telcos had or even have the challenge to integrate complex physical network functions, often coming with proprietary operations, administrations and maintenance concepts, to build network services. This requires highly specialized experts to operate and maintain these heterogeneous solutions. Usually it does not allow sharing hardware resources, as most of the physical network functions come as closed boxes, similar to appliances in the IT world, although most of these network functions are complex software systems, often already running on x86 COTS servers.  

One of the main targets of NFV was the decoupling of the HW and the network function software by introducing a virtualization layer, allowing a more flexible deployment of new software releases or adding capacity with a more standardized infrastructure, leveraging cost savings from the economy of scale. Hand in hand with NFV also Software Defined Networking (SDN) was introduced to enable highly automated network operations.
But when looking under the hood of existing physical network functions it turned out quite fast, that OpenStack in its early stages did not provide all capabilities to host virtualized network functions in a cost effective way, especially to provide comparable performance to physical network functions on data path processing. This forced the industry and open source communities to innovate and to implement specific features and functionalities like Enhanced Platform Awareness, SR-IOV, DPDK, vCPU pinning, Huge Pages or Realtime Kernel, just to name a few. As mentioned above Telco cloud is a software-defined, highly resilient cloud infrastructure. 
You may ask, what does it mean, why is Telco cloud highly resilient?
At least some Telco services are, in contrast to many IT services, systemically relevant services for each country and usually regulated by government policies. Therefore the requirements for service availability of Telco services like Voice or Emergency Call Services are usually 99,999% (the famous Five Nines) in contrast to IT or Public Cloud services with Service Level Agreements committing service availability of 99,99% or lower. Achieving 99,999% service availability requires specific architecture capabilities like redundancy and automatic failover mechanisms, resilience and self-healing functions to handle error situations, overload protection and so on. It also requires in-service software upgrade mechanisms to roll out changes (new features, patches, security fixes etc.) without shutting down the service, as 99,999% means ~5 minutes downtime per year, no matter if it is planned or unplanned. This is the reason why Telcos often did or still do SW changes in 3-4h maintenance windows during the night, where the service usage is lower and a potential problem has less impact for end users. Fast recovery mechanism or safe rollback are other capabilities of Telco solutions to achieve the high service availability requirements.
With the introduction of cloud some fundamental paradigm changes were introduced in how to design, build and operate so called cloud-native apps, examples are the The-Twelve-Factor app or cattles vs pets analogy or the Stateless design principle for Microservices based applications. While it is relatively easy to apply these new principles and paradigms to new applications, for the Telco industry it was and still is a big challenge and huge effort to adapt existing network functions, which are complex software systems with hundreds of staff years development efforts. Specifically the Stateless principle is challenging when providing network functions, where at least some are Stateful Entities specified by standardization bodies like ETSI or 3GPP. Telco Network Services are highly standardized to allow interoperability between Network Functions provided by different vendors and use e.g. Diameter based protocols for different interfaces and communication between network entities. Those protocols require storing a so-called session state and maintaining this state over the lifetime of a session and for different message flows. Adapting Network Functions to Stateless principles requires often to re-architecture and re-design to externalize the state, which opens new challenges to find cost efficient technical solutions fulfilling the low latency, high performance and data consistency needs for the state handling.

5G – Cloud Native from the Core to the Edge

5G – the fifth generation technology standard for cellular networks, specified by the 3rd Generation Partnership Project (3GPP) – often is named to be evolutionary and revolutionary. But what does it mean?

Let’s look back into the 1970s, where with the introduction of personal computers a rapid development of computers and electronic devices started. This led to the creation of the internet as well as new digital communication services for fixed and mobile telecommunication networks, and this has been referred to as the 3rd industrial revolution

For mobile telecommunication there was an evolution from 1G, which provided only analog voice call capabilities, to 4G Long Term Evolution (LTE) mobile networks, which offers service providers the technology to meet the constantly and rapidly increasing demand for more bandwidth, flexibility and reliability. LTE has an All-IP, flat network architecture, which can support higher capacity and greater speed for mobile broadband access as the generations before. LTE facilitates an enhanced internet experience on application-centric mobile devices like smartphones and tablets and has dramatically changed the way people access the internet via Email, social media, music or video streaming applications.
This evolution from 1G to 4G was mainly driven by increasing demand for either more mobile voice capacity (2G), introduction of mobile data capabilities (3G) and faster and better mobile broadband (4G) to provide higher speed access to the internet to allow people to work or play or consume multimedia services wherever they are.

The future of mobile telecommunication is likely to be different with the introduction of 5G and will affect different areas of our lives. 5G networks are designed to provide data speed in excess of 10Gbit/sec, ultra-reliable and extremely low latency connections in a secured and trusted environment. Envisioned as a social game changer that goes far beyond consuming e.g. high-definition video, 5G enables the widespread use of a range of technologies, including but not limited to artificial intelligence (AI) and machine learning (ML), cloud, robotics and virtual reality (VR).
The blend of physical, digital and biological domains enabled by these technologies will significantly impact different industries and global economics. Therefore 5G has been referred to as the 4th industrial revolution.
The following picture illustrates a set of vertical industry use cases, which were deeply analyzed to derive specific requirements for each use case.

These requirements were consolidated into basic 5G Telco Services Classes called 

  • enhanced Mobile Broadband (eMBB)
  • Ultra-Reliable Low-Latency Communication (URLLC)
  • massive Machine Type Communications (mMTC)
  • Vehicle to Everything (V2X)

The following diagram describes key technical capabilities to evolve mobile broadband services with higher E2E Service Availability, significantly higher Scalability and significantly shorter Service Deployment Time.

As mentioned above Cloud technology is a key enabler and a fundamental basis for 5G network architecture. Looking at the overall architecture in the following diagram,  it shows Cloud infrastructure management together with Software Defined Networking (SDN) as the basis on the Resource & Functional level to provide a programmable infrastructure for the Network and Services Level. It has to support the concept of Network Slicing to automatically instantiate so called Slices of separated networks with the above described service classes (e.g. mMTC) to address above mentioned use cases from different industry segments (e.g. Factory of the future). This is supported by a Services-based architecture of all 5G Network Functions, which allows an API-driven, Microservices based cloud native architecture.

Another important aspect is the extension of the Cloud from the Core (small number of centralized Data Centers) to the Edge (highly distributed, high number of smaller locations with space, power, cooling constraints). Specifically Low-Latency communication requires bringing the services near to the end user, because otherwise data needs too much time to travel over a longer distance (speed of light is limited). But also processing of a massive amount of data close to the data source provides new opportunities for real time services and allows data aggregation for a much more efficient transfer to central processing and storage services. That means 5G is also one of the drivers for Edge computing, which can be seen as a logical extension of – initially more centralized – Cloud computing.

Telcos are large Enterprises

Most Telco companies are large enterprises with their own enterprise IT, and the same IT trends can be seen in Telco companies as within any other enterprises of other industry segments. Telco IT organizations have the same challenges to keep lights on and at the same time to modernize and innovate with limited budget. They often try to address this with more efficient IT operations by introducing Cloud infrastructure and automation or by moving at least parts of their IT systems to the Public Cloud. 
Telcos operate, beside their Core Network Systems and related Operational Support Systems (OSS), also so called Business Support Systems (BSS), which are often based on industry-standard IT technology and applications to provide services like Charging and Billing, Customer Relationship Management or Product Catalog and Order Management to support the Telco Core business. The following picture illustrates the different segments of a typical Telco company.

Looking holistically at the different IT and Network segments it is obvious that Telcos can benefit from a Horizontal Telco Cloud with an Open Hybrid Cloud strategy, providing a standardized operating environment and automation technology stack at scale – no matter of the underlying infrastructure. Specifically for the introduction of Edge clouds it is important to provide a compatible application operating environment as for centralized clouds to allow developers to build applications once and deploy them depending on the business needs. As Edge computing introduces new challenges, especially for managing a high number of highly distributed systems, a modern application design shall try to centralize, wherever possible and only distribute, what is absolutely necessary. Following this design principle will lead to distributed application architectures, which will benefit from a standardized application operating environment, otherwise different operating environments will introduce additional complexity.

Telco and Open Source

While most of the Network Services were and still are built on proprietary products provided by so called Network Equipment Providers like Ericsson, Nokia or Juniper Networks or ISV like Mavenir, Metaswitch or Affirmed Networks, Open Source software is meanwhile used in many of these products as embedded part or as infrastructure components for Telco solutions. It started with Linux, when it became the standard Operating System for physical Telco Network Functions ten to fifteen years ago. As already mentioned above another important milestone was the introduction of Network Function Virtualization (NFV), which led to the introduction of OpenStack as the de-facto standard NFV infrastructure stack. For more details on OpenStack see also the blogpost of my colleague Wolfgang Marx. 

In that context we could also see how the collaboration started between the Telco community, which worked and still works with standardization groups in a more waterfall oriented way, and the Open Source community, which works in a more agile way with fast prototyping and short feedback cycles. Meanwhile, specific Open Source community projects like Open Network Automation Platform (ONAP) and Open Platform for NFV (OPNFV) exist to work on specific Telco platforms and solutions with members from both the Telco industry and the Open Source community. The O-RAN Alliance is another example of a new way of collaboration in the Telco industry to create standards for open solutions.

There are many other examples of how technology, coming from the Open Source community, is used in Telco. The most prominent is in my opinion Kubernetes as the orchestration platform to manage containerized workloads at scale in a very efficient way, for example by allowing to run Containers on Bare Metal and having a highly standardized life cycle management built into the platform stack and technologies like the Operator Framework.  Starting now Network Functions will be built as Containerized Network Functions and 5G will drive the introduction of Kubernetes-based Container Platforms as the standard NFV infrastructure for Telco Networks. 

Automation is key for efficient operations of cloud-based network services, technologies like Chef, Puppet or Ansible are used, which provides a comprehensive library for network automation.
CI/CD (Jenkins), Git and other tools are meanwhile standard in each Telco Network operations team, and modern Kubernetes-native operations models like GitOps are getting more and more popular and will help Telcos to transform in their way how they operate their network services. 

One example for a modern Telco Network solution using Open Source based Cloud technologies is the NIMS project at Deutsche Telekom, see [3], [4] and [5].

Telco goes Cloud. Really!

“The network is the computer” – you may see it as the legendary phrase or a visionary statement from Sun Microsystems from 1984 – has brought us to the cloud as we know it today.  And cloud services as we use them today would not exist without the Telco Network, providing access for end users from different devices with different connectivity, be it an old fashioned DSL line via copper cable or a broadband fibre optics connection at home, or the variety of smartphones, tablets or pocket sized mobile network access points to connect easily and nearly everywhere to the internet via the Radio Access Network of our Telco Service Provider.

With 5G the journey will continue and Telcos will transform their networks, their processes and their company culture to become cloud-native, whatever this buzz word really means. With Open Source, Telcos will benefit from the innovation power coming from the communities and will have the choice to select the right technology stack and partners. But the competition is not sleeping, Hyperscalers are investing massively to enter the Telco market and to position their proprietary infrastructure, technology stacks and cloud services to extend their share in the value creation chain. It will be interesting to see how this will evolve and where the already existing “coopetition” between Telcos and Hyperscalers will lead, but also how governments will influence with regulations, legal requirements and state investments for these systemically relevant services.

Telco cloud is already real.

References

[1] https://en.wikipedia.org/wiki/Cloud_computing

[2] https://www.redhat.com/en/topics/cloud-computing/what-is-telco-cloud

[3] https://www.youtube.com/watch?v=XUMe_xIUVPk

[4] https://www.youtube.com/watch?v=JWONP0UGy-w

[5] https://www.analysysmason.com/research/content/white-papers/DT-cloud-NIMS-rma16/