How to achieve robust and high availability communication for critical energy and infrastructure

It's essential that your communication solutions operate with high availability, even in the face of device errors and maintenance requirements. In industries such as manufacturing, high availability helps avoid downtime and its associated costs and its risks of putting your operations behind schedule. In the electric power sector, high-availability communication is crucial for delivering a reliable electric power service and maintaining safety.


With the help of the right standards, protocols and equipment, this high-availability network architecture is attainable. In this guide, we'll discuss robust and high-availability communication solutions with zenon.


Local SCADA system design: The need for high availability

High availability is crucial in the electric power industry. It ensures the continuous delivery of power and the safe operation of equipment. The communication network must continue operating even if a piece of equipment fails, or needs to be shut off for maintenance work. To ensure high availability, a redundant architecture with several Supervisory Control and Data Acquisition (SCADA) servers, which have multiple Ethernet communication interfaces, is often used.


A platform like zenon provides powerful functions for redundant operation. The primary server and standby server are constantly synchronized and maintain a redundant connection to related devices. In case of an uncontrolled breakdown of the primary server, the standby server will take over without any loss of data. Preventive scenarios are also supported. “Rated Redundancy” in zenon makes it possible to observe specific metrics within the system. If any value starts heading towards a critical state, a controlled switchover can be automatically triggered. This may help to mitigate or even avoid any critical system states being reached.


The important role of standard protocols

Today, a number of protocol standards are discussed in the context of automation and control solutions in the energy domain. Due to its simple applicability in smaller facilities, MODBUS was often used – and can still be found – for local as well as remote monitoring and control. Due to its universal design, OPC UA can also be found in some installations. However, today the majority of applications and retrofit initiatives rely on DNP3, IEC 60870 or IEC 61850, or on protocols closely related to them. These protocols have been specifically created to be applied in a mission-critical energy environment. Therefore, as well as providing appropriate data models and data-exchange services to the application, they also use well-conceived mechanisms in the backend.


For example, the DNP3 protocol was designed specifically to work in remote areas with poor bandwidth or even connection losses. A consistent handshake between the DNP3 master and DNP3 outstation ensures that a set of events has been received properly by the master. Only then will the transaction be considered successful by both communication partners and the outstation will dispose of its local send buffer. Importantly, this protocol makes very economic use of networking resources and will remain functional even with limited bandwidth.


The example above indicates that communication protocols like DNP3, IEC 60870 or IEC 61850 can also play a part in resolving the current status of communication. In a SCADA system it is crucial to know at any time whether a data link or the quality of data is bad, a connection is lost, or if a device isn´t behaving as expected.

High availability is crucial in the electric power sector. It ensures continuous delivery of power and safe operation of equipment.

A closer look into protocol architectures

The OSI model (Open Systems Interconnection model) characterizes communication functions within a protocol stack. A protocol as a whole is decomposed into different functional layers, starting with the physical configuration (Layer 1) and ending with the actual application function (Layer 7). Layers in between deal with the addressing and routing of single telegrams, as well as with the association of a set of messages in the context of a persistent communication session. The TCP/IP stack, based on Ethernet physics, is one of the most popular because it is regularly used by classical Internet applications.


In terms of the protocol layers which are responsible for the “guidance” of telegrams through the different sections of the communication network, there is room for improvement with regard to the resilience of data exchange. Whenever there are multiple paths possible between sender and receiver, the disruption of the used path can be compensated by using an alternative path instead. There are different techniques proposed to handle this issue. The RSTP (Rapid Spanning Tree Protocol), for instance, is able to quickly determine a new path through the network. In cases of a breakdown of one specific path, the alternative path is quickly provided. The PRP (Parallel Redundancy Protocol) and the HSR protocol (High-availability Seamless Redundancy) are both based on the concept of sending a telegram twice through the network. The telegrams go via different physical paths. Even in the case of a disruption of one path, the telegram still reaches its destination via the second path.


These measures are utilized in protocol standards for energy applications. The IEC 61850 standard specifically references the PRP and HSR protocols to enhance the resilience of communication.


RSTP vs. PRP vs. HSR

Which redundancy protocols do you need to use to achieve the necessary level of availability? Is Rapid Spanning Tree Protocol (RSTP) sufficient? Or do you need to apply the Parallel Redundancy Protocol (PRP) or High-availability Seamless Redundancy (HSR)?


Rapid Spanning Tree Protocol (RSTP)

To create redundancy, you need to provide alternative paths of communication between source and destination devices. Ethernet doesn't allow for rings and loops because this would flood the network. Because of this, you need to establish a default path and be able to switch to a new path if a fault occurs.


RSTP prevents looping by creating a logical tree network that includes all of the switches on the network. After a network failure, recovery occurs rapidly, within a few hundred milliseconds or even faster. This quick recovery helps to minimize data loss and ensure the system continues to function properly.


Parallel Redundancy Protocol (PRP)

PRP provides seamless failover and only needs specific support in end devices. The switches within the network are standard Ethernet switches. An end device that uses PRP is referred to as a Double Attached Node for PRP (DAN P). These devices have a connection to both independent networks. The two networks can either be identical or can vary in structure.


A standard device that has a single network port is referred to as a Single Attached Node (SAN) and can connect to one of the two networks but not to the other. Alternatively, a device which does not have the necessary onboard capability to directly connect to the two redundant networks can be connected via a “Redundancy Box” (RedBox) that, in turn, connects it to both networks.


Whenever data needs to be transmitted, a PRP device sends the data to the network through both ports at the same time. The two data frames move through the networks and typically experience different delays as they do so. The PRP unit on the receiving side then only adopts the first data packet and discards the second packet.


High-Availability Seamless Redundancy (HSR)

HSR is designed mainly for use in ring topologies. A Double Attached Node for HSR (DAN H) uses two network ports to form a ring. Each node takes the data frames that are addressed to it from the network and then forwards them to the application. The nodes forward multicast and broadcast messages, passing them on to the application. To prevent data frames from continuing to circulate, the node that first placed the frame on the ring removes it once it is completed.


Unlike with PRPs, you cannot integrate SAN nodes into HSR networks directly without breaking the ring and must use Redundancy Boxes. As with PRP, HSR sends duplicate data frames from both of the ports. This way, if one path fails, the data will still transmit over whichever path remains intact. This enables zero switchover time without two parallel networks. HSR is less flexible than PRP, however, because it is always in the structure of a ring or coupled rings. Because of the duplicate transmission, only half of the bandwidth is available for data traffic at a time.


It is possible to meet the requirements of IEC 61850 using a combination of RSTP and link aggregation at a low cost. However, when running critical functions on a SCADA system, it's recommended to use PRP or HSR to achieve the quick network failover times needed. PRP also provides benefits related to maintenance and operation because it uses two separate, independent networks, making it useful for solutions that require high availability.

PRP or HSR offer a failsafe communication link for SCADA-related protocols, like IEC 61850 MMS, or protection functions via GOOSE.

Safeguarding IEC 61850 communication with PRP and HSR

The IEC 61850 standard, Communication and Systems for Power Utility Automation, establishes standard communication methods for intelligent electronic devices (IEDs) that are connected via an Ethernet network at electrical substations. The standard is part of the reference architecture for electric power systems created by Technical Committee 57 of the International Electrotechnical Commission (IEC).


The standard provides for several communications services, including client/server communication based on the Manufacturing Messaging Specification (MMS) protocol, the Generic Object Oriented Substation Event (GOOSE) protocol for quickly transmitting data over the network and the sampled values (SV) protocol for quickly transmitting analog values over the network. 


The protocols run on substation LANs or TCP/IP networks with high-speed switched Ethernet to ensure they have the necessary response times for protection systems. The standard includes the Ethernet Open System Interconnection (OSI) model, Layer 2 redundancy protocols, high-availability Seamless Redundancy (HSR) and Parallel Redundancy Protocol (PRP).


An important factor in availability is the time it takes to recover from a failure or piece of equipment going offline for any reason. For the redundant architecture to provide the level of availability necessary for electrical substations, the recovery time to restore operation needs to be minimal. The network recovery times for various functions of substations, as compiled by the IEC Technical Committee 57 Working Group 10, range from 100 milliseconds (ms) to 0 ms, also called bumpless.


“Detect and mitigate” when communication is disturbed

Even though the number one goal is to achieve a permanent state of connectivity, an automation solution must always be prepared to master situations where the data being transmitted is “bad” or connection is lost. In such a situation, it is crucial to detect any malfunction in the communication architecture and to react accordingly.

In zenon, there are multiple functions available to help in detecting communication malfunction and to take protective measures against negative effects. Protocol drivers in zenon can be monitored according to operational information, such as communication statistics and connection states. Each variable can be monitored for validity or health, based on its specific status information. Based on this, anomalies or interruptions can be detected. In reaction to this, maintenance staff can be informed via specific alarms or direct messages. In severe cases, the process could be sent into a safe state.


zenon also offers a native function to feed datapoints from different value sources. Whenever a primary value source (a driver) proves to be unreliable, the system can automatically switch to an alternative data source. This may help to continue operation without interruption if the second data source provides equivalent values. Otherwise, an alternative data source may support continuation in a safe productional state. The decision to switch to an alternative data source is processed via a configurable algorithm.


Having a redundant layout with primary server and standby server can give you precious options in case of communication disturbances. The rated redundancy mode in zenon gives you the option of continuous evaluation of the ”fitness” of your current server and standby server. Here, again, the evaluation algorithm can be configured based on the various values and metrics you get from the system platform (PC/server) or drivers. If there are any weak connections to the current primary server, these weaknesses may not be present on the standby server. In this case, the system could decide to perform a seamless switchover to the standby server. While that server leads the process, any malfunction on the former primary server can be investigated and fixed.


Monitoring of network and device health with SNMP

The Simple Network Management Protocol, or SNMP, gives you visibility into and control of your communication network. It enables you to monitor devices and network equipment within the local facility. With a solution like zenon, you can collect and store data about your network, receive notifications if problems occur, and even enable automatic adjustments based on the data you gather. An SNMP agent can give you, for example, information about whether a device is working properly, where errors have occurred, what types of errors have occurred, which ports on a switch are being used, and what the server central processing unit (CPU) temperature is.


The primary capabilities that SNMP enables include:

  • Monitoring network devices
  • Remotely controlling and configuring network devices
  • Recognizing and reporting device faults throughout the communication network


These functions are crucial for the safe and continuous operation of communication networks and the specific devices within electrical substations.


How SNMP works with zenon

zenon has an SNMP driver and can serve as an SNMP manager. This capability allows you to monitor and configure your SNMP agents as needed. zenon displays the collected data as variables, which you can edit in zenon. Your data can, for example, be shown in a process graphic, evaluated as part of a report or stored in an archive.


When acting as a manager, zenon can also trigger alarms if, for instance, it receives a critical value. It can also intervene automatically, based on the data it receives, to control devices.


zenon can also serve as an SNMP agent. When acting as one of the agents in a network, zenon sends data to the SNMP manager using the zenon Process Gateway. Through this process, a superordinated unit can monitor the operational state of zenon. If, for example, zenon is used as a control system in an unmanned substation, it could act as an SNMP agent.

SNMP makes it possible to monitor network components. It helps to detect whether a device is close to failure or requiring maintenance.

The benefits of SNMP

SNMP is one of the most commonly used protocols for monitoring and managing network devices. The protocol works reliably and does not require an especially complicated architecture. It does not, for example, rely on the IP networking protocol as a transport medium. This relative simplicity makes it easier to implement.


SNMP is also extremely versatile. A wide variety of hardware supports it, including routers, switches, access points, gateways, printers, scanners and Internet of Things (IoT) devices. It can be used for everything from switch monitoring to managing entire networks.


SNMP's modularity is another benefit. You can easily add and remove devices and set up your network in various configurations without interrupting your monitoring or management capabilities.


Compliance and security

Protocol standards are very valuable in the area of energy automation, not least to stimulate functional innovation and interoperability between different devices and vendors. In the context of resilient networking, they contribute with a precise description on how particular end-points shall behave under specific circumstances. All transactions and states are clearly defined, which also allows the observation of specific connections. Being compliant with particular standards like DNP3, IEC 60870 or IEC 61850 is, therefore, an important prerequisite for a well-managed communication solution. zenon offers comprehensive support for these protocols. Its utilization in a variety applications has resulted in numerous enhancements. COPA-DATA follows up very closely on all related standardization activities. The IEC 61850 client driver of zenon, for instance, is Edition 2.0 certified by TÜV SÜD. This confirms that the driver will work reliably and provide support for up-to-date functions.


Cyber security is another critical consideration. In the first instance, measures such as authentication and encryption are applied to hinder intruders from compromising the communication network and disturbing the network. Another effect of some of these measures is to check the integrity of data, transactional sequences and the authenticity of communicating entities. The IEC 62351-3 standard, for instance, specifies requirements in terms of the end-to-end security for TCP/IP based connections via TLS (Transport Layer Security). In turn, TLS uses various cryptographic methods for key exchange, authentication, encryption or hashing. These methods also enable thorough checks on the integrity of the overall communication and offer a way to ensure it proceeds correctly. Security mechanisms will, in general, contribute to reliable communication within your overall solution.


Achieving high-availability architecture with zenon

zenon from COPA-DATA integrates SCADA, HMI, reporting, alarms and other features in one powerful software platform. It enables the automation, monitoring, control and analysis of operational processes.

It includes compliant and up-to-date versions of all major energy protocols, such as DNP3, IEC 60870 and IEC 61850 and supports PRP. The zenon software platform also supports a number of redundancy scenarios for the related protocol drivers. Cyber security, according to IEC 62351-3, can be attained when using these protocols with zenon. Fundamental features, such as redundant server operation or conditional switching to backup data sources, protect the solution against instabilities in the system environment. Via SNMP, monitoring of network devices is also easy when using zenon. All relevant signals and status values are accessible for you to manage your solution infrastructure reliably.


Contact Us

With more than 30 years in the industry, we continually improve our products to ensure they maintain all the logic and algorithms required to achieve the performance and functionality you need. Do you have any questions about zenon or how zenon can enable high availability at your facility?