High Availability and Redundancy

Quuppa Technical White Paper

September 2020

Table of Contents

1. Introduction

2. General Principles

3. Tags

4. Locators

5. Quuppa Positioning Engine (QPE)

6. Quuppa Monitoring

7. IT Infrastructure

8. Conclusion

1. Introduction

High availability is when a system can run continuously without interruptions or so-called downtime, i.e. the system is always available. For business-critical systems requiring high availability and uptime, it is very important to plan the system environment so that service interruptions are not caused by any single points of failure (SPOF).

Building redundancy to achieve higher availability can typically also mean higher costs for the system. This cost should be compared with the risk of the costs caused if the system is not working. Therefore, the basis for the design and planning of the system should be based on business requirements:

  • How much unplanned downtime can the business tolerate?

  • What is the cost that the business is willing to pay for higher availability?

It is important to clarify the business requirements ahead of time, so that the right decisions can be made during the design phase about how much redundancy is required and what is the bearable cost for achieving that required redundancy and availability.

In practice, there are many different options for how to build redundancy for systems using Quuppa technology. It is the responsibility of the Quuppa Partner or Quuppa Customer to design the system redundancy according to the requirements specific to the end Customer case.

The purpose of this white paper is to describe the basic principles for building redundancy to achieve high availability for Quuppa Partner system environments and to support the selection of the right way to implement redundancy for the specific customer project.

2. General Principles

There are different ways to improve the availability of a system environment. This section will walk through some of the general principles of adding availability.

Redundancy and Failover

SPOF cause a risk for system availability, and this kind of risk can be eliminated by adding sufficient redundancy. The added redundancy makes sure that a single component no longer causes the failure of the whole system environment.

Built in system redundancy can use either automatic or manual failover between the components, as long as the failover is reliable and suitable for the use case. In order to detect possible component downtime, the implementation of monitoring or regular checks for the system are highly recommended.

Reactive and Proactive Monitoring

Reactive monitoring triggers alerts if something has already happened, e.g. a system component is down. In some cases, it may also be possible to set up proactive detection and alert systems to indicate that a failure might occur soon unless action is taken to prevent it. This kind of monitoring will help prevent unplanned downtime, because it allows problems to be fixed proactively before downtime occurs. For example, disk utilisation monitoring and alerts when disk capacity is e.g. 90% full can help prevent disks filling up and stopping the system from running.

For more information, see the Quuppa Monitoring section below.

Regular Updates

In general, it is good practice to plan systematic updates for all of the components in the environment to the latest available software and firmware releases. This is important for getting support from the vendors, in case of incident or error situation. This goes for all of the components, including applications, middleware, IT infrastructure (servers, switches and other infrastructure) and Locators. The possibility of creating a test environment (quality assurance environment) could also be considered, where different updates and patches can always be tested before installing them to a customer production environment.

Quuppa provides regular software and firmware updates for its products, which introduce new features as well as the latest bug fixes. It is important to keep the Quuppa products up-to-date in order to have the latest fixes to all known bugs. Customers and Quuppa Partners are recommended to purchase software maintenance (or a license that includes software maintenance) and make a plan for regular maintenance for system updates.

3. Tags

When Quuppa Tag downtime occurs, it is typically caused by the end of battery lifetime. In cases of critical systems, the partner may consider whether there should be two (or more) tags connected to the same tracked item in order to avoid downtime if one tag fails. A pair of two tags is typically used e.g. in live sports TV broadcast cases to provide redundancy and higher system availability.

Tag voltage levels are provided through QPE for Quuppa tags.

4. Locators

In order to avoid downtime or reduced system performance caused by a failed Locator, project planning should be done so that any Locator can fail without causing tracking blind spots or a significant reduction in the system performance. The density for Locator installation is dependent on many things and it is the Quuppa Partner's responsibility to consider what level of redundancy and performance is sufficient for their use case.

In addition to the sufficient Locator redundancy in the project plan (regarding Locators), the Quuppa Partner could consider if a small spare part stock of Locators near the customer system location is needed and how to arrange the physical Locator change if a hardware change is needed.

5. Quuppa Positioning Engine (QPE)

Redundancy for Large Enterprise Systems

For large enterprise systems, redundancy should be built by utilising DHCP options. In this case, the partner should have a monitor for the QPE and change the QPE IP address in the DHCP server to a secondary QPE if the primary QPE is experiencing issues. With this option, there will be a delay of approximately one minute by default before the Locators will reconnect to the secondary QPE.

Existing documentation about failover with DHCP servers is available on the Internet, see for example here.

Redundancy for Smaller Local Deployments

In smaller local deployments there may be two QPE instances running on the same subnet with Quuppa's -Dfailover parameter enabled. This option will allow the QPEs to monitor each other and the secondary can start automatically if the main goes down. This allows for very fast startup times for the secondary (around 1 second or so) and in fact, it was designed for a live international TV broadcast where there were very strict requirements on the availability of the data.

6. Quuppa Monitoring

Quuppa Locators can be monitored via the QPE Web Console interface. Additionally, in some cases the Quuppa Partner may have integrated the monitoring information into the software they provide via the API.

7. IT Infrastructure

The IT infrastructure for the system should also be designed using the redundancy principles (e.g. for servers, storages, network components, cabling).

DHCP

In both large deployments and smaller local deployments, the DHCP server is something that requires attention. As the Locators get their IP from the DCHP server, the DHCP server should also have redundancy and failover possibility for the system to have full redundancy.

Network

Typically, a system using Quuppa products will communicate with other systems using the network components in the customer company network infrastructure. Network connection availability is essential for service availability.

Monitoring

Depending on the customer's IT infrastructure environment, there may be possibilities to monitor the applications and IT components of the IT environment. These could include, for example, disk usage monitoring (to prevent disks becoming full and stopping the environment from running), network switch monitoring etc.

Cabling

There are several network constellations that can be used when cabling an IT network and connecting all of the IP devices. Some are good for minimising cable runs (extended star or daisy chain), but they are vulnerable to SPOF. For example, in the case of cable connectivity loss (e.g. defective cable or termination, physical cut or fire) all the equipment connected before such a point will lose connection. There are several ways for carrying redundancy in cabled networks, for example using twin-cable runs or creating "circular" network topologies, which allow the data traffic to be easily re-routed based on need.

Physical Separation

It is important to set up backup and failover IT network devices, from server machines to network components. However, it is also important that such devices are not physically located in the same place or right next to each other. For example, in the event of a fire, if the whole server room goes down it is important to have the failover replica physically separated from the main components so that it is not impacted by the destruction.

8. Conclusion

Higher availability for the end customer system can be achieved by adding redundancy, better architecture planning and additional preventive measures such as running regular maintenance for updates and fixes as well as setting up different types of monitoring. All of these measures may add costs, which the end customer needs to cover. Therefore, it is essential that the business requirements and tolerance for planned and unplanned downtime are clarified at the beginning of project planning and taken into account when planning the system architecture for the system.