Skip to content

Design Goals

HyperCloud is a highly integrated, turn-key hardware and software solution for building private clouds.

The design and implementation of HyperCloud is governed by seven key principles, outlined below.

Goals

Design Goals

Freedom wherever possible

With its foundation of integrated hardware and software that prioritizes highly performant interoperability at every level, HyperCloud's deployment is a set of fixed, easy-to-replicate steps. This consistency simplifies the initial steps for getting HyperCloud running - but this doesn't mean restriction and lock-in for what you can do with it. HyperCloud aims to provide freedom to the operator wherever possible, combining the best aspects of a turn-key solution and a DIY approach.

Simplicity and elegance

Building and operating a private cloud should be a simple, straightforward, and well-documented process.

Scalability and resilience

The cloud should function equally well, whether it is built on minimal quarter-rack deployments to dozens of racks. It needs to offer enough resilience that operators feel comfortable committing to quality SLAs (Service Level Agreements) for their tenants and stakeholders.

Sustainability

The data center landscape is changing rapidly, and strict regulation is being introduced around power usage efficiency and the approach to cooling and power consumption. We manufacture appliances that assist our customers on their journey to net zero emissions, optimizing for efficient operation and cooling.

Turn-key appliance

We provide a mature and exemplary cloud, delivered as a turn-key appliance. Building and deploying a private cloud should be as easy as installing a home WiFi router. It should be easy to deploy and service from thousands of miles away - both in the rack and at the edge.

Integrated and supported

Every facet of the cloud solution stack required should be provided by HyperCloud, and the whole stack should be fully supported, ideally by a single vendor.

Scalability

Deliver a cloud solution that our customers will be able to operate efficiently at massive scale.

Design Corollary

State is lethal

Distributed systems are resilient based on their ability to shrink and grow, whereas state leads to complexity, inconsistency, and makes systems impossible to manage at scale. A good chunk of service failure and delayed recovery can be blamed on stateful systems and discrete points of failure. Stateless systems mean planning for failure from the outset, which is an operationally hygienic approach.

Hardware matters

The ugliest and most unexpected issues tend to come down to failures in hardware where systems only partially fail - these are also the hardest to debug. Faulty error correction, memory corruption, CPU failure, and kernel panics can all lead to serious problems. We've learned the hard way that to run infrastructure building fleets on consistent, task-specific hardware is the way to go. Once the task-specific commitment is made, amazing things can be done to improve operator experience.

Open source in production needs a strict ruleset

Open source software is the foundation of server infrastructure worldwide, and consistently produces higher quality, more secure code. When used correctly this code is extremely powerful, but most open source software doesn't come with usability guidelines, or in many cases not even with basic documentation. Running open source software to underpin production systems requires a strict approach to quality assurance.

Ownership is critical

Cloud operators who do not have ownership over their own platforms are crippled by inability to operate, support themselves, and make the changes that their tenants and customers request. Many private cloud acquisition models will try to lead operators to surrender ownership of different facets of their stack, and will mask these models as a boon. This is to be avoided at all costs.

Features must not compromise simplicity

Most cloud operators, even at massive scale, are just looking for a way to manage and provision resources, and provide resilient services to their stakeholders along with SLAs. At the end of the day though, it's only software and there will always be a way to boot from a NIC-mounted ARM chip, directly access memory addresses on remote machines, or use Raspberry Pis as compute hosts. These are all fun and viable things to do, but not at the expense of solution integrity and simplicity. In many cases these are over-engineered answers to commercial questions that were originally very different. Sensible operators need their cloud solutions to be an abstraction inversion for the technology underneath.