In the economics of computing the most expensive resource is still the network. In scenarios where large data-sets are being processed it is almost always cheaper to move computation to the data than the other way round. My friend and former colleague Dave McCrory nicely captured this reality in his concept of Data Gravity.Today's cloud based computing architectures assume that all data will flow to the center to be processed. Unfortunately, this centralized data-processing model is not likely to be economically viable as we look forward to a tsunami of data being generated by trillions of connected devices and sensors.
In the brave new world of the 'Internet of Things' - IoT — moving every bit of generated data from edge devices to the center for processing will likely make little economic sense. A new distributed data processing architecture is going to be required.
Content Distribution Networks (CDNs) are a common way of efficiently moving data from the center to the edge of the network but a new generation of Content Aggregation Networks (CANs) may be required to make the processing of IoT data economically viable — is there a 'CAN' in your future?
Content Distribution Networks — CDNs are frequently used to get data out to edge devices with the most efficiency and cost effectiveness. A CDN acts an efficient distribution network for data accessed by end users — html, css, scripts, audio, video etc. — all the stuff that ends up being rendered by our browsers and media players. CDN’s move large volumes of this data from the center out — through high capacity, low cost, pipes — to local 'Points-of-Presence' which are geographically close to the end user. This model allows for economically efficient distribution while at the same time greatly improves the end user experience by reducing the time to serve individual user requests.
Overcoming the economic challenges of transferring huge volumes of IoT generated data are likely to need network distribution architectures that work like CDNs in reverse. In effect we’re going to need a new generation of 'Content Aggregation Networks' (CAN). A CAN will compliment a CDN by optimizing the flow of data from edge devices to cloud based processing in the center. But what sets that apart from standard networking technology and topologies that make up today’s IP network? In two words: Computation and Filtering.
In an IoT connected world it will make little sense to move every bit of data generated by a sensor — or edge device — to the center for processing. The primary role of a CAN will be to distribute tiered computational processing and filters across the network including to the edge devices. Just like the supplier network of a large manufacturing company the CAN will ensure that data gets processed where it makes sense and only 'value-added' bits are allowed to flow towards the center. In essence a CAN will enable architects to assign a relative economic value to each bit and only then allow those bits which justify the cost of transport to flow upstream.
In a CAN enabled architecture an environmental sensor might only send bits across the network when a detected signal crosses some % value above ambient. The CAN would be responsible for distributing the the computation and threshold filter algorithms to the sensors across the network. Even when threshold is crossed it might not make economic sense to flow data all the way to the center for processing. Instead the CAN would provision a data processing node close — an edge computing instance — to the sensor network so that aggregate sensor data can be locally analyzed. Only the analytic product from this local processing may eventually flow to the center.
Our current generation of 'Big Data' processing infrastructures — such a Hadoop — use a distributed architecture where computation is distributed across a number of 'compute nodes' which then act on local subsets of data. From one perspective a CAN could be thought of as a network wide version of the same architecture. However, a CAN would also be required to initiate edge computing instances on demand and to provide distributed cost modelling and filtering mechanisms to optimize the downstream flow of data towards the center in a cloud based computing environment.
The competitive landscape for CANs appears to be wide open at the moment but there’s some indication that this will soon change. The major cloud service providers — Amazon, Google and Microsoft — are likely to see this as a major opportunity but the need to distribute computation to the edge of the network will challenge their existing infrastructure and business models.
The opportunity to move up the value chain with CANs is likely to be far too compelling for Cisco to let the opportunity pass them by. Cisco have been looking for a rational to exploit the distributed computational fabric of their router business for years. The CAN opportunity may be just the ticket.
The existing Hadoop and 'Big Data' processing players are also likely candidates seeing CANs as a natural extension of their existing distributed data processing architectures but its difficult to see how any of them could reach the scale to be relevant unless in partnership with one of the big infrastructure players.
For the highly distributed computational CAN model to achieve global scale we’re also going to need a set of standards. Will we see the emergence of a common — interoperable — standard for the distribution of computational algorithms, filters and cost models across highly distributed infrastructures — including to the most microscopic of end-point sensor nodes? I hope so, because without these standards and a new generation of CAN architectures — and in a world with a growing deluge of IoT data — economically sifting the signal from the noise is likely to be close to impossible.