#95: Software Application Speed - the Content Delivery Network

Continuing the conversation on Software Application Speed, I look at one of the means of improvement - using a Content Delivery Network (CDN).

In this episode I introduce the Content Delivery Network (CDN); how it works, why you would consider it, and the concerns that may come with it.

Or listen at:

Published: Wed, 28 Jul 2021 14:53:37 GMT


Hello and welcome back to The Better ROI from Software Development podcast.

In this week's episode, I'm going to continue the conversation on speed, the speed of our software applications. And this episode follows very much on the last episode where I talked about Cashing - in this episode. I'm going to talk about Content Delivery Networks or CDNs.

So what is a CDN?

Wikipedia describes a CDN as being:

"A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially relative to end users. CDNs came into existence in the late 1990s as a means for alleviating the performance bottlenecks of the Internet, even as the Internet was starting to become a mission-critical medium for people and enterprises. Since then, CDNs have grown to serve a large portion of the Internet content today, including web objects (text, graphics and scripts), downloadable objects (media files, software, documents), applications (e-commerce, portals), live streaming media, on-demand streaming media, and social media sites"

So if you listened to the last episode on Caching, a CDN is very similar in concept. A customer will request something from a website. That website will be set to go via a CDN. If the CDN already has that content then it's a "hit", very much in the same way as a Cache, so it's not having to go right the way back to the original service. It can take the data from the CDN."

A "hit" improves performance for the customer. It also reduces load on the core service, your primary servers that you have.

So how does it work?

Say, for example, you've got a marketing PDF. Your server will be the origin, you have that marketing PDF served on your server - whether that being in a data center, the cloud or wherever you want to host it.

Your customer will then try to request that marketing PDF. Because the way you've configured your site, the customer actually makes the request to the CDN, not to your origin server. The request from the customer browser will go to the CDN. The CDN will check to see whether or not it already has a copy of your marketing PDF. If it does, it will serve that marketing PDF. If it doesn't, it will go back to your origin, it will then serve the original requester that marketing PDF, but at the same time save it so it's got a copy of it for later.

This goes back to the same "hit" and "miss" philosophy that we got in Cashing, as I talked about in the last episode. Having a "hit" dramatically increases the speed for the customer, alleviating load on your origin service. A "miss" slows it down ever so slightly, but negligibly.

In principle, it's very similar to the browser cache, but for a geography. Whereas if I've gone to a website, my local browser will cache images, pages, script to save me having to download them, the CDN is doing the same thing for a geography.

So say, for example, your origin is in the U.K., but you have customers in the United States. Then the CDN with being located geographically means that you can cache that content closer to the user, the first person that requests that marketing PDF that's in the United States will effectively populate the cache. OK, the first request will have to go from that requester to the CDN to the origin in the U.K. But subsequent requests can go to much more local requests rather than having to go all the way back to the origin.

And many CDN providers have many, many instances across the world in many geographical locations, meaning the wider and more geographically dispersed your audiences, the more value you could see from using a CDN.

One of the largest CDN providers, CloudFlare, says it covers 200 cities in over 100 countries. This produces quite a lot of very localised geographical points of presence for trying to avoid having to make really long, round trips for images, content, and so on. And they provide that as a service. If you were trying to build that yourself, it would be exponentially expensive to try and cover that many sites across the globe, whereas in in this instance you can choose to spend money with a service like Cloudflare, and there are others, to be able to get that benefit without having to put the outgoing to actually build that structure.

Ultimately, it's up to you as the business provider as to whether you want to use a CDN. The CDN is in your control. You can choose to use it and to pay for it to be in place. Once you get over a certain level, a certain amount of traffic, a CDN will cost you money. Now, you can balance that against two things primarily, And I go through the pros and cons a bit more in a second, but primarily you're balancing that up against the cost of actually operating the CDN with a little bit of extra complexity of having it in place versus the reduction on the infrastructure hit to your origin servers and the speed of response to your actual customers - which is, as I say, the more geographically diverse they are the more vulnerable this probably is to you.

OK, let's talk about the pros and cons a bit more.

For the pros, again, it's very similar to the caching that I talked about in the last episode; speed for your customer. Because it's geographically closer to the customer in most cases than your server, it will be faster.

You also potentially get a level of resilience.

My website, for example, goes through CloudFlare. So if you go to red-folder.com, your request is going to CloudFlare, and from there, if needs be and it's not in the CDN, to my origin server. Now, if for any reason, my origin server is offline and CloudFlare already has that content cached, then it can serve most of my website without ever needing to talk to my origin server. So if for any reason my origin server is offline, then I've got that added level of resilience.

You're getting a saving on the cost of your origin service. By using CloudFlare, an amount of traffic is handled by CloudFlare, not having to hit my origin servers, which I'm using Microsoft Azure for. By being handled by CloudFlare, I'm not having to pay quite as much money for Microsoft Azure.

There are also some growing technical options you can get with using CDNs - they're starting to allow for more technology at the edge, what they're considering that closest point to the end customer. With them having data centres around the world, much closer potentially to the customer than your servers, you've got the option of doing things closer to the customer.

CloudFlare, for example, are starting to make available the ability to run technology and processes closer to the customer - edge computing. And they're not the only ones. A lot of companies are doing this now, and it's well worth looking to see whether that may be useful for you as part of a technological option.

They will also generally provide a level of distributed denial of service protection. I'll touch on DDOS probably in a future episode, but think about it as a security attack against your website by people trying to flood your website with so many requests. Imagine you're a small shop. Normally you'd have five people in your shop at any one time. By producing a massive queue of people outside of your shop - hundreds deep, that aren't there to actually do anything other than browse or get in the way - the customers that you want in the shop are no longer able to get into the shop because there's this massive queue - and that's what a DDOS attack is.

You'll find commonly CDNs are built to protect and help you against that DDOS attack. In fact, if you go to the CloudFlare website now, one of the buttons on top right corner is "are you under attack now?" And they'll help you with that by providing resources and being able to filter out that rubbish traffic.

And you can expect to see over time the CDNs provided more and more functionality, including more security and more ability to run processes at the edge, closer to the customer.

So let's talk about the cons.

Again, I'm going to come back to staleness. Staleness is probably the biggest thing we talked about in caching in the last episode, and the same is true here. You have to balance the act of availability vs. consistency.

What's more appropriate, having something available or it being the correct thing, the most up to date thing?

There's a world of difference between a website showing news articles and showing current stock prices.

And again, for me, this is really a business decision or one that needs to be fought about as to where the balance is for you as a business owner - is it that availability or that correctness?

Cost is obviously another factor. Personally, I think cost probably does offset itself. Yes, a CDN will cost you money, but you're probably going to save money on your origin service.

I mean, it's something you obviously need to look and calculate properly, but my expectation is that you would save money from it. I personally wouldn't necessarily suggest that the reason to do it. But, it does offset some of the concerns that people may have about going to a third party and adding an extra body into your solution on a cost basis.

And the final con I wanted to talk about this episode was it does increase technical complexity. There's an extra step between the customer and your website. This does take a little bit of extra work, from a technical perspective, to set up and potentially get right, especially if you want to get the balance of that staleness, that availability versus consistency correct.

But you also have to accept you're taking on an additional risk, you are heavily reliant on a third party to be in the route between your customer and you. Your customer will have to go through this third party to get to your website. So what happens if that third party goes offline, stops trading, is unavailable to actually provide the service to you? What happens then?

In most cases, that isn't really too much of a concern. Yes, it's a risk, but if you look at some of the big CDN providers, their facilities and capabilities for being able to handle traffic and being able to handle cyber attacks such as DDOS is so much stronger than you could ever build on your own.

And such, almost certainly in most cases, they will be a more resilient path for you to use than the actual technology that you're using to serve your systems.

But that isn't to say something can't go wrong. Certainly CloudFlare that I've talked about before has had problems in the past. So if, for example, something goes wrong at CloudFlare, they could take off considerable portions of the Internet because so much the Internet goes through them.

It can happen. I believe it has happened. And statistically it will happen. It's just how big a risk is that to you as an organisation? My personal view is it isn't big enough to outweigh the benefits. But as a business owner, you need to make that assessment for your own business.

In this episode, I wanted to finish up the speed of application mini-series by talking about the Content Delivery Network. The Content Delivery Network builds very much on that idea of caching I talked about in the last episode. It gives you the pros and cons of being able to get speed to your customer, potentially saving you money, but at the same time also giving that complexity of staleness and increasing the technical complexity of providing it to your customer.

In most cases, a CDN will be the right thing for any customer facing website. But there are risks and you do need to understand those and whether they are right for your organisation.

Thank you for taking the time to listen to this podcast. I look forward to speaking to you again next week.