QR code

Jare.io, an Instant and Free CDN

  • Palo Alto, CA
  • modified on
  • comments

cdn

badge

CDN stands for a Content Delivery Network. Technically, it is a bunch of servers located in different countries and continents. You give them your logo.gif and they give you a URL, which resolves to a different server depending on who is trying to resolve it. As a result, the file is always close to the end-user and your website loads much faster than without a CDN. Sounds good, but all CDN providers want money for their service and usually a rather complex setup and registration procedure. My pet project jare.io is a free CDN that is simple to configure. It utilizes AWS CloudFront resources.

First, let me show how it works and then, if you're interested in the details, I will explain how it's done internally. Say you have this HTML:

<img src="http://www.teamed.io/image/logo.svg"/>

I want this logo.svg to be delivered via a CDN. There are two steps. First, I register my domain at jare.io:

figure

Second, I change my HTML:

<img src="http://cf.jare.io/?u=http://www.teamed.io/images/logo.svg"/>

That's it.

Try it with your own resources and you will see how much faster they will be loaded.

It's absolutely free, but I ask you to be reasonable. If your traffic is huge, you need your own account in CloudFront or somewhere else. My service is for small projects.

Now for more technical details, if you want to know how technically this solution works. First, let's discuss what CDN is and how it works.

URL, DNS, TCP, HTTP

When your browser wants to load an image, it has a URL for that, like in the example above. This is the URL: http://www.teamed.io/image/logo.svg. There are three important parts in this address. First is http, the protocol. Second is www.teamed.io, the host name, and the tail /images/logo.svg, which is the path. To load the image, the browser has to open a socket, connecting your computer and the server, which has the image. To open a socket, the browser needs to know the IP address of the server.

There is no such address in that URL. In order to find the IP address, the browser is doing what is called a lookup. It connects to the nearest name server and asks "what is the IP address of www.teamed.io?" The answer usually contains a single IP address:

$ nslookup www.teamed.io
Server:   172.16.0.1
Address:  172.16.0.1#53

Non-authoritative answer:
www.teamed.io canonical name = teamed.github.io.
teamed.github.io  canonical name = github.map.fastly.net.
Name: github.map.fastly.net
Address: 199.27.79.133

IP address of www.teamed.io is 199.27.79.133, at the time of writing.

When the address is known, the browser opens a new socket and sends an HTTP request through it:

GET /images/logo.svg HTTP/1.1
Host: www.teamed.io
Accept: image/*

The server responds with an HTTP response:

HTTP/1.1 200 OK
Content-Type: image/svg+xml

[SVG image content goes here, over 1000 bytes]

That is the SVG image we're looking for. The browser renders it on the web page and that's it.

The Network of Edge Servers

So far so good, but if the distance between your browser and that IP address is rather large, loading the image will take a lot of time. Well, hundreds of milliseconds. Try to load this image, which is located on a server that is hosted in Prague, Czech Republic (I'm using curl as suggested here):

$ curl -w "@f.txt" -o /dev/null -s \
  http://www.vlada.cz/images/vlada/vlada-ceske-republiky_en.gif
    time_namelookup:  0.005
       time_connect:  0.376
   time_pretransfer:  0.377
 time_starttransfer:  0.566
                    ----------
         time_total:  0.567

I'm trying to do it from Palo Alto, California, which is about half a globe away from Prague. As you can see, it takes over 500ms. That's too much, especially if a web page contains many images. Overall, page loading may take seconds, just because the server is too far away from me. Well, it will inevitably be too far away from some users, no matter where we host it. If we host it here in California, it will be close enough to me and the image will be loaded instantly (less than 50ms). But then it will be too slow for users in Prague.

This problem has no solutions if the server generates images or pages on the fly in some unique way and if we can't install a number of servers in different countries and continents. But in most cases, such as our logo example, this is not a problem. This logo doesn't need to be unique for each user. It is a very static resource, which needs to be created only once and be delivered to everybody, without any changes.

So, how about we install a server somewhere here in California and let Californian users connect to it. When a request for logo.gif comes to one of the edge servers, it will connect to the central server in Prague and load the file. This will happen only once. After that, the edge server will not request the file from the central server. It will return it immediately, from its internal cache.

We need to have many edge servers, preferably in all countries where our users may be located. The first request will take longer, but all others will be much faster because they will be served from the closest edge server.

Now, the question is how the browser will know which edge server is the closest, right? We simply trick the domain name resolution process. Depending on who is asking, the DNS will give different answers. Let's take cf.jare.io, for example (it is the name of all edge servers responsible for delivering our content in AWS CloudFront, a CNAME for djk1be5eatcae.cloudfront.net). If I'm looking it up from California, I'm getting the following answer:

$ nslookup cf.jare.io
Server:   192.168.1.1
Address:  192.168.1.1#53

Non-authoritative answer:
cf.jare.io  canonical name = djk1be5eatcae.cloudfront.net.
Name: djk1be5eatcae.cloudfront.net
Address: 54.230.141.211

An edge server with IP address 54.230.141.211 is located in San Francisco. This is rather close to me, less than fifty miles. If I do the same operation from a server in Virginia, I get a different response:

$ nslookup cf.jare.io
Server:   172.16.0.23
Address:  172.16.0.23#53

Non-authoritative answer:
cf.jare.io  canonical name = djk1be5eatcae.cloudfront.net.
Name: djk1be5eatcae.cloudfront.net
Address: 52.85.131.217

An edge server with IP address 52.85.131.217 is located in Washington, which is far away from me, but very close to the server I was making the lookup from.

There are thousands of name servers around the world and all of them have different information about where that edge server cf.jare.io is physically located. Depending on who is asking, the answer will be different.

AWS CloudFront

CloudFront is one of the simplest CDN solutions. All you have to do to start delivering your content through their edge nodes is to create a "distribution" and configure it. A distribution is basically a connector between content origin and edge servers:

PlantUML SVG diagram

One of edge servers receives an HTTP request. If it already has that logo.svg in its cache, it immediately returns an HTTP response with its content inside. If its cache is empty, the edge server makes an HTTP request to the central server. This server knows about the "distribution" and its configuration. It makes an HTTP connection to the origin server, which is www.teamed.io and asks it to return logo.svg. When done, the image is returned to the edge server, where it is cached.

This looks rather simple, but it's not free and it's not that quick to configure. You have to create an account with CloudFront, register your credit card there, and get an approval. Then you have to create a distribution and configure it. You should then create that CNAME in your name server. If you're doing it for a single website, it's not a big deal. If you have a dozen websites, it's a time consuming operation.

Jare.io, a Middle Man

Jare.io is an extra component in that diagram, which makes your life easier:

PlantUML SVG diagram

Jare.io has a "relay", which acts as an origin server for CloudFront. All requests that arrive to cf.jare.io are dispatched to the relay. The relay decides what to do with them. The decision is based on the information from the HTTP request URI. For example, the request from the browser has this URI path:

/?u=http://www.teamed.io/images/logo.svg

Remember, the request is made to cf.jare.io, which is the address of the edge server. This exact URI arrives at relay.jare.io. The URI contains enough information to make a decision about which file has to be returned. The relay makes a new HTTP request to www.teamed.io and retrieves the image.

The beauty of this solution is that it's easy. For small websites, it is a free and quick CDN.

By the way, when we query the same image through jare.io (and CloudFront), it comes back much faster:

$ curl -w "@f.txt" -o /dev/null -s \
  http://cf.jare.io/?u=www.vlada.cz/images/vlada/vlada-ceske-republiky_en.gif
    time_namelookup:  0.005
       time_connect:  0.021
   time_pretransfer:  0.021
 time_starttransfer:  0.041
                    ----------
         time_total:  0.041

Most of the work is done by AWS CloudFront, while jare.io is just a relay that makes its configuration more convenient. Besides, it makes it free, because jare.io is sponsored by Teamed.io. In other words, my company will pay for your usage of CloudFront. I would appreciate if you kept that in mind and didn't use jare.io for traffic-intensive resources.