Â鶹Éç

« Previous | Main | Next »

Scaling the Â鶹Éç iPlayer to handle demand

Post categories: ,Ìý

Simon Frost | 12:30 UK time, Friday, 2 July 2010

One of the key goals we set ourselves when we developed the was that it would have to be fast to use. We understand that any delay in getting you to the video is frustrating as the site is just a jumping off point into TV and Radio content.

But how do we make things fast? Displaying a web page in the browser contains many steps, some we can control some we can't. Time spent for the request and response travelling over the network we can't control, but we can control how long the pages take to generate and how large they are. We also have a degree of control over how long those pages can take to render in your browser.

We had our work cut out for us on the new version of iPlayer.

Personalised websites require much more processing power and data storage

The current site uses one back-end service that we pull data from to build the pages. The uses many more, and we both post and pull data from them.

This means that every returning user gets a different homepage. There's already a small amount of difference between each homepage on our current site (your recently played) but the new site is driven much more by your favourites, recommendations and friends; they're key parts of the experience and they have to be fast.

We started developing in PHP

The Â鶹Éç is standardising on as its web tier development tool. Our current site is developed using Perl and Server Side Includes, and it's something that's well understood, but our new web tier framework (based on ) means that teams can share components and modules. In fact, the team responsible for the social networking functionality develop modules that anyone within the Â鶹Éç can integrate into their site easily.

This does come at a cost though: the usage of a framework sometimes introduces delay in generating a page as it needs to get hold of resources to do so. In some cases this is necessary, especially if there's an element of personalisation, but in others our web tier is just repeating the same tasks.

All this against a growing demand

The site will have to support a massive amount of page views and users every day, on average 8 million a day for 1.3 million users. Previous versions of the site were able to grow into this demand; we'll have to hit the ground running from day one.

page-views_595.png

This graph shows our growth over the last year in terms of monthly page views.


So how do we do this?

One of the first things we can do is optimise the time it takes to generate the page.

Although changing architectures can be risky, we were confident that the one we moved to would enable us to meet all the challenges. At the heart of page generation is a PHP and customised Zend-based layer called PAL. This system then needs to integrate with our login system, Â鶹Éç iD, our programme metadata system (Dynamite), our social networking systems, a Key Value data store and a few others. The homepage alone for a logged-in user with friends requires 15 calls across these services. Even if each of those calls take a few milliseconds, we can spend a second or two just collecting the information required, which would push us well out of our 2.5s target.

We proved our architecture before we built it

At the start of re-architecting iPlayer, we did what we could to eliminate guesswork. We developed a number of architectures based on our requirements, and then built prototypes of three of them; all built to serve the homepage, which we then tested against some basic volumetrics. This gave us plenty of data about how many requests we could serve a second and CPU loads, which we could then weigh up against other softer factors, like how our dev team could work with it.

We actually ended up going for the one which offered us a good balance between these factors, as this enabled us to be the most flexible in building pages, rather than constraining what we could with the site just to squeeze the extra speed out.

We cache a lot

Caching means storing a copy of the data in memory so subsequent requests for that data don't have to do the expensive things such as database queries.

It also allows us to get around any delays introduced by our framework starting up, as there's no such delay when delivering from cache.

Caching has its problems though. The data may have changed in the underlying system (programmes become available to play for example) but the change won't be reflected in our cache. This means we can only cache for seconds or minutes, but with the millions of page views we get, it can still make a crucial difference.

  • Data caching We cache the data returned from the services. We use for this. Sometimes we share data between pages.
  • HTML caching We also cache the resulting HTML for a short time. When you're hitting a page, it's highly likely you're just seeing the cached page. We use for this. Caching in this way is nothing new, but Varnish has a few tricks up its sleeve that we use which I'll explain later.

We broke the page into personalised and standard components

If you look at our homepage, many of those components are the same for everyone, but some are just for you. With traditional page caching in some reverse HTML caches, it's not possible to do this; so we break the page up. The main build of the page is cached; then when the page loads we use XHR and Ajax to load in the personalised components. Varnish gives us the ability to control the caching at a low-level like this. Every time we generate a page or a fragment, we can tell Varnish how long we want to cache it for. The main bulk of the homepage doesn't need caching for long to get some benefit, but your favourites we can cache for longer (although still only for a few minutes), and we know when you add a new favourite so we can clear out the cache and replace it with the new content. This means as you browse the site, the page loads quicker and your experience is smoother.

We use loads of servers

After we've optimised all we can using a single server, we then scale horizontally using multiple servers joined together in a pool. None of our web servers store any state about who you are and what you're doing, so your request can go to any server at any time.

We also serve pages out of two locations (or data centres). This gives us a higher degree of resilience to failures; we can lose an entire data centre and still be able serve the site.

We load tested the site before we launched

We're able to track how the site is used, so this gives us the ability to produce detailed volumetrics of how we think the new site is going to be used. Some of it is estimation, but it's always backed up with data. We can then produce detailed load tests, so we can simulate usage of the site. This enables us to find and resolve any problems we may experience under load, before we go live.

The end result

We're not 100% there yet (this is a beta after all) but from this sample 24 hours of monitoring data you can see that, apart from a couple of spikes, we're doing well at keeping to our target of 2.5 seconds. (We were also able to track down the spikes to some misbehaving components on the platform).

page-times_595.png

We're currently working hard behind the scenes at making sure we can continue to serve at this speed as usage increases, spreading the load across our infrastructure.

At the end of this though, we hope the result of our efforts is that you won't notice a thing: it'll just work.

Simon Frost is Technical Architect for Â鶹Éç iPlayer .

Comments

  • Comment number 1.

    I still feel you're missing the wood from the trees here. In the current (non-beta) iPlayer, it takes (typically) 4 clicks/pageloads to get to a Radio 4 Afternoon Play. Now you've taken the insane step of getting rid of per-channel programme lists in the beta, it takes 7 clicks/pageloads to get to the same programme.

    Mad. Completely mad.

    Russ

  • Comment number 2.

    Russ,
    If you know what you want to listen to, have you tried the search box in the top right of the page. The auto suggest is much improved allowing you to find specific programmes more easily.
    Also if afternoon play is one of your favourites, try the new favourites functionality. This way all episodes available on iplayer will be appear in your favourites carousel

    D

  • Comment number 3.

    Yes, fair points, D, but:

    - Using the search involves typing, which is unergonomic when the primary interface mechanism is predicated on mouse-clicking. (I'm not criticising the search function per se, but it is a sort of 'last resort' in interface design terms.) Searching also uses more Â鶹Éç server resources.

    - I do use favourites, and like it a lot, but it operates only when logged in of course. (I usually log in only to post in messageboards/blogs.) How many users are logged in when going to iPlayer - a few percent at the most? I recognise the Â鶹Éç would like us to be logged in all the time, and user behaviour in this respect will change over time, but at the moment, I would argue the features dependent on being logged in will appeal only to a minority 'enthusiasts' set. The mistake in design strategy in my view is requiring users to be logged in to access basic functionality.

    - The 'For you' feature on the beta console works only sporadically. (Very strange.) And when it does work, and one goes 'off genre', it will lock into that other genre, with no way back. The 'For you' recommendations can be very bizarre, and often bear no relation to the recommendations on the non-console pages. Some of the backend databases are either not talking with each other, or are just spewing out random suggestions.

    If not wanting to type in things into the search box, it's now quicker to avoid iPlayer beta pages completely, and go via non-iPlayer channel pages. For example, 5 or 6 clicks (depending on route taken, and these will become 4 or 5 clicks when the homepage cookies get sorted out properly) from the Â鶹Éç homepage to an Afternoon Play console. iPlayer beta's 7 clicks is a suicide note. It has just made itself redundant.

    Admittedly, things can be quicker via a logged-in favourites route, but I don't always 'know' what I want to listen to, and have fairly catholic tastes across Radios 3, 4 and 7. This is where the axeing of per-channel programme lists in iPlayer beta is so inexplicable and perplexing. I note no one from the Â鶹Éç has even mentioned this, let alone attempted to defend the rationale.

    My basic point remains. The above blog explains at length how iPlayer beta has been made more efficient from the Â鶹Éç's point of view. I would still disagree strongly with that premise. It is also demonstrably more inefficient from this user's point of view.

    Russ

    P.S. On latency aspects, the 2.5s target is interesting. I'm looking at my beta, and the list of new items in my favourites list still hasn't updated on a refreshed page 4 hours (and counting) after I listened to them.

  • Comment number 4.

    Russ - I think you're off topic. This post is about scaling not navigation. Probably best to comment here.

  • Comment number 5.

    I take your point to an extent, Nick, but in my view, navigational aspects are intimately related to architecture, and thus to pageloads, server demands, caching, personalisation, etc, which is what this scaling blog is all about. I'm not sure we can really separate these aspects.

    Russ

  • Comment number 6.

    Russ, you have some valid points about the number of clicks it takes to get to your intended page or program - but that is a usability issue in the site menu design - which is a separate matter from the topic of this post which is about optimising the server performance and providing scalability for huge masses of visitors.

    Although a higher number of clicks does relate to more server load - but that is only marginal and all the things described here are more to do with the stuff that goes on 'under the hood' and behind the scenes to keep the Â鶹Éç site running smoothly and staying responsive.

    The Â鶹Éç server management team should be speaking to companies like Facebook and Twitter to see how they have scaled their site to meet an ever growing number of visitors. I recently read that Facebook had customized Memcached to meet its own needs and has provided the fruit of that labour by making it open source, which the Â鶹Éç should consider taking advantage of if it serves an appropriate need within the server network.

  • Comment number 7.

    Great post about the architecture. I'm pleased the iPlayer is running on open source PHP rather than some ridiculously expensive oracle or Microsoft system :D

  • Comment number 8.

    Russ is on the wrong board here, but he makes excellent points.

  • Comment number 9.

    I think this is a shocking waste of UK tax payers money.

    THe Â鶹Éç should be focusing on core Content and Programming -- not trying to be a technology infrastructure provider.

    Why are you wasting time learning on content infrastructure delivery. Why not leave that to the likes of Apple, Amazon and Google.

  • Comment number 10.

    This is a great article, thanks for sharing! We are a web development company in the UK and we highly recommend Zend Framework and PHP if it suits their requirements. There is a lot of negative press about PHP and I think you have made the right decision choosing PHP on its merits and not looking at how fashionable it is.

  • Comment number 11.

    Oh I forgot to ask, what do you use for the db layer? MySql, Postgres, NoSQL (cassandra, etc) or a combination of the two types?

  • Comment number 12.

    This is a great, informative article. Thanks for posting it.

    I agree with the comment that one should not lose sight of the big picture -- how many mouse clicks and page views it takes for users to accomplish the task.

  • Comment number 13.

    What proportion of users actually want or use the social media features?
    I don't use them. It must cost a fortune catering for the minority who do.

    "We use loads of servers":
    Offer a simple choice up front to switch between a lightweight interface (default) and the fancy social media interface and remember that in a cookie.
    The user can choose between a lightning fast response and a 2 second response -just like iGoogle.
    Then the fact a good proportion are being served cheaper pages saves a large amount of server resource.

    In the current economic climate shouldn't that be a priority?

    Also I guess that while PHP is fine, Zend is too much of an overhead, even with lots of caching.

    I agree with commenter #9, a lot of these are solved problems and you could benefit by partnering with Amazon, Facebook and Google to make selective use of their technologies, e.g. AWS spot instances to manage your peak load, Amazon Dynamo / S3 rather than trying to reinvent reliable Key-Value storage, Akamai ESI for scalable edge server caching.

  • Comment number 14.

    How does your cache-invalidation work, exactly?

  • Comment number 15.

    I'd have a couple of comments however. Things like swfobject.js have no expiration date, nor favicon. Normally not a problem but surely they stack up big time for transfer on the site? Likewise a lot of the css and js has only a day expiration, surely you dont change the site css every day?

    Also the CSS looks a bit bloated, according to the stats I ran

    48.7% of CSS (estimated 64.8kB of 133.1kB) is not used by the current home page.

    7kB of 23.7kB is not used
    775 bytes of 1.5kB is not used
    /iplayer/r23863/style/style.css: 42.6kB of 70.1kB is not used
    14.4kB of 37.8kB is not used

    Also has 14 very inefficient rules, 134 inefficient rules, so they would be worth fixing

    That spread over so many users would certainly stack up to a lot of bandwidth

  • Comment number 16.

    Oh, and minify your HTML to save about 10% of transfer.

    While I'm looking the following external CSS files were included after an external JavaScript file in /iplayer To ensure CSS files are downloaded in parallel, always include external CSS before external JavaScript.

    So , & /iplayer/r23863/style/style.css should come before external JS

    finally and not least

    The following resources have identical contents, but are served from different URLs. Serve these resources from a consistent URL to save 1 request and 9.2KB, per user !

    *
    *

    The following resources have identical contents, but are also served from different URLs. Serve these resources from a consistent URL to save another request and 2.8KiB.

    *
    *

    in fact looking at some of the other images, there is another 16K to be saved for every page load by optimising some of the images that seem to have been added later.

  • Comment number 17.

    I just wish you would introduce buffering... Even 30seconds of buffering would greatly improve my experiences when using Iplayer... its incredibly annoying to have shows stutter with connection issues, especially when I consider how easy it should be to make this a none issue....

  • Comment number 18.

    What made you decide on PHP/Zend rather than Ruby/Rails or Python/Django for example?

  • Comment number 19.

    As a Web Professional, I was astounded to learn that you are currently still running Perl with SSIs - I stopped doing that 15 years ago! And you are switching to php! php, as another purely interpreted scripting language doesn't scale particularly well, and is definitely old-tech for large websites.
    I would have chosen Java Servlets for their speed, elegance, supreme scalability, and resilience (fail-over session handling, for example).
    Still, best of luck!

  • Comment number 20.


    Is this why there is suddenly a frame rate issue with video play back on iPlayer ? It's been happening for a month. Chrome 6.4 Mac OS 10.6.4 - headache inducing flickering hell. One star.

  • Comment number 21.

    UchihaJax wrote:

    "Oh I forgot to ask, what do you use for the db layer? MySql, Postgres, NoSQL (cassandra, etc) or a combination of the two types?"

    We use a combination. MySQL for wher we need it such as programme metadata, and CouchDB for our KV where we need the fast read/write for more user-focused data.

    peterdragon wrote:

    "Offer a simple choice up front to switch between a lightweight interface (default) and the fancy social media interface and remember that in a cookie.
    The user can choose between a lightning fast response and a 2 second response -just like iGoogle.
    Then the fact a good proportion are being served cheaper pages saves a large amount of server resource."

    An interesting idea, but most of our content is served out of cache anyway and much of the overhead in the request is network latency. You'd also probably need a large number of people to make that change to see any kind of benefit, and in our experience we don't see users doing that kind of activity.

    "Also I guess that while PHP is fine, Zend is too much of an overhead, even with lots of caching."

    Zend gives us the ability to reuse components amongst teams and it's easy to recruit people with the skills we need, both important considerations for us.

    "I agree with commenter #9, a lot of these are solved problems and you could benefit by partnering with Amazon, Facebook and Google to make selective use of their technologies, e.g. AWS spot instances to manage your peak load, Amazon Dynamo / S3 rather than trying to reinvent reliable Key-Value storage, Akamai ESI for scalable edge server caching."

    We're not in this to solve engineering problems that have already been solved.

    Pretty much everything we build is on top of open-source components; where input is provided by all kinds of organisations and companies. As I mentioned above, our KV store is CouchDB from the Apache Organisation.

    As for ESI, it's something we've looked at; Varnish (our caching server) can make use of it. It's just not right for us at the moment.

    Mike K wrote:

    "How does your cache-invalidation work, exactly?"

    Much of the invalidation is time or action-based (e.g. someone adds a new favourite). We don't need to cache for very long to see a massive improvement in performance and saving of resource.

    @Ajax Jones: Thanks, I'll pass this onto the team to take a look.

  • Comment number 22.

    cordas wrote:

    "I just wish you would introduce buffering... Even 30seconds of buffering would greatly improve my experiences when using Iplayer... its incredibly annoying to have shows stutter with connection issues, especially when I consider how easy it should be to make this a none issue...."

    Thanks for your suggestion - actually we do buffer. When you see the buffering symbol it means that your buffer has been depleted (by the rate of frames played exceed the rate at which the connection can replenish them).

    Our media playback team work hard to tune this to give the best possible experience, but at the end of the day this is limited by the performance of your connection.

  • Comment number 23.

    Hi Simon,

    I think @22 is referring to progressive buffering a la Youtube, so someone on a slow connection could let the thing progressively load while paused, then hit play... without the extra load of AIR etc..

    Also, as someone who uses Ubuntu on all his kit, I would like to make a minor observation. You state that the iPlayer infrastructure is built on a lot of open source components - but comments have been made by the open-source and Linux community that the Â鶹Éç can appear at times to be leeching off the community and not giving anything back.

    It is probably the case that that is not your intention... but it has been noticed that while taking advantage of GNU licences, the Beeb have only ever released stuff under traditional closed-source licences... frustrating developers who only want to help.

    If you have a dual-core Intel CPU, you;re in luck - but what about SPARC kit running Linux? Adobe don't have a version of Flash for them...

    EG - you may have written a Froyo end-user client... but what about older Android phones? You have a Symbian client available for the Nokia N97, but what about the N900 (which is faster, btw). Runs Maemo - and the hardware is very similar. So - maybe look into porting the N97 client?

    And with slapping C&D's on anyone who wants to help you extend the reach through developing end-user clients for other architectures (I can understand how the Beeb would get annoyed about different server infrastructure - but end-user client apps designed to talk to the Beeb's kit?!), it does mean that the Â鶹Éç are seen as leeches.

    Maybe I could suggest a way you can get the community back on-side? How about offering an API into the iPlayer infrastructure so that 3rd party end-user clients could be built . Or maybe look at plugins to native phone media players? Or at least offer something under the LGPL to enable the full-spec iPlayer feeds to be played out on older phones at whatever resolution they can handle?

    Most people are *not* interested in ripping iPlayer streams - we simply want to *play* them on phones which we may be stuck with for up to 2 years. But Linux etc runs on a wide range of architectures - why not leverage all that help... people writing software in their spare time? These people writing playback clients for your streaming services could be likened to folks who, back in the '20s, made crystal sets and used the Beeb's 2LO and 4HY transmissions as feeds to test against.

    It sticks in the craw a bit... but between us, we could go that extra step.

    Thoughts?

  • Comment number 24.

    Buffering.

    "Our media playback team work hard to tune this to give the best possible experience, but at the end of the day this is limited by the performance of your connection."

    Hmm, I have 20MB down 6MB up corporate link, 18.30 hours, one user and only since the new iPlayer does HD video stutter, you may want to reconsider the above statement.

  • Comment number 25.

    I am tired of the excuses. "The poor workman blames his tools."

    Whenever I experience terrible playback performance on the Â鶹Éç iPlayer, I find I can switch to YouTube and enjoy comfortable viewing and listening. Why is that? Same ISP. Same time of day. Oh, yeah, they use proper buffering and they evidently care about the user experience.

  • Comment number 26.

    What Alex says above in relation to progressive buffering makes sense.

    Simon your statement just contradicted the statement made by your colleague to Karin earlier in this linked post relation to not doing Buffering at all - which is it or does the left hand of the development team not know what its right is doing?

    I have a 50Mb corporate line, accessing iPlayer, still getting stutter, still can't get the download to work with this new iPlayer version.

    I have some subtle suggestions as to what the right hand of the Â鶹Éç iPlayer Development team might be doing, and it might do well to stop and try and refocus on architecting something that works properly !

    Sorry its harsh, but its utterly fair. You should rename it the "Dell boy player", it performs like it fell of the back of a lorry in Peckham market.

Ìý

More from this blog...

Â鶹Éç iD

Â鶹Éç navigation

Â鶹Éç © 2014 The Â鶹Éç is not responsible for the content of external sites. Read more.

This page is best viewed in an up-to-date web browser with style sheets (CSS) enabled. While you will be able to view the content of this page in your current browser, you will not be able to get the full visual experience. Please consider upgrading your browser software or enabling style sheets (CSS) if you are able to do so.