Since joining Buffer last September, it’s been so amazing to see how much we’ve grown in just under a year. This is the first time any of us on the team has built something that’s achieved this level of scale and in doing so, we’ve learned so much. I want to share a general overview of Buffer’s scale, and the technology stack that we’ve built to fulfill it.
Some Quick Stats about Buffer’s Scale
- 2 backend engineers, Colin and Sunil work on web/API (Really 1 and a half, because now my day-to-day is now primarily focused on hiring and Android)
- 3 million daily API requests (33 requests per second)
- 9 million daily ‘Buffer buttons‘ served (100 requests per second)
- 2.6 million daily application web requests, (ie serving the Buffer popup and bufferapp.com pages)
- 30k Daily active users and 147k Monthly active users
- 5500 API clients
- 1.6 Million social media updates posted per week
The front-end of Buffer is built with the Backbone.js MVC framework and our backend is written in Codeigniter (PHP) and Django (Python). Our servers are all fairly standard Linux with Apache webservers. Everything is hosted on Amazon Web Services. With less than 2 full time backend engineers, it makes the most sense to not re-invent anything that’s already been done and instead, focus on developing roadmap features. AWS allows us to do everything we’ve wanted without spending too much time focusing on ops or hiring an infrastructure engineer. We use Elastic Beanstalk (with EC2 instance and Elastic Load Balancers) to easily configure, deploy, and scale all of our services, Route53 for DNS management, ElastiCache for our Memcache configuration, and Simple Queue Service (quite heavily) for all of our event/message handling and processing. All of our static assets, and user uploaded photos are stored on S3 with Cloudfront as our CDN.
Buffer Buttons (Widgets)
We serve our ‘Buffer buttons’ (the button in the floating bar on the left of this post) from 2-4 Apache2 (m1.small) servers that scale up as connections increase. Since the Buttons are served from several third-party websites performance has been extremely important. Button counts are Memcached and refreshed every 15 minutes. Memcache allows us to serve Buttons faster and reduces reads on our database. There’s very little work going on to display these buttons, which allows us to serve them at scale with just a few small servers.
Buffer API and Web servers
The Buffer API is the backbone of our application and houses most of our logic. Our web app, mobile clients and partners are all dependent on it, so we’ve built the API focusing on on both speed and availability. The API handles everything from authentication, posting/adding to Buffer, image uploads and all other user actions. We scale between 6-15 m1.small servers based on average CPU load and number of connections. One way we try to keep response times low is by delaying major logic or high latency segments behind queues (using Amazon SQS) and process events with workers. For example API response times when adding to a user’s Buffer were slowed down because we were making an HTTP request to Pusher. We’re were able to reduce response times after external request behind a message-queue system and processing these requests with workers.
As mentioned, we use Amazon’s Simple Queue Service heavily to manage and process messages. We use this to handle processing the sending of posts to FB, Twitter etc, receiving analytics from services, processing links and internal metrics, and sending push notifications and emails. With SQS, we’re able to do passive retries if a failure in processing occurs. Our workers are run by 10-15 m1.small servers and each worker is run as a daemon that is managed by supervisord. Our application workers are written in PHP and metrics workers in python.
Since our core technical team is super small, we really don’t have time to fully manage our MongoDB database configuration. This is why we use MongoHQ. Our experience with them has been great. While we are constantly thinking about the optimal set up, and our application’s query schema, it’s been great knowing that they’re our devops/db management team. We often shoot them emails that trigger discussions when we’re thinking of making a configuration change, or setting up a new query. Here’s our configuration we’ve setup with them: The Buffer application is run on a 3 member replica set. They’re on m2.4xlarge EC2 instances, each with 68 gb of RAM and with 2000 Provisioned IOPS. Two members of the set are the usual primary/secondary set up which allows for high availability. We allow reads from this secondary to load balance the queries across the two servers. The last member is a secondary that is held at priority 0 so that it never become primary. We use this server for our internal queries and administration and therefore we don’t want to run production queries on this member. While developing, every new query is run and tested manually and ensured it’s optimized, (often by creating a query index) before using it in production. We also memcache queries which enables us to both reduce load from our DB and provide a faster response.
We’ve built out our own metrics tools to measure the usage of Buffer. Every link clicked, API request, visitor and pre-conversion page visit is tracked. We used to use third-party solutions to track this, but as Buffer has grown, we’ve realized that building our own solution was inevitable. Our custom metrics allows us to store and query raw metrics in ways that are most useful to us. We’ve essentially built our own Google Analytics, and now we have control over the data. We use Python for processing metrics events. They’re stored in a separate MongoDB database managed by MongoHQ. We run a 2-member and 1 arbiter replica set that has 500gb of SSD storage. SSD is extremely important for us here as it allows us to run unindexed queries on our data that we may have never initially planned for, thus allowing us to slice and query our data in various ways at much faster speeds than hard disk. Using MongoDB, we have a lot flexibility in how we structure our data, and can change it at any time. Our internal metrics application was built by Michelle using Django.
Over the next few months I’ll be going in depth describing our stack and provide much more detail into the challenges that we’ve faced and thought process for architecting Buffer. I’d love to hear from you if you have any thoughts at all about our tech, and what you’d like me to detail further in coming blog posts.