What It Was Like to Scale LinkedIn’s Data Pipeline
It was pretty kafkaesque at times, but overall, very fun and satisfying.
PHOTO CREDIT: Getty Images
What was it like to build and scale LinkedIn's data pipeline? originally appeared on Quora: the place to gain and share knowledge, empowering people to learn from others and better understand the world.
Scaling LinkedIn's data pipeline was pretty kafkaesque at times, but overall, very fun and satisfying. I joined LinkedIn about seven years ago at a particularly interesting time. We were just beginning to run up against the limits of our monolithic, centralized database and needed to start the transition to a portfolio of specialized distributed systems. To accommodate our growing membership and increasing site complexity, we had already migrated from a monolithic application infrastructure to one based on microservices. This change allowed our search, profile, communications and other platforms to scale more efficiently. It also led to the creation of a second set of mid-tier services to provide API access to data models and back-end services to provide consistent access to our databases.
We initially developed several different custom data pipelines for our various streaming and queuing data. The use cases for these platforms ranged from tracking site events like page views to gathering aggregated logs from other services. Other pipelines provided queuing functionality for our InMail messaging system, etc. These needed to scale along with the site. Rather than maintaining and scaling each pipeline individually, we invested in the development of a single, distributed pub-sub platform. Thus, Kafka was born.
Seven years down the line, today, Apache Kafka serves as the central nervous system of LinkedIn and thousands of companies worldwide. Quite a ride so far!
This question originally appeared on Quora - the place to gain and share knowledge, empowering people to learn from others and better understand the world. You can follow Quora on Twitter, Facebook, and Google+. More questions:
- Apache Kafka: How does Kafka adoption bring change in large-scale distributed systems?
- Technology Companies: How can a woman in tech assert herself and her knowledge/contributions without being labeled negatively by male co-workers?
- Distributed Systems: What would be the only advice that you would give to someone designing a scalable distributed system?