From Vendor to In-House: How eBay reimagined its analytics landscape — Part One
For several years eBay relied on a vendor platform for analytics functions until recently when it shifted to an in-house platform. This two-part series explores why eBay decided to move out of a critical platform and how this complex shift was achieved.
By: Ishita Majumdar, Medha Samant, Naveen Dhanpal
How did it begin?
eBay’s journey with a commercial data warehousing platform began in 2003 and ended in 2021. During these years, the platform had steadily grown in size amassing over 20 Petabytes of data including behavioral and transactional data that includes bids, checkouts, listings, users, and accounts. The platform was serving thousands of eBay Analysts, Engineers, Product Managers, Data Scientists, and eBay’s Business and Technology Leaders across the globe. It not only served as the system of record for eBay’s financial reporting, but was the preferred platform for all advanced analytics and business intelligence. Why did eBay decide to move out of this critical platform, and how was this complex shift achieved?
Let’s begin with the why. Key drivers of the decision to move off of vendor software and onto open source solutions were cost, the need to innovate, and increased control over both.
“This has been an invaluable system for years. However, it grew increasingly expensive and posed constraints on eBay’s scope of innovation and expansion. We saw an opportunity to make an important change.” — Ishita Majumdar, VP, eBay Data Analytics Platforms.
Around the same time, eBay’s technology stack was undergoing major transformation. With growing focus on security, data governance, platform reliability and availability, it was imperative for eBay to have full control of its technological innovation.
eBay’s data footprint on the vendor platform grew steadily. Also, it is not atypical for vendors to frequently change their pricing models. For eBay to continue growing on a vendor platform would mean ever-increasing costs; add to that the element of unpredictability and it became apparent that there was an opportunity for significant cost avoidance if eBay moved out of the vendor platform.
Given these key factors, eBay’s Technology Leadership began to explore alternatives, with open-source as the top contender. Though the idea seemed exciting, it was met with a fair amount of skepticism. A new ecosystem would need to be built from the ground up, and it wasn’t going to be easy.
“It is like changing the engine of an airplane full of passengers, mid-air. It is going to be a risk and a challenge. But I believe in my team’s abilities, and I knew it could be done” — Mazen Rawashdeh, eBay Chief Technology Officer.
Six thousand miles away, a small eBay team in Shanghai had quietly been developing an open source data warehousing platform that would go on to dislodge the 17-year-old vendor system, and become eBay’s analytics platform of choice.
What does eBay’s analytics landscape look like?
Before diving into how this monumental shift was accomplished, let’s look at what the landscape comprised. At the foundational level there were two main systems: one supporting large data and batch processing, the other supporting fast interactive querying and analytics. Both systems had thousands of ETL (Extract, Transform, Load) jobs running on a daily basis. These datasets were being consumed by thousands of users at all levels of the organization. eBay teammates in search, marketing, shipping, payments, risk, and several other domains directly consumed and interacted with these datasets every second of the day. Whether a team wanted to execute a simple “select * from” SQL command or build a complex machine learning model, they had to touch the data residing in one of these two systems. The use-cases were seemingly endless, and they all had to move to the new platform without any disruptions.
How did eBay approach this massive undertaking?
The overall objective was broken down into the following goals:
- Build the Hadoop and Spark infrastructure and clusters
- Enable ETL batch processing on Hadoop
- Replicate jobs running on the vendor platform to Hadoop
- Build a dedicated compute cluster for interactive queries
- Build a framework for enabling easy execution of queries
- Migrate users from the vendor platform to the in-house system
These goals were defined when eBay’s overall platform technology was undergoing a large-scale transformation which helped accelerate and strengthen eBay’s execution of the above goals. There were several initiatives that directly contributed to the advancement of the new Hadoop-Spark ecosystem, the key ones being the following:
- Custom ODM (Original Design Manufacturing) Hardware
- Modernized Data Centers with high resiliency Availability Zone architecture
- Elimination of tech debt from legacy to modernized architecture
- Upgrading applications to container-based architecture
- Software-based security framework
The next part of this series will delve into the details of how eBay achieved these goals and ultimately shifted its analytics community from the vendor system.