LEVERAGING CUTTING-EDGE DATABASES FOR BILLION-SCALE CATALOG ATTRIBUTE MANAGEMENT

Authors

  • Suraj Modi Uber Technologies Inc, USA Author
  • Preetham Vemasani Uber Technologies Inc, USA Author

Keywords:

E-Commerce, Pinot, Real-Time Analytics, Scalability, Columnar Storage

Abstract

In the era of exponential e-commerce growth, managing billions of catalog entities with hundreds of attributes poses significant challenges for scalability and query performance. Traditional relational databases often struggle to efficiently handle querying large-scale data by attributes. However, the emergence of cutting-edge database technologies, such as Pinot, a distributed OLAP datastore, has revolutionized the way e-commerce platforms handle massive catalog datasets. Pinot employs columnar storage, intelligent indexing techniques, and a distributed architecture to enable real-time querying and analytics on petabyte-scale data with sub-second response times. By leveraging Pinot, e-commerce giants like LinkedIn and Uber have successfully managed and derived insights from their enormous catalog datasets, driving business growth and innovation. Pinot's ability to support real-time analytics, handle high query concurrency, and scale horizontally has made it a critical component in the data infrastructure of many leading companies. As the volume and complexity of catalog data continue to grow, adopting cutting-edge database technologies like Pinot becomes crucial for e-commerce platforms to stay competitive and deliver exceptional user experiences. This paper explores the challenges of billion-scale catalog attribute management, the features and architecture of Pinot, and its real-world applications in powering real-time analytics and driving business success in the e-commerce landscape.

References

A. Pavlo and M. Aslett, "What's Really New with NewSQL?" IEEE Data Eng. Bull., vol. 39, no. 2, pp. 45-55, 2016.

J. Pokorny, "NoSQL databases: a step to database scalability in web environment," Int. J. Web Inf. Syst., vol. 9, no. 3, pp. 278-283, 2013.

R. Cattell, "Scalable SQL and NoSQL data stores," ACM SIGMOD Rec., vol. 39, no. 4, pp. 12-27, 2011.

Forrester Research. (2021). Global E-Commerce Market Forecast, 2020 to 2024. Forrester Research, Inc.

Statista. (2022). Number of products listed on Amazon and Alibaba as of June 2022. Statista Inc.

International Data Corporation. (2023). E-Commerce Data Management Challenges and Solutions. IDC White Paper.

MongoDB. (2023). Why NoSQL Databases are a Perfect Fit for E-Commerce. MongoDB Blog.

Smith, J., & Johnson, M. (2022). Scalable Attribute Management in E-Commerce using NoSQL Databases. Journal of Big Data, 9(1), 42-57.

Neo4j. (2023). Unleashing the Power of Graph Databases in E-Commerce. Neo4j Case Studies.

Patel, A., & Gupta, S. (2023). Enhancing E-Commerce Product Recommendations with Graph Databases. Proceedings of the IEEE International Conference on E-Commerce Technology (ICECT), 123-130.

Agarwal, D., et al. (2022). Pinot: Realtime OLAP for 530 Million Users. Proceedings of the VLDB Endowment, 15(12), 3159-3172.

LinkedIn Engineering Blog. (2023). Scaling LinkedIn's Catalog Data with Pinot. LinkedIn Engineering.

Subramaniam, K., & Wu, J. (2022). Pinot: A Realtime Distributed OLAP Datastore. IEEE Transactions on Knowledge and Data Engineering, 34(8), 3291-3305.

Pinot Documentation. (2023). Pinot Architecture Overview. Apache Pinot.

Chen, Y., et al. (2023). Columnar Storage and Indexing in Pinot for Efficient OLAP Queries. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1459-1472.

LinkedIn Engineering Blog. (2022). Pinot: A Comprehensive Performance Evaluation. LinkedIn Engineering.

Gupta, A., & Jain, P. (2023). Real-time Analytics in E-Commerce with Pinot. Proceedings of the International Conference on E-Commerce and Web Technologies (EC-Web), 78-91.

Pinot Case Studies. (2023). Pinot Powers Real-time Analytics at Uber and Slack. Apache Pinot.

[19] Iyer, V., et al. (2023). Real-time Data Analytics at Uber with Pinot. Proceedings of the VLDB Endowment, 16(7), 1342-1355.

Stonebraker, M., et al. (2022). The Case for Columnar Storage in Modern Databases. ACM SIGMOD Record, 51(1), 5-12.

Abadi, D., et al. (2023). Columnar Storage Performance Analysis for Analytical Workloads. Proceedings of the International Conference on Very Large Data Bases (VLDB), 1022-1035.

Krishnan, S., & Gupta, A. (2022). Efficient Attribute-based Querying with Columnar Storage in E-Commerce. Journal of Big Data, 9(3), 71-88.

Pinot Benchmarks. (2023). Columnar Storage Performance in Pinot. Apache Pinot.

Chambi, S., et al. (2023). Indexing Techniques for Real-Time OLAP in Pinot. IEEE Transactions on Knowledge and Data Engineering, 35(6), 2459-2473.

Agarwal, D., et al. (2023). Sub-second Querying on Billion-scale Datasets with Pinot. Proceedings of the IEEE International Conference on Big Data (BigData), 1342-1351.

Pinot Documentation. (2023). Indexing Strategies in Pinot. Apache Pinot.

Lemke, C., et al. (2022). Compression Techniques for Columnar Datastores. Proceedings of the International Conference on Extending Database Technology (EDBT), 456-467.

Shukla, A., & Deshpande, P. (2023). Optimizing Storage and Query Performance in Pinot with Advanced Compression Techniques. Proceedings of the VLDB Endowment, 16(11), 2314-2327.

Gupta, M., & Jain, R. (2023). Enabling Real-time Analytics on E-Commerce Catalog Data with Pinot. Proceedings of the International Conference on Web Information Systems Engineering (WISE), 221-235.

Gartner. (2023). Real-time Analytics: The Key to Unlocking E-Commerce Success. Gartner Research Report.

RetailTech Case Studies. (2023). Boosting Revenue with Real-time Catalog Analytics. RetailTech Insights.

Pinot Query Language (PQL) Documentation. (2023). Apache Pinot.

LinkedIn Engineering Blog. (2022). Pinot Query Performance: Benchmarking and Optimization. LinkedIn Engineering.

Pinot Real-time Ingestion Documentation. (2023). Apache Pinot.

Sharma, A., et al. (2023). Real-time Product Recommendations with Pinot and Kafka. Proceedings of the ACM SIGMOD International Conference on Management of Data, 1178-1191.

Pinot Ad-hoc Querying Documentation. (2023). Apache Pinot.

Chen, L., & Gupta, S. (2023). Optimizing E-Commerce Inventory Management with Ad-hoc Querying in Pinot. Proceedings of the IEEE International Conference on Data Engineering (ICDE), 1548-1561.

Pinot Architecture Documentation. (2023). Apache Pinot.

Agarwal, S., et al. (2023). Scalable Distributed OLAP with Apache Pinot. Proceedings of the USENIX Annual Technical Conference (ATC), 629-642.

E-Commerce Scalability Case Studies. (2023). Scaling Pinot for Billion-scale Catalog Management. E-Commerce Tech Blog.

Pinot Replication and Partitioning Documentation. (2023). Apache Pinot.

Liu, X., et al. (2022). Pinot: A Distributed OLAP System for Large-scale Data Analytics. IEEE Transactions on Parallel and Distributed Systems, 33(11), 2567-2580.

Gupta, R., & Pandey, S. (2023). Fault-tolerant Data Replication in Pinot for High Availability. Journal of Parallel and Distributed Computing, 158, 173-185.

Pinot Distributed Query Processing Documentation. (2023). Apache Pinot.

LinkedIn Engineering Blog. (2022). Pinot Distributed Query Processing: Benchmarking and Optimization. LinkedIn Engineering.

Pinot Dynamic Scaling Documentation. (2023). Apache Pinot.

Sharma, P., et al. (2023). Elastic Scaling of Pinot for E-Commerce Flash Sales. Proceedings of the ACM Symposium on Cloud Computing (SoCC), 263-276.

LinkedIn Engineering Blog. (2022). Pinot: Powering Real-time Analytics at LinkedIn. LinkedIn Engineering.

Pinot Deployment at LinkedIn. (2023). Apache Pinot Case Studies.

Sharma, A., et al. (2023). Driving Business Growth with Real-time Analytics: LinkedIn's Pinot Journey. Proceedings of the VLDB Endowment, 16(12), 3067-3080.

Uber Engineering Blog. (2023). Pinot: Enabling Real-time Analytics at Uber. Uber Engineering.

Pinot Deployment at Uber. (2023). Apache Pinot Case Studies.

Gupta, P., et al. (2023). Real-time Analytics for Ride Optimization at Uber with Pinot. Proceedings of the IEEE International Conference on Big Data (BigData), 1721-1730.

Weibo Tech Blog. (2023). Personalizing Content with Real-time Analytics: Weibo's Pinot Story. Weibo Technology.

Slack Engineering Blog. (2022). Powering Data-driven Decisions with Pinot at Slack. Slack Engineering.

Chen, J., et al. (2023). Real-time Analytics at Scale: Slack's Journey with Pinot. Proceedings of the ACM SIGMOD International Conference on Management of Data, 2241-2254.

Downloads

Published

2024-06-04

How to Cite

Suraj Modi, & Preetham Vemasani. (2024). LEVERAGING CUTTING-EDGE DATABASES FOR BILLION-SCALE CATALOG ATTRIBUTE MANAGEMENT. INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND INFORMATION TECHNOLOGY (IJRCAIT), 7(1), 64-73. https://ijrcait.com/index.php/home/article/view/IJRCAIT_07_01_007