You'll get access to over 3,000 product manager interview questions and answers
Recommended by over 100k members
Assumptions
- We will calculate this number worldwide, for videos uploaded on all devices
- MAU for YouTube is 2.291B, we will assume a DAU to be 50% of MAU which is 1B
- Further, we assume that 1 in 1000 of DAU will upload a video of 10 mins every day.
- Users can upload videos in any resolution and YouTube stores videos in all resolutions of 360p, 480p, 720p, 1080p
- YouTube keeps 3 copies of video for redundancy
- We will calculate the yearly storage requirements for YouTube based on the above assumptions
- YouTube viewership has been growing at roughly 40% CAGR for the last 14 years, so we will calculate the yearly requirement for YouTube storage (latest 2020) and then use that to calculate the total amount of storage for last 14 years
Formula
1. We first calculate the amount of space required in 2020 for all videos uploaded in that year
A. Length of video per minute - By (3) above, we have 1B/1000 users upload a video of 10 min every day, which is 1M users.
Length of video per day = 1,000,000 * 10 min = 10,000,000 mins = 10,000,000/60 hours per day
= 166,666 hours of video per day
2. Amount of space required for 720p - I know from experience that we require 100 MB of space for 1 hour of 720p video. But when uploaded to YouTube, let's assume that they reduce it to 15% of original size, which is 15MB
3. Space per hour of video - From 4, here are the assumed sizes for different resolution videos per hour
1080p - 30MB
720p- 15MB
360p - 7MB
240p - 3MB
144p - 1.5MB
Adding up we find, YouTube requires roughly 60MB of space for 1 hour video
4. Space required per day for videos uploaded
Based on 1 and 3
We have space required per day = number of hours of video per day * space required per hour of video
= 166,666 * 60MB = 10 M MB= 10^13 = 10 TB
4a. Space required per year for single copy = (4) * 365 = 10 TB * 365 = 3650 TB = 3.65PB
Space required per year for single copy = 3.65PB
Total Space for 2020 = 3.65*3 ~11PB
Space required per year for 3 copies = 11 PB
5. Based on 7, YT has CAGR of 40%, so we can roughly multiple number in (4) by 3 to get the cumulative aggregate space for 14 years
Cumulative aggregate space for all videos from 2006 is 11PB * 3 ~ 33PB of storage
6. Cost of storing new videos per year based on the latest data.
We know that $/TB harddisk capacity is roughly $20/TB. Based on this, YT needs
11 * 1000 * 20 = $220,000 in 2020
Cost of storing new content in 2020 is around $200k
Clarifications:
1. Includes ad videos? Lets say No
2. How much historic data is chosen – you decide
3. Cache / CDN included – Yes
Approach:
1. Bottom-up estimation of content uploaded by creators based on Google announcement benchmarks
2. Average out the content in the last 10 years based on best known stats
3. Look at data replication strategies to meet low latency/HA and assign content duplication buffers
Assumptions:
1. 500 Hours video uploaded every minute to YT à 500 x 60 x 24 = 720K = 0.72 Million hours of video upload per day (Ggl I/O announcement 2019)
2. 40 ZB of data was created globally in 2019 (statista report)
3. Global digital data creation grew at 20 CAGR from 2011 and 2021 (statista report)
Working:
1. Determining the size (per unit of time) of a single uploaded video
a. Resolution = 480p @ 9:16 aspect ratio = (480 x 480 x (16/9) = 400K pixels
b. Color-depth = 16 bits / pixel
c. Size of one frame = 400K x 16 = 6.4 Mbits
d. Frame-rate = 30 frames/second
e. Per second raw video = 6.4 x 30 ~ 200 Mbps
f. Compression with standard video encoding methods = 1:160
g. Total bit-stream rate = 200 Mbps / 160 = 1250 Kbps (Its ok to assume this figure directly during interview as these are kind of standard numbers)
h. This is same as 1250 x 1000 x 3600 / 8 = 0.5 GB / Hour
2. In similar fashion:
a. 720p = 1.5 GB/Hr
b. 1080p (Full HD) = 2.5GB/Hr
c. 4k = 12GB / Hr
3. As of stat published by Google in 2019, 500 Hours of video is uploaded every minute
a. This means 500 x 60 = 30K Hours of video uploaded / Hour
b. Which is same as 720K Hours of videos uploaded every day
4. Lets make some assumption for the split of this content across resolutions
Resolution | % of content (based on observation) | Size / hour of video (see 1 and 2 above) | Apportioned Uploaded data / day |
4K | 1% | 12 GB / Hr | 720 K * 0.01 * 12 = 86 TB |
Full-HD (1080p) | 19% | 2.5 GB/Hr | 720 K * 0.19 * 2.5 = 340 TB |
720p (Standard) | 50% | 1.5 GB / Hr | 720 K * 0.5 * 1.5 = 540 TB |
480p | 30% | 0.5 GB / Hr | 720 K * 0.3 * 0.5 = 108 TB |
Total | 1074 TB ~ 1 PB |
1. 1 PB of data is uploaded per day in 2019
2. This means ~350 PB of data in 2019
3. Quick Sanity Check at this point
a. 40 ZB of Global digital data was created in 2019
b. So total YT content created in 2019 is 40000:0.3 ~ 120K:1 fraction of total digital content created globally which seems to be in right order of magnitude looking at various factors like YT creator % etc
4. Between 2012 and 2021, global digital data grew at 20% CAGR
5. Lets assume the same pattern for YT data that’s uploaded. Further let’s assume that YT maintains data for 10 years. With that info we can deduct the data for the 10-year period of 2012 – 2021 with the formula:
End Value = Begin Value (1 + CAGR/100)^n
(here first derive begin value for 2012 by setting end value for 2019 as 350 and n = 7 and then use this value to derive all other year number.
Year | Data Uploaded (PB) |
2012 | 98 |
2013 | 117 |
2014 | 141 |
2015 | 169 |
2016 | 203 |
2017 | 243 |
2018 | 292 |
2019 | 350 |
2020 | 420 |
2021 | 504 |
Total | ~2500 |
This means for last 10 years 2.5 EB of data is uploaded to YT which is stored.
1. Now let us try to look deeper into how YT will store this data as it will need to ensure distributed and many a times redundant storage to ensure low latency and high speed streaming.
2. For that lets look at a high level abstraction of a possible layout of YT content storage.
Important is to note that the video uploaded by creator community is stored in a blob store like the GCP Cloud Storage. But it is then processed – split into chunks for streaming, pre-processed for multiple device, multiple resolution, multiple bandwidth options, compressed and then partially / on-demand replicated to a globally distributed CDN so that its optimized for viewing across a huge a global consumer community of 30 M DAUs
1. We assume that this compression + global replication (not full as there will be cache misses) will lead to at least 3X more data than whats originally uploaded
2. This means total storage = Blob Storage + PoP distributed CDN cache (Ggl + 3rd party)
3. This will sum upto 4 x 2500 PB = 10 EB data.
4. This will be the total storage requirement of YT
Top Google interview questions
- What is your favorite product? Why?89 answers | 263k views
- How would you design a bicycle renting app for tourists?62 answers | 82.5k views
- Build a product to buy and sell antiques.54 answers | 66.8k views
- See Google PM Interview Questions
Top Estimation interview questions
- Calculate the number of queries answered by Google per second.45 answers | 78.5k views
- How do you decide how many cash registers you need for a new Walmart store?33 answers | 39.4k views
- How many dentists are there in New York?33 answers | 36.7k views
- See Estimation PM Interview Questions
Top Google interview questions
- How would you improve Google Maps?53 answers | 228k views
- A metric for a video streaming service dropped by 80%. What do you do?50 answers | 135k views
- How would you design a web search engine for children below 14 years old?36 answers | 42.9k views
- See Google PM Interview Questions
Top Estimation interview questions
- Estimate the number of Uber rides.30 answers | 51.3k views
- Estimate the number of bicycles required to start a bike sharing operation in a big city.18 answers | 15.8k views
- Estimate the number of WhatsApp chats occuring in India.14 answers | 13.3k views
- See Estimation PM Interview Questions