15% off membership for Easter! Learn more. Close

Estimate total storage capacity for all videos on Youtube.

Asked at Google
67.2k views
Answers (2)
crownAccess expert answers by becoming a member

You'll get access to over 3,000 product manager interview questions and answers

Assumptions

  1. We will calculate this number worldwide, for videos uploaded on all devices
  2. MAU for YouTube is 2.291B, we will assume a DAU to be 50% of MAU which is 1B
  3. Further, we assume that  1 in 1000 of DAU will upload a video of 10 mins every day.
  4. Users can upload videos in any resolution and YouTube stores videos in all resolutions of  360p, 480p, 720p, 1080p
  5. YouTube keeps 3 copies of video for redundancy
  6. We will calculate the yearly storage requirements for YouTube based on the above assumptions
  7. YouTube viewership has been growing at roughly 40% CAGR for the last 14 years, so we will calculate the yearly requirement for YouTube storage (latest 2020) and then use that to calculate the total amount of storage for last 14 years

Formula

1. We first calculate the amount of space required in 2020 for all videos uploaded in that year

A. Length of video per minute - By (3) above, we have 1B/1000 users upload a video of 10 min every day, which is 1M users.

Length of video per day = 1,000,000 * 10 min = 10,000,000 mins = 10,000,000/60 hours per day

= 166,666 hours of video per day 

2. Amount of space required for 720p - I know from experience that we require 100 MB of space for 1 hour of 720p video. But when uploaded to YouTube, let's assume that they reduce it to 15% of original size, which is 15MB

3. Space per hour of video - From 4, here are the assumed sizes for different resolution videos per hour

1080p - 30MB

720p- 15MB

360p - 7MB

240p - 3MB

144p - 1.5MB

Adding up we find, YouTube requires roughly 60MB of space for 1 hour video

4. Space required per day for videos uploaded

Based on 1 and 3

We have space required per day = number of hours of video per day * space required per hour of video 

= 166,666 * 60MB = 10 M MB= 10^13 = 10 TB

4a. Space required per year for single copy = (4) * 365 = 10 TB * 365 = 3650 TB =  3.65PB

Space required per year for single copy =  3.65PB

5. Further YouTube stores say 3 copies of this data, 

Total Space for 2020 = 3.65*3 ~11PB 

Space required per year for 3 copies = 11 PB

5. Based on 7, YT has CAGR of 40%, so we can roughly multiple number in (4) by 3 to get the cumulative aggregate space for 14 years

Cumulative aggregate space for all videos from 2006 is 11PB * 3 ~ 33PB of storage

6. Cost of storing new videos per year based on the latest data.

We know that $/TB harddisk capacity is roughly $20/TB. Based on this, YT needs 

11 * 1000 * 20 = $220,000 in 2020

Cost of storing new content in 2020 is around $200k

Access expert answers by becoming a member
2 likes   |  
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs
badge Bronze PM

 

Clarifications:

1.       Includes ad videos? Lets say No

2.       How much historic data is chosen – you decide

3.       Cache / CDN included – Yes

Approach:

1.       Bottom-up estimation of content uploaded by creators based on Google announcement benchmarks

2.       Average out the content in the last 10 years based on best known stats

3.       Look at data replication strategies to meet low latency/HA and assign content duplication buffers

Assumptions:

 

1.       500 Hours video uploaded every minute to YT à 500 x 60 x 24 = 720K = 0.72 Million hours of video upload per day (Ggl I/O announcement 2019)

2.       40 ZB of data was created globally in 2019 (statista report)

3.       Global digital data creation grew at 20 CAGR from 2011 and 2021 (statista report)

 

Working:

 

1.       Determining the size (per unit of time) of a single uploaded video

a.       Resolution = 480p @ 9:16 aspect ratio = (480 x 480 x (16/9) = 400K pixels

b.       Color-depth = 16 bits / pixel

c.       Size of one frame = 400K x 16 = 6.4 Mbits

d.       Frame-rate = 30 frames/second

e.       Per second raw video = 6.4 x 30 ~ 200 Mbps

f.        Compression with standard video encoding methods = 1:160

g.       Total bit-stream rate = 200 Mbps / 160 = 1250 Kbps (Its ok to assume this figure directly during interview as these are kind of standard numbers)

h.       This is same as 1250 x 1000 x 3600 / 8 = 0.5 GB / Hour

2.       In similar fashion:

a.       720p = 1.5 GB/Hr

b.       1080p (Full HD) = 2.5GB/Hr

c.       4k = 12GB / Hr

3.       As of stat published by Google in 2019, 500 Hours of video is uploaded every minute

a.       This means 500 x 60 = 30K Hours of video uploaded / Hour

b.       Which is same as 720K Hours of videos uploaded every day

4.       Lets make some assumption for the split of this content across resolutions

Resolution

% of content (based on observation)

Size / hour of video

(see 1 and 2 above)

Apportioned Uploaded data / day

4K

1%

12 GB / Hr

720 K * 0.01 * 12 = 86 TB

Full-HD (1080p)

19%

2.5 GB/Hr

720 K * 0.19 * 2.5 = 340 TB

720p (Standard)

50%

1.5 GB / Hr

720 K * 0.5 * 1.5 = 540 TB

480p

30%

0.5 GB / Hr

720 K * 0.3 * 0.5 = 108 TB

Total

1074 TB ~ 1 PB

 

 

1.       1 PB of data is uploaded per day in 2019

2.       This means ~350 PB of data in 2019

3.       Quick Sanity Check at this point

a.       40 ZB of Global digital data was created in 2019

b.       So total YT content created in 2019 is 40000:0.3 ~ 120K:1 fraction of total digital content created globally which seems to be in right order of magnitude looking at various factors like YT creator % etc

4.       Between 2012 and 2021, global digital data grew at 20% CAGR

5.       Lets assume the same pattern for YT data that’s uploaded. Further let’s assume that YT maintains data for 10 years. With that info we can deduct the data for the 10-year period of 2012 – 2021 with the formula:

End Value = Begin Value (1 + CAGR/100)^n

(here first derive begin value for 2012 by setting end value for 2019 as 350 and n = 7 and then use this value to derive all other year number.

Year

Data Uploaded (PB)

2012

98

2013

117

2014

141

2015

169

2016

203

2017

243

2018

292

2019

350

2020

420

2021

504

Total

~2500

 

 

This means for last 10 years 2.5 EB of data is uploaded to YT which is stored.

1.       Now let us try to look deeper into how YT will store this data as it will need to ensure distributed and many a times redundant storage to ensure low latency and high speed streaming.

2.       For that lets look at a high level abstraction of a possible layout of YT content storage.

Important is to note that the video uploaded by creator community is stored in a blob store like the GCP Cloud Storage. But it is then processed – split into chunks for streaming, pre-processed for multiple device, multiple resolution, multiple bandwidth options, compressed and then partially / on-demand replicated to a globally distributed CDN so that its optimized for viewing across a huge a global consumer community of 30 M DAUs

1.       We assume that this compression + global replication (not full as there will be cache misses) will lead to at least 3X more data than whats originally uploaded

2.       This means total storage = Blob Storage  + PoP distributed CDN cache (Ggl + 3rd party)

3.       This will sum upto 4 x 2500 PB = 10 EB data.

4.       This will be the total storage requirement of YT

 

Access expert answers by becoming a member
5 likes   |  
1 Feedback
badge Bronze PM

End Value = Begin Value (1 + CAGR/100)^n

(here first derive begin value for 2012 by setting end value for 2019 as 350 and n = 7 and then use this value to derive all other year number.

 

 

How can we but end value as 350? 350 is just the growth number(20% addition) and not the overall values of all the years.

0
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs