15% off membership for Easter! Learn more. Close

How many queries per second does Gmail get?

Asked at Google
14.2k views
Answers (2)
crownAccess expert answers by becoming a member

You'll get access to over 3,000 product manager interview questions and answers

badge Platinum PM
  1. CLARIFY:
    1. Do you want to focus on US or global? You choose. (Will go global.)
    2. Web v. mobile? Can include both. 
    3. Is it OK to focus on the email function, as it is the main functionality of Gmail (i.e. not include Meet or Hangouts which are accessible from the Gmail web interface)? Yes. 
    4. Is it OK to ignore any query requests from APIs or automated services? Yes. 
  2. BACKGROUND: Gmail is a free email service developed by Google. Users can access email online via web or mobile. Google's mission is to promote accessibility of user information and email - information exchange - fits nicely with this mission. Gmail is one of the most popular email services in the US. There are over 1.5B active Gmail accounts worldwide. 
  3. QUERY TYPES: Before we get into the equation, I'd like to establish possible ways Gmail can get a query:
    1. Received
    2. Read
    3. Write
    4. Draft Save
    5. Send Email
    6. Discard Email
    7. Starred / Marked as Read / Mark as Spam or Phishing
    8. File Email into Folder  
  4. EQUATION: Total Gmail queries / second = Read query + write query + draft save query + send email query + discard email query + starred query + file email query
  5. BREAKDOWN UNKNOWNS:
      1. Gmail Active Users in Day Population:
        1. Assume there are 1.5B active users on Gmail. 
        2. In a given day, assume 70% are actively using Gmail. 
        3. 1.5B * .7 = 1.1BB active accounts on Gmail
      2. Received and Read Emails:
        1. We can classify accounts in 3 groups based on number of emails coming through. Based on my Gmail account, which receives a lot of emails every day, I'd classify myself on the high end. Let's assume the majority of the population is a medium gmail reader. 
        2. Let's assume that users on average read ~75% of the emails that come through their inbox across the board.
        3. Type of Account# Emails / Day% of Gmail PopulationTotal # of Gmail AccountsTotal # Emails ReceivedTotal # Emails Read
          Low525%1.1 * .25 = 275M5* 275M = 1.4B 1.4B * .8 = 1.1B 
          Medium1050%1.1B / 2 = 550M10 * 550M = 5.5B5.5B * .8 = 4.4B
          High2025%1.1 * .25 = 275M5* 275M = 1.4B1.4B * .8 = 1.1B 
        4. Total Received Emails: 1.4B + 5.5B + 1.4 B = 8.3B

        5. Total Read Emails: 1.1B + 4.4B + 1.1B = 6.6B

    1. Write Emails: We can classify writing emails in two ways: responding and writing new emails. Let's assume that users respond to a portion of the emails that come through their inbox. Let's break down the accounts in the same way. 
      1. Responding: We can break down our groups into the same groups and assume response behavior for each group. 
        1. Type of Account# Emails Read / Day# of Email ResponsesTotal # Emails ReadTotal # of Emails Responses
          Low51/51.1B .2 * 1.1B = 220M
          Medium103/104.4B.3 * 4.4B = 1.3B
          High205/20 1.1B .25 * 1.1B = 275M
        2. Total Email Responses: 220M + 1.3B + 275M = 1.8B 
      2. Writing New: Again with the same groups, let's assume that each group writes new emails each day. 
        1. Type of Account# New Emails Written / DayTotal # of Gmail AccountsTotal # of New Emails Written 
          Low1275M1 * 275M = 275M
          Medium5550M5 * 550M = 2.75B
          High10275M10 * 275M = 2.75B
        2. Total New Written Emails: 275M + 2.75B + 2.75B = 5.8B
      3. Total Written Emails (New + Responses): 1.8B + 5.8B = 7.6B emails
    2. Draft Saved: Let's say that 20% of the emails that are written (responses and new), are draft saved / not sent immediately. (I assume based on my behavior of saving emails.)
      1. 7.6B emails written * .2 = 1.5B emails
    3. Sent Emails: Of the emails written, let's assume that 90% of them will be sent. (I assume based on my behavior of sending emails.) 
      1. 7.6B emails written * .9 = 6.8B emails
    4. Discard Emails: There are two types of messages a user can discard: received or written emails / drafts. 
      1. Written Emails: We stated above that 90% of emails written will be sent, which means 10% will not be sent. Let's assume that of those never sent 80% will be discarded.
        1. 7.6B emails written * .1 * .8 = 608M
      2. Received Emails: We stated above that there are 8.3B read emails. Let's assume 25% of those emails are deleted. 
        1. 8.3B emails received  * .25 = 2B
      3. Total Discard Emails: 608M + 2B = 2.6B
    5. Starred / Mark as Read / Marked as Spam / Phishing: Let's split this bucket.
      1. Starred: Let's assume 10% of the emails received will be starred:
        1. 8.3B emails received * .1 = 830M
      2. Mark as Read: We stated earlier that an individual reads 75% of the messages in their inbox, which means 25% they don't read. Let's assume they mark 15% of those as read.
        1. 8.3B emails received * .25 * .15 = 311M
      3. Marked as Spam / Phishing: Let's say the other 10% of emails users don't read as marked as spam / phishing
        1. 8.3B emails received * .25 * .1 = 208M
      4. Total Starred / Mark as Read / Marked as Spam / Phishing: 830M + 311M + 208M = 1.3B
    6. File Emails: Let's also assume that 10% of emails received will be filed. 
      1. 8.3B emails received * .1 = 830M
  6. TOTAL QUERIES / SECOND: 
    1. Total Queries / Day:
      1. Query Type# / Day
        Received8.3B

        Read

        6.6B

        Write

        7.6B

        Draft Save

        1.5B

        Send Email

        6.8B

        Discard Email

        2.6B

        Starred / Marked as Read / Mark as Spam or Phishing

        1.3B

        File Email into Folder  

        830M
        Total35.5B
    2. Total Queries / Second:
      1. 86,400 seconds in day
      2. 35.5B queries in day / 86,400 = ~411K queries / second
  7. CAVEATS:
    1. We counted received emails and sent emails separately - assuming there's a separate query for each action. Some received emails may come from non-Gmail addresses. Others may come from Gmail addresses. If a separate query between received and sent occurs ONLY for the non-Gmail received emails, would have to adjust equation.
 
 
 
Access expert answers by becoming a member
4 likes   |  
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs
badge Platinum PM

A Google estimation interview question like this can be approached in several ways. My approach would be as follows...

Gmail is the number 2 Email Client after iPhone with 27% market share.

I would define what query means:

The user opens his Gmail, makes a query to the server, if the Emails are already in the cache, they are loaded directly, otherwise server sends back the top X Emails (depending on how many Emails are shown in the first page)

The user does three main things in Gmail:

  1. Reads Emails. When the user clicks on an Email, if it is already in the cache, it is loaded from there, otherwise it is coming from server. Let's imagine it is coming form server.
  2. Writes Email: The user either answers to Emails or write a new Email. The user Sends the Email.
  3. Draft: After the user is done writing the Email, he does not send it and saves it to the draft: In this case a write query is send to the server to save the draft in the database.
  4. Search: User searches in his Emails.
  5. Delete: User deletes Emails. User can delete Emails one by one (multiple requests to server), or select several and delete with one click (one query to server).

There are two ways to group the users:

  1. based on age. I assume people in the age group of 20-50 receive more Emails, because they have more responsibilities in life, e.g. being parents, university students, need to submit university exercises, book hotel for family, etc. The user group in age <20 are also more into using FB, or messenger apps rather than Email.
  2. based on their use case for Gmail: people who use Gmail only for private use cases, people who use Gmail for both work and private life. I assume the second group has a lot more read, write and search queries as they receive more Emails and they need to react to several of them.

We also have three groups of Emails in Gmail:

  1. Inbox
  2. Promotion
  3. Social

I assume most people read and answer to Emails in Inbox. Promotion Emails are often ignored (not even opened), and and a lower percentage of social Emails are opened, and they mostly don't need to be answered. 

 

Facts and Assumptions:

Gmail has 1.2 bio users.

I assume on a daily basis, 80% of users are active---> 80%X1.2bio = 9.6 bio ~ 1bio

  • Group 1: I assume 20% of DAUs use Gmail for both work and private life. 20%X1bio = 200mio
  • Group 2: I assume 80% of DAUs use Gmail just for private life: 80%X1bio = 800mio

I assume 80% of group 2 are in age of 20-50 (80%X800). I assume the number of queries for them is 30% more vs. the other 20% (<20 or >50) because as I mentioned, age<20 uses more FB, or messenger apps rather than Email. Age >50 has little reason to use Email, they are not used to it.

 

Group1: private + work

  1. Search query: 1 time a day ==> 1 req/day
  2. Read
    • Inbox: 20 work Emails, 10 private Emails. The user opens 95% of work Emails, and open 90% of private Emails. ==> 20 * 95% + 90% 10 ~ 30 req/day 
    • Promotion: User receives 20 Emails a week, opens 10 Emails every week. => 20 * 50% /7 ~ 1.5 req/day
    • Social: User receives 21 Emails a week, opens 7 every week = > 21 * 30% /7 = 1 req/day
  3. Write: User initiates 5 writes a day, and answers 5 Emails a day => 10 req/day
  4. Draft: User saves 2 drafts a week => 2 / 7 ~ 0 req/day
  5. Delete: User deletes 30% of total Emails received each day.  => 5 req/day

 

Total req/day = 1 + 30 + 1.5 + 1 + 10 + 0 + 5 = 50 req/day

 

Group2: private 20

  1. Search query: 1 time a day ==> 1 req/day
  2. Read
    • Inbox: 10 private Emails. The user opens 90% of private Emails. ==> 90% 10 ~ 9 req/day 
    • Promotion: User receives 20 Emails a week, opens 10 Emails every week. => 20 * 50% /7 ~ 1.5 req/day
    • Social: User receives 21 Emails a week, opens 7 every week = > 21 * 30% /7 = 1 req/day
  3. Write: User initiates 3 writes a day, and answers 3 Emails a day => 6 req/day
  4. Draft: User saves 2 drafts a week => 2/7 ~ 0 req/day
  5. Delete: User deletes 30% of total Emails received each day.  => 3 req/day

 

 

Total req/day 20= 1 + 9 + 1.5 + 1 + 6 + 0 + 3 = 20 req/day

Total req/day age<20 or age>50 (20% of group2)= 20 * 70%= 14 req/day

 

Total queries = Group 1 + Group 2 (age <20 , age >50) + Group 2 (20

Total queries = 200mio X 50 req/day + 800mio X 80% X 20 + 800mio X 20% X 14 = (10k+12.8k + 2.2k )mio req/day = 25 bio req/day

To convert number of req per day to number of req/sec

  • calculate number of req/month (M) 
  • number of req/sec = M*400/1bio

number of req/sec = 25 bio req/day * 30 * 400 / 1 bio = 300k req/sec

---------------------------------

Conclusion:

  • We divided Gmail users based on their usage of Gmail (work+private vs private) and based on age.
  • We divided the type of queries to Search, Write, Read, Draft, Delete
  • We divided the types of Emails to three types: Inbox, Promotion, Social
  • We concluded that there are 300 k req/sec to Gmail.

 

Access expert answers by becoming a member
5 likes   |  
1 Feedback
badge Platinum PM
Hi Pegah,
Thanks for the structured and well crafted answer. I think it's easy to follow your thinking structure . I really like how you took into consideration the different user behaviors that people in different age groups have. Couple small tweaks you can make are:
- Consider edge cases that require more queries (synching between multiple devices)  
- Add a sanity check in the end to make sure your estimate is reasonable
0
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs