You'll get access to over 3,000 product manager interview questions and answers
Recommended by over 100k members
- CLARIFY:
- Do you want to focus on US or global? You choose. (Will go global.)
- Web v. mobile? Can include both.
- Is it OK to focus on the email function, as it is the main functionality of Gmail (i.e. not include Meet or Hangouts which are accessible from the Gmail web interface)? Yes.
- Is it OK to ignore any query requests from APIs or automated services? Yes.
- BACKGROUND: Gmail is a free email service developed by Google. Users can access email online via web or mobile. Google's mission is to promote accessibility of user information and email - information exchange - fits nicely with this mission. Gmail is one of the most popular email services in the US. There are over 1.5B active Gmail accounts worldwide.
- QUERY TYPES: Before we get into the equation, I'd like to establish possible ways Gmail can get a query:
- Received
- Read
- Write
- Draft Save
- Send Email
- Discard Email
- Starred / Marked as Read / Mark as Spam or Phishing
- File Email into Folder
- EQUATION: Total Gmail queries / second = Read query + write query + draft save query + send email query + discard email query + starred query + file email query
- BREAKDOWN UNKNOWNS:
- Gmail Active Users in Day Population:
- Assume there are 1.5B active users on Gmail.
- In a given day, assume 70% are actively using Gmail.
- 1.5B * .7 = 1.1BB active accounts on Gmail
- Received and Read Emails:
- We can classify accounts in 3 groups based on number of emails coming through. Based on my Gmail account, which receives a lot of emails every day, I'd classify myself on the high end. Let's assume the majority of the population is a medium gmail reader.
- Let's assume that users on average read ~75% of the emails that come through their inbox across the board.
Type of Account # Emails / Day % of Gmail Population Total # of Gmail Accounts Total # Emails Received Total # Emails Read Low 5 25% 1.1 * .25 = 275M 5* 275M = 1.4B 1.4B * .8 = 1.1B Medium 10 50% 1.1B / 2 = 550M 10 * 550M = 5.5B 5.5B * .8 = 4.4B High 20 25% 1.1 * .25 = 275M 5* 275M = 1.4B 1.4B * .8 = 1.1B Total Received Emails: 1.4B + 5.5B + 1.4 B = 8.3B
Total Read Emails: 1.1B + 4.4B + 1.1B = 6.6B
- Gmail Active Users in Day Population:
- Write Emails: We can classify writing emails in two ways: responding and writing new emails. Let's assume that users respond to a portion of the emails that come through their inbox. Let's break down the accounts in the same way.
- Responding: We can break down our groups into the same groups and assume response behavior for each group.
Type of Account # Emails Read / Day # of Email Responses Total # Emails Read Total # of Emails Responses Low 5 1/5 1.1B .2 * 1.1B = 220M Medium 10 3/10 4.4B .3 * 4.4B = 1.3B High 20 5/20 1.1B .25 * 1.1B = 275M - Total Email Responses: 220M + 1.3B + 275M = 1.8B
- Writing New: Again with the same groups, let's assume that each group writes new emails each day.
Type of Account # New Emails Written / Day Total # of Gmail Accounts Total # of New Emails Written Low 1 275M 1 * 275M = 275M Medium 5 550M 5 * 550M = 2.75B High 10 275M 10 * 275M = 2.75B - Total New Written Emails: 275M + 2.75B + 2.75B = 5.8B
- Total Written Emails (New + Responses): 1.8B + 5.8B = 7.6B emails
- Responding: We can break down our groups into the same groups and assume response behavior for each group.
- Draft Saved: Let's say that 20% of the emails that are written (responses and new), are draft saved / not sent immediately. (I assume based on my behavior of saving emails.)
- 7.6B emails written * .2 = 1.5B emails
- Sent Emails: Of the emails written, let's assume that 90% of them will be sent. (I assume based on my behavior of sending emails.)
- 7.6B emails written * .9 = 6.8B emails
- Discard Emails: There are two types of messages a user can discard: received or written emails / drafts.
- Written Emails: We stated above that 90% of emails written will be sent, which means 10% will not be sent. Let's assume that of those never sent 80% will be discarded.
- 7.6B emails written * .1 * .8 = 608M
- Received Emails: We stated above that there are 8.3B read emails. Let's assume 25% of those emails are deleted.
- 8.3B emails received * .25 = 2B
- Total Discard Emails: 608M + 2B = 2.6B
- Written Emails: We stated above that 90% of emails written will be sent, which means 10% will not be sent. Let's assume that of those never sent 80% will be discarded.
- Starred / Mark as Read / Marked as Spam / Phishing: Let's split this bucket.
- Starred: Let's assume 10% of the emails received will be starred:
- 8.3B emails received * .1 = 830M
- Mark as Read: We stated earlier that an individual reads 75% of the messages in their inbox, which means 25% they don't read. Let's assume they mark 15% of those as read.
- 8.3B emails received * .25 * .15 = 311M
- Marked as Spam / Phishing: Let's say the other 10% of emails users don't read as marked as spam / phishing
- 8.3B emails received * .25 * .1 = 208M
- Total Starred / Mark as Read / Marked as Spam / Phishing: 830M + 311M + 208M = 1.3B
- Starred: Let's assume 10% of the emails received will be starred:
- File Emails: Let's also assume that 10% of emails received will be filed.
- 8.3B emails received * .1 = 830M
- TOTAL QUERIES / SECOND:
- Total Queries / Day:
Query Type # / Day Received 8.3B Read
6.6B Write
7.6B Draft Save
1.5B Send Email
6.8B Discard Email
2.6B Starred / Marked as Read / Mark as Spam or Phishing
1.3B File Email into Folder
830M Total 35.5B
- Total Queries / Second:
- 86,400 seconds in day
- 35.5B queries in day / 86,400 = ~411K queries / second
- Total Queries / Day:
- CAVEATS:
- We counted received emails and sent emails separately - assuming there's a separate query for each action. Some received emails may come from non-Gmail addresses. Others may come from Gmail addresses. If a separate query between received and sent occurs ONLY for the non-Gmail received emails, would have to adjust equation.
A Google estimation interview question like this can be approached in several ways. My approach would be as follows...
Gmail is the number 2 Email Client after iPhone with 27% market share.
I would define what query means:
The user opens his Gmail, makes a query to the server, if the Emails are already in the cache, they are loaded directly, otherwise server sends back the top X Emails (depending on how many Emails are shown in the first page)
The user does three main things in Gmail:
- Reads Emails. When the user clicks on an Email, if it is already in the cache, it is loaded from there, otherwise it is coming from server. Let's imagine it is coming form server.
- Writes Email: The user either answers to Emails or write a new Email. The user Sends the Email.
- Draft: After the user is done writing the Email, he does not send it and saves it to the draft: In this case a write query is send to the server to save the draft in the database.
- Search: User searches in his Emails.
- Delete: User deletes Emails. User can delete Emails one by one (multiple requests to server), or select several and delete with one click (one query to server).
There are two ways to group the users:
- based on age. I assume people in the age group of 20-50 receive more Emails, because they have more responsibilities in life, e.g. being parents, university students, need to submit university exercises, book hotel for family, etc. The user group in age <20 are also more into using FB, or messenger apps rather than Email.
- based on their use case for Gmail: people who use Gmail only for private use cases, people who use Gmail for both work and private life. I assume the second group has a lot more read, write and search queries as they receive more Emails and they need to react to several of them.
We also have three groups of Emails in Gmail:
- Inbox
- Promotion
- Social
I assume most people read and answer to Emails in Inbox. Promotion Emails are often ignored (not even opened), and and a lower percentage of social Emails are opened, and they mostly don't need to be answered.
Facts and Assumptions:
Gmail has 1.2 bio users.
I assume on a daily basis, 80% of users are active---> 80%X1.2bio = 9.6 bio ~ 1bio
- Group 1: I assume 20% of DAUs use Gmail for both work and private life. 20%X1bio = 200mio
- Group 2: I assume 80% of DAUs use Gmail just for private life: 80%X1bio = 800mio
I assume 80% of group 2 are in age of 20-50 (80%X800). I assume the number of queries for them is 30% more vs. the other 20% (<20 or >50) because as I mentioned, age<20 uses more FB, or messenger apps rather than Email. Age >50 has little reason to use Email, they are not used to it.
Group1: private + work
- Search query: 1 time a day ==> 1 req/day
- Read:
- Inbox: 20 work Emails, 10 private Emails. The user opens 95% of work Emails, and open 90% of private Emails. ==> 20 * 95% + 90% 10 ~ 30 req/day
- Promotion: User receives 20 Emails a week, opens 10 Emails every week. => 20 * 50% /7 ~ 1.5 req/day
- Social: User receives 21 Emails a week, opens 7 every week = > 21 * 30% /7 = 1 req/day
- Write: User initiates 5 writes a day, and answers 5 Emails a day => 10 req/day
- Draft: User saves 2 drafts a week => 2 / 7 ~ 0 req/day
- Delete: User deletes 30% of total Emails received each day. => 5 req/day
Total req/day = 1 + 30 + 1.5 + 1 + 10 + 0 + 5 = 50 req/day
Group2: private 20
- Search query: 1 time a day ==> 1 req/day
- Read:
- Inbox: 10 private Emails. The user opens 90% of private Emails. ==> 90% 10 ~ 9 req/day
- Promotion: User receives 20 Emails a week, opens 10 Emails every week. => 20 * 50% /7 ~ 1.5 req/day
- Social: User receives 21 Emails a week, opens 7 every week = > 21 * 30% /7 = 1 req/day
- Write: User initiates 3 writes a day, and answers 3 Emails a day => 6 req/day
- Draft: User saves 2 drafts a week => 2/7 ~ 0 req/day
- Delete: User deletes 30% of total Emails received each day. => 3 req/day
Total req/day 20
Total req/day age<20 or age>50 (20% of group2)= 20 * 70%= 14 req/day
Total queries = Group 1 + Group 2 (age <20 , age >50) + Group 2 (20 Total queries = 200mio X 50 req/day + 800mio X 80% X 20 + 800mio X 20% X 14 = (10k+12.8k + 2.2k )mio req/day = 25 bio req/day To convert number of req per day to number of req/sec number of req/sec = 25 bio req/day * 30 * 400 / 1 bio = 300k req/sec --------------------------------- Conclusion:
Top Google interview questions
- What is your favorite product? Why?89 answers | 263k views
- How would you design a bicycle renting app for tourists?62 answers | 82.5k views
- Build a product to buy and sell antiques.54 answers | 66.8k views
- See Google PM Interview Questions
Top Estimation interview questions
- Calculate the number of queries answered by Google per second.45 answers | 78.5k views
- How do you decide how many cash registers you need for a new Walmart store?33 answers | 39.4k views
- How many dentists are there in New York?33 answers | 36.7k views
- See Estimation PM Interview Questions
Top Google interview questions
- How would you improve Google Maps?53 answers | 228k views
- A metric for a video streaming service dropped by 80%. What do you do?50 answers | 135k views
- How would you design a web search engine for children below 14 years old?36 answers | 42.9k views
- See Google PM Interview Questions
Top Estimation interview questions
- Estimate the number of Uber rides.30 answers | 51.3k views
- Estimate the number of bicycles required to start a bike sharing operation in a big city.18 answers | 15.8k views
- Estimate the number of WhatsApp chats occuring in India.14 answers | 13.3k views
- See Estimation PM Interview Questions