15% off membership for Easter! Learn more. Close

How would you decrease the amount of storage needed for storing e-mail messages for Gmail?  

Asked at Google
2.6k views
Asked at
eye 2.6k views eye 2.6k views
Answers (1)
crownAccess expert answers by becoming a member

You'll get access to over 3,000 product manager interview questions and answers

badge Platinum PM

Clarify question

Before going to brainstorm solutions to reduce storage of Gmail, can I ask some clarifying questions?

  • Are there any constraints for the solutions? (Assuming the answer is no)

 

Cause

I think there are many reasons why Google Mail storage has increased significantly time by time.

 

  • The first reason is all emails in Gmail lasting forever and never being deleted. As I know Google Mail does marketing that their mails will never be ended regardless of any reason.

 
  • The second reason is there are a large number of spam emails sent every hour which is much more than the important mails including promotion, social network, …. Everyday, people receive a myriad of spam emails from different sources like social networks (facebook, twitter, linkedIn, …), web/app they use gmail account to register for, promotion they received from different brands and products, ...

 
  • There are many gmail users who don’t use gmail very often. There are many people creating gmail to do registration or leaving accounts created when they were still children. 

 
  • A large number of media (photos, videos, ...) is uploaded directly to Gmail storage. There are many situations where the same media data is stored in different mails because of some reasons. It unnecessarily occupies a huge space in our storage because the size of media data is much bigger than text data.

 

Solution

Therefore, some solutions coming to my minds are that:

 

 

Description

Pros

Cons

Limit the storage time of spam

The system will limit the lifetime of spam emails being unread to 2 years, because I assume that after reviewing data with DS, we see that most people wouldn’t review their spam mail after 2 years.

Save storage

Some not-spam emails are also deleted when they were incidentally put into the spam category.

=> Enhance the performance of spam detector

Handle the overlapping media

We will detect and remove the duplicate media in our database. For example, if 5 pictures are totally the same according to our detecting module, the system will keep only one and delete the others. Then we will use the existing one for all emails containing deleted pictures.

Save storage

Make sure the detecting module performs precisely to avoid removing different photos.

Compressing all mails of not regularly active users

Our system can save the storage by applying the compression algorithms to emails of gmail users who haven’t used gmail for a specific period of time, 2 years for example. Compressing data can help storage save a lot of memories

Save storage

Compressing data makes the time for extracting and query data takes longer.

But we can handle this problem by starting extracting when users access their gmail after a long time and have to verify their authentication.

 

From my point of view, we can combine all methods together to reduce the redundant data in Gmail storage.

 

The first solution we should use is to limit the storage time of spam because its cons don't matter a lot, and we can test by solf-deleting and see the feedback from customers. If some bad things happen, we can reverse the mail we deleted.

 

Next, handling overlapping data would help Google save a lot of spaces for storing their data related to email. But we should make sure that our detecting overlapping module peforms correctly.

 

We can do the A/B testing to see the behavior of not regularly active users before applying in large scale.

 

 

Access expert answers by becoming a member
2 likes   |  
1 Feedback

@Hung Nguyen : Great answers. good structure of thinking. 

You touched upon rgiht points. You brough in the case large number of media files as a problem. 
Its indeed a real problem and one of the biggest reason. 

While proposing the solution for the same problem, you proposed handling overlapping/duplicate media.  I think there could be more solutions for media files. 

I solved this problem in real-life. I wrote the prototype code back in 2014 and it was shipped in production. My outlook inbox was getting full and it was because I have no control over the incoming emails. People were sending big docuements attachment. 

I wrote a outlook pluing using visual studio. It would save all the attachment of e-mail into my local computer's folder. That folder is syncing using cloud storage. I created one folder for each sender. under each sender I created sub folders for each email thread sub folder name was  subject of email. 

After saving the attachment to that folder I was able to remove the attachment from e-mail and add the hard coded path of c:\users\<mayname>\<my-cloud-storage>\E-mail attachments\ <Sender-Name>

Above my version 1 PoC. 

My Company made CloudStorage called.  I learnt about how to use their API to programmatically upload the file and I made version 2, which directly uplaods to cloud storage. 

Was problem completely solved? No. While I am seding e-mails with attachment, that's also consuming space, as its stored in my sent folder.  Also recepeint's inbox is getting full due to my email. Imagine there can be more than one recepeint. 

I created version 3 of my PoC that would strip the attachment of all my outgoing email and uplaod it to cloud storage and just place the link in email.  

I still have those attachment folders and they weight more than 35 GB ( trash of over 6 years). 
Our CloudStorage product, shipped this feature in form of Outlook plugin the very same year. 

I pitched this idea to head of product alogn with my PoC and he said If people's data is living in out cloud storage they are less likely to churn. 

Later in 2017 on wards we saw Microsoft outlook started doing the same,  while you are sending a document over email, it gives an option want to attach as link or attach the original document. 

So this was my real life story. 

I would pictch creating connectors like the one I mentioned above.  I should be able to polugin my Dropbox account or Google Drive account. 
 

4
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs