15% off membership for Easter! Learn more. Close

You're the CTO of a company. Design a program that provides unique IDs upon requests from a client. This program will be used by Facebook and Google and needs to scale.

Asked at Google
1.3k views
Asked at
eye 1.3k views eye 1.3k views
Answers (2)
crownAccess expert answers by becoming a member

You'll get access to over 3,000 product manager interview questions and answers

1. Need to check why is such a program needed ? Is it to map block of data to a unique ID or user to a unique ID or URL to a unique ID ?Is there an expiration date for the Id, if so can it be set manually ? Say yes. Should the ID be always random or can it be user provided ? Say no 

Let's assume we have user entering some data which in turn returns a unique ID. Since this is going to be scaleable, our system needs to take care of the amount of data stored and the time to return the value generated 

2. Requirements :

  • Functional  :  
    • Need to return a unique ID 
    • Need to be able to set the expiration date.
    • Need to be able to map the input data with the ID when needed
  • Non Functional:
    • Availability of the unique ID and mapping always
    • Min latency in returning the unique ID 
  • Extended :
    • Ability to run analytics on how many times and where the unque ID is used.
3. Command Line :
  • Input : api_dev, "data", user_name, expiry_date
  • Output : api_dev, unique_id
4. Top level design :
Client ->Application Server -> Encoding / Key Generation Service -> Database 
Same in the reverse fashion to return the unique ID
 
5. Estimates :
Let's say 500M usage per day , per second requirement : 500M/24*60*60 Queries per second
If there are 500M unique IDs created and we would like to save for 5 years, then 500M*365*5*50bytes (Per unique ID) would be the storage space.
***Metadata not considered here 
 
6. How to create the unique ID ?
 It could be a simple SHA5SUM or MD5SUM on any text entered . We could also pre create the unique IDs and store in the DB, once the unique IDs are used, they are marked as used. 
Cleanup of the unique IDs are based on the expiry date provided. In case there is no date provided, we could keep a date of 5 years, run a service to regularly check for unique IDs more than 5 years and clean up. However since the program would be used by Google or FB, need to be cautious on the TTL.
 
7. Database sharding / partitioning : Since the data has to be stored in the DB, mapping to the unique ID and needs to be scaleable, consider 2 options here :
  • Range based partitioning : Say from Aa-Jj and the next one to be defined in a similar fashion
  • Hash based partitioning : Create a hash of the text entered and store it where the key is the unique id. 
8. Cache : Assuming that the certain unique IDs would be used on a regular basis, caching the 20% of most used Ids to the cache servers. Cache eviction is done by Least Recently used id being removed first 
 
9. Load balancers : Since this is a highly scaleable system, need to have load balancers in between the client -> application servers, application server -> database servers to effectively handle load of the requests coming in and the unique IDs passed out.
 
10. Analytics could be run, based on the agreed upon policies on how many unique IDs are generated, when are they used, which geographical location and similar questions.
Access expert answers by becoming a member
2 likes   |  
1 Feedback
badge Platinum PM

Things you did well 

  • You asked good set of clarifying questions 
  • Structure: Easier to follow with FR, NFR and extended requirements 
  • Stated the APIS
  • Capacity requirements 
  • Database optimization, caching and eviction policy
Areas of improvement 
  • Clarifying Qs: Ask more Qs to understand the requirements 
    • is the unique id a randomly generated unique alphanumeric id?
    • do we need to cap the list of ids requested by user 
      • should it be retrievable 
  • APIs: mention readable APis like get_id(...). These are easy to follow by IVR
  • Creation of Ids: Mention the length of the id that has sufficient combinations mix of numbers, digits and special characters (optional) for your 5 year estimates
  • Database choice: Mention which DB (SQL/NOSQL) would be prefer to store user id, expiration_date (optional) and unique ids
    • State the data model that helps the IVR to follow - how many tables, attributes
  • Cache crash: state that if you are generating unique ids in advance, placing them in cache and marking those in the DB as used. If the cache crashes, you need not worry about these generating unique ids, because you have ids in excess to use from the DB
Overall, adding more clarifying Qs would help to narrow the scope. Hope it helps.
0
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs

Clarifying questions:

1. Are we creating this program to outsource all login/credentials logic from big tech like FB/Google where they can access via their APIs and plug and play? Yes

2. Are we seeing any regional issues such as login failures or issues as to why we are tasked to do this vs them handling this within their architecture? You can make that assumption.

3. For applications like these which are accessed by Billions of users, high availability, high scalability, and low latency seem like the most important Nonfunctional requirements, does that sound right? Yes


Defining Functional Reqs & Non Functional Reqs:

FR:

- Ability to generate unique user IDs

- Need to be able to map data quick in case the user needs to retrieve forgotten user-id

NFR: 

- High Availability

- Low Latency

- Ability to run analytics (to understand the volume of users, peak times, to make decisions on scalability)

Customer Journey Flow & Architecture Flow (I would draw this system design on whiteboard or Google docs if I can)

#1The user enters a unique ID on any client (Mobile, desktop)
2The load balancer distributes requests based on the data structure set to the available serverLB Layer
3App server checks Cache is this is already created to throwback error for user & to protect the DB getting queried a lot (we can use Redis here) which handles requests like getID etcApp Server 
4If not, Key generation service is invoked using a Hash key Logic to create a Unique ID which handles CreateID functionsKeyGen API
5Stores in Database, we will use a Cassandra database which is a Document DB store that finite queries & ever-increasing data (for apps like FB/Google)Database Layer
6We will add a Redis Cache to keep a check on already generated user ids to maintain uniqueness and protect multiple queries to the DBCaching Layer
7We also can have email services API & Analytics services API which will notify users on successful/failure of id creation. Analytics services will push data via a KAFKA to the Hadoop cluster to perform analytics as needed.Analytics Tracking Layer

 

 

 

Hashkey Generator Logic:

We can assign a 7 digit unique hashkey code to make sure uniqueness is maintained and is scalable:

We can pick a 7 digit code which will increase each time an id is created and stored, this will enable us to create upto ~ 3.5 Trillion unique IDs

Our Database will have tables like (sample)

 iduuid (hashkey)expiry date
 query122abcdef110/10/2033 & timestamp
 name123abcdef21/12/2032 & timestamp
 

Scalability:

How many unique ID requests can we get in a minute: (assumption is we keep it for 10 years)

Making an assumption that we have around 400 Million users a day:

X = 400 M/ (60 X 60 X 24)

X = 4500 requests per second

Storage of these users for 10 years = 400M X 365 X 10 X (Storage unit of one unique ID = 100 Bytes) 

= 400M X 365 X 10 X 100 Bytes

= 1.46 PB would be needed for us to run this model.

Access expert answers by becoming a member
1 like   |  
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs