How would you implement the sync feature of Google Drive app or Google Docs? How would you design the DB for G-drive?
You'll get access to over 3,000 product manager interview questions and answers
Recommended by over 100k members
- Real time syncing of documents/edits
- Conflict resolution (when 2 people edit the same sentence in a G-Doc)
- Speed of retrieval of the documents
- Traceability of who made the changes
- Real time sync capabilities with conflict resolution
- Speed of retrieval of the documents
Understanding of the feature:
When a file on Google Drive changes, or the contents (or settings) of a Google Doc changes we want to reflect that to all consumers of the resource.
Let's take a Google Docs as the main scenario and proceed. Same pricinple will apply to Drive too.
3 stakeholders to this problem:
- User who is making changes (A)
- Resource store (typically a database) (R)
- User who is consuming changes (can be the same user who is also making changes) (B)
Goals of each stakeholder:
- Users
- Make changes and have them persist
- Either by explicit instruction of auto-save
- See changes of other users
- Whenever they happen
- Resolve conflicts if any
- Make changes and have them persist
- Resource store
- Maintain a steady state copy
- Maintain transient copies for collaboration
- Make transient copy a steady copy once collaboration is complete
Flow:
- A creates a file.
- No steady copy exists, so R will create a transient copy and send it to A
- A starts editing.
- R writes edits to transient copy.
- A shares document with B.
- When B accesses the file, R sends the transient copy.
- Now both A and B are editing.
- R keeps writing edits to transient copy.
- Both of them finish editing and close the document.
- After some time (or other condition), R replaces steady copy with transient copy and deletes transient copies.
Architecture:
- Steady state copies can be stored in a normal database(on disk). Write and read performance are important but not critical.
- This database will have a dirty bit that will indicate whether file is being edited.
- Transient DB can be an in-memory database offering high read and write performance.
- Importantly we cannot stream the transient copy for every single change. It makes sense to send updates only.
- Update should contain at the very least:
- Coordinated time
- Content
- Offsets
- Author
- So we will need a system that
- Receives updates and modifies transient copy
- If conflict:
- Send a message to original author and don’t update transient copy
- else:
- Sends updates to all users
- If conflict:
- Receives updates and modifies transient copy
- Extra: Occasionally check if user’s copy is same as transient copy to verify if all updates are persisted properly.
- Update should contain at the very least:
- Finally, when all users close the doc, we replace the steady copy with transient, turn off the dirty flag and delete transient copies.
Additional challenges:
- Scale:
- If we have 100 users editing the same file
- We may have to implement a message queuing system for updates
- From a product PoV, I’d rather limit simultaneous opens to say 10 users.
- If we have 100 users editing the same file
- Network connectivity:
- An update may be received long after it was originally made.
- Conflict resolution should be able to handle this like regular case maybe
- An update may be received long after it was originally made.
So to summarize, there will be 2 main components:
- Transient DB system to ensure high performance of reads and writes
- Update broker that resolves conflicts, modifies transient DB and broadcasts updates to users.
I noticed there aren't many Google technical questions answered here, so thought I'd take a stab at this. Would love feedback as I'm new to technical interview questions.
Clarify:
By sync feature, do we mean the ability to be able to keep document changes up to date, so that if two users are on the same document and one person makes a change to the document, that change will be visible to the other person? → Yes
Do we need to consider how much storage cost this will require? → Yes
Key features of Google docs sync:
Create your own document
Save your document
Multiple people can edit the same document at once
Syncs are automatically merged and merge conflicts are handled
Additional features like comments, formatting, etc.
Qualities:
Simple editing features
Fast
Handle conflicts
Seamless
Back-of-envelope estimation of total data size and key bottlenecks:
Figure out total storage cost of Google
# of Google doc users * # of docs created per year * size of document * 5 years of storage
Did some math to figure out total storage around 250TB of data
Figure out # of Google docs open at any time
Around 12M active documents per half hour
2M of those have 4-5 users on editing at once
High level architecture:
When document is opened, it is fetched from database and put into some type of cache or place easier to edit than directly on disk so faster to access and make changes
Architect a synchronization service that is able to make sure client side version of document is not out of date with most recent version
Since multiple clients can be editing at same time, consider enforcing all requests go through same server so that there aren’t race conditions on the data itself
Proxy could figure out if requests are for the same document, then make sure they’re working off same cache
Create some type of queue so that changes are addressed in order
User session stored to figure out which user is making which edits
Maybe separate out services to have read and write functionality so you can scale both of those independently. Needs to be able to handle high read and write.
DB tables:
User table
User id
Documents table
Document information
Last updated
Updates table
Documents ID
User doing the update
Update time
User permissions table
User id
Document ids
Ways to scale/optimize:
I already mentioned some of these in the high level architecture piece.
Other thing to consider is how we store the documents across servers. Maybe use indexing to make it easier to find the document in question.
Top Google interview questions
- What is your favorite product? Why?89 answers | 263k views
- How would you design a bicycle renting app for tourists?62 answers | 82.5k views
- Build a product to buy and sell antiques.54 answers | 66.8k views
- See Google PM Interview Questions
Top Technical interview questions
- Imagine you're the product manager for Facebook Marketplace. Since many sellers don't mark items as sold, what existing functionality and metrics could you use to determine whether an item has likely sold?7 answers | 20.9k views
- What happens when you enter a URL in your browser?6 answers | 10.8k views
- How does TinyURL work?5 answers | 317k views
- See Technical PM Interview Questions
Top Google interview questions
- How would you improve Google Maps?53 answers | 228k views
- A metric for a video streaming service dropped by 80%. What do you do?50 answers | 135k views
- Calculate the number of queries answered by Google per second.45 answers | 78.5k views
- See Google PM Interview Questions
Top Technical interview questions
- How would you determine how to rank posts in the newsfeed?4 answers | 3.3k views
- The Chrome team is looking to reduce power utilization on mobile phones when using the browser. How would you go about solving this problem?3 answers | 3.7k views
- How would you map the ocean?3 answers | 2.9k views
- See Technical PM Interview Questions