You're a PM at Amazon eCommerce platform. You released new feature in the items search functionality recently and found out that searches increased by 10% but the results page load time is 2 seconds longer. What would you do?
0 votes
in Problem Solving by (12 points) | 755 views

2 Answers

+3 votes

Ask clarifying questions to fully understand data

  1. Is this data for the app or website or both? Ans: It is for website
  2. Was this feature launched globally? Ans: Yes
  3. Is this performance data for the global launch? Ans: Yes
  4. Is a 10% increase per user increase or in the overall number of total searches? Ans: Overall number of searches
  5. Is this 2s increase impacting any consumer/business metric? Ans: Yes, Search exits have increased & people are leaving the site more often
  6. In terms of percentage increase, what do 2 seconds mean? Ans: Almost 100%

Understand the goal of the feature

The goal of the feature was to make sure that people are able to get to the product they want faster than before & hence increase average order size per customer


Let's Look at user funnel

User Lands on to the site  ------x%---->Perform Search----y%------>Land on SRP-----z%--->Click on a result


So as per my understanding:-

x has increased by 10% (which is good)

y has decreased because of degradation in performance

z has increased because of improvement in search relevance


So we will focus on figuring & fixing why 'y' has decreased




If x*y*z has decreased which means that less % of people are clicking on search results than before, then I would roll-back the feature since fewer people are progressing on the funnel. The assumption here is that nothing else is getting impacted by roll-back. 


But for continuing the analysis, I am assuming that more people are progressing in funnel since x & z have increased.


Let's focus on why performance has degraded leading to a decrease in 'y'.


First, let's develop some hypothesis and then we will get some more data points to analyze the issue


Search Time = Time for the request to reach back end search server + Server time to perform search + Time for response to reach client + Time required by the client to display the SRP


Each of the above is measurable by tools or internal logging, so I would measure these and compare it with the previous implementation.


  1. Time for the request to reach back end search server - After comparison, assume we find out that the time has increased and this metric has degraded. Possible reasons are:-
    1. The deployment strategy of the new feature was changed because of which the time increased. Action item: Evaluate if we can use the same deployment strategy. Deployment strategy here means the underlying hardware, infrastructure, routes (both internal as external)  etc for the search service
    2. Check if search queries from a region where the performance is not up to the mark have increased thus skewing up the average time. Action item: Evaluate if this change in search queries is permanent & if yes, plan a deployment catering to the region.
    3. Check if the performance of a certain region has degraded significantly. Action item: Plan a deployment catering to the region.
    4. Check if underlying infrastructure such as undersea cables etc has snapped leading requests to take alternate routes that are taking longer. The probability of this is less since the impact we have seen is after we made the changes that point to something internal. I wouldn't spend too much time on this analysis.
  2. Server time to perform a search - This is the time required by the backend to actually perform the search
    1. Check if the number of items to be searched increased significantly. (Assumption: No)
    2. The throughput of the new algorithm for search has decreased. Action item: If yes, we will evaluate optimizations in the algorithm as well as adding more hardware if needed to improve throughput. 
  3. Time for response to reach client - Similar analysis as point#1. 
  4. Time required by the client to display the SRP
    1. Is the UI of SRP changed ?- Action item: If yes, please evaluate the front-end page-size and other front-end parameters like page load time, etc.  Optimize the front-end code. Chrome Audit tool can be used to evaluate the same. If the UI is not changed, then this analysis is not needed but we need to make sure that we are returning the same number of results as before, else reduce the number of search results & implement pagination is not done already.
See less
0 votes
* First of all this should have been found in testing this functionality
* Second is my customer getting impacting with this delayed response. I will do this by checking the data for the customers.
* I mean 2 seconds is not too much time ( you just blink your eye and 2 seconds are over) so this can be ignored.
* If my new feature is not affecting any customer then  no need to do anything.
* There might also be be some  connection issue with DB, as  there might be multiple hits on this feature on the server at the same time so the page will take time to response.
* I will optimise my server load here and will also have a load balance in place to handle the request.
See less
by (16 points)
Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
To avoid this verification in future, please log in or register.
Your name to display (optional):
Privacy: Your email address will only be used for sending these notifications.
To avoid this verification in future, please log in or register.

Related questions