Ask clarifying questions to fully understand data
- Is this data for the app or website or both? Ans: It is for website
- Was this feature launched globally? Ans: Yes
- Is this performance data for the global launch? Ans: Yes
- Is a 10% increase per user increase or in the overall number of total searches? Ans: Overall number of searches
- Is this 2s increase impacting any consumer/business metric? Ans: Yes, Search exits have increased & people are leaving the site more often
- In terms of percentage increase, what do 2 seconds mean? Ans: Almost 100%
Understand the goal of the feature
The goal of the feature was to make sure that people are able to get to the product they want faster than before & hence increase average order size per customer
Let's Look at user funnel
User Lands on to the site ------x%---->Perform Search----y%------>Land on SRP-----z%--->Click on a result
So as per my understanding:-
x has increased by 10% (which is good)
y has decreased because of degradation in performance
z has increased because of improvement in search relevance
So we will focus on figuring & fixing why 'y' has decreased
If x*y*z has decreased which means that less % of people are clicking on search results than before, then I would roll-back the feature since fewer people are progressing on the funnel. The assumption here is that nothing else is getting impacted by roll-back.
But for continuing the analysis, I am assuming that more people are progressing in funnel since x & z have increased.
Let's focus on why performance has degraded leading to a decrease in 'y'.
First, let's develop some hypothesis and then we will get some more data points to analyze the issue
Search Time = Time for the request to reach back end search server + Server time to perform search + Time for response to reach client + Time required by the client to display the SRP
Each of the above is measurable by tools or internal logging, so I would measure these and compare it with the previous implementation.
- Time for the request to reach back end search server - After comparison, assume we find out that the time has increased and this metric has degraded. Possible reasons are:-
- The deployment strategy of the new feature was changed because of which the time increased. Action item: Evaluate if we can use the same deployment strategy. Deployment strategy here means the underlying hardware, infrastructure, routes (both internal as external) etc for the search service
- Check if search queries from a region where the performance is not up to the mark have increased thus skewing up the average time. Action item: Evaluate if this change in search queries is permanent & if yes, plan a deployment catering to the region.
- Check if the performance of a certain region has degraded significantly. Action item: Plan a deployment catering to the region.
- Check if underlying infrastructure such as undersea cables etc has snapped leading requests to take alternate routes that are taking longer. The probability of this is less since the impact we have seen is after we made the changes that point to something internal. I wouldn't spend too much time on this analysis.
- Server time to perform a search - This is the time required by the backend to actually perform the search
- Check if the number of items to be searched increased significantly. (Assumption: No)
- The throughput of the new algorithm for search has decreased. Action item: If yes, we will evaluate optimizations in the algorithm as well as adding more hardware if needed to improve throughput.
- Time for response to reach client - Similar analysis as point#1.
- Time required by the client to display the SRP
- Is the UI of SRP changed ?- Action item: If yes, please evaluate the front-end page-size and other front-end parameters like page load time, etc. Optimize the front-end code. Chrome Audit tool can be used to evaluate the same. If the UI is not changed, then this analysis is not needed but we need to make sure that we are returning the same number of results as before, else reduce the number of search results & implement pagination is not done already.