15% off membership for Easter! Learn more. Close

Tell me about a situation where things did not go as planned. What did you learn from it?

Asked at Microsoft
29.7k views
eye 29.7k views eye 29.7k views
Answers (3)
crownAccess expert answers by becoming a member

You'll get access to over 3,000 product manager interview questions and answers

badge Platinum PM
ago

S: I launched a payment verification portal for finance users but the team was still using the old system and adoption rate was low

T: I had to understand the root cause and figure out ways to increase the adoption rate

A: Spoke to the finance team, understood that they were not very confident of the new system since training was not done properly. Also, the system was built in a highly restrictive manner without taking into account the exceptions happening on the ground, due to which the payments get stuck and not processed thereby impacting the end user experience. Learning for me was to spend more time in understanding the exception flows and always start with a small rollout without any specific nuances and keep incorporating the feedback while scaling up gradually.

 

R: I reverted the rollout and launched at a client level starting with clients with no specific exceptions, while I kept adding the exceptions in the form of a configuration at a client level thereby making it easier for future rollouts. As a result, the system adoption came up to >90% within the next one month.

Access expert answers by becoming a member
0 likes   |  
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs
badge Silver PM

I love this question because when it goes well, it goes really well. And when it goes poorly it's pretty amazing as well.* The good news is that having your answer go poorly is really more a sign that you are unprepared for the interview, and all of you are preparing so this should not happen to you!

 

General framework for failure questions:

  1. Give context. What is the product / business you were working on?
  2. Talk about what failed. If you are an engineer, talk about how much money that outage cost the company, it's OK if it was pretty catastrophic as long as you talk about how complex the system is. If you are a PM, pick a more curated failure, not the time that you bet all your engineers on a failing product since that may call into question your judgement. Ideally, PMs should pick an execution failure, to avoid giving the interviewer any reason to believe that you may lack good judgement or strategic thinking skills.
  3. Explain what you did when you learned you were failing. Ideally this was calm and structured and has a good ending.
  4. Tell me what you learned. 
  5. Share what you'd do differently next time.
 

An example from a PM lens:

  1. [context] I was the PM for user onboarding in a clinical research platform. The platform contained different clinical research studies that users could sign up to participate in. When onboarding to a study, users could read about the study in their native language and decide if they wanted to consent to participate in the research. From a user perspective, correct and available translations are critical to deciding if you want to participate. Correct and available translations were critical to the business from a regulatory perspective, since participants must willingly provide their consent in a language they are fluent in. 
  2. [what failed] Engineering pushed a change that knocked out all non-English content translations for over 24 hours, meaning that users who were not fluent in English were only able to read study information in English on the platform, for 24 hours. This outage invalidated the consents for users who signed up for a clinical study during that 24 hour period, since we could not ensure that their consent was truly informed. The failure happened for three reasons: 
    1. new feature code broke the existing translations service
    2. there was no testing in place to stop engineering from pushing that bad code
    3. there was no process in place to ensure test planning and test coverage
  3. [what I did during the failure] I worked with the engineer who discovered the outage and gathered the cross-functional team who needed to understand the outage and prepare additional countermeasures (ie reaching out to the clinical research sites to let them know that there had been an outage of translations). I stayed in close contact with the operational team throughout the outage and helped to create some content that we could send to recover those users once we were sure the solution was in place and robust. After we found the root cause, I helped the testing team write out the system-level UI tests to catch this issue in future testing, if there was a regression. I also worked with the engineering manager to improve the engineering team processes around test planning. I also added a missing step to our launch planning process, to ensure that all new features have adequate integration and system test coverage. In the end, we recovered quickly and were able to get over 80% of the lost users re-consented in their native languages, within one week.
  4. [learning] I learned three key things from this experience. 
    1. as a PM, it's my responsibility to ensure that my features are being adequately tested by engineering to protect and ensure the user experience.
    2. I can help ensure that my engineering teams have process and hygiene around test planning and test coverage.
    3. put users first, always. We had many users thank us for reaching out to get them re-consented in their native language, and it was an opportunity to build our relationship with them since for many of them it was their first time interactiing with our platform. 
  5. [what I'd do differently] In general, I now take a more process-driven approach to ensuring adequate testing is in place. Since I am the one responsible for signing off on feature launches, I have implemented better checklists and processes that ensure that we don't miss key testing, and I'm happy to say that we have not had a test-coverage related outage in the platform since that one.
 

The same example from the engineering lens:

  1. [context] I worked on language translations for a clinical research platform. The platform contained different clinical research studies that users could participate in, and users could read about the studies in their native language to decide if they wanted to participate in the research. Correct translations were critical to the business from a regulatory perspective, since participants must willingly provide their consent with all of the information available to them, and they were critical to users because they need to fully understand the clinical research they are electing to participate in.

  2. [what failed] I pushed a change that knocked out all non-English content translations on the platform meaning that non-native English speakers/readers were only able to read information in English on the platform. The outage lasted around 24 hours before I caught it and we resolved the issue. This outage invalidated the consents for everyone who signed up for a clinical study in English in that period of time, since we could not ensure that their consent was truly informed. The failure happened for two reasons: (1) I pushed code that broke the translations service, and (2) there was no testing in place to stop me from pushing that bad code.
  3. [what I did during the failure] I regularly tested my own code, end to end, in Spanish since it's my second language. While working on a new feature, I did some testing and noticed that, despite provisioning my environment to show me Spanish content, all the content was in English. To verify that the translations were broken, I double checked my environment set up first, then I immediately rolled back production to a safe build where I had previously verified that Spanish translations were working. Then, I started a post-mortem doc and began the technical investigation on what went wrong. I also informed our PM so they could gather the cross-functional team who needed to understand the outage and prepare additional countermeasures (ie reaching out to the clinical research sites to let them know that there had been an outage of translations). Once we found the root cause, I worked with the engineer whose translations service had broken, to fix the issue. We tested the change together and verified that it was working before we rolled out the change (and all other changes that had been rolled back). I worked with that engineer to write additional integration tests to cover the failure path, and I worked with the manual test engineer to write additional manual tests which would catch any language outage.
  4. [learning] I learned three key things from this experience. 
    1. as an engineer, it's my responsibility to ensure that my features are being adequately tested, through unit, integration and manual system level tests. 
    2. I can help to ensure that my team has good process and hygiene around writing and maintaining automated integration tests that touch multiple services
    3. bringing in key stakeholders like the clinical operations staff allowed us to solve the operational symptoms in parallel to fixing the technical issue, minimizing overall user confusion and loss for the outage.
  5. [what I'd do differently] Primarily, I would ensure that as I was launching new features, I would work more closely with the dedicated testers to ensure that they are covering key E2E test cases before we launched my feature. My team didn't have strong process around this, and so I missed it. The entire issue could have been caught either by better integration or even unit testing for the service that I broke, or by our manual testing efforts if tests had covered this use case. Today, when I design and plan to launch new features, I reserve significantly more time for test plan development, including unit and integration tests that I write myself, and also system-level manual tests that our testing team owns and maintains. In general, I take a more process-driven approach to ensuring adequate testing is in place.
 
*the feeling of amazing applies only for the interviewer. This sounds kind of evil, but look, I do 5 interviews a week, let me enjoy a crash and burn once and a while. Also, you are here and learning how to answer the question, so you are doing to do fine!
Access expert answers by becoming a member
2 likes   |  
1 Feedback
badge Platinum PM

 

@erika

I ask this question or a version of it in every interview.

Your framework is perfect for the answer.  As an interviewee use the framework, be prepared, don’t step on landmines.  We all make mistakes; we learn more from our mistakes and missteps then we do from our successes.  Admitting a mistake isn’t a weakness – not learning from it, not handling and not being honest about it is…

I get invited to a breakfast interview; I was VP of product.  The interview was for a sales role – the VP of sales tells me that he is ready to extend an offer but wants me to have breakfast with him and the candidate.  We have breakfast, we chit chat, I ask a few background questions, I’m just getting to know they guy, then I ask the tell me about your biggest mistake and how you delt with it?  What we got was a story about covering up a mistake, it all worked out, no one needed to know.  There was a point when the VP sales looks across the table at me... we both knew that this candidate wasn’t getting the offer. 

Essentially, he had messed up an online promo that was going to cost the company $$$$$.  He worked with one of his vendors to cover the promo.  At the end of the month the numbers came in without an issue because the vendor covered it.  But what he did not do was communicate with his management about the mistake and how he was going to resolve it.  He thought that his fixing it was enough.  The candidate made it to breakfast with the two VP’s but blew a question that requires a little prep and being honest. 

My biggest mistake is similar – I sent the wrong contract to the wrong contact.  One was a legacy contract for $ while the other was for the same service but we were charging $$$$.  One of my jobs was to move the legacy contracts up to $$ and let them know that over the next few renewals we would be moving them up to list price. 

The customer who was paying $$$$ saw that someone else was paying $.  They were really upset. 

How I delt with it.  I told my boss.  I was honest with the customer that the lower dollar value contact was signed years ago when the company was a startup.  When we were in startup mode the founders accepted any dollars, they could. 

What I learned – slow down before you press send and check your attachments.  Always be honest, work with your management to resolve issues. 

 

0
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs
badge Platinum PM

In my previous company, one of the key aspect of the product was integrating with external 3rd party payment mechanisms for auto-renewal subscriptions. In one of the integrations, after we scaled after a few months, we suddenly started getting complaints from the users that they are getting charged multiple times. From both user and product persepctive, this was a big problem.

I worked with engineering to figure out the root cause and realized that because of the scale of the increase in scale, the partner system was not able to send charging notifications properly; some were delayed and some never came, even though the user was charged. As a result, our system kept on sending charging requests, leading the user to get chared multiple times.

As the first step, I communicated this to the partner to make sure that they understand the root cause which was not because of us. Secondly, since the users were getting impacted, I took the decision to stop sending any charging requests for that partner. In meantime, we quickly put a hack that, the charging request won't be send for the length of charging period. These 2 things solved the immidiate issues. Then I went back the to the drawing-board and re-wrote the requirements which the engineering worked on to solve this issue permanently.

This is what I learned from this :-

  1. Always before the launch, figure out the scale you expect and make sure all internal & external systems are able to handle it. From that point on, I incorporated what I called as PSR as part of requirements.
  2. At the end as PM, you are responsible for consumers so always plan for failure/worst cases scenarios from the external systems as well. So from now on, I always gave requirements and use-cases for failure in 3rd party systems.
  3. I got our contractual terms modified legally so as we don't get legally liable for issues in the partner systems. It was made more explicit.
  4. Last but not the least, we made a process to make sure that we scale slowly and don't increase traffic massively. Also put in relevant alerts for such scenarios.

 

Access expert answers by becoming a member
0 likes   |  
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs