15% off membership for Easter! Learn more. Close

Explain the data pipeline for the last AI project you worked on. What were the top challenges in getting data, and how did you resolve them?

Asked at Google
1k views
Answers (2)
crownAccess expert answers by becoming a member

You'll get access to over 3,000 product manager interview questions and answers

badge Gold PM

1. understand the question and clarifying

  • can this be any type of AI project - Yes
  • is there a specific area of AI related topic that has to be focussed - like training the model or can cover at high level the entire project - your choice

2. Product description

This a is a consumer app to address the severity of acne through a a selfie image and provide progonosis and recommenditon for treatment and/ or recommending a specilist.

3. Key attributes

A) Train the severity of acne through supervised training

B) provide a severity rating after analyzing the selfie

C) provide a recommendation for treatmen

D) provide a daily progonosis if daily images are uploaded

4.  Goal

Severity of acne when a selfie is uploaded

5. Prioritized attributes

A- High impact - high cost to develop the model

B- High impact - medium cost if A is developed

C-high impact low coast of A and B are in place

5. Design

a. Data Pipeline :

i. Data collecion

Collected clinical pictures of acne with the approval of patients for training the model

tagged the severity to the picture by distrubuting the picture to 15 physicians with 5-15 years experience and labeled the data.

ii. Data proccessing and cleaning

created phyton scripts to normalize the picture quality and physician ratings

iii. storage and management

stored the picture in a secure blob storage with restricted access

iv. model configuration and training

applied multi-model approach . first training through calssification CCN model , then applying facial landmark detection and then finally applying one eye open cv model

v. fine tuning

applied supervised fine tuning (SFT) to fien tune the model

b. image labeler app - to albel the image to continue training the model through customer loaded images

c. deloying through Azure container and kubernete servcie to available the service through a web API to the mobile device app

7. Challenges

a. the number of training data images was limited less than 5000

b. poor quality of images due to bad enviornment or human label were noisy

8. How did resolve

a. worked with the Data scientist to augment the CNN model with spatial senestivity

converted classification to regression model

9. trade offs 

a. by augementing increased the noise in some instances

b. multi-model appraoch impacted the performance of the model as it filterred through stages

10. summary

Solved through AI sensitive medical prognosis of Acne through a selfie uploaded impage being classified and giving the severity of problem, so that young adult can decided if an OTC or physician visit is needed. Applied a multi model approach to have a high accuracy model

 

 

Access expert answers by becoming a member
1 like   |  
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs

In my last AI project, we developed a business matching platform using machine learning for an open innovation services company. The goal was to provide corporates, startup ecosystems, investors, and startups personalized suggestions on who to connect with.

Data Pipeline Overview:

  • Data Collection: We gathered data from multiple sources, including Crunchbase Pitchbook, and scraped public news, articles, and reports on companies, people, funding, etc. Salesforce CRM was also utilized, where Innovation Advisors added their unique perspective and information on the scouted tech startups and corporate innovation challenges and added them to the database. These entries required proper tagging and labeling. Integrating these diverse data sources into a cohesive dataset was our primary challenge.

  • Data Processing and Cleaning: Using Python scripts for preprocessing, we addressed missing values, normalized data, and extracted features. Ensuring data quality was vital for the model's accuracy.

  • Data Storage and Management: The processed data was stored on a cloud-based platform integrated with Firebase. We used OAuth to restrict access to authenticated users, ensuring data confidentiality and security. We also set up regular backups and data redundancy strategies to prevent data loss.

  • Data Analysis and Modeling: We adopted a supervised learning approach to develop our recommendation engine. This involved training our models on historical data, where the outcomes of previous successful business matches were used to predict future connections. We combined content-based and collaborative filtering techniques to analyze company profiles, startup pitches, and user interactions. This approach was key in addressing the cold start problem and enhancing the relevance of our suggestions.

  • Deployment and Monitoring: The model was deployed on a scalable cloud infrastructure designed to accommodate increasing users and data points. Continuous monitoring, with both automated tools and manual audits by our data science team, ensured ongoing accuracy and performance.

Top Challenges and Resolutions:

  • Data Integration: For integrating data from various sources like Crunchbase, Pitchbook, and Salesforce CRM, we utilized Google Cloud's ELT capabilities. This approach allowed us to efficiently manage large volumes of data, ensuring seamless integration and processing within our cloud-based infrastructure.

  • Real-Time Data Updates: Ensuring timely updates of our recommendations was a priority. We achieved this through efficient data processing methods, allowing for prompt updates of the platform's recommendations without necessarily implementing a real-time stream-processing framework.

  • User Feedback Incorporation: We initially overlooked the integration of user feedback into the model. Subsequently, we introduced a mechanism for users to star rate the relevance of our suggestions and write feedback, which was crucial for the iterative improvement of our recommendation algorithms.

Reflection:

This project highlighted the importance of agile data management and flexible architecture, especially in dynamic business environments. Close collaboration with the Innovation Advisors provided invaluable domain insights, enhancing our data labeling and feature engineering efforts. This experience deepened my understanding of developing secure, user-centric AI solutions that leverage supervised learning to deliver real-time, impactful results.

Access expert answers by becoming a member
6 likes   |  
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs
Get unlimited access for $12/month
Get access to 2,346 pm interview questions and answers to give yourself a strong edge against other candidates that are interviewing for the same position
Get access to over 238 hours of video material containing an interview prep course, recorded mock interviews by expert PMs, group practice sessions, and QAs with expert PMs
Boost your confidence in PM interviews by attending peer to peer mock interview practices, group practices, and QA sessions with expert PMs