Explain the data pipeline for the last AI project you worked on. What were the top challenges in getting data, and how did you resolve them?
You'll get access to over 3,000 product manager interview questions and answers
Recommended by over 100k members
1. understand the question and clarifying
- can this be any type of AI project - Yes
- is there a specific area of AI related topic that has to be focussed - like training the model or can cover at high level the entire project - your choice
2. Product description
This a is a consumer app to address the severity of acne through a a selfie image and provide progonosis and recommenditon for treatment and/ or recommending a specilist.
3. Key attributes
A) Train the severity of acne through supervised training
B) provide a severity rating after analyzing the selfie
C) provide a recommendation for treatmen
D) provide a daily progonosis if daily images are uploaded
4. Goal
Severity of acne when a selfie is uploaded
5. Prioritized attributes
A- High impact - high cost to develop the model
B- High impact - medium cost if A is developed
C-high impact low coast of A and B are in place
5. Design
a. Data Pipeline :
i. Data collecion
Collected clinical pictures of acne with the approval of patients for training the model
tagged the severity to the picture by distrubuting the picture to 15 physicians with 5-15 years experience and labeled the data.
ii. Data proccessing and cleaning
created phyton scripts to normalize the picture quality and physician ratings
iii. storage and management
stored the picture in a secure blob storage with restricted access
iv. model configuration and training
applied multi-model approach . first training through calssification CCN model , then applying facial landmark detection and then finally applying one eye open cv model
v. fine tuning
applied supervised fine tuning (SFT) to fien tune the model
b. image labeler app - to albel the image to continue training the model through customer loaded images
c. deloying through Azure container and kubernete servcie to available the service through a web API to the mobile device app
7. Challenges
a. the number of training data images was limited less than 5000
b. poor quality of images due to bad enviornment or human label were noisy
8. How did resolve
a. worked with the Data scientist to augment the CNN model with spatial senestivity
converted classification to regression model
9. trade offs
a. by augementing increased the noise in some instances
b. multi-model appraoch impacted the performance of the model as it filterred through stages
10. summary
Solved through AI sensitive medical prognosis of Acne through a selfie uploaded impage being classified and giving the severity of problem, so that young adult can decided if an OTC or physician visit is needed. Applied a multi model approach to have a high accuracy model
In my last AI project, we developed a business matching platform using machine learning for an open innovation services company. The goal was to provide corporates, startup ecosystems, investors, and startups personalized suggestions on who to connect with.
Data Pipeline Overview:
Data Collection: We gathered data from multiple sources, including Crunchbase Pitchbook, and scraped public news, articles, and reports on companies, people, funding, etc. Salesforce CRM was also utilized, where Innovation Advisors added their unique perspective and information on the scouted tech startups and corporate innovation challenges and added them to the database. These entries required proper tagging and labeling. Integrating these diverse data sources into a cohesive dataset was our primary challenge.
Data Processing and Cleaning: Using Python scripts for preprocessing, we addressed missing values, normalized data, and extracted features. Ensuring data quality was vital for the model's accuracy.
Data Storage and Management: The processed data was stored on a cloud-based platform integrated with Firebase. We used OAuth to restrict access to authenticated users, ensuring data confidentiality and security. We also set up regular backups and data redundancy strategies to prevent data loss.
Data Analysis and Modeling: We adopted a supervised learning approach to develop our recommendation engine. This involved training our models on historical data, where the outcomes of previous successful business matches were used to predict future connections. We combined content-based and collaborative filtering techniques to analyze company profiles, startup pitches, and user interactions. This approach was key in addressing the cold start problem and enhancing the relevance of our suggestions.
Deployment and Monitoring: The model was deployed on a scalable cloud infrastructure designed to accommodate increasing users and data points. Continuous monitoring, with both automated tools and manual audits by our data science team, ensured ongoing accuracy and performance.
Top Challenges and Resolutions:
Data Integration: For integrating data from various sources like Crunchbase, Pitchbook, and Salesforce CRM, we utilized Google Cloud's ELT capabilities. This approach allowed us to efficiently manage large volumes of data, ensuring seamless integration and processing within our cloud-based infrastructure.
Real-Time Data Updates: Ensuring timely updates of our recommendations was a priority. We achieved this through efficient data processing methods, allowing for prompt updates of the platform's recommendations without necessarily implementing a real-time stream-processing framework.
User Feedback Incorporation: We initially overlooked the integration of user feedback into the model. Subsequently, we introduced a mechanism for users to star rate the relevance of our suggestions and write feedback, which was crucial for the iterative improvement of our recommendation algorithms.
Reflection:
This project highlighted the importance of agile data management and flexible architecture, especially in dynamic business environments. Close collaboration with the Innovation Advisors provided invaluable domain insights, enhancing our data labeling and feature engineering efforts. This experience deepened my understanding of developing secure, user-centric AI solutions that leverage supervised learning to deliver real-time, impactful results.
Top Google interview questions
- What is your favorite product? Why?89 answers | 263k views
- How would you design a bicycle renting app for tourists?62 answers | 82.5k views
- Build a product to buy and sell antiques.54 answers | 66.8k views
- See Google PM Interview Questions
Top Technical interview questions
- Imagine you're the product manager for Facebook Marketplace. Since many sellers don't mark items as sold, what existing functionality and metrics could you use to determine whether an item has likely sold?7 answers | 20.9k views
- What happens when you enter a URL in your browser?6 answers | 10.8k views
- How does TinyURL work?5 answers | 317k views
- See Technical PM Interview Questions
Top Google interview questions
- How would you improve Google Maps?53 answers | 228k views
- A metric for a video streaming service dropped by 80%. What do you do?50 answers | 135k views
- How would you design a web search engine for children below 14 years old?36 answers | 42.9k views
- See Google PM Interview Questions
Top Technical interview questions
- How would you determine how to rank posts in the newsfeed?4 answers | 3.3k views
- The Chrome team is looking to reduce power utilization on mobile phones when using the browser. How would you go about solving this problem?3 answers | 3.7k views
- How would you map the ocean?3 answers | 2.9k views
- See Technical PM Interview Questions