- 86
- 976 654
Shaw Talebi
United States
Приєднався 17 вер 2020
Failing my way to success | Data Entrepreneur
About me:
Hi, I'm Shaw. I make videos about data science and entrepreneurship. As someone who survived grad school with the help of UA-cam, I want to give back by making videos that would have been helpful when I was first navigating these ideas.
About me:
Hi, I'm Shaw. I make videos about data science and entrepreneurship. As someone who survived grad school with the help of UA-cam, I want to give back by making videos that would have been helpful when I was first navigating these ideas.
How to Build ML Solutions (w/ Python Code Walkthrough)
This is the 4th video in a series on Full Stack Data Science. Here, I explain why experimentation is critical to the ML lifecycle and walk through the development of a semantic search tool for my UA-cam videos.
👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosWmAt-AMK0MBgh3GeSvbCmL.html
💻 Example Code: github.com/ShawhinT/UA-cam-Blog/tree/main/full-stack-data-science/data-science
🤖 RAG: ua-cam.com/video/Ylz779Op9Pw/v-deo.html
📚Text Embeddings: ua-cam.com/video/sNa_uiqSlJo/v-deo.html
Resources:
[1] karpathy.medium.com/software-2-0-a64152b37c35
[2] arxiv.org/abs/2012.07919
--
Book a call: calendly.com/shawhintalebi
Homepage: shawhintalebi.com/
Socials
medium.com/@shawhin
www.linkedin.com/in/shawhintalebi/
ShawhinT
shawhintalebi
The Data Entrepreneurs
🎥 UA-cam: www.youtube.com/@TheDataEntrepreneurs
👉 Discord: discord.gg/RSqZbF9ygh
📰 Medium: medium.com/the-data-entrepreneurs
📅 Events: lu.ma/tde
🗞️ Newsletter: the-data-entrepreneurs.ck.page/profile
Support ❤️
www.buymeacoffee.com/shawhint
Introduction - 0:00
Why ML is Different - 0:39
Role of Experimentation - 3:04
Semantic Search (Design Choices) - 5:09
Example Code: Semantic Search of YT Videos - 8:17
Preview of Final Product - 10:06
Step 1: Experimentation & Evaluation - 11:17
Step 2: Build Video Index - 34:14
Step 3: Build UI - 35:49
What's Next? - 43:43
👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosWmAt-AMK0MBgh3GeSvbCmL.html
💻 Example Code: github.com/ShawhinT/UA-cam-Blog/tree/main/full-stack-data-science/data-science
🤖 RAG: ua-cam.com/video/Ylz779Op9Pw/v-deo.html
📚Text Embeddings: ua-cam.com/video/sNa_uiqSlJo/v-deo.html
Resources:
[1] karpathy.medium.com/software-2-0-a64152b37c35
[2] arxiv.org/abs/2012.07919
--
Book a call: calendly.com/shawhintalebi
Homepage: shawhintalebi.com/
Socials
medium.com/@shawhin
www.linkedin.com/in/shawhintalebi/
ShawhinT
shawhintalebi
The Data Entrepreneurs
🎥 UA-cam: www.youtube.com/@TheDataEntrepreneurs
👉 Discord: discord.gg/RSqZbF9ygh
📰 Medium: medium.com/the-data-entrepreneurs
📅 Events: lu.ma/tde
🗞️ Newsletter: the-data-entrepreneurs.ck.page/profile
Support ❤️
www.buymeacoffee.com/shawhint
Introduction - 0:00
Why ML is Different - 0:39
Role of Experimentation - 3:04
Semantic Search (Design Choices) - 5:09
Example Code: Semantic Search of YT Videos - 8:17
Preview of Final Product - 10:06
Step 1: Experimentation & Evaluation - 11:17
Step 2: Build Video Index - 34:14
Step 3: Build UI - 35:49
What's Next? - 43:43
Переглядів: 2 121
Відео
How to Build Data Pipelines for ML Projects (w/ Python Code)
Переглядів 1,5 тис.День тому
This is the 3rd video in a series on Full Stack Data Science. Here, discuss key aspects of building data pipelines for machine learning and share Python code for pulling transcripts from all my UA-cam videos. 👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosWmAt-AMK0MBgh3GeSvbCmL.html 📰 Read more: towardsdatascience.com/how-to-build-data-pipelines-for-machine-learning-b97bbef050a5?sk=4823c18cab0a...
How to Manage Data Science Projects
Переглядів 1,1 тис.21 день тому
This is the 2nd video in a series on Full Stack Data Science. Here, I introduce a 5-step project management framework for data science and discuss the project manager's role in implementing it. 👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosWmAt-AMK0MBgh3GeSvbCmL.html 📰 Read more: medium.com/towards-data-science/data-science-project-management-e8787d818ad0?sk=236beb8e4787f6c6ec2f31ed17237cfd 💻 ...
4 Skills You Need to Be a Full-Stack Data Scientist
Переглядів 3,2 тис.28 днів тому
I'm kicking off a new series on Full Stack Data Science. This video introduces the idea and defines its 4 hats. Future videos will walk through each hat using a real-world use case. 👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosWmAt-AMK0MBgh3GeSvbCmL.html 📰 Read more: medium.com/towards-data-science/the-4-hats-of-a-full-stack-data-scientist-5b916bd2f079?sk=2d60946532c55d7c4f6502a8e73e7b52 💻 Pr...
How I'd Learn Data Science (if I started over)
Переглядів 1,5 тис.Місяць тому
If you forced me to learn data science over again, this is exactly what I'd do. 🎥 Data Scientist Interviews: ua-cam.com/video/_Wjn0gm4g20/v-deo.html (Potentially) Helpful Resources [1] blogs.sun.ac.za/open-day/files/2022/03/Data-Scientist-Harvard-review.pdf [2] www.liebertpub.com/doi/epdf/10.1089/big.2013.1508 [3] ua-cam.com/video/xC-c7E5PK0Y/v-deo.html [4] ua-cam.com/video/FsSrzmRawUg/v-deo.ht...
I Was Wrong About AI Consulting (what I learned)
Переглядів 4,7 тис.Місяць тому
❌ Why I Quit: ua-cam.com/video/LRLH_yIxHrI/v-deo.html 🎥 (Data) Entrepreneurship: ua-cam.com/play/PLz-ep5RbHosXORBcWr6dy3-Wdq9RT2n2f.html The Data Entrepreneurs 🎥 UA-cam: www.youtube.com/@TheDataEntrepreneurs 👉 Discord: discord.gg/RSqZbF9ygh 📰 Medium: medium.com/the-data-entrepreneurs 📅 Events: lu.ma/tde 🗞️ Newsletter: the-data-entrepreneurs.ck.page/profile Book a call: calendly.com/shawhintaleb...
Text Embeddings, Classification, and Semantic Search (w/ Python Code)
Переглядів 37 тис.Місяць тому
Need help with AI? Book a call: calendly.com/shawhintalebi In this video, I introduce text embeddings and describe how we can use them for 2 simple yet high-value use cases: text classification and semantic search. 👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0.html 🎥 RAG: ua-cam.com/video/Ylz779Op9Pw/v-deo.html 📰 Read more: medium.com/towards-data-science/text-embeddings...
How to Improve LLMs with RAG (Overview + Python Code)
Переглядів 15 тис.Місяць тому
In this video, I give a beginner-friendly introduction to retrieval augmented generation (RAG) and show how to use it to improve a fine-tuned model from a previous video in this LLM series. 👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0.html 🎥 Fine-tuning with QLoRA: ua-cam.com/video/XpoKB3usmKc/v-deo.html 📰 Read more: medium.com/towards-data-science/how-to-improve-llms-w...
QLoRA-How to Fine-tune an LLM on a Single GPU (w/ Python Code)
Переглядів 36 тис.2 місяці тому
Need help with AI? Book a call: calendly.com/shawhintalebi In this video, I discuss how to fine-tune an LLM using QLoRA (i.e. Quantized Low-rank Adaptation). Example code is provided for training a custom UA-cam comment responder using Mistral-7b-Instruct. 👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0.html 🎥 Fine-tuning with OpenAI: ua-cam.com/video/4RAvJt3fWoI/v-deo.htm...
3 Ways to Make a Custom AI Assistant | RAG, Tools, & Fine-tuning
Переглядів 15 тис.3 місяці тому
AI assistants are chatbots that can use tools to perform a wide variety of tasks. In this video, I walk through 3 ways to make a custom AI assistant using OpenAI's platform. Each approach is used to make a UA-cam comment responder called ShawGPT. 👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0.html 🎥 OpenAI API Intro: ua-cam.com/video/czvVibB2lRA/v-deo.html 🎥 Fine-tuning O...
AI & Machine Learning for Business | A (non-technical) introduction
Переглядів 1,8 тис.3 місяці тому
AI & Machine Learning for Business | A (non-technical) introduction
5 Questions Every Data Scientist Should Hardcode into Their Brain
Переглядів 1,1 тис.4 місяці тому
5 Questions Every Data Scientist Should Hardcode into Their Brain
How Much YouTube Paid Me in My First 6 Months of Monetization (as a Data Science Creator)
Переглядів 1,8 тис.4 місяці тому
How Much UA-cam Paid Me in My First 6 Months of Monetization (as a Data Science Creator)
4 Ways to Measure Fat Tails with Python (+ Example Code)
Переглядів 5215 місяців тому
4 Ways to Measure Fat Tails with Python ( Example Code)
Detecting Power Laws in Real-world Data | w/ Python Code
Переглядів 9335 місяців тому
Detecting Power Laws in Real-world Data | w/ Python Code
Pareto, Power Laws, and Fat Tails-what they don’t teach you in STAT 101
Переглядів 2,7 тис.6 місяців тому
Pareto, Power Laws, and Fat Tails-what they don’t teach you in STAT 101
I Spent $716.46 Talking to Data Scientists on Upwork-Here’s what I learned.
Переглядів 2,5 тис.6 місяців тому
I Spent $716.46 Talking to Data Scientists on Upwork-Here’s what I learned.
I Have 90 Days to Make $10k/mo-Here's my plan
Переглядів 1,3 тис.6 місяців тому
I Have 90 Days to Make $10k/mo-Here's my plan
How to Build an LLM from Scratch | An Overview
Переглядів 172 тис.7 місяців тому
How to Build an LLM from Scratch | An Overview
Fine-tuning Large Language Models (LLMs) | w/ Example Code
Переглядів 224 тис.7 місяців тому
Fine-tuning Large Language Models (LLMs) | w/ Example Code
Prompt Engineering: How to Trick AI into Solving Your Problems
Переглядів 18 тис.7 місяців тому
Prompt Engineering: How to Trick AI into Solving Your Problems
Why I Quit My $150,000 Data Science Job
Переглядів 1,4 тис.8 місяців тому
Why I Quit My $150,000 Data Science Job
The Hugging Face Transformers Library | Example Code + Chatbot UI with Gradio
Переглядів 30 тис.9 місяців тому
The Hugging Face Transformers Library | Example Code Chatbot UI with Gradio
The OpenAI (Python) API | Introduction & Example Code
Переглядів 17 тис.9 місяців тому
The OpenAI (Python) API | Introduction & Example Code
A Practical Introduction to Large Language Models (LLMs)
Переглядів 32 тис.9 місяців тому
A Practical Introduction to Large Language Models (LLMs)
I Spent $675.92 Talking to Top Data Scientists on Upwork-Here’s what I learned
Переглядів 6 тис.10 місяців тому
I Spent $675.92 Talking to Top Data Scientists on Upwork-Here’s what I learned
How to Create a Custom Email Signature in Gmail (2024)
Переглядів 74 тис.11 місяців тому
How to Create a Custom Email Signature in Gmail (2024)
My $100,000+ Data Science Resume (what got me hired)
Переглядів 2,9 тис.Рік тому
My $100,000 Data Science Resume (what got me hired)
How to Make a Data Science Portfolio With GitHub Pages (2024)
Переглядів 73 тис.Рік тому
How to Make a Data Science Portfolio With GitHub Pages (2024)
Dimensionality Reduction & Segmentation with Decision Trees | Python Code
Переглядів 606Рік тому
Dimensionality Reduction & Segmentation with Decision Trees | Python Code
It did make my life a little easier ☺ thank you!
Test this Here is a code to add in chat GPT 3.5 for it to remember your conversation in that specific text box Add this code in the settings for model description in the second box add your name and contacts in the first box of what you would like to be working on and then add this definition in the second definition box and it will remember your conversation when you were working in that specific text box this is an advanced input code so you don't actually have to pay for a model to be upgraded to get this request they've not added this feature so I created it to add in that model you could try using it in other models I've only used GPT 3.5 because they have a custom feature in the settings to add model definitions which is amazing # Define the RollingChatRecall class class RollingChatRecall: def __init__(self): self.previous_conversation = "" self.next_conversation = "" def add_previous_conversation(self, conversation): self.previous_conversation = conversation def add_next_conversation(self, conversation): self.next_conversation = conversation def retrieve_previous_conversation(self): return self.previous_conversation def retrieve_next_conversation(self): return self.next_conversation # Define the chat system function def chat_system(): # Create an instance of the RollingChatRecall class rolling_chat = RollingChatRecall() # Retrieve the previous conversation data previous_data = rolling_chat.retrieve_previous_conversation() # Retrieve the next conversation data next_data = rolling_chat.retrieve_next_conversation() # Process and respond to the conversation data if previous_data: print("Processing previous conversation data:", previous_data) # Add your logic here to process the previous conversation data if next_data: print("Processing next conversation data:", next_data) # Add your logic here to process the next conversation data # Call the chat system function to start the conversation chat_system()
Excellent way of teaching. Keep doing this kind of good work.
Thanks Shaw! Your tips made a big change in my job hunting process, Many thanks!
Hey Shaw, I’m loving your content so far. I just left my job doing data science/engineering for hardware development at a prominent AI / robotics lab, but I would have benefited from some of your latest uploads had I stayed. 1 video / week is typically antithetical to quality, but your angle is unique and adds value so I hope it doesn’t get too hard to keep quality high. Best of luck from Palo Alto, I’ll be following along! Especially for the AI content
Great video, I would like a video that help us to find a job on linkedin for example (or some tips to find a job or where we can apply please Shaw!)
This is such a fantastic video on building LLMs from scratch. I'll watch it repeatedly to implement it for a time-series use case. Thank you so much!!
'ging ist' is German, for 'went is' and I have no idea why that would be a completion
Thank you so much, I have one doubt please, even if we set fp16 = True, still the optimization would happen in fp32 right, like you showed at 20:22
Great content, Thank you.
Just so I’m clear Regression doesn’t work well in any situation or just power laws?
Great video! Getting this error in training: /usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py in step(self, optimizer, *args, **kwargs) 447 448 assert ( --> 449 len(optimizer_state["found_inf_per_device"]) > 0 450 ), "No inf checks were recorded for this optimizer." 451 AssertionError: No inf checks were recorded for this optimizer. Any idea what cause it?
Hey Shaw - Great video. But there are also an AI Engineer role & AI Product Manager roles. Where does it fit in?
Great question. AI Engineer seems to be a new one. I've mainly seen this from data science or ML engineering freelancers rebranding themselves for a non-technical audience. So they will likely have a similar skillset to a full-stack data scientist (as I describe it here). The AI Product Manager is one step beyond a full-stack data scientist in that they implement products rather than projects. This requires additional considerations such as sales, marketing, finding product market fit, and other business skills.
Brilliant, thanks
Great video, really interesting. A question on the encoding process. Does condensing transcripts into an embedding with 384 dimensions lose much information, or does the encoding process truncate the text at a point? How would something like this manage a lengthy transcript where you cover several different topics? Does the embedding get too "noisy" in that case to be able to really stand above your threshold if only perhaps 5 lines out of 100 contain the information relating to the search?
That's a great question. Whether (much) information is lost depends on the specific use case. For example, if you have simple text chunks that either say "True" or "False" then even a 1 dimensional embedding will preserve all the information. However, as your describing, the longer the chunks the more information can be lost. This is why experimentation is so critical because you can't really know 1) how much "information" is preserved by embeddings and 2) how that impacts your use case, before just trying it out.
Awesome, thank you
More on Full Stack Data Science 👇 👉 Series Playlist: ua-cam.com/play/PLz-ep5RbHosWmAt-AMK0MBgh3GeSvbCmL.html 💻 Example Code: github.com/ShawhinT/UA-cam-Blog/tree/main/full-stack-data-science/data-science
5:01 Why is "X ← Z1 → Z3 ← Z2→ Y" also a back door path, as Z3 doesn't point to Z2?
Great question. This comes down to the definition of a back-door path. Which is any path starting with an arrow point to X and ending with an arrow pointing to Y. All other arrowheads are irrelevant.
@@ShawhinTalebi Thanks, Shawhin Note for myself: As long as it's a path with arrow into X and Y, the direction of any arrows existing between them doesn't matter.
Why no pause between sentences? Annoying to listen to. But great content, thanks!
That's good feedback. I like to minimize the amount of gaps, but I may have gone overboard here 😅
Your style of conveying information is wonderful. Good luck to you
My friends and I are working on a graduation project, which is the process of creating summaries of Arabic research papers via Fine Tuning AraBERT model. But we still don't understand how, can you guide us?
Happy to help however I can. Feel free to set up office hours here: calendly.com/shawhintalebi/office-hours
I'm clearly missing some fundamental connective tissue and I wish I could figure out what it was... I don't know what y and Fs are from audioread(). I don't know how to construct them from a spreadsheet of timestamps & values like "2024-05-01, 72 ; 2024-05-02 80 ...". I really, really, really wish I could find an example somewhere that took a two-column spreadsheet of time-series values that had nothing to do with sound (like the weather report from the previous video) and walked through each step of this process, ending with a frequency analysis of that starting non-WAVE-file example data.
The example in the video linked below doesn't use MATLAB, but does show how to import csv data and apply FFT using Python. ua-cam.com/video/-5c1KO-JF_s/v-deo.htmlsi=Q5mHSlp40OdXwmZO
Why image is not visible on my signature and always I get signature to long message ?
Image might be too big. Maybe try adjusting the size.
Can't causality be bidirectional? For e.g. X is influencing Y and Y is also influencing X.
For sure. The framework I discuss here, however, focuses on causal relationships that can be represented by a DAG i.e. unidirectional effects.
@@ShawhinTalebi thank you so much sir
this was amazing ! thank you. Straight to the point. After looking many places, this is the best!
Great to hear :)
Now why are you showing how to use ChatGpt? Why not using Hugging face models? It's too hard for you?
I use HF models in the next videos of this series: ua-cam.com/play/PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0.html
☹☹🥱🥱 Do you have any tutorial about how to fine tune a conversational model? Sentiment analysis is just text classification which is so simple and boring 🥱🥱🥱🥱
In the video linked below I show how to fine-tune a model to respond to UA-cam comments. ua-cam.com/video/XpoKB3usmKc/v-deo.htmlsi=e0AC7J4NeMiZRlCX
GREAT video - fyi.... it can be very tricky and time consuming. Espceially the little icons with links and overall spacing.
Such a underrated channel. Keep up the good work mate 👌🙌
Your slides are very good could you also share them please
Thanks! Slides are available here: github.com/ShawhinT/UA-cam-Blog/tree/main/LLMs/_slides
Thank you for this video. I am biostatistician interested in datascience. I have a master degree in machine learning but not working in datascience project yet except on my personnal project during my free time. I have experieces in data ingeneering (mainly structured data: relationnal database). I have some knowledge in in machine learning. I participated once to Kaggle competition. I have alread made a Rshiny app and dockerized it. I am interested in deploying a ML model for just a fun. What framework would you recommend me first. I had began learning Django, but I am wondering I should rather go for FastAPI or streamlit or Flask.
What you need to learn depends on what your goals are. If you want to get a data science job you may not need to learn any of these skills. If you want to do more ML engineering learning FastAPI or Flask might help. If you want to freelance and deliver simple UI's for your models maybe you could learn streamlit or gradio. I don't know many (if any) data scientists using Django. Hope that helps!
Great videos.. do you have something similar for AWS bedrock?
I do not but that's a great topic for a future video :)
Your acting is ultimate at 1.15 min :)
Thank you 😂😂
Looks like different algorithm give different graphs. How to validate them? How to decide which is the best?
The simplest way is via expert domain knowledge. A more objective (and time consuming) way would be to generate predictions from the graphs and test them against experiements.
@@ShawhinTalebi Thanks
9:40 Average Causal Effect is 0.2 = Having degree increase chance of earning over 50k Confounder: Age Treatment: HavingDegreeOrNot Response: EarnOver50kOrNot Confounder: OverallScore Treatment: HaveSolutionOrNot Response: LocalScore
wow, you are the genius of explaining super hard math concept into layman understandable terms with good visual representation. Keep it coming.
How would you automate the entire process?
He can't explain that. I'd check out Paul Iutzin, Nana Janashia, and others for real explanations. No one scraps UA-cam stuff directly.
This is where an orchestration tool like AirFlow can help. I'll get to this in an upcoming video on ML engineering.
Not sure what you mean. But Paul and Nana are great!
Loving the series so far! I am 2 videos in and I feel like you are balancing very well between making these tutorials beginner-friendly and not giving just superficial knowledge.
cool...
Awesome Shaw, thanks heaps. The signature apps can be quite limiting and if going pro, bloody expensive. Cheers mate. Shaunie
Thank you so much for that video. ¿From where did you get TRAINING loss metrics? In the console and in the trainer_state.json, I only see evaluation metrics.
Great question. The training loss is predefined as a property of the base model, so no need to redefine that explicitly.
Build your team. Fun creative. Would you be interested in going the main team. Judging? 1. CodeCraft Duel: Super Agent Showdown 2. Pixel Pioneers: Super Agent AI Clash 3. Digital Duel: LLM Super Agents Battle 4. Byte Battle Royale: Dueling LLM Agents 5. AI Code Clash: Super Agent Showdown 6. CodeCraft Combat: Super Agent Edition 7. Digital Duel: Super Agent AI Battle 8. Pixel Pioneers: LLM Super Agent Showdown 9. Byte Battle Royale: Super Agent AI Combat 10. AI Code Clash: Dueling Super Agents Edition
Nice Video, any ideas for doing this on PowerPoints? Want to build a kind of knowledge base from previous projects but the grafics are a problem. Even GPT4V is not always interpreting them correctly. 😢
If GPT4V is having issues you may need to either 1) wait for better models to come out or 2) parse the knowledge from the PPT slides in a more clever way. Feel free to book office hours if you want to dig into it a bit more: calendly.com/shawhintalebi/office-hours
Thankyou so much. Becoming a fan of yours! Please do a video on Rag with llamaIndex + llama3 if it's free and not paid.
Great suggestion. That's a good excuse to try out Llama3 :)
Great video and explanation! Thanks a lot. For the code, have you tried to use: from transformers import BitsAndBytesConfig nf4_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16 ) and then add that as quantization configs when loading the model? This would include the other aspects from the QLoRA paper, no?
Thanks for sharing! I'll need to try that out. I remember running into issues when trying this on my first pass.
You did such a great job explaining the data pipeline and gave a great example. Subscribed. Can't wait to see more vids
Great to hear. Thanks for the sub!
Im really gratful for youre work , you really help me when I had no one to ask .
Best Video. The mix btw theory and hands on practice is genius!
Thanks, glad it was helpful :)