Transformers, explained: Understand the model behind GPT, BERT, and T5

공유
소스 코드
  • 게시일 2024. 04. 27.
  • Dale’s Blog → goo.gle/3xOeWoK
    Classify text with BERT → goo.gle/3AUB431
    Over the past five years, Transformers, a neural network architecture, have completely transformed state-of-the-art natural language processing. Want to translate text with machine learning? Curious how an ML model could write a poem or an op ed? Transformers can do it all. In this episode of Making with ML, Dale Markowitz explains what transformers are, how they work, and why they’re so impactful. Watch to learn how you can start using transformers in your app!
    Chapters:
    0:00 - Intro
    0:51 - What are transformers?
    3:18 - How do transformers work?
    7:41 - How are transformers used?
    8:35 - Getting started with transformers
    Watch more episodes of Making with Machine Learning → goo.gle/2YysJRY
    Subscribe to Google Cloud Tech → goo.gle/GoogleCloudTech
    #MakingwithMachineLearning #MakingwithML
    product: Cloud - General; fullname: Dale Markowitz; re_ty: Publish;
  • 과학기술

댓글 • 350

  • @Omikoshi78
    @Omikoshi78 년 전 +56

    Ability to break down complex topic is such an underrated super power. Amazing job.

  • @robchr
    @robchr 2 년 전 +215

    Transformers! More than meets the eye.

  • @rohanchess8332
    @rohanchess8332 10 개월 전 +43

    How did you condense so many pieces of information in such a short time? This video is on a next level, I loved it!

  • @tongluo9860
    @tongluo9860 년 전 +219

    Great explanation of the key concept of position encoding and self attention. Amazing you get the gist covered in less than 10 minutes.

    • @patpearce8221
      @patpearce8221 년 전 +1

      @Dino Sauro tell me more...

    • @patpearce8221
      @patpearce8221 년 전

      @Dino Sauro thanks for the heads up

    • @an-dr6eu
      @an-dr6eu 년 전 +3

      She has one of the wealthiest company on earth providing her resources. First hand access to engineers, researchers, top notch communicators and marketing employees.

    • @michaellavelle7354
      @michaellavelle7354 11 개월 전 +2

      @@an-dr6eu True, but this young lady talks a mile-a-minute from memory. She's knows it cold regardless of the resources at Google.

  • @ansumansamal3767
    @ansumansamal3767 2 년 전 +205

    Where is optimus prime?

  • @dpj670
    @dpj670 년 전 +9

    This is awesome. This has been one of the best overall breakdowns I've found. Thank you!!

  • @dylan_curious
    @dylan_curious 년 전 +15

    This is such an informative video about transformers in machine learning! It's amazing how a type of neural network architecture can do so much, from translating text to generating computer code. I appreciate the clear explanations of the challenges with using recurrent neural networks for language analysis, and how transformers have overcome these limitations through innovations like positional encodings and self-attention. It's also fascinating to hear about BERT, a popular transformer-based model that has become a versatile tool for natural language processing in many different applications. The tips on where to find pertrained transformer models and the popular transformers Python library are super helpful for anyone looking to start using transformers in their own app. Thanks for sharing this video!

  • @rajqsl5525
    @rajqsl5525 4 개월 전 +1

    You have the gift of making things simple to understand. Keep up the good work 🙏

  • @maayansharon280
    @maayansharon280 년 전 +21

    This is a GREAT explanation! please lower the background music next time it could really help. thanks again! awesome video

  • @luis96xd
    @luis96xd 년 전 +5

    Amazing video! Nice explanation and examples 😄👍
    I would like to see more videos like this and practices ones

  • @PaperTools
    @PaperTools 년 전 +26

    Dale you are so good at explaining this tech, thank you!

  • @erikengheim1106
    @erikengheim1106 개월 전 +1

    Thanks you did a great job. I spent some time already looking at different videos to capture the high level idea of what transformers are about and yours is the clearest explanation. I actually do have an educational background in neutral networks but don't go around remembering every details or the state of the art today so somebody removing all the unessesary technical details like you did here is very useful.

  • @trushatalati5596
    @trushatalati5596 년 전 +7

    This is a really awesome video! Thank you so much for simplyifying the concepts.

  • @noureldinosamas2978
    @noureldinosamas2978 년 전 +166

    Amazing video! 🎉 You explained that difficult concepts of Transformers so clearly and made it easy to understand. Thanks for all your hard work!🙌👍

    • @pumbo_nv
      @pumbo_nv 9 개월 전 +4

      Are you serious? The concepts were not really explained. Just a summary of what they do but not how they work behind the scenes.

    • @axscs1178
      @axscs1178 3 개월 전

      No.

  • @Jewish5783
    @Jewish5783 년 전 +1

    i really enjoyed the concepts you explained. simple to understand

  • @mfatal
    @mfatal 년 전 +4

    Love the content and thanks for the great video! (one thing that might help is lower the background music a bit, I found myself stopping the video because I thought another app was playing music)

  • @bondsmagi
    @bondsmagi 2 년 전 +68

    Love how you simplified it. Thank you

    • @luxraider5384
      @luxraider5384 년 전

      It s so simplified that you can t understand anything

  • @reddyvarinaresh7924

    I loved it and very simple ,clear explanation.

  • @shravanacharya4376

    So easy and clear to understand. Thanks

  • @TallesAiran
    @TallesAiran 년 전 +6

    I love how to simplify something so complex, thank you so much Dale, the explanation was perfect

  • @MaxKar97
    @MaxKar97 17 일 전

    Nice amount of info parted in this video. Very clear info on what Transformers are and what made them so great.

  • @SeanTechStories
    @SeanTechStories 년 전 +1

    That's a really good high-level explanation!

  • @user-wr4yl7tx3w
    @user-wr4yl7tx3w 년 전 +4

    Wow, this is so well explained.

  • @todayu
    @todayu 년 전 +1

    This was a really, really awesome breakdown 👏🏾

  • @JohnCorrUK
    @JohnCorrUK 년 전 +1

    Excellent presentation and explanation of concepts

  • @rembautimes8808
    @rembautimes8808 2 개월 전

    This is a very well produced video. Credits to the presenter and those involved in production with the graphics

  • @touchwithbabu
    @touchwithbabu 년 전

    Fantastic!. Thanks for simplifying the concept

  • @JayantKochhar
    @JayantKochhar 년 전

    Positional Encoding, Attention and Self Attention. That's it! Really well summarized.

  • @junepark1003
    @junepark1003 4 개월 전

    This is one of the best vids I've watched on this topic!

  • @CarlosRodriguez-mv8qi

    Charm, intelligence and clarity! Thanks!

  • @Daniel-iy1ed
    @Daniel-iy1ed 년 전

    Thank you so much. I really needed this video, other videos were just confusing

  • @labsanta
    @labsanta 년 전 +48

    Takeaways:
    A transformer is a type of neural network architecture that is used in natural language processing. Unlike recurrent neural networks (RNNs), which analyze language by processing words one at a time in sequential order, transformers use a combination of positional encodings, attention, and self-attention to efficiently process and analyze large sequences of text.
    Neural networks, Convolutional neural networks (for image analysis), Recurrent neural networks (RNNs), Positional encodings, Attention, Self-attention
    Neural networks: A type of model used for analyzing complicated data, such as images, videos, audio, and text.
    Convolutional neural networks: A type of neural network designed for image analysis.
    Recurrent neural networks (RNNs): A type of neural network used for text analysis that processes words one at a time in sequential order.
    Positional encodings: A method of storing information about word order in the data itself, rather than in the structure of the network.
    Attention: A mechanism used in neural networks to selectively focus on parts of the input.
    Self-attention: A type of attention mechanism that allows the network to focus on different parts of the input simultaneously.
    Neural networks are like a computerized version of a human brain, that uses algorithms to analyze complex data.
    Convolutional neural networks are used for tasks like identifying objects in photos, similar to how a human brain processes vision.
    Recurrent neural networks are used for text analysis, and are like a machine trying to understand the meaning of a sentence in the same order as a human would.
    Positional encodings are like adding a number to each word in a sentence to remember its order, like indexing a book.
    Attention is like a spotlight that focuses on specific parts of the input, like a person paying attention to certain details in a conversation.
    Self-attention is like being able to pay attention to multiple parts of the input at the same time, like listening to multiple conversations at once.

    • @an-dr6eu
      @an-dr6eu 년 전

      Great, you learned how to copy paste

    • @yumyum_99
      @yumyum_99 년 전 +10

      @@an-dr6eu first step on becoming a programmer

    • @JohnCorrUK
      @JohnCorrUK 년 전 +3

      ​@@an-dr6eu your comment comes over somewhat 'catty' 😢

  • @sun-ship
    @sun-ship 개월 전

    Easiest to understand explaination ive heard so far

  • @walterppk1989
    @walterppk1989 2 년 전 +21

    Hi Google! First of all, thank you for this wonderful video. I'm working on a multiclass (single label) supervised learning that uses Bert for transfer learning. I've got about 10 classes and a couple hundred thousand examples. Any tips on best practices (which Bert variants to use, what order of magnitude of dropout to use if any)? I know I could do hyperparameter search but that'd probably cost more time and money than I'm comfortable with (for a prototype), so I'm looking to make the most out of my local Nvidia 3080.

  • @barbara1943
    @barbara1943 4 개월 전

    Very interesting, informative, this added perspective to a hyped-up landscape. I'll admit, I'm new to this, but when I hear "pretrained transformer" I didn't even think about BERT. I appreciate getting the view from 10,000 feet.

  • @bingochipspass08
    @bingochipspass08 2 년 전

    Very well explained.. This really is a high level view of what Transformers are, but it's probably enough to just get your toes wet in the field!

  • @RobShuttleworth
    @RobShuttleworth 2 년 전 +9

    The visuals are very helpful. Thanks.

  • @akashrawat217
    @akashrawat217 년 전

    Such a simple yet revolutionary 💡idea

  • @hallucinogen22
    @hallucinogen22 3 개월 전

    thank you! I'm just starting to learn about gpt and this was quite helpful, though I will have to watch it again :)

  • @sorbethyena3828
    @sorbethyena3828 2 년 전 +2

    Informative! Thank you

  • @DeanRGAnderson
    @DeanRGAnderson 년 전 +1

    This is an excellent video introduction for transformers.

  • @EranM
    @EranM 년 전 +4

    I knew little on transformers before this video. I know little on transformers after this video. But I guess in order to know some, we'll need a 2-3 hours video.

  • @josedamiansanchez9874

    Amazing explanation!

  • @danielchen2616
    @danielchen2616 년 전

    Thanks for your hard work.This video is very helpful!!!

  • @harshadfx
    @harshadfx 8 개월 전 +1

    I have more respect for Google after watching this Video. Not only did they provided their engineers with the funding to research, but they also let other companies like OpenAI to use said research. And they are opening up the knowledge for the general public with these video series.

  • @shailendraburman
    @shailendraburman 2 년 전 +1

    Simply loved it!

  • @NicolasHart
    @NicolasHart 3 개월 전

    so super helpful for my thesis, thank u

  • @bobdillan5761
    @bobdillan5761 년 전 +1

    super well done. Thanks for this!

  • @jokeysmurf123
    @jokeysmurf123 개월 전

    wow, what a great summary! thanks!!!

  • @rodeoswing
    @rodeoswing 5 개월 전 +1

    Great video for people who are curious but don’t really want to (or can’t) understand how transformers actually work.

  • @Mariouigi
    @Mariouigi 년 전

    crazy how things have changed so much

  • @mohankiranp
    @mohankiranp 6 개월 전

    Very well explained. This video is must watch for anyone who wants to demystify the latest LLM technology. Wondering if this could be made into a more generic video with a quick high-level intro on neural networks for those who aren't in the field. I bet there are millions out there who want to get a basic understanding of how ChatGPT/Bard/Claude work without an in-depth technical deep dive.

  • @ZeeshanAli-ck3ue

    very well explained.👍

  • @xiongjiedai8405

    Very good lecture, thanks!

  • @ayo4757
    @ayo4757 년 전 +1

    Soo cool! Great work

  • @anshulchaurasia8762

    Simplest Explanation ever

  • @JG27Korny
    @JG27Korny 4 개월 전

    Very informative video. Thank you!

  • @janeerin6918
    @janeerin6918 6 개월 전 +1

    OMG the BEST transformers video EVER!

  • @zacharythomas5046

    Thanks! This is a great intro video!

  • @ganbade200
    @ganbade200 2 년 전 +6

    You have no idea how much time I potentially have saved just by reading your blog and watching this video to get me up to speed quickly on this. "Liked" this video. Thanks

  • @VaibhavPatil-rx7pc

    Excellent explanation i ever seen, recommending everyone's this link

  • @myt97
    @myt97 년 전

    Great video. Thank you!

  • @AleksandarKamburov

    Positional encoding = time, attention = context, self attention = thumbprint (knowledge)... looks like a good start for AGI 😀

  • @shivangsharma599

    Super Explanation!!

  • @takeizy
    @takeizy 년 전

    Very impressive video. Thanks for the way you shared information via this video.
    Reference your video timeline 05:05, how you created such a video, please.

  • @gammacubed
    @gammacubed 4 개월 전

    Amazing video, thank you so much!

  • @probablygrady
    @probablygrady 10 개월 전

    phenomenal video

  • @amimegh
    @amimegh 년 전

    NICE SUPERB PRESENTATION

  • @k-c
    @k-c 년 전 +1

    This is probably the first time after the 90's I have the same "internet wild west" kinda feeling. The genie is out of the bottle baby.

  • @robertabitbol6454
    @robertabitbol6454 년 전 +1

    You have actually given the BEST explanation on Neural Machine Translation that I read so far but you are missing a few elements

    • @robertabitbol6454
      @robertabitbol6454 년 전 +1

      But your explanations, your analyses and your delivery are excellent. You're definitely a great communicator and teacher.

    • @robertabitbol6454
      @robertabitbol6454 년 전

      Actually Google and others have an algo they're not interested in sharing and I pretty much know what it is. I am working with my programmer on the coding of my new app, the revolutionary Universal Sentence builder and the Universal Dictionary and I keep adding and changing stuff to simplify the concept and I push at a later date the programming of my Sentence Analyser app. It is like most of my apps a simple (and brilliant concept) coded with very few lines of code.

    • @robertabitbol6454
      @robertabitbol6454 년 전

      You know Alfred Hitchcock was always adapting into the screen his scenario never changing anything not even a comma while Francis Ford Copolla (The Godfather) was doing the opposite: They say that his script was like a newspaper that had new contents every day. Well I am more like Copolla with my apps. I change stuff all the time and I usually make my programmers go crazy. It's a good sign. :-) Mind you I don't know if one can do like Hitchcock with an app. Come up with a definite version once and for all. This would be quite an achievement!

    • @robertabitbol6454
      @robertabitbol6454 년 전

      In the case of my Universal Sentence builder, the main task was to process the data entered by the user and we've been at it since July 2022. :-) It's either I am dumb or it is a complex task. Actually it is the latter for I have started with French, this langage being the most complex in the world. The good news is I am sure I will be imitated but you can rest assured that my imitators will also have a jolly hard time with French :-)

  • @RonaldMorrissetteJr
    @RonaldMorrissetteJr 11 개월 전 +1

    When I saw this title, I was hoping to better understand the mathematical workings of transformers such as matrices and the like. Maybe you could do a follow-up video explaining mathematically how transformers work.
    thank you for your time

  • @maxkhan4485
    @maxkhan4485 년 전

    Thanks! Great video.

  • @user-or7ji5hv8y
    @user-or7ji5hv8y 2 년 전 +2

    Great video.

  • @intekhabsayed4316
    @intekhabsayed4316 개월 전

    Good(Pro) Explanation.

  • @theguythatcoment
    @theguythatcoment 년 전 +2

    do transformers learn the internal representation one language at a time or all of them at the same time? I remember that Chomsky said that there's no underlying structure to language and that for every rule you try to make you'll always find an edge case that contradicts the rule.

  • @gerardovalencia805
    @gerardovalencia805 2 년 전 +2

    Thank you

  • @hom01
    @hom01 년 전

    this is brilliant

  • @arpitrawat1203
    @arpitrawat1203 2 년 전 +6

    Very well explained. Thank you.

  • @badrinair
    @badrinair 년 전

    Thank you for sharing

  • @TechNewsReviews
    @TechNewsReviews 7 개월 전

    woww, she's good at explaining things

  • @ludologian
    @ludologian 11 개월 전

    When I was a kid, I knew the trouble of translation were due to literally translation words, without contextual/ sequential awareness. I knew it's important to distinguish between synonyms. I've imagined there's a button that generate the translation output then you can highlights the you words that doesn't make sense or want improvement on it . then regenerate text translation. this type of nlp probably exist before I program my first hello world (+15y ago)!

  • @massimobuonaiuto8753

    great video, thanks!

  • @GubeTube19
    @GubeTube19 년 전

    10/10. Very helpful

  • @Christakxst
    @Christakxst 년 전

    Thanks, that was very interesting

  • @EduardoOviedoBlanco

    Great content 👍

  • @wiclcoocoo
    @wiclcoocoo 25 일 전

    a very nice video. thanks

  • @KulbirAhluwalia
    @KulbirAhluwalia 년 전 +3

    From 5:28, shouldn't it be the following:
    "when the model outputs the word “économique,” it’s attending heavily to both the input words “European” and “Economic.” "?
    For européenne, I see that it is attending only to European. Please let me know if I am missing something here. Thanks for the great video.

  • @MichaelToop
    @MichaelToop 년 전

    Great video. Thx.

  • @cassianocominetti7784
    @cassianocominetti7784 3 개월 전

    Amazing!

  • @WalterReade
    @WalterReade 2 년 전 +4

    Nicely done. Very helpful. Thanks!

  • @hughesadam87
    @hughesadam87 년 전

    Thank you!

  • @aGj2fiebP3ekso7wQpnd1Lhd

    Fantastic video

  • @IceMetalPunk
    @IceMetalPunk 년 전 +16

    The invention of transformers seems to have jump-started a revolutionary acceleration in machine learning! Between the models you mentioned here, plus the way transformers are combined with other network architectures in DALL-E 2, OpenAI Jukebox, PaLM, Chinchilla/Flamingo, Gato -- it seems like adding a transformer to any model produces bleeding-edge, state-of-the-art-or-better performance on basically any tasks.
    Barring any major architecture innovations in the future, I wonder if transformers end up being the key we need to reach human levels of broad-range performance after all 🤔

    • @IceMetalPunk
      @IceMetalPunk 년 전 +2

      @Dino Sauro They're certainly not dead, since they're still being incorporated into the bleeding edge AIs. But technology is always evolving, building upon one idea to create the next. If you're hoping for a "final architecture" that will be the best and never replaced by anything else, you're out of luck.
      While I respect Professor Marcus, his ideas about the requirements for AGI strongly imply that intelligent design is required for true intelligence to emerge, and I think evolution contradicts that view.

    • @IceMetalPunk
      @IceMetalPunk 년 전 +1

      @Dino Sauro Um... Okay, friend, whatever you say. Have a nice life.

    • @tanweeralam1650
      @tanweeralam1650 년 전

      I think you are right...we just saw its use in ChatGPT...and I think ChatGPT is just a glimpse of what future holds and how it will affect the IT, EV and Industrial Automation Industry.
      Am I right? You wanna add something to it?

    • @IceMetalPunk
      @IceMetalPunk 년 전 +1

      @@tanweeralam1650 I agree. ChatGPT, though, is really just GPT-3 with a larger input layer, and human-guided reinforcement learning on top of it. Which is a step in the right direction for sure, but not as huge a development as a lot of people are touting it to be.
      From what I can tell, there are three issues that need to be solved before transformer-based (or transformer-incorporating) AIs can reach truly human levels of intelligent behavior.
      (1) They need to be bigger. If we think of the model parameter size as analogous to brain synapses, there are about a quadrillion synapses in a human brain, which is orders of magnitude more than the biggest current transformers. For instance, the largest single transformer model is 207 billion parameters, and the largest transformer-incorporating language model is 1.75 trillion parameters. On the other hand, such models don't need to allocate parameters for things like body maintenance, reproduction, etc., so it's not a 1-to-1 correspondence, but I think it's a good estimate for the order of magnitude we need to reach before we get to human levels of sapience. That said, models keep getting bigger, so I have no doubt we'll achieve this within the next decade at most.
      (2) Multimodality is important. A lot of "common sense" understanding that AIs seem to lack can likely be attributed to their lack of variety in types of input they can learn from. If you only learn from text, it's a lot harder to learn what the described concepts actually *mean.* On the other hand, a model that can learn from text, images, video, audio, and other forms of data should be able to learn much more accurate representations of the world. And of course, there's a TON of research into multimodal learning right now, so we'll get there pretty soon, too, I think.
      (3) The third obstacle I think is the hardest: continual learning. (From what I can tell, by the way, "continual learning" is synonymous with "incremental online learning". Let me know if there are any important differences between the two.) An AI without this can learn from a *ton* of data, but once it does, it stops learning and everything it knows is set in stone. In effect, this means every interaction with such an AI "resets" it, and so you might get inconsistent behaviors as slightly different initial conditions of an interaction can lead to very different outputs when previous similar interactions are not incorporated into the model's weights (which, in this context, can be thought of as its "long term memory"). This also means the AIs can't form consistent opinions, since any opinion they might espouse in one conversation is immediately forgotten for the next.
      Continual learning techniques already exist for smaller networks, but they are not at all efficient enough to practically apply to these very large language models of many billions of parameters or more. Which is a shame, because I'd speculate that larger models would be less prone to retroactive interference -- "catastrophic forgetting" -- than smaller ones, if we could efficiently incrementally train them.

    • @tanweeralam1650
      @tanweeralam1650 년 전

      @@IceMetalPunk I did understand your first 2 points and agree with it...but I want to slightly differ with your 3rd point.
      I dont understand...Why would the AI would stop learning?? Due to its storage space, Processing power exhaustion or for what reason? What you said may be a POSSIBILITY...But its others side also exists...it may just continue learning more n more and make it's system better.
      To have Human like Intelligence...I dont think it will achieve that in next 30-40 yrs...far from those timeline...I can't say. And frankly there is NO NEED to have AIs so Advanced. Upto a certain extent...AIs should develop and Humans MUST BE able to control them. Always.
      And can you say will Programs like ChatGPT ( i mean its advanced form) able to replace search Engine like Google in future?? Also how AI/ML will affect IT industry as a whole and also EV, Industrial Automation industry (e.g.- the industry where companies like Siemens, Honeywell operate)??

  • @younessnaim1849

    Beyond the great content and delivery, I loved your French accent ... ;)

  • @directorblue
    @directorblue 년 전

    Well done

  • @samsont81
    @samsont81 년 전

    You are amazing!

  • @softcoda
    @softcoda 년 전

    Wowww….thanks for clarifying my confusion.

  • @johnbarbuto5387
    @johnbarbuto5387 년 전 +3

    An excellent video. I wonder if you can comment on "living the life" of a transformers user. For example, in another video by another KRplusr I heard the sentiment that being an AI person in this era means constant - really constant - study. That may not be the lifestyle that everybody wants to adopt. I'm a retired neurologist and vice president of the faculty club at my state university. What interests me these days is how students "should" be educated in this era. And, at the end of the day, one of the critical aspects of that is matching individual human brains - with their individual proclivities - with the endless career opportunities of this era. So, I'm trying to gather perspectives (aka "data") on that topic. Maybe you could make some kind of video about it. Please do!

    • @LimabeanStudios
      @LimabeanStudios 년 전

      I think the most important thing is that students are simply encouraged to use these tools. It's pretty hard to get a realistic grasp of the capabilities without really pushing the systems. The idea about needing to do constant research is interesting, and I think it's something that a person CAN do (the rest of my life probably lmao) but I think simply adopting the tools is all that will effectively matter. It's too early to be much more specific sadly. When it comes to younger education then we definitely need to be putting more focus on skills and behaviors instead of knowledge.

  • @Maisonier
    @Maisonier 년 전

    Amazing video, thank you ... can you use transformers to detect patterns in random data that which is supposedly unpredictable, like weather or stocks?

    • @Happypast
      @Happypast 년 전

      the unpredictability of stuff like weather and stocks has to do with the fundamental underlying nature of those phenomena so I would bet no.

  • @amortalbeing
    @amortalbeing 5 개월 전

    Thanks a lot.