CodeEmporium
CodeEmporium
  • 332 Videos
  • 6 880 069 조회수
Hyper parameters - EXPLAINED!
Let's talk about hyper parameters and how they are used in neural networks and deep learning!
ABOUT ME
⭕ Subscribe: krplus.net/uCodeEmporium
📚 Medium Blog: medium.com/@dataemporium
💻 Github: github.com/ajhalthor
👔 LinkedIn: www.linkedin.com/in/ajay-halthor-477974bb/
RESOURCES
[1] Code for Deep Learning 101 playlist: github.com/ajhalthor/deep-learning-101
PLAYLISTS FROM MY CHANNEL
⭕ Deep Learning 101: krplus.net/p/PLTl9hO2Oobd_NwyY_PeSYrYfsvHZnHGPU.html
⭕ Natural Language Processing 101: krplus.net/p/PLTl9hO2Oobd_bzXUpzKMKA3liq2kj6LfE.html
⭕ Reinforcement Learning 101: krplus.net/p/PLTl9hO2Oobd9kS--NgVz0EPNyEmygV1Ha.html&si=AuThDZJwG19cgTA8
Natural Language Processing 101: krplus.net/p/PLTl9hO2Oobd_bz...
조회수: 802

비디오

Transfer Learning - EXPLAINED!
조회수 2.5K2 개월 전
Embeddings - EXPLAINED!
조회수 3.8K2 개월 전
Building your first Neural Network
조회수 3.2K3 개월 전
Deep Q-Networks Explained!
조회수 12K4 개월 전
Q-learning - Explained!
조회수 8K5 개월 전
Bellman Equation - Explained!
조회수 10K6 개월 전
ChatGPT: Zero to Hero
조회수 3.7K7 개월 전
Llama - EXPLAINED!
조회수 22K8 개월 전
Convolution in NLP
조회수 4.1K9 개월 전
Word Embeddings - EXPLAINED!
조회수 11K10 개월 전

댓글

  • @hajrawaheed9636
    @hajrawaheed9636 21 시간 전

    Great work indeed. Helped clear a lot of things especially the part where softmax is used for the decoder output. So the first row will output the target lang first word. But in scenarios where two source words resonate with one target lang word, how is softmax handled their? Can you please help me in figuring this out.

  • @Ha-mb4yy
    @Ha-mb4yy 일 전

    Really good explanation

  • @punk3900
    @punk3900 일 전

    did you know at that time how revolutionary this would be?

  • @AakashKumarDhal

    Answer for Quiz2: Option 'B' frank was updating Q values based on observed rewards from simulated episodes.

  • @axelolafsson7312

    this video is great

  • @blackseastorm61
    @blackseastorm61 3 일 전

    Understood nothing about how this model works. Oversimplifications and storytelling makes it unpaired with the how the real thing work. Now I know : AE is reducing the input data into a smaller vector, VAE can generate blurry image. What I don't know : What is happening to input data and the dataset, what this pool intuition is for?

  • @melodyzhan1942
    @melodyzhan1942 3 일 전

    Thank you so much for making such great videos. It really helps for someone new to DS to quickly understand all the concepts. Appreciate explaining with actual codes and going through each step!

  • @Userforeverneverever

    For the algo

  • @Tyokok
    @Tyokok 3 일 전

    Dear Sir, if I may have 2 questions here: 1) 7:25, how did you remove y_i as it's independent? yi can be opposite signs, how can it be removed like 1? 2) at 7:58 in matrix representation why you convert p(x_i) in different way? or it really doesn't matter, cuz you will substitute beta_i in sigmoid function at each iteration? Many Thanks!

  • @SkittlesWrap
    @SkittlesWrap 4 일 전

    Straight to the point. Nice and super clean explanation for non-linear activation functions. Thanks!

  • @yashwanths6529
    @yashwanths6529 4 일 전

    Thanks really very helpful resource for me! Keep rocking Ajay.

  • @PRUTHVIRAJRGEEB
    @PRUTHVIRAJRGEEB 4 일 전

    This is exactly what I was looking for. End to end explanation clearly showing the steps involved. Thanks a ton man!❤

  • @Abdullahkbc
    @Abdullahkbc 5 일 전

    hey i dont get what you mean in 6:29. why do you convert every single character rather than word? i think embeddings are for token/words rather than characters. could u pls make this clear?

  • @akshiti3402
    @akshiti3402 5 일 전

    are you sure you're not gay?

  • @loplop88
    @loplop88 6 일 전

    so underrated!

  • @matinfazel8240
    @matinfazel8240 6 일 전

    Thanks for the tutorial

  • @nemeziz_prime
    @nemeziz_prime 6 일 전

    Quiz 2: A, B, C

  • @danish5326
    @danish5326 6 일 전

    Am I the only one to notice that vsauce refrence 02:28

  • @pohacurry
    @pohacurry 7 일 전

    this aged well

  • @spiky8932
    @spiky8932 7 일 전

    I am just astonished mann I am a 1st year student from Bengaluru who stumbled upon on your channel while learning about the AI buzzterms and then i find out that ur a kannadiga as well, great man Although i am overwhelmed with the videos you make, i am just so happy that a guy from here can reach to this great extent. You are truly an inspiration brother !

  • @coderide
    @coderide 7 일 전

    Poor man’s 3blue1brown But nice explanation ❤

  • @StudyGoalTensionEnjoylifelove

    thank you sir... like from china

  • @Abdullahkbc
    @Abdullahkbc 7 일 전

    i dont get how the token is seleceted in top-k sample ? does it get randomly from the top-k?

  • @ouroboros7388
    @ouroboros7388 7 일 전

    Thanos snap hehe

  • @isaidhs
    @isaidhs 7 일 전

    gold

  • @justchary
    @justchary 9 일 전

    this is very good. thank you!

  • @ikhsansdq
    @ikhsansdq 9 일 전

    Great and simple explanation it's very helpful for me, but do you think this "multi head attention" could use for time series forecasting? and if it does what type attention will it be?

  • @ananyamishra382
    @ananyamishra382 9 일 전

    Could you also make a video about varioud ways to test the output of transformer models.

  • @FabioDBB
    @FabioDBB 10 일 전

    Great explanation sir! Thx a lot!

  • @sanjaisrao484
    @sanjaisrao484 10 일 전

    Thank you for amazing explaination

  • @sneha_more
    @sneha_more 11 일 전

    Great video!

  • @shivamsharma9206
    @shivamsharma9206 11 일 전

    B

  • @adrielomalley
    @adrielomalley 11 일 전

    My meta Raybans

  • @altrastorique7877
    @altrastorique7877 12 일 전

    I have struggled to find a good explanation of transfomers and your videos are just amazing. Please keep releasing new content about AI.

  • @harshitdtu7479
    @harshitdtu7479 12 일 전

    7.52

  • @user-kd7xd2gb5s
    @user-kd7xd2gb5s 12 일 전

    i love your shit man, this was so usefull i actually understood this ml shit and now can be elon musk up in this llm shit

  • @ghostrider9084
    @ghostrider9084 12 일 전

    sir neevu kannadigara wow?? do u work in usa sir ? or pursuing any degree?

  • @abhinavraj1580
    @abhinavraj1580 12 일 전

    Very useful video

  • @the-tankeur1982
    @the-tankeur1982 13 일 전

    I hate you for making that noises, i want to learn, comedia is something i would pass on

  • @zhezhe3351
    @zhezhe3351 13 일 전

    Good video!there is a small typo at the summary page about on-policy

  • @arjunbali2079
    @arjunbali2079 13 일 전

    @codebasics and @CodeEmporium are best channels to learn high level concepts they must collaborate

  • @krishnavinukonda1882

    This is best . Thanks!

  • @karannchew2534
    @karannchew2534 14 일 전

    Xi ∈ ℝ^D "Xi" represents a specific customer, where "i" is an index referring to a particular customer. "∈" denotes membership, meaning "Xi" belongs to or is an element of. "ℝ^D" represents the set of real numbers raised to the power of "D," where "D" is the dimensionality of the feature space. This indicates that each customer is represented as a vector of real numbers with "D" dimensions. Each dimension might correspond to a specific feature or attribute of the customer, such as age, income, spending habits, etc. So, the equation "Xi ∈ ℝ^D" means that each customer "Xi" is represented as a vector of real numbers with "D" dimensions.

  • @ah_bb8267
    @ah_bb8267 14 일 전

    No way lstm existed in 1991

  • @shanli7257
    @shanli7257 14 일 전

    Fantastic video, Ajay! Just have two questions: 1. how many attention heads are optimal? For example, if there're 10 words in a sentence, is 10 a good number for attention heads? 2. does more multiple attention layers correspond to better performance?

  • @kiranbade9481
    @kiranbade9481 14 일 전

    well explained brother

  • @cormackjackson9442

    Such an awesome video! Can't believe i hadn't made the connection between ridge and Lagrangians, literally has a lambda in it lol!

    • @cormackjackson9442
      @cormackjackson9442 15 일 전

      With the lasso intuition, the stepwise function you get for theta, how do you get the conditions on the right i.e. yi < lambda/2.I thought perhaps instead of writing theta < 0, you are just using the implied relationship between yi and lambda. E.g. that if theta < 0, and therefore |theta|.= - theta, which then after optimising gives theta = y - lambda/2 i.e. y = lambda/2 + theta, but then i get the opposite conditions as you...i.e. as theta is negative in this case wouldn't that give y = lambda/2 + theta < lambda/2?

  • @karannchew2534
    @karannchew2534 15 일 전

    "Control other effect through randomisation"

  • @songjiangliu
    @songjiangliu 17 일 전

    very good quality video!

  • @newginsam670
    @newginsam670 17 일 전

    Bro TBH no words to appreciate such a well structured video in a short time and the explanation was easly understandable even for people with less knowledge. Thanks for the video man.