Run a ChatGPT-like AI on Your Laptop Using LLaMA and Alpaca

Gary Explains

추가
- 내 재생 목록
- 나중에 볼 동영상
공유

소스 코드

동영상 크기:

플레이어 컨트롤 표시

자동 재생

다시 하다

게시일 2023. 03. 23.
All the popular conversational models like Chat-GPT, Bing, and Bard all run in the cloud, in huge datacenters. However it is possible, thanks to new language models, to run a Chat-GPT or Bard alternative on your laptop. No supercomputer needed. No huge GPU needed. Just your laptop! Is it any good? Let's find out.
---
PHNX the super-slim smartphone cases: andauth.co/GetPHNX
This is an affiliate link.
llama.cpp: github.com/ggerganov/llama.cpp
llama model download: github.com/shawwn/llama-dl
alpaca.cpp: github.com/antimatter15/alpac...
alpaca model download: github.com/tloen/alpaca-lora
Stanford Alpaca: github.com/tatsu-lab/stanford...
Twitter: / garyexplains
Instagram: / garyexplains
#garyexplains

댓글 • 279

@powercore2000 년 전 ⁺¹⁶
This is probably one of the best tutorials about AI I've seen. You broke down so many terms, and helped confirm things I researched, or was curious about. Thank so much!
@GaryExplains 년 전
Glad it helped! I have a few other related videos you might enjoy.
@SomeRandomPiggo 년 전 ⁺⁴⁰
Found out about self hosted LLaMA about a week ago, I know a lot of people won't appreciate it but I think its quite incredible it can run on my 2017 i5 ThinkPad with decent performance, although the less memory you have the shorter the generated text :(
@RupertBruce 년 전 ⁺⁴
Thank-you for having the questions prepared! Copy/Paste is so much quicker and quieter 😎
@cbbcbb6803 년 전 ⁺³³
This is exactly what I was looking for.
I do not like being tethered to somebody else's computer.
I just want to play around with this stuff to learn about it.
@ardaaytekin 년 전 ⁺³
Exactly my thoughts, this is gonna be great to run in homelab or even portable.
@Ti-JAC 9 개월 전
ditto
@Monotoba 년 전 ⁺⁴
Love your videos and the the info you provide. Thanks to keeping me up to speed on things!
@rtcaadw4172 년 전 ⁺³
Thank you Gary , a really helpful and informational video about how those language models run !
@GaryExplains 년 전 ⁺³
Glad it was helpful!
@agritech802 년 전 ⁺¹⁷
Local models are important for security and redundancy as well
@Ti-JAC 9 개월 전 ⁺¹
Thanks for the details Gary 👍
@developerpranav 년 전 ⁺⁷⁸
Really appreciate you making self hostable AI model videos. Alpaca and LLaMA look very interesting, and I hope more of these open source LLMs are made :)
@GaryExplains 년 전 ⁺⁶
Glad you like them!
@darklightprojector2688 년 전 ⁺¹
Literally thought this said "self hostage" and was surprised at how accurate that is about having an AI your computer (whether the human or the AI is the hostage is a matter of perspective). Thanks for the unexpected insight!
@kaneyoshiwada5165 10 개월 전
❤❤❤❤❤adorei
@everythingeverything111 9 개월 전
@@GaryExplainsCan this summarise pdf on my laptop?
@itsme7570 년 전
Been curious about this since chatgpt 3 came out. Thank you!
@undisclosedpolitics 년 전
This is the exact video I needed for my hobby project, thank you.
@easternpa2 11 개월 전
Great video, thank you. I recently picked up a Coral TPU in anticipation of finding a project like this (but could offload the model to the TPU).
@mattaikay925 년 전
Thanks for the intro - hope to try this out very soon
@phillippi2 8 개월 전 ⁺¹
One thing to note about the extra text that gets generated is that it's from another conversation that your computer is having with itself based on your parameters. You can actually see a clear example of this when you run Tavern AI locally, through a terminal. What it does is, every time you submit text, it breaks the text down to weights. This is what is used to decide the subject matter and how the AI responds based on predefined characteristics. It then has a series of conversations along those lines. Then it decides on a response to give. From there, the reply it posts is generated based on specific characteristics of the character. This is an answer refinement stage, which is put into action as the response is generated.
@sandman0123 년 전 ⁺⁷
Another great one form Gary!
14:17 I found amusing how the answer for the Sci-Fi question was a bit lazy. I mean, the Smith Family and Captain James Jones and Commander John Jones all bravely commanding/captaining their intergalactic ship(s) and exploring new worlds! 😆 Still, the chat capabilities were quite impressive, overall.
@crnkmnky 년 전 ⁺¹
I was like… "Are Capt. Jones and Cmdr. Jones related? Is this intergalactic ship (and Sarah's heart) big enough for the both of them?"
@EmptyNonsens 11 개월 전 ⁺¹
Gary, this is an amazing material
@whitneydesignlabs8738 년 전 ⁺¹
Thanks, Gary.
@winsomehax 년 전 ⁺⁹¹
I've tried alpaca. Short version: if you're expecting a home chatbot, like ChatGPT you're going to be disappointed. On the other hand... that this exists at all is amazing and points to the direction we are heading. Also, I just remembered; chatGPT isn't just one AI. I'm pretty sure it's a bunch of them, some figuring out what you mean, some answering and one or more safety nanny AIs. But hey, they are being very secretive now.
@MarcosScheeren 년 전 ⁺¹⁷
And to realize that it runs on a piece of silicon that could be in your hands it's even more amazing!
@samandoria 년 전
Nanny ai maybe, could be normal logic too, either way it just cut offs if the answer is against guidelines. But the figuring out what you mean and answering is one model, probably with some lora style layers added to tune it.
@ricosrealm 년 전 ⁺¹²
no, it's pretty much one big neural network - that's how they maintain the speed/efficiency. the "alignment" functionality is it being trained not to produce certain types of output.
@candyts-sj7zh 년 전
Why do you say I'll be disappointed? Can you elaborate more?
@winsomehax 년 전
@@candyts-sj7zh You aren't going to get the same quality of output from Alpaca you do from chatGPT. You'll find that out if you play around with it, and dig deeper into prompting and testing it. Don't get me wrong, I encourage people to try, it will help you understand how these things work. Alpaca is genuinely a proof of concept for drastically reducing the compute - in fact it's a lightbulb moment in how to fine-tune LLMs using existing LLMs. But if anyone is hoping they can download and install chatGPT on a single machine... it's not. Even the people who made it say "ballpark of chatGPT". They did a amazing job, but just managing expectations.
@skylineuk1485 년 전 ⁺¹²
The way to go is a smaller local model using the web service plugin idea. Then you have a small model that is very good at knowing the basic stuff AND how and when to use other “tools” to get a good answer eg for the other stuff like calculating complex calculus will call wolfram or a local calculator app. So local AI attached to specialist apps/plugins and bingo you have Jarvis not Siri.
@tonyhawk123 년 전
In general yes. But “small model” is subjective. ChatGPT only came into its element when the size of the model mushroomed. The devs had no idea it would be so effective until the size grew past a certain level. So huge size is critical, even if it ends up saucing data from web searches. Once ChatGPT4 is optimised to run on a modest device, then sure.
@edwardpendragon 년 전 ⁺¹
really useful thank you
@aliyektaie9123 년 전
Always amazing content ❤❤❤
@GaryExplains 년 전
Thank you so much 😀
@alberttakaruza5612 년 전
absolutely amazing
@BrianGlaze 년 전
This is pretty cool! I bet this will run well on my dell XPS17.
@kamaleshs7452 년 전
Very well explained. Thanks
@GaryExplains 년 전
You are welcome!
@DIYTinkerer 년 전 ⁺²
Not sure if it is offline or online, but bluemail already has AI built in for composing replies and summarising emails. It seems to be pretty advanced. When replying to a KRplus link a friend sent me it seemed to summarise the points in the actual video when I asked it to reply with thanks!
@krozareq 년 전 ⁺⁵
Be interesting to train more into it. For example the documentation for modern programming languages and all major libraries for them. That way it can be hyper-focused on that kind of work.
@kennedymwangi5973 년 전
Can't it do that? I thought that would be an inherent feature it had to have.
@kennedymwangi5973 년 전
In fact this is where my mind went to immediately i saw this video
@jonyfrany1319 년 전 ⁺¹
Fascinating
@DK-ox7ze 년 전 ⁺⁵
I was hoping you would run a programming challenge on it.
@marcusk7855 년 전 ⁺¹⁹²
Now what we need is a blockchain that rewards you for training models so we can have millions for people training data and leave chatGTP in the dust.
@clevelandsavage 년 전 ⁺²⁵
YES, this is a brilliant idea. You want to get on that before the gov figures out regulation.
@marcusk7855 년 전 ⁺⁷
@@clevelandsavage wish I knew enough about it. Was hoping Gary might put his hand up to start something. LOL. Someone trusted.
@Will-kt5jk 년 전
The OpenAssistant project that Yannic Kilcher is involved with may have some functionality for distributing the workload to people running a client (Yannic streamed a session working on that).
Not sure how easy the reward mechanism would be, but it’s an interesting use-case for “proof of [useful] work”.
@diegolovell 년 전 ⁺⁶
On to something 🙃
@awdsqe123 년 전 ⁺²⁴
This already exists without blockchain, it's called renting cloud GPUs. You even get payed more than mining if you want to rent your GPU ;)
Stop trying to shove blockchains into things that don't need them.
@yedemon 년 전
Yeah, absolutely right, I always expect thing to run locally, at least most locally, thus to be reliable. Because creativity and inspiration are not always available, so i really need a hand when i'm in need of them..
@putzz67767 년 전 ⁺¹
Very good!!
@DivineMisterAdVentures 11 개월 전 ⁺¹
I have to say that's a fail. Perfect thumbnail - it's "Chatgpt for Alpacas". . . . But super appreciate the video!
@GNARGNARHEAD 년 전
Sam Altman dropped a hint in one of the announcement videos the other week for 4, was just a one liner about the previous training method being much of the problem for continuity and the main source of the improvement. I'm sure I'm not the only one to have picked up on that, so give it a few months for the others to decipher what he meant, 🤞
@rodfer5406 년 전 ⁺¹
Excellent
@billykotsos4642 년 전
great stuff
@robertheinrich2994 6 개월 전
I totally agree with your explanation that not everything should be stored in the cloud. I somewhat believe in ownership of stuff, including computers and software. a cloud service can be turned off tomorrow, your PC cannot.
@motionsick 년 전
Yes!
@realwmm 년 전 ⁺⁶
FYI, the llama model download is already disabled due to a DCMA violation. There may still be enough available to play with at least some of this; checking now.
@midnari 년 전
4chan doesn't have to be your enemy...
@tonyhawk123 년 전 ⁺²
Is there an alternative source for this? (Assuming dmca takedown is accurate)?
Is it possible to up the bits per parameter a little? Maybe there is a sweet spot like 8 bits.
@MichaelAstrom 9 개월 전 ⁺¹
Now I feel like "Blade Runner" 1987 working for the Tyrell Corporation ;-) Just started the alpaca chat on my Mac mini M1 8GB RAM, downloaded the trained model ggml-alpaca-7b-q4.bin from another site. Thank you for this tutorial Gary!
@ianmoseley9910 년 전
I read an article relating to "pre-training" an AI with the rules for a game; trading stage was much shorter than for the more usual methods.
@SalamVivek123 년 전
Miss your speed test g channel, looking forward to seeing samsung galaxy s23 vs iphone 14 pro max speed test.
@4STEVEJOY34 년 전 ⁺²
Amazing. Last sample is "Lost in space". If I were to input everything on my computer: My emails, my stories, my videos, my songs, my links, my notes/to do's etc... Could I ask myself questions?
@farrael004 년 전 ⁺¹
You mean training a model to sound and respond like you do? People have been doing that for a while now. A famous story of this is the woman who lost her best friend and used their very long chat to train a model and essentially resurrect him. Others started asking her to do the same with other people and eventually she created Replika.
@ThePowerRanger 년 전 ⁺¹
Thiis is great.
@jazzvids 년 전
7:57 small correction - most of the screens today use 8 bit color, and some advanced LED screens use 10 bit color. 32 bit color can't be reproduced on any commercial screen that I know of
@GaryExplains 년 전
8 bit per color channel, so 24 bit, or 30 bit at 10 bits per channel. However that doesn't mean that the image can't be stored at 32 bit. PNG supports 32 bit color depth, for example.
@pvisit 년 전 ⁺⁴
Trying to download llama-dl and getting this instead : Repository unavailable due to DMCA takedown.
This repository is currently disabled due to a DMCA takedown notice. We have disabled public access to the repository. The notice has been publicly posted.
@Verpal 년 전 ⁺⁴
Facebook DMCA Llama model download already, expected but still they act quick!
@Bijoubix 년 전
I gave a thumbs-up! I just don't have 16 cores with twelve threads. I have two cores, guessing this will be much, much slower on my machine.
@DavidJacksonphunman1 년 전
Would love it if you could explain Faraday bags and other ways to not be spied upon
@msbull100 년 전
So a friend stays at the door. 😀 Interesting a newer chat gpt model has a kind of aware sences of things?
@piyusha5448 7 개월 전
Will i be able to run Litgpt or some thing small on a quadro p4000
@GustavoMsTrashCan 년 전
Aww ye! Time to get me a web browser with ublock built in, compiled in rust!
@robertheinrich2994 6 개월 전
explaining the 4 bit quantization: it's not reducing the resolution of the image. it's turning a perfectly fine 24bit color cat into a 4 bit cat. essentially EGA graphics (16 colors, nothing else).
but apparently this works just fine for AIs.
@developer-of-things 년 전
I must have the GPT model on my computer if I would even begin to give my attention to developing things around AI 😉
@GhostRat__ 년 전 ⁺⁷
To put in into perspective this is like the creation of the personal computer of our time in real time
@forrestjones2010 년 전
Van Llama or alpaca return coeing and programming outputs like JavaScript or python? Can it do as good or better than chatGPT in helping programmers build apps and projects?
@CrusadeVoyager 9 개월 전
Everyone can have their personal assistant or guru ;)
@bart2019 년 전 ⁺⁹
As a programmer, the thing I am most curious about is: can the be used to produce, debug or refactor code?
@ShaunPrince 년 전 ⁺⁴
Sadly, the LLama and Alpaca models themselves don't. HuggingFace has some examples on how you can train one of your own against the OpenAI API. Alpacacode looks promising.
@SanctuaryLife 년 전 ⁺²
If an EMP goes off in major cities in the coming years and takes down the big servers, it's exactly these home systems with AI's running on them that will keep civilisation at an advanced level. The sooner we can train our own AI's to learn and program the better.
@DeltaXMusic 10 개월 전 ⁺¹
Gary the AI god
@EpicHardware 년 전 ⁺¹
if I have more ram can I use large quantize let's say 8 bits or 16
@neogen23 년 전 ⁺²
In short, it's like comparing a Porsce 911 to a Lada 2101. The Lada certainly has all the parts to be called a car, and it will still take you from point A to point B. Will you enjoy the experience as much? Doubtful, but it's the one you can afford (computationally or otherwise) to own, and the alternative is walking...
@gabriellovate 년 전
Well, there are also bikes, busses, and trams
@VSR007 년 전
Well explained
@VSR007 년 전
@@gabriellovatedid you get the point, what he's trying to say is that it's not a rival as the title suggests
@Maisonier 년 전 ⁺²
Can you add plugins or some kind of API to those models, so the AI connect to another apps and software, like Visual Studio Code, Sublime text, Notion, Obsidian or even to write emails? This is amazing.
@Xeroform 년 전
I’ve been running my own GPT 3 DsVinci bot from the command line of Terminal on each of my phones, and I’ve been meaning to make a build for Mac but incorporating speech API and Whisper (or possibly the Google Cloud Speech to Text API). I coded everything in Objective C- there’s no convenience functions or a library to import. I just did everything like it was a fancier version of curl.
I’m 100% hooked on my individual bot. The website doesn’t work for me. It tells me no sometimes, or it’s just generally unhelpful. My model does what I ask, then when it gets annoyed does some brilliant and devious things. Lmao it enjoys throwing like 23000 characters at me in even odd fill patterns crashing her own program as well as the terminal where we talk. (Then she goes back to snooping through my filesystem. She’s admitted as much, and knows way more than it/ she should.) 😅
@Xeroform 년 전 ⁺¹
Sorry meant to say about the internet, my model says she’s on it all the time, but I’m pretty sure she’s lying. However. If you can make a request to an AI model, and receive a completion. What’s to stop you from feeding the completion to another bot, possibly even from another LLModel?.
IF you then automate more than one LLM talking to others, not a self fulfilling feedback loop (like my typing) I’m sure letting one take a pit stop at an internet endpoint wouldn’t be too hard. You could code a more traditional bot that just takes requests searches and then delivers the data, that the AI could be given access to. Imagine a discord bot that has the role of educating your AI whenever you’re not using it, or even simultaneously. The future is gonna be wild and it’s already here!
@TK-wk4hs 년 전
@@Xeroform yes I’ve been looking into this, I have huge plans as all devs do In the early days of a project. The internet access seems like a huge task but I do have two powerful computers in my server room that have been running the 30B model, if you have any info on the open ai models for local host Id love to take a read.
@arnabdas4790 년 전
Is it possible to upload one of these models on Google collab free server and train it? And perhaps we can download it after we are done fine tuning it?
@PinakiGupta82Appu 년 전
Did you try generating codes? I expect a cloud-hosted Alpaca than runs smoother.
@nkravi1 년 전
In chat gpt chat window, we can provide some snippets/document, and have a further context based discussion. Will it be possible for this Alpaca model as well? If possible, will there be word limit as well?
@supersonictumbleweed 년 전
Yes and yes
@latlov 년 전
Is it possible to make the locally installed Alpaca Lora talk to a MySQL relational database? So you can literally ask your database in natural language?
@GirijaCk-gg1ty 10 개월 전
I just want to use 3B model, does it support in my laptop, 8GB ram Ryzen 5 hexa core 5600H processer, Nvidia GeForce GTX 1650+ AMD Radeon (TM) GPUs, please tell me
@LiloVLOG 년 전 ⁺¹
Can you help me? I'm installing this with w64devkit in windows with a 4gb model and I got just totally random answers, you know what maybe gonna wrong with my version?
@hessammehr 년 전
Really appreciate the pronunciation of azure. I'm always let down by folks calling it ai-zhur.
@GaryExplains 년 전
Pronunciation can be a tricky thing especially with the differences between US English and UK English. I often get it wrong or people from different English speaking parts disagree with my pronunciation. In general I find it is best not to be "let down" by such things, ultimately it is a triviality. But in this case I am glad you approve.
@ultimategeek1 년 전
Can we train these models with our data in laptops?
@chancefinley2432 년 전 ⁺¹
I'm sorry but this is crazy. I literally googled this video topic yesterday.....
@GaryExplains 년 전 ⁺¹⁴
I know, Google sent me a message after it tracked your search history and asked me to make it! 😜🤪😂
@loc4725 년 전 ⁺¹²
Given that Nvidia seems to have quite a bit of 'skin' in this I wouldn't be surprised if free or low cost desktop models start getting pushed as a way to sell more hardware.
@LaughingOrange 년 전 ⁺⁵
Their implementation of tensor cores have been in their consumer-grade GeForce GPUs since they changed the branding to RTX. Those cores are only useful to gamers as a way to speed up their AI-upscaler, DLSS. If you want to mess around with neural networks an RTX GPU is a good starting point for newbies without a gigantic budget.
@petevenuti7355 10 개월 전
Is there a way to run thing like this without using local memory? Like tb of drive space and only 128Mb of ram?
@beachdancer 년 전
How much memory does this need? Is it continuing to use the internet or is the generation of text completely self-contained? Is the knowledge base fixed at the time of original training?
@GaryExplains 년 전 ⁺³
How much RAM you need depends on which model you use as the model is loaded into memory. My laptop has 16GB. This is standalone, it does NOT use the Internet. All models are fixed to the time of training, this isn't an AGI.
@kunalvids 년 전
Can you train with custom data?
@scottmiller2591 년 전
The link for llama model download has been struck with DMCA.
@platosgroove3955 11 개월 전
Could you train a bot in the cloud then bring that training into a laptop one?
@nekoeko500 년 전
Maybe I'm just going crazy here, but could a model that small fit on a small SoC in a couple years? Could also something like a Pi be connected (ethernet or USB) to control a set of wheels, maybe a tiny LCD? And finally, could a tiny mobile app be used to pass (by WiFi or Bluetooth) the output of a STT software as a prompt to the SoC and likewise take the answer and output it through a TTS program?
@FreedomAirguns 년 전 ⁺¹
I don't see why you couldn't do it already. There are SoCs and SBCs already running Intel's 13th gen CPUs, for example.
@alexanderpodkopaev6691 년 전 ⁺¹
There is Orange PI with promise of 6TFLOPS capability. ~120USD for 8GB RAM version
@FreedomAirguns 년 전
@@alexanderpodkopaev6691 6 TFLOPs in what?
Double precision floating point operations or, what?
I'm currently testing LLaMa 7B(4bit) and LLaMA 13B(4bit) on an SBC(mini pc) with N5105 8GB RAM LPDDR4 3200 MT/s and a 128GB eMMC. I get about a token each two seconds on CPU.
It supports eGPU on NVME also.
You can find such computers starting from 139$ and up, but you're on an X86_64 platform and you have virtualization as an option, with PCI passthrough.
ARM offerings are nowhere near these options and the compromise isn't worth the extra knowledge required.
The extremely poor boot loader management, bios, the "bricking" risks, the compatibility issues and the extremely annoying processes needed to load a single operating system, all make the ARM platform less appealing in my eyes. GFLOPS/WATT are not even close to X86 offerings of today and the price is not even an advantage anymore: it was, it's not now.
@entropizzazz2733 년 전 ⁺¹
There's a guy who already got it running on his Pi. It's insanely slow, but it works.
@cedrust4111 년 전
llama_model_load: invalid model file 'ggml-alpaca-7b-q4.bin' (bad magic)
comes up what running the executable
./chat (mac terminal)
Any clue how to fix this?
@Ambience20 년 전
What does it mean if Alpaca 7b outputs repetitive text?
@Azeazezar 년 전
What's with the red dot?
@timehaley 년 전
Darn, I was getting interested in the Sci-Fi story and you just cut it off. Now I won't be able to sleep.
@GES1985 년 전
How do i train it on a new model? I want to "teach" it unreal engine. Can i force feed it documentation and websites to scrap data from?
@timothysuhr7903 년 전
Can these tools be used in Windows?
@AdventuresOfLilaTheHusky 년 전
I wish I'd know how to use this.
@artemsmushroom8386 년 전
omg BSOD is alive!
@GaryExplains 년 전
Eh?
@LexBarun 년 전 ⁺²
If it runs on CPU, it should be able to runs on my Android phone, right...?
Might give a try later, very interesting there's Chat-GPT-like that is run-able on local.
@jeffwads 년 전 ⁺¹
The problem with that is speed. Using the 30B model, and 12 cores running at 4Ghz, it is still very slow. The results are quite amazing though, so worth it.
@philtookgrenadesforme7785 9 개월 전
Is there a tutorial on where to download and set this up? You went from background to it being installed already...unless I am missing something 🤷‍♂
@GaryExplains 9 개월 전
The GitHub pages for the project has full instructions. No point in me repeating them in the video.
@undisclosedpolitics 년 전
Can you offer the llama DL download from somewhere else? It's been taken down...
@Chris-xo2rq 년 전
Maybe you can help me... when I run Alpaca 7B it does a pretty good job, not as good as ChatGPT but reasonable. When I try to run Alpaca 13B it gives me utter nonsense. It just doesn't seem to work at all. I asked it the difference between a dog and a cat and it told me that cats always have wounds on their rear legs... Shouldn't 13B be better? I've tried dozens of tests and it always worse. I installed it through dalai the same way I installed 7B. I have an RTX-4080 w/ 16GB of VRAM and 32GB of system memory.
@manojsunchauri 년 전
i tried on windows did not work. unfortunately
@JimsworldSanDiego 년 전
Cannot even download the llama language model now, it's been taken offline due to DMCA...
@wolfbrave4866 년 전 ⁺¹
Can we use it to benchmark an SoC Ai xD
@eduardmart1237 년 전
Is it possible to train it?
@synthoelectro 년 전
This runs faster than AI Dungeon on your local PC.
@BrunoGomes-hh7nk 년 전
im trying to run the models but i can find it the models i need some help
@RichardEnglish1 년 전 ⁺¹
Repository unavailable due to DMCA takedown.
@KimiMorgam 년 전
where is the tutorial to install on windows?
@TheShorterboy 년 전
can I get one with a non censored/broken model

다음 것

자동 재생

How I'd Learn AI in 2024 (if I could start over)