Generative AIs tend to become more democratic. But at the moment, they are exclusively owned by private companies and run on remote servers. What if we could “run” ChatGPT on our computers? Developer Simon Willison accomplished this by running LLaMA, Meta’s language model, on his laptop.
The trend is unquestionably towards artificial intelligence for automatic text generation: ChatGPT, the new Bing, Snapchat’s My AI, the You search engine and soon Google Bard. At the moment, everything works thanks to servers, in the cloud. But in a few years, by optimizing and increasing the capacity of our machines, we could make them run smoothly thanks to our computers. This is already the case, not with the most famous language model, GPT-3, but with Meta’s, LLaMA. Due to its leaked code, some developers managed to use LLaMa on their computer.
The leak of LLaMA, the Meta language model
That’s what changed everything according to open source developer Simon Willison: the release of the source code for LLaMA, the language model released by Meta a few weeks ago. The aspiration is great: 65 billion parameters are taken into account for the generation of text. Furthermore, Meta claims that their model outperforms GPT-3 on most points.
Where Meta has wanted to differentiate itself from its competitors is in the opening of its model. It has been made available to researchers, although you have to agree to certain strict conditions, such as using it solely for research purposes, non-commercial of course. But as Simon Willison writes, a few days after its publication, LLaMA saw its source code published on the Internet with all the model files. It is now too late for Meta: her baby is in the wild and can be used locally.
Run a text generation tool like ChatGPT on your computer: it is possible
So Simon Willison ran the show on his own MacBook and couldn’t suppress a thrill: ” When my laptop started spitting text at me, I really had a feeling the world was about to change, once again.Whoever thought that it would still be years before being able to operate a model similar to the GPT-4, was wrong: “that future is already here.»
According to the developer,our priority should be to find the most constructive ways to use it“, speaking of AI. Add that “Assuming Facebook doesn’t relax the license terms, LLaMA will likely be more of a proof of concept that local language models can be used on consumer hardware than a new base model for people to use in the future.” Anyway, “The race is on to launch the first fully open language model that gives users ChatGPT-like capabilities on their own devices.»
In his blog post, the developer explains how he achieved this tweak. Simon Willison talks about the llama.cpp project by Georgi Gerganov, also an open source developer based in Sofia, Bulgaria. The latter published on Github LLaMA in order to “run the model using 4-bit quantization on a MacBook.Simon Willison clarifies that “4-bit quantization is a technique for reducing the size of models so that they can run on less powerful hardware. It also reduces the size of the models on disk: 4 GB for the 7B model and just under 8 GB for the 13B model.” HE “B.being here the number of parameters used by LLaMA, expressed in billions.
The dangers of open source or locally executed language models
However, while it’s a great thing for the open source world and to limit control of these language models to a few companies, this “releaseit has some dangers. Simon Willison even listed ways to use this technology to do damage:
- spam generation
- Automated romance scams
- Trolling and hate speech
- Fake news and misinformation
- Radicalization by bubble effect
Furthermore, these multipleChatGPTYou can still make things up that are wrong at the same level that you can write factually correct sentences. This is also where the strength of theSpectrum» – By providing access to their language models through virtual machines, an API, or conversational tools, they can add layers of moderation. Through fine-tuning and constant improvement, they can control the way users interact with these artificial intelligences. By running them locally, you can lose those layers of control.
Only big digital companies control language models
Finally, there are few language models: among the most important are OpenAI’s GPT-4, Google’s LaMDA or Meta’s LLaMA. While Microsoft doesn’t fully control OpenAI, it has invested around $10 billion in the company. Recently announced, GPT-4 is a major new version of GPT that takes many more parameters into account, for even better performance and higher accuracy. If we say that the expressionSpectrumIt may seem simplistic, it is actually these few private companies that control the most popular language models. The potential dangers of this control are great: they are actually black boxes that we don’t know much about.
There are other initiatives, including in France. With us, Hugging Face and her model Bloom are leading the bow. As indicatedle figaro, the company was founded by three Frenchmen and is valued at 2,000 million dollars. A language model designed in open source and in collaboration with the CNRS: it has beentrained for 117 days on the French public supercomputer Jean Zay, and already downloaded for free by more than 65,000 professionals“. The ambition is clear: Hugging Face dreams of a Franco-European Bloom to compete with ChatGPT. If this type of AI is still reserved for large companies, it is also because its development requires significant funding.
Running generative AI is expensive
This week, Microsoft introduced new virtual machines on Azure, its cloud solution: they’re purpose-built for AI. But to make it all work, Microsoft used Nvidia H100 GPU. An infrastructure valued at hundreds of millions of dollars that ultimately makes Nvidia the big winner in this race in which all the giants are thrown. In addition, this race could lead to a new shortage of graphics cards in the coming months, according to some forecasts.
As Simon Willison writes in his article, automatic text generation models are even larger than image generation models, such as Stable Diffusion, Midjourney or DALL-E 2. For him, “even if you could get the GPT-3 model, you wouldn’t be able to run it on commodity hardware; these models typically require multiple A100-class GPUs, each of which retails for over $8,000.»
In addition to a great deal of computing power, skilled teams are also needed to create these models. Even for Simon Willison, if there are dozens of models, none manage to tick the following boxes:
- Ease of execution on own hardware
- Powerful enough to be useful (equivalent to GPT-3 performance or better)
- Open enough to be modified.
Do you use Google News (News in France)? You can follow your favorite media. Continue Frandroid on Google News (and Numerama).