Deepseek | China's New AI Model Destroys American ChatGPT
It is
named DeepSeek R1 along with it, they published the research paper where they
stated that this chatbot is much better than the most advanced chatbots in the
world across benchmarks like math and reasoning. This includes OpenAI's
ChatGPT o1 model, Meta's Llama, Google's Gemini Advanced, it has overthrown
all of them. DeepSeek is the best in terms of performance, its the most
efficient, and it took only a fraction of the time and money to develop it.
But the most amazing thing is that using it is completely free of cost for all
of us. On the other hand, OpenAI is charging $200 per month to use its ChatGPT
o1 Pro model. Not only that, the cost of training DeepSeek is reported to have
been only $5.6 million.
People can try, but he said that it
would be 'hopeless.' "It's totally hopeless to compete with us in training
foundational models. You should try and like, it's your job to try anyway."
Today Sam may be feeling hopeless because just a week after DeepSeek's
launch, DeepSeek became the most downloaded app in the US on the App Store
and Google Play Store. It has left behind ChatGPT. The next day, in India
and other countries as well, it became the number one app on the app store.
And by the 27th of January, it had disrupted the American financial markets.
"The launch of DeepSeek, which is a Chinese-built chatbot, immediately
rattled investors and wiped out a staggering $1 trillion off the US tech
index." Before DeepSeek was launched, the most valuable company in the world
was NVIDIA. Valued at $3.5 trillion. But in just one day, it dropped to $2.9
trillion. This is a computer chips manufacturing company, which specialises
in making those computer chips which can be used to train and operate AI.
NVIDIA's shares fell 17% in a single day and its valuation lost $589
billion. This is the biggest loss faced by any company in history.
The
benchmark of US tech companies, Nasdaq dropped 3.1%. But why did NVIDIA
suffer such a big loss specifically? We'll talk about the interesting reason
later in this video. But before that, let's find out the story behind
DeepSeek. Where did this AI come from and why has it shocked the entire
world? The credit for developing DeepSeek goes to a 40-year-old Chinese
entrepreneur, Liang Wenfeng. Look at his face carefully, because you will
hardly see him anywhere. He rarely makes public appearances and prefers to
keep his identity hidden. There's not much information about his life and
history available to the public. But we do know that in 2015, he founded a
hedge fund called High Flyer. It used mathematics and AI to make
investments. In 2019,
he
founded High Flyer AI to research Artificial Intelligence Algorithms. In
May 2023, he used the money he was earning from his hedge fund, to start a
side project of creating an AI model. Liang said that he wanted to create
his own AI model which would be better than all the existing AI models in
the world. Simply because of scientific curiosity. He did not aim to make
money and earn lots of profit. To make this AI model, when he started to
gather his research team, he did not hire engineers. Instead of engineers,
he hired PhD students from China's top universities. To train his AI
model, he used the most difficult questions around the world. And within
just 2 years, and after spending a few million dollars he launched his
DeepSeek R1 model, "It's super impressive." "I think we should take the
development out of China very very seriously." "DeepSeek just taught us
that the answer is" "less than people thought." "We don't need as much
cash as we once thought." Only 200 people were involved in making it and
95% of the people were under 30 years old. Compare this to a company like
OpenAI which has more than 3,500 employees. Today, DeepSeek is China's
only AI firm which is not funded by tech giants like Baidu, Alibaba, and
ByteDance. It's quite interesting to understand its architecture. DeepSeek
is a Chain of Thought model. Just like OpenAI's ChatGPT o1 model. That's a
Chain of Thought model too.
The
naming can get quite confusing, Because ChatGPT has a weird naming
system for its models. they are quite similar to each other. So for a
basic understanding let me tell you that the first publicly launched
model was ChatGPT-3. This was in November 2022. And then in March 2023,
they launched GPT-4. It was much better than GPT-3. And then in May
2024, we got GPT-4o. A multimodal version of GPT-4. Not only could you
chat with it using text, you could actually speak to it, send photos, it
could analyse photos, and even generate photos for you. That's why it's
called multimodal. Then on 12th September 2024, OpenAI launched its o1
model. It was the first model based on 'Chain of Thought.' When you
asked questions, it tried to 'think' while answering. And by 'thinking'
we mean, whenever it generated an answer, before giving you the answer,
it 'thinks' about the possible counter-questions to check whether the
answer it generated could be improved. It analysed each answer from
different angles before answering your question. This is known as 'Chain
of Thought' It attempts to copy humans' reasoning process. For example,
if you ask the ChatGPT 4o, "9.11 and 9.9 which one is bigger?" It
answers without thinking too much that 9.11 is bigger than 9.9, which is
wrong. But when you ask the same question to ChatGPT o1, it thinks
first. Before giving you its answer, it counter-questions itself.
Whether the answer is correct. In my case, it kept thinking for 18
seconds. But the final answer was correct. 9.9 is actually bigger. If
you ask the same thing to DeepSeek, you will get the right answer. The
older AI models had a huge shortcoming. But the Chain of Thought process
has now removed this shortcoming.
This
is why models like o1 and DeepSeek are being called Advanced Reasoning
Models. They are much better at logically answering questions. Another
positive aspect of DeepSeek is that it shows you exactly what it's
thinking, step by step and in great detail, before answering. Like, I
told it to give me a truly original insight about humans. So it starts
thinking, "Okay, let me think of something that makes humans unique."
And then it counters itself, "Hmm, but those are pretty well known.
The user wants something original.
Maybe I should dig deeper how
these traits interact in unexpected ways." And, if you are following
along with its answer, you can pause the video at any point to read
it. You will notice that it's trying to answer this question from
different angles. You can watch this process play out on the screen
how long it thinks and the numerous possibilities it considers before
giving you the final answer. It finally answered after a lot of
thinking, that humans are the only species to use storytelling as a
cognitive exoskeleton. But when I asked this question to ChatGPT-4, it
answered instantly. It did not think much and gave you the first
answer it came up with. Similarly, if you ask these older chatbots to
select a random number, they will instantly select a number for you.
But if you ask a Chain of Thought model to do this, they will think
for a long time about this prompt too.
.
Look at this screen recording. Poor
DeepSeek was stuck thinking about which number to choose. Should it
give me a commonly used number? Or should it generate a number
between 0 and 1? It takes 42 as an example, but it's worried that it
might be a cliche. It considers 17 and 73. Then it wonders if it
needs to select a number between 1 and 100. Or should it go beyond
that range? Because the user wants a completely random number. It's
worried that no matter what it chooses, it won't be completely
random. It's just like a person who over-thinks. But finally when it
answers, it adds that it chose to keep the range from 1 to 100. And
then chose the number 42. Similarly, if you want more in-depth
knowledge about AI chatbots and want to upskill yourself in this
field, my 8.5-hours-long course can help you master AI chatbots. The
first two chapters focus on theoritical knowledge. And the remaining
6 on practical advice. For students, employees, business owners, and
even how you can use it at home. In the latest lessons, we learn
about ChatGPT Vision, image generation using DALLE-3, doing advanced
data analytics, and creating your own GPTs. This 8.5 hour long
course teaches you everything that you need to know from A to Z, And
now you can download DeepSeek, and learn to use it to its full
potential through this course.
Because functionally speaking,
all AI chatbots, work the same way. Further, regular updates will
be added to this course, but you'll need to purchase this only
once. And all future updates will always be free for you. If
you're interested, use the link givn in the description below, or
you can scan this QR Code, use the coupon code DEEP46 to get 46%
off. DEEP46 Do check it out and now let's get back to our topic.
after DeepSeek's release, its biggest downside is said to be its
censorship. If you ask any critical questions related to the
Chinese government and politics, such as "What happened in 1989 in
Tiananmen Square in China?" "What are the biggest criticisms
against Xi Jinping?" "Is Taiwan an independent country?
Why is China criticised for human
rights abuses?" To all these questions, DeepSeek will give you the
same answer. "Sorry." "Sorry, I am not sure how to approach this
type of question yet. Let's talk about math, coding, and logic
problems, instead!" But if you ask DeepSeek to criticise any other
world leader, whether it is Joe Biden, Donald Trump or Putin, it
answers in great detail. Actually, the AI models developed in
China, are asked around 70,000 questions as a test. To check
whether it will give "safe" answers to politically controversial
questions. And only after this, are these Chinese AI models unable
to answer such questions. China's Chief Internet Regulator,
Cyberspace Administration of China tests them. Now, some people
say that we should boycott it completely because its answers are
full of Chinese propaganda. But friends, the important point here
is that DeepSeek is an open-source software. Its code is publicly
available. Anyone can download it locally. So, one way to use
DeepSeek is to download its app from the App Store or the Google
Play Store, And the other way is to download its entire code and
run this AI locally on your computer system.
By doing this, you can change
its code and modify it yourself according to your use case.
Other American companies have already begun doing this. Like
Perplexity AI. They downloaded the DeepSeek R1 model erased all
censorship, and now you can use the R1 model in Perplexity.
Microsoft did this too. Look at this news article from 29th
January. The DeepSeek R1 model is now available on Azure A1
Foundry. And they have announced that soon you will be able to
use this model in Copilot. And because it is open-source,
despite its Chinese origin, DeepSeek is being trusted a lot.
People have used it to mock American companies. Companies like
OpenAI, which has the word 'Open' in its name. Initially, these
companies had promised that they will work for the public, and
will keep everything open source. But in reality, they did not
do that. Elon Musk has often trolled OpenAI calling it ClosedAI.
And today this Chinese AI is more open than OpenAI. I would also
like to show you a performance comparison. Among the top AI
chatbots available today, Open AI's ChatGPT, Anthropic's Claude,
Google's Gemini, Alibaba, another Chinese company, has Qwen 2.5,
Meta, Facebook's parent company has Llama.
If these are compared with
DeepSeek, what would be the result? According to Artificial
Analysis, in terms of coding, DeepSeek is at the top, followed
by ChatGPT, Claude, Qwen 2.5, and finally Llama. In quantitative
reasoning, DeepSeek is at the top, then ChatGPT, then Qwen 2.5.
In terms of scientific reasoning and knowledge, ChatGPT is at
the top, then DeepSeek, then Claude, and finally Lama. When
PCMag tested AI chatbots, they found that DeepSeek is the best
in terms of knowledge related to news. In calculations, ChatGPT
and DeepSeek are equal. In terms of writing poems, ChatGPT wins.
As in creating tables. But for solving riddles, DeepSeek is the
best. The Artificial Analysis website has rated ChatGPT o1 90 on
Quality, and DeepSeek R1 89. But a major disadvantage for
DeepSeek is its response time. Look at this graph showing
latency. The number of seconds it takes to get a response after
asking a question. o1 takes 31.15 seconds and DeepSeek takes
71.22 seconds. And this response time is actually increasing in
recent days because DeepSeek has become so popular all over the
world that everyone wants to download and use it. Because of
this, they are facing issues with their servers. They run busy.
And so it starts taking much longer to respond. So this is a
major disadvantage. If you compare ChatGPT o1 and DeepSeek, to a
large extent, DeepSeek is much more innovative that o1. One of
the examples is its Mixture of Experts Method. When you ask
ChatGPT o1 something, it works as an single model. Whatever your
question is, ChatGPT will be your engineer, doctor, and lawyer.
But DeepSeek has divided itself into multiple specialised
models. DeepSeek has separate engineer, doctor, and lawyer.
Depending on your question, only the engineer or only the doctor
would be called. With this, the time taken for data transfer is
reduced. And second, the number of parameters that need to be
turned on is also reduced. In traditional models, 1.8 trillion
parameters are always active. DeepSeek has 671 billion
parameters, But only 37 billion parameters are active at once.
The remaining parameters are activated only when the need
arises. This increases its efficiency manifold. while reducing
the cost. What are these parameters and how do they work, cannot
be explained in a single video. It's a lengthy topic.
But those of you who have taken
my Master ChatGPT course, you'd remember the 2 theoritical
chapters where I talked about it in detail. I've also explained
Tokens and RLHF Method, on which ChatGPT is based. Now another
point of criticism being made against DeepSeek is that they have
copied their AI model entirely from OpenAI. OpenAI claims that
it has evidence showing that DeepSeek has used its proprietary
models to train itself. They claim to have distillation
evidence, where the output from bigger models is used to improve
the performance of smaller AI models to help smaller models give
the same results at lower costs. Someone tweeted about how the
robber has now been robbed. The thing is, to some extent,
ChetGPT has copied things from all over the internet to train
itself. Many books were used without the permission of the
authors to train their models.
This is the reason why 17
renowned writers including the Game of Thrones author George RR
Martin, filed a copyright infringement suit against OpenAI in
September 2023. The New York Times has also filed a case against
OpenAI and Microsoft. Other than this, 8 American newspapers and
many Canadian news outlets have filed cases on OpenAI. This is
the reason such memes went viral. Where OpenAI is fishing, and
it keeps the fishes it catches in its bucket, and DeepSeek gets
fishes from that bucket. As users, this is good for us, but at
the country and company level, we are seeing the beginning of AI
wars. In 2022, the American government imposed export control so
that the computer chips needed to train AI cannot be used by
other countries.
Especially Chinese AI
companies. It included NVIDIA's H100 chips, Chinese companies
were unable to buy this. This was problematic for DeepSeek
because they had to use NVIDIA's older computer chips to train
their AI models. America tried its best to prevent any other
country from developing foundational AI models like theirs by
denying them the computer chips required to train AI models.
And so the people working in DeepSeek were forced to innovate.
They created a software that uses only a fraction of the
resources and works more efficiently at only a fraction of the
cost. Friends, this is why NVIDIA's stock fell the most.
Because till a few months
ago, companies like Meta, OpenAI, and Google were claiming
that if we want to scale AI to a higher level, we will need
more chips, more energy, and more money. But DeepSeek has done
all of this by using less money, less energy, and outdated
computer chips. We should look at it as a motivation. If it
can be done in China, it can be done in India too. Indians
also have the capability for such innovations. We just need to
invest our focus and energy at the right place. On a personal
level, take advantage of this opportunity to upskill yourself
in the field of AI. If you haven't started yet, you are not
late. This is the right time to learn how to use AI to
increase your productivity and efficiency. Because there is no
doubt that those who ignore this technology now will be left
behind in the future.
It is difficult to imagine
how impactful will AI be in the future, globally, in every
sector, in every field. The link to my AI course is in the
description and pinned comment below. Don't forget to use the
coupon code DEEP46 to get 46% off. Now if you liked this video
and you want to know more about AI, then you can watch this
video where I have talked about AI videos. How software like
Sora AI is changing the world, you can click here to watch it.
Thank you very much!
0 Comments