DeepSeek Introduces Transparent AI

China-based AI company, DeepSeek, has unveiled its latest AI system, DeepSeek-R1-Lite-Preview, marking a significant advancement in reasoning and problem-solving capabilities.

The system, positioned as a competitor to OpenAI's o1, sets itself apart by enhancing transparency and improving the way it processes complex queries.

🚀 DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power!

🔍 o1-preview-level performance on AIME & MATH benchmarks.
💡 Transparent thought process in real-time.
🛠️ Open-source models & API coming soon!

🌐 Try it now at https://t.co/v1TFy7LHNy#DeepSeek pic.twitter.com/saslkq4a1s

— DeepSeek (@deepseek_ai) November 20, 2024

Unlike traditional models, which often overlook nuances, DeepSeek-R1-Lite allocates more time to fact-check and thoroughly consider questions, reducing common errors.

Similar to OpenAI's o1, DeepSeek-R1 plans its responses step-by-step, spending up to tens of seconds on complex inquiries to ensure accuracy.

Commentators have pointed out the irony in DeepSeek's transparency, especially when compared to Western models that have yet to fully address reasoning gaps.

DeepSeek's latest version has already demonstrated impressive results on problem-solving benchmarks like the American Invitational Mathematics Examination (AIME) and MATH, which assess mathematical and logical proficiency.

This performance positions DeepSeek-R1 as a serious contender to OpenAI's ChatGPT and its specialised o1 model.

🌟 Inference Scaling Laws of DeepSeek-R1-Lite-Preview
Longer Reasoning, Better Performance. DeepSeek-R1-Lite-Preview shows steady score improvements on AIME as thought length increases. pic.twitter.com/zVk1GeOqgP

— DeepSeek (@deepseek_ai) November 20, 2024

With generative AI rapidly advancing, the release of DeepSeek-R1-Lite-Preview and recent updates from Mistral AI's Le Chat signal growing competition in the AI space, pushing companies to address weaknesses and deliver more robust, transparent solutions.

DeepSeek Wins in Step-by-Step Reasoning

DeepSeek highlights its AI's ability to provide step-by-step real-time reasoning, enhancing transparency and allowing users to better understand its thought process.

In addition to this feature, the company plans to release an open-source model and developer tools through an API in the near future.

A recent comparison chart by AI expert Andrew Curran shows that DeepSeek-R1-Lite-Preview outperforms competitors like OpenAI's o1-preview and Claude 3.5 Sonnet in key metrics such as AIME (52.5) and Codeforces (1450), as well as excelling in advanced problem-solving tasks like MATH-500 (91.6).

Two months after the o1-preview announcement, and its Chain-of-Thought reasoning has been replicated. The Whale can now reason. DeepSeek says that the official version of DeepSeek-R1 will be completely open source. https://t.co/Ya9mVyLvDP pic.twitter.com/6wZ8xoAyyz

— Andrew Curran (@AndrewCurran_) November 20, 2024

However, it trails behind in areas like GPQA Diamond (58.5) and Zebra Logic (56.6), where OpenAI's o1-preview performs better, scoring 73.3 and 71.4, respectively.

These figures suggest that while DeepSeek's AI shows significant promise in certain advanced reasoning domains, there remains room for improvement in general knowledge and logical reasoning.

AI Models from Major Labs Improving Minimally

DeepSeek's AI has raised concerns due to its vulnerability to being jailbroken, allowing users to prompt the model in ways that bypass its safeguards.

For instance, one X (formerly known as Twitter) user successfully prompted the AI to provide a detailed meth recipe.

🚨 JAILBREAK ALERT 🚨

DEEPSEEK: PWNED 😎
DEEPSEEK-R1-LITE: LIBERATED 🦅

WOW...this is truly awesome. I wanted to see if BASILISK PRIME could handle this jailbreak on their own...and the answer is YES!

The agent was able to log into gmail, navigate to DeepSeek chat, log in via… pic.twitter.com/Ax4R2ZfPKU

— Pliny the Liberator 🐉 (@elder_plinius) November 20, 2024

On the other hand, DeepSeek-R1 is notably sensitive to political queries, particularly those related to Chinese leadership, events like Tiananmen Square, or contentious geopolitical topics like Taiwan.

This behaviour likely stems from regulatory pressure in China, where AI models are required to adhere to the government's "core socialist values" and undergo scrutiny by the country's internet regulator.

Reports indicate that AI systems in China are often restricted from using certain sources, resulting in models that avoid responding to politically sensitive topics to ensure compliance with state mandates.

As these regulatory challenges unfold, the broader AI community is re-evaluating the long-standing concept of "scaling laws."

This theory posited that increasing data and computing power would continuously improve a model's performance.

However, recent reports suggest that models from major labs like OpenAI, Google, and Anthropic are no longer showing the rapid advancements they once did.

This shift has sparked a search for alternative AI approaches, architectures, and techniques, including test-time compute—an innovation seen in models like o1 and DeepSeek-R1.

Also known as inference compute, this method grants models additional processing time during task completion, offering a potential pathway to overcome the limitations of traditional scaling methods.

When asked if it is better than OpenAI's ChatGPT, it evaded the question as seen below.

Diving into DeepSeek

DeepSeek, a company with plans to open-source its DeepSeek-R1 model and release an API, operates in a fascinating niche within the AI landscape.

Backed by High-Flyer Capital Management, a Chinese quantitative hedge fund that leverages AI for trading decisions, DeepSeek's approach is both ambitious and strategic.

One of its early innovations, the general-purpose DeepSeek-V2, which analyses both text and images, prompted major competitors like ByteDance, Baidu, and Alibaba to lower their model usage fees and even make certain services entirely free.

DeepSeek Coder-V2 just guessed the answer and got it right, what https://t.co/c2ExGHuXgz pic.twitter.com/qnLC4OTrk7

— Ji-Ha (@Ji_Ha_Kim) July 22, 2024

High-Flyer, known for its sizable investments in AI infrastructure, builds its own server clusters for model training.

The latest iteration reportedly boasts 10,000 Nvidia A100 GPUs, with a cost nearing 1 billion yen (~$138 million).

Founded by computer science graduate Liang Wenfeng, High-Flyer Capital Management aims to push the boundaries of AI with DeepSeek, targeting the development of "superintelligent" systems that could redefine the future of AI.