This free “heuristics” AI model outperforms the OpenAI model — without the $20 monthly fee

The Novasky team, a "collaborative initiative led by students and consultants at UC Berkeley's Sky Computing Lab," has done what seemed impossible just a few months ago: they created a high-performance logic model for artificial intelligence for less than $450 in training costs.

Unlike traditional MBAs that simply predict the next word in a sentence, so-called “inference models” are designed to understand the problem, analyze different approaches to solving it, and implement the best solution. This makes training and configuring these models more difficult, because they have to “think” through the entire problem-solving process rather than just predicting the best response based on their training data set.

That's why a subscription to ChatGPT Pro, which runs the latest o3 inference model, costs $200 per month — OpenAI says that training and running these models is expensive.

Novaski's new model is dubbed Sky-T1,It can be compared to OpenAI The first inference modelknown as o1 — also known as Strawberry — was released in September 2024, and costs users $20 per month. By comparison, the Sky-T1 is a 32 billion-parameter model capable of running natively on home PCs – provided you have a large 24GB GPU, such as the RTX 4090 or the older 3090 Ti. And it's free.

We're not talking about a watered down version. Sky-T1-32B-Preview achieved 43.3% accuracy on AIME2024 math problems, beating OpenAI o1's 40%. On LiveCodeBench-Medium, it scored 56.8% compared to 54.9% for the o1 preview. The model maintains strong performance across other benchmarks as well, reaching 82.4% on Math500 problems where the o1 preview received 81.4%.

The timing couldn't be more interesting. The race to think about artificial intelligence has intensified recently, with OpenAI's o3 turns heads By outperforming humans on general intelligence standards, sparking debate about whether we are witnessing the beginning of artificial general intelligence or artificial general intelligence. Meanwhile, China DeepSeek v3 It made waves last year by outperforming OpenAI while using fewer resources and also being open source.

But Berkeley's approach is different. Instead of going after raw power, the team focused on making a powerful thinking model accessible to the masses at the lowest possible cost, building a model that was easy to configure and run on local computers without the need for expensive corporate hardware.

“Remarkably, the Sky-T1-32B-Preview was trained for less than $450, demonstrating that it is possible to replicate high-level inference capabilities affordably and efficiently. All code is open source,” Novasky said in its official statement. Blog post.

Currently, OpenAI does not provide free access to its inference models, although it does provide free access to a less sophisticated model.

The prospect of fine-tuning a reasoning model to excel in a specific domain for less than $500 is particularly compelling for developers, as such specialized models can outperform more powerful general-purpose models in targeted domains. This cost-effective specialization opens new possibilities for focused applications across scientific fields.

The team trained their model for just 19 hours using Nvidia H100 GPUs, following what they call a "recipe" that most developers should be able to replicate. Training data seems to be the greatest success of AI challenges.

“Our final data contains 5K coding data from APPs and TACO, and 10K mathematical data from AIME, MATH, and Olympiads subsets of the NuminaMATH dataset. In addition, we retain 1,000 scientific and puzzle data from STILL-2,” Novaski said.

The data set was diverse enough to help the model think flexibly across different types of problems. Novasky used QwQ-32B-Preview, another open source AI model, to generate data and fine-tune Qwen2.5-32B-Guidance Master Open Source. The result was a powerful new model with logic capabilities, which later became the Sky-T1.

One of the key findings of the team's work is that bigger is still better when it comes to AI models. Their experiments with smaller versions with 7 billion and 14 billion parameters showed only modest gains. The sweet spot turned out to be 32 billion parameters, which is large enough to avoid redundant output, but not so large as to be impractical.

If you want your own version of the model that outperforms the OpenAI o1, you can download Sky-T1 at Face hugging. If your GPU isn't powerful enough but you still want to try it, there are quantum versions ranging from 8-bit to 2-bit, so you can trade resolution for speed and test the next best thing on your potato PC.

Just be careful: the developers warn that such levels of quantization are "not recommended for most purposes."

Modified by Andrew Hayward

Smart in general Newsletter

A weekly AI journey narrated by Jane, a generative AI model.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *