Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

m-ric

posted an update 1 day ago

Post

1286

📜 𝐎𝐥𝐝-𝐬𝐜𝐡𝐨𝐨𝐥 𝐑𝐍𝐍𝐬 𝐜𝐚𝐧 𝐚𝐜𝐭𝐮𝐚𝐥𝐥𝐲 𝐫𝐢𝐯𝐚𝐥 𝐟𝐚𝐧𝐜𝐲 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐞𝐫𝐬!

Researchers from Mila and Borealis AI just have shown that simplified versions of good old Recurrent Neural Networks (RNNs) can match the performance of today's transformers.

They took a fresh look at LSTMs (from 1997!) and GRUs (from 2014). They stripped these models down to their bare essentials, creating "minLSTM" and "minGRU". The key changes:
❶ Removed dependencies on previous hidden states in the gates
❷ Dropped the tanh that had been added to restrict output range in order to avoid vanishing gradients
❸ Ensured outputs are time-independent in scale (not sure I understood that well either, don't worry)

⚡️ As a result, you can use a “parallel scan” algorithm to train these new, minimal RNNs, in parallel, taking 88% more memory but also making them 200x faster than their traditional counterparts for long sequences

🔥 The results are mind-blowing! Performance-wise, they go toe-to-toe with Transformers or Mamba.

And for Language Modeling, they need 2.5x fewer training steps than Transformers to reach the same performance! 🚀

🤔 Why does this matter?

By showing there are simpler models with similar performance to transformers, this challenges the narrative that we need advanced architectures for better performance!

💬 François Chollet wrote in a tweet about this paper:

“The fact that there are many recent architectures coming from different directions that roughly match Transformers is proof that architectures aren't fundamentally important in the curve-fitting paradigm (aka deep learning)”

“Curve-fitting is about embedding a dataset on a curve. The critical factor is the dataset, not the specific hard-coded bells and whistles that constrain the curve's shape.”

It’s the Bitter lesson by Rich Sutton striking again: don’t need fancy thinking architectures, just scale up your model and data!

Read the paper 👉 Were RNNs All We Needed? (2410.01201)

2 replies

MoritzLaurer

posted an update 2 days ago

Post

2574

#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D

2 replies

clem

posted an update about 22 hours ago

Post

460

Very few people realize that most of the successful AI startups got successful because they were focused on open science and open-source for at least their first few years. To name but a few, OpenAI (GPT, GPT2 was open-source), Runway & Stability (stable diffusion), Cohere, Mistral and of course Hugging Face!

The reasons are not just altruistic, it's also because sharing your science and your models pushes you to build AI faster (which is key in a fast-moving domain like AI), attracts the best scientists & engineers and generates much more visibility, usage and community contributions than if you were 100% closed-source. The same applies to big tech companies as we're seeing with Meta and Google!

More startups and companies should release research & open-source AI, it's not just good for the world but also increases their probability of success!

2 replies

TuringsSolutions

posted an update 2 days ago

Post

1494

Hyperdimensional Computing + Neural Network, tell your friends. To my knowledge, this is a completely novel implementation of HDC+Neural Networks. It would be a direct competitor to Transformers. It is off the charts more computationally efficient than Transformers could ever hope to be (which is why I tested it in the first place). It is far more similar to biological processes. My testing so far shows that it works surprisingly well. One surprise so far from my testing, adding an Attention Mechanism to the model does nothing at all. Weirdest thing. Like 1% performance increase. I guess Attention Is Not All You Need?

I made a Github repository for my Hyperdimensional Computing Neural Network: https://github.com/RichardAragon/HyperDimensionalComputingNeuralNetwork

I made a YouTube video showcasing the model and some of my experiments with it: https://youtu.be/Eg51o519zVM

4 replies

zamal

posted an update 2 days ago

Post

1211

🚀 New Model Release: zamal/Molmo-7B-GPTQ-4bit 🚀

Hello lovely community,

zamal/Molmo-7B-GPTQ-4bit model is now available for all! This model has been highly quantized, reducing its size by almost six times. It now occupies significantly less space and vRAM, making it perfect for deployment on resource-constrained devices without compromising performance.

Now we get:
Efficient Performance: Maintains high accuracy while being highly quantized.
Reduced Size: The model size is reduced by nearly six times, optimizing storage and memory usage.
Versatile Application: Ideal for integrating a powerful visual language model into various projects particularly multi rag chains.
Check it out!

1 reply

Jaward

posted an update 1 day ago

Post

533

New hobby: creating AI research paper arts lol, using pymupdf to extract text and add background then animate with runway:) code coming soon…

louisbrulenaudet

posted an update 2 days ago

Post

1524

My biggest release of the year: a series of 7 specialized embedding models for information retrieval within tax documents, is now available for free on Hugging Face 🤗

These new models aim to offer an open source alternative for in-domain semantic search from large text corpora and will improve RAG systems and context addition for large language models.

Trained on more than 43 million tax tokens derived from semi-synthetic and raw-synthetic data, enriched by various methods (in particular MSFT's evol-instruct by @intfloat ), and corrected by humans, this project is the fruit of hundreds of hours of work and is the culmination of a global effort to open up legal technologies that has only just begun.

A big thank you to Microsoft for Startups for giving me access to state-of-the-art infrastructure to train these models, and to @julien-c , @clem 🤗, @thomwolf and the whole HF team for the inference endpoint API and the generous provision of Meta LLama-3.1-70B. Special thanks also to @tomaarsen for his invaluable advice on training embedding models and Loss functions ❤️

Models are available on my personal HF page, into the Lemone-embed collection: louisbrulenaudet/lemone-embed-66fdc24000df732b395df29b

1 reply

sequelbox

posted an update 2 days ago

Post

911

NEW releases for today:

- We've brought our new Esper 2 model to Llama 3.2! The DevOps-first Esper finetunes use our newest open source datasets. Get the new Esper: ValiantLabs/Llama3.2-3B-Esper2
- Some new merged models, combining Shining Valiant 2 with the other Build Tools:
- sequelbox/Llama3.1-8B-PlumCode
- sequelbox/Llama3.1-8B-PlumChat
- sequelbox/Llama3.1-8B-PlumMath

more to come soon :)

1 reply

m-ric

posted an update 2 days ago

Post

1051

🇨🇳⛵️ 出海: Chinese AI is expanding globally

Fact: Chinese LLMs are heavily underrated, for instance recently the excellent Deepseek-v2.5 or Qwen models.

Luckily for us, @AdinaY just wrote an excellent blog post explaining the Chinese AI ecosystem!

My key takeaways:

Since Google, OpenAI and Anthropic models are not available in China, local companies are fighting for the market. A really good market - AI has much higher penetration there than in the rest of the world, both with companies and individual users!

💰 But since Deepseek heavily cut prices in May 24, this spiraled into a price war that created a cut-throat environment with unsustainably low prices.

📋 On top of this, the local regulation is stringent: models must undergo licensing from a local censor (the Cyberspace Administration of China), that for instance requires models to refuse answering certain questions on the CCP. Although this is certainly simpler to implement than certain condition of the European AI Act.

💸 If this wasn't enough, VC investment in AI is drying out: By mid-2024, Chinese AI startups raised approximately $4.4 billion, vs $55B for US startups just in Q2 24.

📱 To get profitability companies have shifted from foundational models to model + application, for instance PopAI from [01.AI](http://01.ai/) with millions of users and high profitability.

⛏️ They also try to drill down specific industries: but these niches are also getting crowded.

➡️ Since their home market is becoming both too crowded and unhospitable, Chinese companies are now going for international market, "Sailing abroad" following the expression consacred for Zheng He's legendary journey in 1500.

There, they'll have to adapt to different infrastructures and regulations, but they have bright prospects for growth!

Read her post 👉 https://huggingface.co/blog/AdinaY/chinese-ai-global-expansion

YerbaPage

posted an update 2 days ago

Post

1899

We propose MGDebugger, a hierarchical bottom-up LLM code debugger 🔥 that can fix bugs from low-level syntax errors to high-level algorithmic flaws.

It achieves an ⭐️ 18.9% improvement in accuracy over seed generations in HumanEval and a ⭐️ 97.6% repair success rate in HumanEvalFix.

Paper available at https://arxiv.org/abs/2410.01215.
Code and demo available at https://github.com/YerbaPage/MGDebugger.

1 reply

Recently active users