Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

m-ricย 
posted an update 1 day ago
view post
Post
1286
๐Ÿ“œ ๐Ž๐ฅ๐-๐ฌ๐œ๐ก๐จ๐จ๐ฅ ๐‘๐๐๐ฌ ๐œ๐š๐ง ๐š๐œ๐ญ๐ฎ๐š๐ฅ๐ฅ๐ฒ ๐ซ๐ข๐ฏ๐š๐ฅ ๐Ÿ๐š๐ง๐œ๐ฒ ๐ญ๐ซ๐š๐ง๐ฌ๐Ÿ๐จ๐ซ๐ฆ๐ž๐ซ๐ฌ!

Researchers from Mila and Borealis AI just have shown that simplified versions of good old Recurrent Neural Networks (RNNs) can match the performance of today's transformers.

They took a fresh look at LSTMs (from 1997!) and GRUs (from 2014). They stripped these models down to their bare essentials, creating "minLSTM" and "minGRU". The key changes:
โถ Removed dependencies on previous hidden states in the gates
โท Dropped the tanh that had been added to restrict output range in order to avoid vanishing gradients
โธ Ensured outputs are time-independent in scale (not sure I understood that well either, don't worry)

โšก๏ธ As a result, you can use a โ€œparallel scanโ€ algorithm to train these new, minimal RNNs, in parallel, taking 88% more memory but also making them 200x faster than their traditional counterparts for long sequences

๐Ÿ”ฅ The results are mind-blowing! Performance-wise, they go toe-to-toe with Transformers or Mamba.

And for Language Modeling, they need 2.5x fewer training steps than Transformers to reach the same performance! ๐Ÿš€

๐Ÿค” Why does this matter?

By showing there are simpler models with similar performance to transformers, this challenges the narrative that we need advanced architectures for better performance!

๐Ÿ’ฌย Franรงois Chollet wrote in a tweet about this paper:

โ€œThe fact that there are many recent architectures coming from different directions that roughly match Transformers is proof that architectures aren't fundamentally important in the curve-fitting paradigm (aka deep learning)โ€

โ€œCurve-fitting is about embedding a dataset on a curve. The critical factor is the dataset, not the specific hard-coded bells and whistles that constrain the curve's shape.โ€

Itโ€™s the Bitter lesson by Rich Sutton striking again: donโ€™t need fancy thinking architectures, just scale up your model and data!

Read the paper ๐Ÿ‘‰ย  Were RNNs All We Needed? (2410.01201)
  • 2 replies
ยท
MoritzLaurerย 
posted an update 2 days ago
view post
Post
2574
#phdone - I defended my PhD yesterday! A key lesson: it is amazing how open science and open source can empower beginners with limited resources:

I first learned about instruction-based classifiers like BERT-NLI 3-4 years ago, through the @HuggingFace ZeroShotClassificationPipeline. Digging deeper into this, it was surprisingly easy to find new datasets, newer base models, and reusable fine-tuning scripts on the HF Hub to create my own zeroshot models - although I didn't know much about fine-tuning at the time.

Thanks to the community effect of the Hub, my models were downloaded hundreds of thousands of times after a few months. Seeing my research being useful for people motivated me to improve and upload newer models. Leaving my contact details in the model cards led to academic cooperation and consulting contracts (and eventually my job at HF).

That's the power of open science & open source: learning, sharing, improving, collaborating.

I mean every word in my thesis acknowledgments (screenshot). I'm very grateful to my supervisors @vanatteveldt @CasAndreu @KasperWelbers for their guidance; to @profAndreaRenda and @CEPS_thinktank for enabling me to work part-time during the first year; to @huggingface for creating awesome tools and an awesome platform; and to many others who are not active on social media.

Links to the full thesis and the collection of my most recent models are below.

PS: If someone happens to speak Latin, let me know if my diploma contains some hidden Illuminati code or something :D
  • 2 replies
ยท
clemย 
posted an update about 22 hours ago
view post
Post
460
Very few people realize that most of the successful AI startups got successful because they were focused on open science and open-source for at least their first few years. To name but a few, OpenAI (GPT, GPT2 was open-source), Runway & Stability (stable diffusion), Cohere, Mistral and of course Hugging Face!

The reasons are not just altruistic, it's also because sharing your science and your models pushes you to build AI faster (which is key in a fast-moving domain like AI), attracts the best scientists & engineers and generates much more visibility, usage and community contributions than if you were 100% closed-source. The same applies to big tech companies as we're seeing with Meta and Google!

More startups and companies should release research & open-source AI, it's not just good for the world but also increases their probability of success!
  • 2 replies
ยท
TuringsSolutionsย 
posted an update 2 days ago
view post
Post
1494
Hyperdimensional Computing + Neural Network, tell your friends. To my knowledge, this is a completely novel implementation of HDC+Neural Networks. It would be a direct competitor to Transformers. It is off the charts more computationally efficient than Transformers could ever hope to be (which is why I tested it in the first place). It is far more similar to biological processes. My testing so far shows that it works surprisingly well. One surprise so far from my testing, adding an Attention Mechanism to the model does nothing at all. Weirdest thing. Like 1% performance increase. I guess Attention Is Not All You Need?


I made a Github repository for my Hyperdimensional Computing Neural Network: https://github.com/RichardAragon/HyperDimensionalComputingNeuralNetwork


I made a YouTube video showcasing the model and some of my experiments with it: https://youtu.be/Eg51o519zVM
ยท
zamalย 
posted an update 2 days ago
view post
Post
1211
๐Ÿš€ New Model Release: zamal/Molmo-7B-GPTQ-4bit ๐Ÿš€

Hello lovely community,

zamal/Molmo-7B-GPTQ-4bit model is now available for all! This model has been highly quantized, reducing its size by almost six times. It now occupies significantly less space and vRAM, making it perfect for deployment on resource-constrained devices without compromising performance.

Now we get:
Efficient Performance: Maintains high accuracy while being highly quantized.
Reduced Size: The model size is reduced by nearly six times, optimizing storage and memory usage.
Versatile Application: Ideal for integrating a powerful visual language model into various projects particularly multi rag chains.
Check it out!

  • 1 reply
ยท
Jawardย 
posted an update 1 day ago
view post
Post
533
New hobby: creating AI research paper arts lol, using pymupdf to extract text and add background then animate with runway:) code coming soonโ€ฆ
louisbrulenaudetย 
posted an update 2 days ago
view post
Post
1524
My biggest release of the year: a series of 7 specialized embedding models for information retrieval within tax documents, is now available for free on Hugging Face ๐Ÿค—

These new models aim to offer an open source alternative for in-domain semantic search from largeย text corpora and will improve RAG systems and context addition for large language models.

Trained on more than 43 million tax tokens derived from semi-synthetic and raw-synthetic data, enriched by various methods (in particular MSFT's evol-instruct by @intfloat ), and corrected by humans, this project is the fruit of hundreds of hours of work and is the culmination of a global effort to open up legal technologies that has only just begun.

A big thank you to Microsoft for Startups for giving me access to state-of-the-art infrastructure to train these models, and to @julien-c , @clem ๐Ÿค—, @thomwolf and the whole HF team for the inference endpoint API and the generous provision of Meta LLama-3.1-70B. Special thanks also to @tomaarsen for his invaluable advice on training embedding models and Loss functions โค๏ธ

Models are available on my personal HF page, into the Lemone-embed collection: louisbrulenaudet/lemone-embed-66fdc24000df732b395df29b
  • 1 reply
ยท
sequelboxย 
posted an update 2 days ago
m-ricย 
posted an update 2 days ago
view post
Post
1051
๐Ÿ‡จ๐Ÿ‡ณโ›ต๏ธ ๅ‡บๆตท: Chinese AI is expanding globally

Fact: Chinese LLMs are heavily underrated, for instance recently the excellent Deepseek-v2.5 or Qwen models.

Luckily for us, @AdinaY just wrote an excellent blog post explaining the Chinese AI ecosystem!

My key takeaways:

Since Google, OpenAI and Anthropic models are not available in China, local companies are fighting for the market. A really good market - AI has much higher penetration there than in the rest of the world, both with companies and individual users!

๐Ÿ’ฐ But since Deepseek heavily cut prices in May 24, this spiraled into a price war that created a cut-throat environment with unsustainably low prices.

๐Ÿ“‹ On top of this, the local regulation is stringent: models must undergo licensing from a local censor (the Cyberspace Administration of China), that for instance requires models to refuse answering certain questions on the CCP. Although this is certainly simpler to implement than certain condition of the European AI Act.

๐Ÿ’ธ If this wasn't enough, VC investment in AI is drying out: By mid-2024, Chinese AI startups raised approximately $4.4 billion, vs $55B for US startups just in Q2 24.

๐Ÿ“ฑ To get profitability companies have shifted from foundational models to model + application, for instance PopAI from [01.AI](http://01.ai/) with millions of users and high profitability.

โ›๏ธ They also try to drill down specific industries: but these niches are also getting crowded.

โžก๏ธ Since their home market is becoming both too crowded and unhospitable, Chinese companies are now going for international market, "Sailing abroad" following the expression consacred for Zheng He's legendary journey in 1500.

There, they'll have to adapt to different infrastructures and regulations, but they have bright prospects for growth!

Read her post ๐Ÿ‘‰ย https://huggingface.co/blog/AdinaY/chinese-ai-global-expansion
YerbaPageย 
posted an update 2 days ago
view post
Post
1899
We propose MGDebugger, a hierarchical bottom-up LLM code debugger ๐Ÿ”ฅ that can fix bugs from low-level syntax errors to high-level algorithmic flaws.

It achieves an โญ๏ธ 18.9% improvement in accuracy over seed generations in HumanEval and a โญ๏ธ 97.6% repair success rate in HumanEvalFix.

Paper available at https://arxiv.org/abs/2410.01215.
Code and demo available at https://github.com/YerbaPage/MGDebugger.
  • 1 reply
ยท