Data Machina #239

Power of Truly Open Source AI. OLMo 7B. Nomic Embed. HugginChat Assistants. Eagle 7B. Objective Driven AI. Markov Chains & LMs. Transformer Circuits. Time-LLM. Exphormer. SymbolicAI. MambaTab.

The Power of Truly Open Source AI. The spin doctors of some big closed-AI companies have been busy inflating the “AGI is here soon, AGI will be an existential risk” bubble. But that thankfully that is deflating quickly, and backfiring somehow.

In the meantime, the open source AI community is stubbornly embarked upon releasing truly open source, efficient, smallish, powerful AI models that match or beat the closed AI models from big companies.

The reaction from these big closed AI companies: “Oh! open source AI models are dangerous, we need to regulate open source AI. And btw: We’re dropping the pricing trousers for using our closed models.” A recent report from Stanford HAI, totally debunks all the myths about dangerous open source AI, and the exaggerations coming from the closed AI companies.

Truly open source AI research and models is the only way forward to advance AI.

A new, truly open source language model. Two days ago, The Allen Institute for AI (AI2) released OLMo 7B, a truly open source SOTA language model which was trained with Databricks Mosaic Model Training. OLMo was released on Apache 2.0 license and comes with:

Full training data used, training code, training logs, and training metrics
Full model weights and 500+ model checkpoints
Fine-tuning code and adapted models

Checkout the blogpost, repo & tech report here: How to Get Started with OLMo SOTA truly open source LM.

A new, truly open source text embedding model. Also a few days ago, Nomic AI released Nomic Embed, a truly open source text embedding model, that is SOTA in 2 main benchmarks. Nomic Embed has a 8192 context-length, and beats Open AI text-embedding-3-small. The model is released under Apache 2.0 license and comes with the full training code, training data and model weights. Checkout the blogpost, repo and tech report here: Introducing Nomic Embed: A Truly Open Text Embedding Model.

Want to learn more on Nomic Embed? Checkout this vid from the guys at LangChain: How to build a long context RAG app with OSS components from scratch using Nomic Embed 8k, Mistral-instruct 32k and Ollama.

And speaking of text embedding models, Salesforce Research just released SFR-Embedding-Mistral model, now SOTA in the MTEB benchmark. The model was trained on top of 2 open source models: E5-mistral-7b-instruct and Mistral-7B-v0.1.

A new, fully open source SOTA multi-lingual model based on a RNN. Last week, a team of independent researchers backed by Stability AI and Eleuther AI, released Eagle 7B. The model beats all 7B open source models in the main multilingual benchmarks, and it’s super cheap compute-efficient. The beauty of this model is that it’s an attention-free, linear transformer built on the RWKV-v5 architecture, which is based on a RNN. Checkout the blogpost, repo, and demo here: Eagle 7B : Soaring past Transformers with 1 Trillion Tokens Across 100+ Languages (RWKV-v5.)

Yesterday, Hugging Face released HuggingChat Assistants (blogpost, demo), a nice alt to closed-model chat assistants, that uses 6 top open source models. Albeit rather basic yet, the idea is to have the open source community developing several powerful features already planned.

This is such a cool open source AI project! ADeus: An Open-Source AI Wearable Device for less that $100 (repo, sw/hw list.) It uses Ollama, Supabase, Coral AI microcontroller (soon to be replaced by Raspberry Py Zero.) Checkout the intro vid:

Have a nice week.

10 Link-o-Troned

Share Data Machina with your friends

the ML Pythonista

Deep & Other Learning Bits

AI/ DL ResearchDocs

MLOps Untangled

data v-i-s-i-o-n-s

AI startups -> radar

ML Datasets & Stuff

Postscript, etc

Keep up with the very latest in AI / Machine Learning research, projects & repos. A weekly digest packed with AI / ML insights & updates that you won’t find elsewhere

Submit your suggestions, feedback, posts and links to:

datamachina@datamachina.com

Data Machina #239

Power of Truly Open Source AI. OLMo 7B. Nomic Embed. HugginChat Assistants. Eagle 7B. Objective Driven AI. Markov Chains & LMs. Transformer Circuits. Time-LLM. Exphormer. SymbolicAI. MambaTab.

10 Link-o-Troned

the ML Pythonista

Deep & Other Learning Bits

AI/ DL ResearchDocs

MLOps Untangled

data v-i-s-i-o-n-s

AI startups -> radar

ML Datasets & Stuff

Postscript, etc

Trending Articles

NYALKAL Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

Bureau of Internal Revenue: Regional Offices (Directory)

£40 - 90mins - Professional Full Body Massage : Erdington, Birmingham

Man dies and another in serious condition after A614 crash between Driffield...

Updated How to win at Markstrat (Tips and Tricks)

Thief stole from ladies in her care

2015 Vivaro 1.6cdti "Engine Failure Hazard" Potential cause and a few bits of...

PURPLE RANGE LIVE IN MORAGAHAHENA 2015

Capitolo 10 Morale

Kanulanu Thaake Lyrics and translation | Manam (2014)

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

PCDA Circular No. 625 - One Rank One Pension to the Defence Forces Personnel:...

Head Teacher Arrested For Stealing Exam Papers

Practice Sheet of Right form of verbs for HSC Students

Elizabeth Perez Arrested by Miami-Dade County Corrections on Dec 26, 2019

Who’s been sentenced from Bozeat, Corby, Kettering, Rushden, Wellingborough...

ItsMyCode: [Solved] AttributeError: ‘str’ object has no attribute...

The Angry Birds Movie (Tamil Dubbed)

Problem Solving (PS) | If 10 cows can eat 10 bags of grain in 10 days, how...

Organic Fertilizers from Farm Waste Adopted by Farmers in the Philippines