Latest

6/recent/ticker-posts

Header Ads Widget

AI2 Tulu-3 surpasses DeepSeek V3 🤖, Mistral Small 3 3️⃣, Figure on humanoid robot safety 🦺

AI2's Tulu-3, a 405B parameter open-weight language model, surpasses DeepSeek V3 and even OpenAI's GPT-4o on key benchmarks. ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌  ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 

TLDR

Together With Dataiku

TLDR AI 2025-01-31

See How Dataiku Drove 80% Time Savings on Manual Processes ⏰⬇️ (Sponsor)

Yes, we said it. Customer interviews & financial analysis found that a composite organization experienced benefits of $23.5 million over three years and an ROI of 413% with Dataiku. Plus, 80% time savings on manual processes, reduced costs, and improved decision making on key business activities. 

We call that a Return on AI! Get the full study & learn more here.

🚀

Headlines & Launches

AI2's New Model Surpasses DeepSeek V3 (5 minute read)

AI2's Tulu-3, a 405B parameter open-weight language model, surpasses DeepSeek V3 and even OpenAI's GPT-4o on key benchmarks.
Mistral Small 3 (6 minute read)

Mistral has released a very powerful 24B model that achieves strong performance, especially in multilingual data. It is the perfect size for deployment and strength.
Figure AI details plan to improve humanoid robot safety in the workplace (4 minute read)

Figure AI is establishing the Center for the Advancement of Humanoid Safety to address gaps in safety for robots in workplaces. Led by former Amazon Robotics safety engineer Rob Gruendel, the initiative will focus on testing and certifying robots to industrial safety standards. The company aims to provide transparency with quarterly updates on testing processes and improvements.
🧠

Research & Innovation

AI-Powered Bioacoustic Monitoring (19 minute read)

Acoupi is an open-source Python framework that simplifies the deployment of AI-based bioacoustic monitoring on low-cost devices. It integrates recording, processing, and real-time messaging.
3D Occupancy Prediction (9 minute read)

SliceOcc introduces a novel vertical slice representation for 3D semantic occupancy prediction in dense indoor environments. It achieves state-of-the-art performance using an RGB camera-based model.
Explainable Query Optimization (21 minute read)

Reqo is a new query optimization model that leverages Bi-GNN and probabilistic ML to improve cost estimation accuracy. It introduces an explainability technique that highlights the contribution of query subgraphs.
🧑‍💻

Engineering & Resources

Finally, an ad spot techies won't skip (Sponsor)

Tech inboxes are crowded, but TLDR stands out—and so do the ads inside. With over 5 million software devs, founders and other tech decision-makers tuning in daily, this is where ideas, tools, and products get noticed. Learn more about advertising with TLDR.
Bypassing LLM Guardrails with VIRUS (GitHub Repo)

VIRUS is a method designed for generating adversarial data that can bypass moderation systems and disrupt the safety alignment of large language models.
Rigging Chatbot Arena Rankings (GitHub Repo)

Researchers demonstrate that crowdsourced voting on Chatbot Arena can be manipulated to boost or lower model rankings using strategic rigging techniques, impacting the leaderboard's reliability.
Qwen2.5-VL Cookbooks (GitHub Repo)

Qwen2.5-VL, an amazing new vision language model, has a companion set of cookbooks that show how to use the model for various different tasks.
🎁

Miscellaneous

Artificial intelligence is transforming middle-class jobs. Can it also help the poor? (4 minute read)

Generative AI adoption is surging globally, with 66% of leaders prioritizing AI skills over traditional experience. However, access disparities hinder adoption in developing regions, where only a small percentage can leverage GenAI due to limited digital infrastructure. Addressing infrastructure and education gaps is crucial to prevent AI from widening global inequalities.
A New Way to Test AI for Sentience: Make It Confront Pain (6 minute read)

Researchers from Google DeepMind and LSE conducted a study using a text-based game to explore AI "sentience," testing LLMs by having them choose between options with varying pain and pleasure associations. Findings revealed some models prioritized avoiding pain over scoring points, suggesting a potential framework for assessing AI consciousness.
AI's coding promises, and OpenAI's longevity push (6 minute read)

The second wave of AI coding is advancing, allowing models to prototype, test, and debug code, potentially moving developers into more supervisory roles. OpenAI has entered longevity science with a model that designs proteins to transform cells into stem cells, claiming results surpassing human efforts. Cleaner jet fuels from alternative sources are gaining momentum, promising significant emission reductions and prompting industry shifts.

Quick Links

Omi raises $2m to build the future of AI wearables (1 minute read)

Omi has raised $2M to develop an AI wearable that enhances mind and productivity.
AI isn't very good at history (3 minute read)

A new benchmark, Hist-LLM, revealed that leading LLMs like GPT-4, Llama, and Gemini struggle with high-level historical questions.
Oscar hopeful 'The Brutalist' used AI during production (2 minute read)

The filmmakers of 'The Brutalist', driven by budget constraints, used AI from Respeecher to enhance actors' Hungarian pronunciation and generate architectural drawings.

Love TLDR? Tell your friends and get rewards!

Share your referral link below with friends to get free TLDR swag!
Track your referrals here.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of AI professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here or send a friend's resume to jobs@tldr.tech and get $1k if we hire them!

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Andrew Tan & Andrew Carr


Manage your subscriptions to our other newsletters on tech, startups, and programming. Or if TLDR AI isn't for you, please unsubscribe.

Post a Comment

0 Comments