Moonshot AI launches Kimi K2.6 on Kimi Chat and APIs (2 minute read)
Kimi K2.6 features robust capabilities for coding and agentic tasks across chat and agent modes on kimi.com, with weights on Hugging Face and APIs via platform.moonshot.ai. The lineup includes K2.6 Instant for quick replies, K2.6 Thinking for complex reasoning, K2.6 Agent for document and web tasks, and K2.6 Agent Swarm for large-scale processing. Kimi K2.6 claims top open-source benchmark scores, surpassing competitors like GPT-5.4 and Claude Opus 4.6 in SWE-bench Multilingual and BrowseComp.
|
Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolving (2 minute read)
Qwen3.6-Max-Preview brings stronger world knowledge and instruction following along with significant agentic coding improvements across a wide range of benchmarks. The model is still under active development as researchers continue to iterate on it. Users can chat with the model interactively in Qwen Studio or call via API on Alibaba Cloud Model Studio API (coming soon).
|
Jeff Bezos Nears $10 Billion Funding for AI Lab, FT Says (2 minute read)
Jeff Bezos' AI startup, which is aiming to develop models with the capability of understanding the physical world, is close to finalizing a $10 billion funding round. The company, code-named Project Prometheus, will use AI to accelerate engineering and manufacturing in fields like aerospace and automobiles. It was set up with an initial $6.2 billion in funding, sourced in part by Bezos himself. The new funding round, which is expected to close soon but has not been finalized, will include JPMorgan and BlackRock as investors.
|
Chronicle – Codex (6 minute read)
Chronicle, available for ChatGPT Pro users on macOS, augments Codex by using screen context for memory building, helping Codex understand ongoing work with less context restatement. It stores unencrypted markdown memories on your device and requires macOS Screen Recording and Accessibility permissions. Be aware of prompt injection risks from screen content, and pause Chronicle during sensitive work to prevent unwanted context capture.
|
|
Modular Post-Training (14 minute read)
AllenAI describes a post-training approach that builds independent domain experts and combines them using a mixture-of-experts architecture. This allows models to gain new capabilities without retraining from scratch or degrading existing skills.
|
Improving Training Efficiency with Effective Training Time (19 minute read)
Meta introduced Effective Training Time (ETT%) to measure how much end-to-end training runtime is spent on actual learning, highlighting overhead like checkpointing and failures. This post outlines system and PyTorch-level optimizations that reduce wasted time and improve large-scale training efficiency.
|
Even 'uncensored' models can't say what they want (6 minute read)
Even uncensored models quietly nudge language away from the words that sentences actually want. There is no refusal or warning - the probability just moves in some instances. This is a mechanism that can be used to shape what billions of users read without them noticing.
|
|
Google adds subagents to Gemini CLI to handle parallel coding tasks (4 minute read)
Google's Gemini CLI now includes subagents to split coding tasks, enhancing parallel execution by delegating specific roles like frontend updates or testing. This enables multiple tasks to process simultaneously without interference, optimizing workflows for developers. Gemini's setup contrasts with systems like Claude Code, which extends agent coordination across multiple sessions.
|
DeepMind's TIPSv2 Vision-Language Encoder (6 minute read)
TIPSv2 improves vision-language pretraining by combining distillation, enhanced self-supervised objectives, and richer caption data. The resulting models achieve strong performance across multimodal tasks, with notable gains in zero-shot segmentation.
|
FlashDrive: Flash Vision-Language-Action Inference For Autonomous Driving (8 minute read)
FlashDrive is an algorithm-system co-design framework that reduces end-to-end latency to 159ms with negligible accuracy loss. VLA inference is a cascade of stages, each hiding a different form of redundancy. Temporal overlap in vision, low entropy in reasoning, velocity smoothness in flow matching, and numerical headroom in weights each yield to a targeted shortcut. The speedups compound to 4.5x with negligible accuracy loss because the redundancies are orthogonal.
|
|
OpenAI Stargate: where the US sites stand (9 minute read)
The US is in the model of an unprecedented build-out of AI infrastructure. Stargate is a $500 billion endeavor that involves OpenAI, Oracle, and SoftBank. The AI infrastructure project has seven locations across the US that are currently showing active development. They currently add up to over 9 gigawatts of planned capacity, enough to power the equivalent of 20 million Nvidia H100 GPUs - the total amount of AI compute in the world at the end of 2025. This post takes a look at each of the sites and how they are currently being developed.
|
|
Love TLDR? Tell your friends and get rewards! |
|
Share your referral link below with friends to get free TLDR swag!
|
|
|
| Track your referrals here. |
|
|
|
0 Comments