This AI Compressed 'All Human Cooking' Into 2 Megabytes

CN
Decrypt
Follow
1 hour ago

Josef Chen says he compressed all of human cooking into two megabytes. That's a bold claim. It also checks out.


Chen, co-founder and CEO of London food AI startup KAIKAKU.AI, published a paper on arXiv this week, alongside researcher Jakub Radzikowski, presenting Epicure—three AI models trained on 4.14 million recipes pulled from 11 datasets across seven languages. The result: a map of 1,790 ingredients, each described by 300 numbers, that fits in your email attachment limit with room to spare.


"4.1M recipes. 7 languages. 1,790 ingredients. 300 dimensions," Chen wrote on X. "All of human cooking compressed into 2 megabytes."



It's not storing recipes


Before you imagine a two-megabyte USB stick jammed with stir-fry instructions, the model doesn't store a single recipe. The two megabytes is more a coordinate table than it is a cookbook.


Think of it as a map. Every ingredient gets a precise location based on how it behaves across millions of real dishes worldwide. The math is blunt: 1,790 ingredients × 300 numbers per ingredient × 4 bytes each ≈ 2.05 megabytes. Those numbers encode which ingredients appear together, which share flavor compounds, and which belong to the same culinary tradition. Once the model learns all that from the recipes, the recipes can go. The knowledge lives in the coordinates.


This is essentially the same trick word2vec pulled on language back in 2013, when Google researchers showed that you could do arithmetic with meaning. Epicure does that for food. Take beef, point it toward America and you’ll get bread, lettuce, maybe beer. Point it toward South East Asia and the model stops thinking about burgers and grills and starts thinking about soy sauce, ginger, and sesame oil.





This happens through what the paper describes as a steering operator called SLERP rotation. Take a seed ingredient—chicken—and rotate it mathematically toward a cuisine direction. At 30 degrees you start seeing Tex-Mex territory. At 60 degrees, chicken and beef converge on the same Mexican pantry: corn tortilla, salsa, monterey jack, poblano pepper. The angle is a dial between "stay near this ingredient" and "land somewhere new."


Epicure comes in three versions, and picking the right one depends on what you're actually asking. Cooc learns from recipe co-occurrence—what shows up together in real dishes. Chem learns from flavor chemistry—which ingredients share aroma compounds from the FlavorDB chemical database. Core is a mix between the previous two.


Ask Cooc what pairs with chocolate and you may get dessert-pantry companions: cocoa powder, vanilla, almond. Ask Chem and you get flavor-chemistry peers: toffee, fudge, ganache.


Same ingredient, different question. A chef looking for a substitute has different needs than a chef mapping flavor compatibility.


Why this isn't ChatGPT for food


Epicure has no general knowledge, no language generation, and no ability to hallucinate an ingredient it's never seen. It knows 1,790 ingredients. That's the whole world, as far as this model is concerned. What it gives up in breadth it gains in reliability—unlike recipe chatbots that will confidently suggest poison as a cooking ingredient if you push them the wrong way.


The previous state of the art here was FlavorGraph, a 2021 model that combined chemical data with the English-only Recipe1M+ dataset. Epicure brings in a multilingual corpus more than four times larger and cleans the vocabulary for efficiency.


Practical uses aren't hard to picture. A chef asks what the East Asian equivalent of a Mediterranean ingredient looks like. A food product developer asks what minimally processed swap lands in the same flavor zone as an additive. A recipe app needs a coherent substitution when an ingredient is missing from the pantry. That last one is the gap where purpose-built small models quietly outperform the big generalist ones.


The Epicure paper is a research release. The trained models are live on Hugging Face and an interactive ingredient map is publicly accessible at epicure.kaikaku.ai. They even released an MCP for your agents. Full training code is not released at this time.


免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Share To
APP

X

Telegram

Facebook

Reddit

CopyLink