
Meta|Jul 12, 2025 04:51
Many people are now saying that 'AI needs to be verifiable' and 'traceable to its source', but to be honest, most models simply cannot do it. A few days ago, I saw that @ Openledger HQ released a logic that supports attribution proof through Infini gram. After careful study, I found it quite interesting.
In the process of using AI, we ask it a question and it answers it fluently, but we cannot know how this sentence came about, whether it was "guessed" or actually seen in any training data. It's like asking someone a question and they tell you the answer, saying 'I think it's right' for every sentence, but never giving the source.
The Infini gram system recently proposed by OpenLedger is attempting to solve this problem by "marking" the source of each token in the model.
Simply put, traditional language models use n-gram technology
one ️⃣ Uni gram is looking at individual words
two ️⃣ Bi gram is a combination of two words
three ️⃣ Tri gram is a combination of three words
The above language logic will provide some context, but the content is very limited, only looking at existing questions and answering them based on small sentence associations, but ignoring the logic of the current problem in the entire conversation.
And Infini gram is another approach. It not only looks at existing problems, but also uses a method similar to "symbol matching" to compare each fragment output by the model with all possible "statements" in the training set, to see where it learned from and whose contribution it is related to.
For example, if you ask the model, "How do you determine if a wallet is a bot? ”
A typical model would tell you: 'This type of address typically trades multiple DEX contracts frequently in a very short amount of time.'. ”
Infini gram can tell you that the basis for judgment is to tell comments similar to those written by a public data researcher in a Dune dashboard or Github repo. Even able to locate which line it is.
The technology behind it is actually quite hardcore, using an ∞ - gram framework based on a suffix array - essentially, it pre indexes all segments in the training set and directly compares them when outputting, without the need to rerun the model or rely on gradient calculations. This means fast, stable, and reproducible.
For users, you can determine whether the model's response is "original" or "modified"
For data contributors, you can receive the rightful "attribution rights" and even "economic incentives"
For regulatory agencies, this provides an 'interpretable' interface
OpenLedger is not making models smarter, but more responsible - answering every sentence clearly: 'Why do I say that, where did I learn it from?'.
In my opinion, the Proof of Attribution system proposed by OpenLedger is a crucial step for "trusted AI" and may also be the core infrastructure for building data ownership and contribution traceability.
Share To
Timeline
HotFlash
APP
X
Telegram
CopyLink