
Colin Wu|Oct 19, 2025 01:08
I found a problem with using GPT, which is that its accuracy is lower when it comes to new things (so it is still difficult to replace media and news KOLs for the time being), and it even cites many obviously incorrect information, including the sources of some online traffic generating accounts. What is the reason for this? How to deal with it?
Analysis: The main ability of large models comes from pre trained corpora (books, papers, encyclopedias, open source web pages, etc.), which have a "time lag" between collection, cleaning, training, and publication.
Old information has been widely collected and mutually verified during model training, resulting in a large amount of redundant evidence and the formation of stable statistical correlations in the model; The 'new information' has not yet entered the training set, and the model can only rely on existing world knowledge to 'guess'. Once you don't let it verify online, it becomes easier for it to fabricate seemingly reasonable answers (commonly known as' illusions').
The first round of reports on major emergencies often contradict each other; There are many social media reports, few original evidence, and frequent withdrawals and corrections. During training, the model learns the "distribution of language consensus" rather than the "fact truth discriminator". When consensus has not yet been formed, the model is prone to "averaging" or adopting early but incorrect statements.
New messages are most easily amplified by traffic KOLs and forwarding chains in the early stages. If the model (or browser plugin) retrieves these highly interactive but low credibility posts, it will be misled by "popularity". In the Chinese context, there may also be translation errors, confusion of entities with the same name (abbreviations of names/institutions), screenshots without context, and old news updates, making retrieval and verification more difficult.
The essence of language models is to predict the most likely sequence of words in a given context. It is very good at "semantic fluency" and not naturally reliable for "fact checking". When you use more general or biased questions (such as "Is X because Y"), the model tends to organize "reasonable explanations" along the language structure, thus writing inferences as "factual statements".
In order to respond to speed and coverage, many systems do not enable the heavy link of "strong retrieval multi-source comparison evidence scoring" by default; Even if networking is allowed, some implementations only perform "single round search+single source reference" without cross validation or time consistency checks, resulting in "reference but unreliable".
Compared to older historical/encyclopedic/classic papers/established technical documents, the information structure is clear, conflicts are minimal, and has been repeated and corrected multiple times; This type of content appears repeatedly and mutually confirms in the training set, and the model has a very stable probability distribution for it, naturally resulting in higher accuracy.
Practical operation: Reduce the error rate to an acceptable level. At the level of questioning and constraint, it is explicitly required to only answer after verification and provide at least 2 independent authoritative sources+timestamps; Priority of designated sources: Official announcement/filing - Regulatory website - Frontline media - Author verification - Social media original post (for clues only); Mandatory time filtering: for example, "only reference sources updated in the past 48 hours and indicate the release time (UTC+8/UTC+9)"; Request to first provide the confidence level (high/medium/low) of the key points and conclusions, and then provide the details; For rumor related issues, the model is required to first classify them as Confirmed/Like/Rumor/Disputed, and explain the basis for the judgment.
Validate workflow
1. Define the key points of the conclusion (which facts need to be verified: person, event, amount, time, location). 2. Multi source retrieval (at least 3 sources, with priority given to domain authorities; for Chinese messages, cross reference to English/regulatory websites). 3. Look at the "original source" instead of second-hand interpretation (press release/SEC/company announcement/on chain transaction hash). 4. Align the timeline (event occurrence time vs. release/correction time, clearly stating 'as of YYYY-MM-DD HH: mm JST'). 5. Label uncertainties and gaps (which are still fermenting/only from a single source). 6. Generate a summary+citation block (with each citation placed after the corresponding conclusion).
Share To
Timeline
HotFlash
APP
X
Telegram
CopyLink