Patronus AI: Lightspeed US leads a $3 million investment, targeting the enterprise market to solve large model security issues.

CN
巴比特
Follow
1 year ago

Source: SenseAI Circle of Deep Thinking

"Large enterprises need to invest a huge amount of cost to detect AI errors to prevent them, and there is currently a lack of standard LLM testing framework. Therefore, the evaluation of LLM is not scalable and the effect is unsatisfactory. This also leads to enterprises showing a cautious attitude when deploying AI products.
Patronus AI hopes to build an automated evaluation and security platform for LLM, allowing enterprises to deploy AI products securely, and thus promote the widespread adoption of Gen-AI."

Sense Thinking

We attempt to propose more divergent reasoning and deep thinking based on the content of the article, and welcome discussions.

  • Pain points of enterprise-level large model applications: The transformer's autoregressive prediction of subsequent text is essentially a probabilistic model, and the evaluation of uncertainty in generated content is a key aspect of model validation. At the same time, academic metrics evaluation cannot adapt to enterprise-level domain applications, requiring a more productized multi-model automatic evaluation platform.

  • How to balance accuracy and uncertainty in production content, and amplify the longboard of LLM capabilities to meet business needs, is the art of model evaluation platforms and enterprise-level Gen-AI applications.

This article has a total of 2115 words, and it takes about 5 minutes to read carefully.

Users are adopting generative AI at an unprecedented speed. ChatGPT is the fastest-growing consumer product in history: it attracted over 100 million users in the first two months after its release. This year, AI has been the focus of attention. However, enterprises have shown a cautious attitude when facing rapid deployment of AI products. They are concerned about the potential errors that large language models may cause. Unfortunately, the current evaluation and inspection of language models are difficult to scale and inefficient. Patronus is committed to changing this situation, and their mission is to increase enterprises' confidence in generative AI.

Patronus AI's Founding Background

Patronus AI's Founding Background

Patronus' two founders, Rebecca and Anand, have known each other for nearly 10 years. After studying computer science together at the University of Chicago, Rebecca joined Meta AI (FAIR) to work on NLP and ALGN-related research, while Anand developed early causal inference and experimental foundations at Meta Reality Labs. At Meta, both of them personally experienced the difficulties of evaluating and explaining machine learning outputs—Rebecca from a research perspective and Anand from an application perspective.

When OpenAI CTO Ilya Sutskever announced the release of ChatGPT on Twitter last November, Anand forwarded the news to Rebecca within 5 minutes. They realized that this was a transformative moment, and enterprises would certainly quickly apply language models to various scenarios. Therefore, when Anand heard that the investment bank Piper Sandler, where his brother worked, banned internal access to OpenAI, he was very surprised. Over the next few months, they heard multiple times that traditional enterprises were very cautiously advancing this technology.

They realized that despite significant advances in NLP technology, there is still a significant gap from real enterprise applications. Everyone agrees that generative AI is very useful, but no one knows how to use it in the right way. They realized that in the coming years, AI evaluation and security will become the most important issues.

Team and Financing Situation

Team and Financing Situation

On September 14, 2023, Patronus announced that it had raised a $3 million seed round from Lightspeed Venture Partners, with participation from Factorial Capital, Replit CEO Amjad Masad, Gokul Rajaram, Michael Callahan, Prasanna Gopalakrishnan, Suja Chandrasekaran, and others. These investors have rich experience in investing and operating benchmark companies in enterprise security and AI fields.

Patronus' founding team comes from top ML (machine learning) application and research backgrounds, including Facebook AI Research (FAIR), Airbnb, Meta Reality Labs, and quantitative institutions. They have published NLP research papers at top AI conferences (NeurIPS, EMNLP, ACL), designed and launched Airbnb's first conversational AI assistant, pioneered causal inference at Meta Reality Labs, exited a quantitative hedge fund supported by Mark Cuban, and launched 0→1 products in rapidly growing startups.

Patronus' advisor is Douwe Kiela, CEO of Contextual AI and adjunct professor at Stanford University, who was also the former research director of HuggingFace. Douwe has made pioneering research in the NLP field, especially in evaluation, benchmarking, and RAG.

Problems Patronus AI Aims to Solve

The current evaluation of large language models is not scalable and has unsatisfactory effects for the following reasons:

  • Manual evaluation is slow and costly. Large enterprises need to spend millions of dollars to hire thousands of internal testers and external consultants to manually check for errors in AI. Engineers deploying AI products need to spend weeks manually creating test sets and checking AI outputs.

  • The inherent uncertainty of large language models makes predicting faults difficult. Large language models are probabilistic systems. Due to their unrestricted input range (within the context length limit), they provide a wide attack surface. Therefore, the causes of faults will be very complex.

  • There is currently no standard testing framework for large language models. Software testing has been deeply integrated into traditional engineering workflows, with frameworks for unit testing, large quality inspection teams, and release cycles, but enterprises have not developed similar processes for large language models. Continuous, scalable evaluation, identification and recording of errors in large language models, and performance benchmarking are crucial for the productized use of large language models.

  • Academic benchmarks cannot reflect real-world situations. Enterprises currently test large language models on academic benchmarks (such as HELM, GLUE, SuperGLUE, etc.), but these benchmarks cannot reflect real usage scenarios. Academic benchmarks tend to saturate and have issues with training data leakage.

  • The long tail phenomenon of AI failures is very severe, with the last 20% being extremely challenging. Adversarial attacks have shown that the security issues of large language models are far from being resolved. Even though general pre-trained language models demonstrate strong foundational capabilities, there are still many unknown failure cases. Patronus has conducted a significant amount of pioneering research in adversarial model evaluation and robustness, but this is just the beginning.

Mission of Patronus AI

The mission of Patronus AI is to increase enterprises' confidence in generative AI.

Patronus AI is the industry's first automated evaluation and security platform for large language models. Customers use Patronus AI to detect large language model errors on a large scale, enabling the secure deployment of AI products.

The platform automatically performs:

Scoring: Evaluating model performance and key metrics in real-world scenarios, such as hallucination and security.

Adversarial Testing: Automatically generating large-scale adversarial test sets.

Benchmarking: Comparing models to help customers determine the best model for specific use cases.

Patronus AI

Patronus aims to conduct frequent evaluations to adapt to continuously updated models, data, and user requirements. The ultimate goal is to obtain a credibility label. No company wants to see their users dissatisfied with unexpected failures, or even negative news and regulatory issues.

In addition, Patronus is seeking trusted third-party evaluation organizations, as users need an unbiased independent perspective. Patronus hopes to be seen as the Moody's of the AI industry.

Currently, Patronus' partners include leading AI companies Cohere, Nomic, and Naologic. In addition, several well-known companies in traditional industries such as financial services are in discussions with Patronus AI for pilot projects.

"Do not go gentle into that good night, Rage, rage against the dying of the light."

  • Dylan Thomas (1954)

Reference: Patronus Launch

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Bitget:注册返10%, 送$100
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink