Hugging Face reveals “unauthorized access” to AI model hosting platform

Hugging Face has disclosed a data breach affecting its Spaces platform, a place where developers can create, share, and host different Artificial Intelligence (AI) models, and resources. 

In an announcement posted on the community’s website, the company said it detected unauthorized access to its Spaces platform, “specifically related to Spaces secrets”.

“As a consequence, we have suspicions that a subset of Spaces’ secrets could have been accessed without authorization,” the notification states.

Migrating to fine-grained access tokens

To address the problem, the team did what was expected: revoked a number of Hugging Face tokens present in the secrets, and notified affected individuals about the change. It also reported the incident to law enforcement agencies, as well as data protection authorities.

Unfortunately, Hugging Face did not say how many people might have been affected by the breach.

Besides the ones who were directly notified, Hugging Face advised everyone to refresh any key or token they might have, and even consider switching the tokens to fine-grained access tokens which it already deems as the new default. 

“We are working with outside cyber security forensic specialists, to investigate the issue as well as review our security policies and procedures,” the notification concluded.

Hugging Face is a company and an open-source community that focuses on natural language processing (NLP) and machine learning. It is known for its transformative work in making state-of-the-art NLP models accessible and user-friendly. As such, it is often targeted by threat actors looking to compromise different AI models.

Thus, Hugging Face has made “significant improvements to the security of the Spaces infrastructure” over the past few days, “including completely removing org tokens (resulting in increased traceability and audit capabilities), implementing key management service (KMS) for Spaces secrets, robustifying and expanding our system’s ability to identify leaked tokens and proactively invalidate them, and more generally improving our security across the board.” 

The company also plans completely deprecating “classic” read and write tokens, the moment fine-grained access tokens reach feature parity. 

More from TechRadar Pro