Home Tech News Microsoft AI Researchers Accidentally Expose 38 Terabytes of Confidential Data

Microsoft AI Researchers Accidentally Expose 38 Terabytes of Confidential Data

by Norman Scott

Microsoft recently faced a significant security incident when 38 terabytes of private data were exposed due to a security flaw. The leak occurred on the company’s AI GitHub repository and happened accidentally during the publication of an open-source training data bucket. The leak included sensitive information such as secrets, keys, passwords, and internal Teams messages from two former employees.

The repository, named “robust-models-transfer,” has since been taken down. Prior to its removal, it contained source code and machine learning models related to a research paper from 2020. The leak was caused by a permissive SAS token, which is an Azure feature that allows data sharing in a way that is difficult to track and revoke. The token was misconfigured, granting full control permissions instead of read-only access.

The security issue was reported to Microsoft on June 22, 2023. The company’s investigation concluded that no unauthorized exposure of customer data occurred, and no other internal services were at risk. Microsoft took immediate action by revoking the SAS token and blocking external access to the storage account. The problem was resolved two days after responsible disclosure.

To prevent similar risks in the future, Microsoft has expanded its secret scanning service to include overly permissive SAS tokens. It has also identified and fixed a bug in its scanning system that caused the specific SAS URL in the repository to be flagged incorrectly as a false positive. The researchers involved in the report emphasized the importance of treating Account SAS tokens as sensitive as account keys themselves.

This incident is not the first time misconfigured Azure storage accounts have been exposed. In July 2022, JUMPSEC Labs uncovered a scenario where threat actors could exploit such accounts to gain access to enterprise on-premise environments. These incidents highlight the growing need for additional security measures as data scientists and engineers handle large amounts of data while working on AI projects.

Microsoft’s response to this incident, including prompt action and improvements to its security scanning services, showcases their commitment to addressing such vulnerabilities. As technology advances and AI solutions become more prevalent, it is crucial for companies to prioritize data security to protect sensitive information.

You may also like