Uncovering Remote Code Execution Vulnerabilities in AI/ML Libraries (2026)

Imagine a world where the very tools designed to make AI smarter could be weaponized against us. That's the chilling reality exposed by recent discoveries in popular AI/ML libraries. But here's where it gets controversial: despite the potential for remote code execution (RCE) vulnerabilities, these libraries are widely used in models with millions of downloads. And this is the part most people miss: the issue isn't just about malicious code, but the inherent risks in how these libraries handle model metadata.

Researchers at Palo Alto Networks uncovered vulnerabilities in three prominent open-source AI/ML Python libraries: NVIDIA's NeMo, Salesforce's Uni2TS, and Apple's FlexTok. These libraries, published on GitHub, are integral to developing and deploying AI models across various applications. The core problem lies in how they process model files containing malicious metadata, which can lead to remote code execution.

NeMo, a PyTorch-based framework by NVIDIA, is designed for creating diverse AI/ML models and complex systems. Uni2TS, a PyTorch library by Salesforce, powers their Morai model for time series analysis. FlexTok, developed by Apple and the Swiss Federal Institute of Technology, enables AI models to process images efficiently. These libraries are not just niche tools; they are used in popular models on HuggingFace, with tens of millions of downloads collectively.

The vulnerabilities stem from the way these libraries use metadata to configure models and pipelines. A shared third-party library, Hydra, instantiates classes using this metadata. In vulnerable versions, the provided data is executed as code without proper validation. This oversight allows attackers to embed arbitrary code in model metadata, which is then executed when the model is loaded.

But why is this a big deal? Because it means an attacker could potentially take control of systems running these models, leading to data breaches, system compromises, or worse. As of December 2025, no malicious exploits have been detected in the wild, but the potential for harm is significant.

Palo Alto Networks responsibly disclosed these vulnerabilities to the affected vendors in April 2025. Here's how they responded:
- NVIDIA issued CVE-2025-23304, rated High severity, and released a fix in NeMo version 2.3.2.
- Apple and the Swiss Federal Institute of Technology updated FlexTok in June 2025 to address the issues.
- Salesforce issued CVE-2026-22584, also rated High severity, and deployed a fix on July 31, 2025.

These vulnerabilities were discovered using Prisma AIRS, a tool capable of identifying models exploiting these flaws and extracting their payloads. Palo Alto Networks customers are further protected through Cortex Cloud’s Vulnerability Management and the Unit 42 AI Security Assessment, which help mitigate AI-related risks.

The Bigger Picture: AI/ML Model Formats
AI/ML pipelines rely on saving complex internal states, such as learned weights and architecture definitions, as model artifacts. Python libraries traditionally used the pickle module to serialize these artifacts, but this approach is inherently risky due to its ability to execute arbitrary code. Newer formats like HuggingFace's safetensors aim to mitigate these risks by only storing model weights and metadata in JSON.

However, even these safer formats aren't foolproof. Security researchers at JFrog have demonstrated vulnerabilities in applications using these formats through techniques like XSS and path traversal. This highlights the ongoing challenge of securing AI/ML systems.

Technical Deep Dive
The vulnerabilities in NeMo, Uni2TS, and FlexTok all involve the hydra.utils.instantiate() function, which is used to create instances of classes from configuration data. Attackers exploit this by passing malicious callables, such as builtins.exec(), to achieve RCE. Hydra has since added a warning and a block-list mechanism to its documentation, but this can be bypassed using implicit imports.

NeMo, for instance, integrates with HuggingFace and supports loading models from the platform. The vulnerability exists in the restore_from() and from_pretrained() functions, which load model configurations without sanitization. NVIDIA addressed this by introducing a safe_instantiate function that validates _target_ values against an allow list of prefixes and checks the imported module's class and name.

Uni2TS uses the safetensors format but relies on a config.json file for model configurations. The library leverages Hydra's instantiate() function to decode specific arguments, making it vulnerable to RCE. Salesforce's fix includes an allow list and strict validation to ensure only permitted modules are executed.

FlexTok also uses safetensors and extends PyTorchModelHubMixin. It decodes metadata using ast.literal_eval(), which is safer than eval() but still susceptible to certain attacks. Apple and the Swiss Federal Institute of Technology resolved the issue by parsing configurations with YAML and adding an allow list of classes for Hydra's instantiate() function.

Controversial Take: The Proliferation of Supporting Libraries
While newer formats and updates address specific vulnerabilities, the rapid proliferation of supporting libraries creates a vast attack surface. As of October 2025, over a hundred Python libraries were identified on HuggingFace, nearly half of which use Hydra. This ecosystem's complexity makes it challenging to ensure security across all components.

Thought-Provoking Question: Are We Sacrificing Security for Innovation?
As AI/ML continues to advance, the pressure to innovate often outweighs security considerations. Should developers prioritize security over speed and functionality? How can the community balance these competing demands? Share your thoughts in the comments—let’s spark a discussion on the future of secure AI development.

Protection and Mitigation
Palo Alto Networks offers several solutions to protect against these threats:
- Prisma AIRS identifies vulnerable models and extracts payloads.
- Cortex Cloud’s Vulnerability Management helps identify and remediate vulnerabilities in cloud environments.
- Unit 42 AI Security Assessment assists organizations in reducing AI adoption risks and strengthening governance.

If you suspect a compromise, contact the Unit 42 Incident Response team immediately. Together, we can navigate the complexities of AI security and build a safer digital future.

Uncovering Remote Code Execution Vulnerabilities in AI/ML Libraries (2026)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Aracelis Kilback

Last Updated:

Views: 6094

Rating: 4.3 / 5 (64 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Aracelis Kilback

Birthday: 1994-11-22

Address: Apt. 895 30151 Green Plain, Lake Mariela, RI 98141

Phone: +5992291857476

Job: Legal Officer

Hobby: LARPing, role-playing games, Slacklining, Reading, Inline skating, Brazilian jiu-jitsu, Dance

Introduction: My name is Aracelis Kilback, I am a nice, gentle, agreeable, joyous, attractive, combative, gifted person who loves writing and wants to share my knowledge and understanding with you.