A critical high-severity remote code execution (RCE) vulnerability has been found in the Apache Parquet Java library, specifically affecting the parquet-avro module. Tracked as CVE-2025-46762, this flaw can allow attackers to execute arbitrary code during schema parsing, posing significant risk to data platforms and analytics pipelines that rely on Parquet files.
What is Apache Parquet Java
Apache Parquet is a columnar storage file format optimized for large-scale data processing. It’s widely used across big data ecosystems including Apache Spark, Apache Hive, and Presto, for its efficient storage, compression, and performance on analytical workloads.
The Apache Parquet Java library provides read/write support for Parquet files in Java-based environments. One of its modules, parquet-avro, bridges Parquet with the Apache Avro data serialization system allows users to serialize and deserialize Parquet files using Avro schemas.
CVE-2025-46762
When using the parquet-avro module with either the “specific” or “reflect” Avro data models, schema parsing can lead to execution of arbitrary code. This happens during deserialization from a Parquet file’s embedded Avro schema metadata. Essentially, if the file metadata contains malicious class references and those classes fall under the trusted packages, the Java runtime may execute them, resulting in a full RCE.
While version 1.15.1 attempted to introduce trusted package restrictions, its default configuration still leaves the door open to exploitation.
Mitigation of CVE-2025-46762
Apache recommends two mitigation strategies for CVE-2025-46762, either of which fully resolves the issue:
-
- Upgrade to version 1.15.2 or later
Updated version properly restricts schema deserialization to only explicitly trusted packages. - Set the system property
–Dorg.apache.parquet.avro.SERIALIZABLE_PACKAGES=""
- Upgrade to version 1.15.2 or later
This disables all trusted packages, preventing execution of classes from file metadata in 1.15.1. If you’re using the generic model in Avro, you are not exposed but upgrading is still encouraged for consistency and future-proofing.
Conclusion
Apache Parquet is deeply embedded in modern data platforms. Whether you’re running Spark jobs, streaming analytics, or cloud-native data lakes, this CVE-2025-46762 underscores a critical risk when dealing with deserialization of untrusted input, a common attack vector in Java ecosystems.
Organizations that use Parquet as a data exchange format must audit their usage of the parquet-avro module, if they rely on specific or reflect models for Avro serialization.
Source:hxxps[://]seclists[.]org/oss-sec/2025/q2/103
Follow Cybersecurity88 on X and Linkedin for the latest cybersecurity news
