Database driver Parquet

starred/dbeaver

Fork 0

mirror of https://github.com/dbeaver/dbeaver.git synced 2026-04-25 05:56:14 +03:00

Table of Contents

Table of contents

Parquet Files driver connection settings
Features and capabilities

Advanced SQL query capabilities
Structuring Parquet files with a schema

How to create a DDL file

Folder structure
Internal database
Additional features

Supported compression formats

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Note

: This feature is available in Lite, Enterprise, and Ultimate editions only.

Parquet Files driver connection settings
Features and capabilities
Supported compression formats

This guide provides instructions on how to set up and use Parquet files with DBeaver. The Parquet driver allows you to work with Parquet data as if it were in a database. You can retrieve data and apply filters, sorting, and other operations, even combining data from multiple files.

Important: When using the Parquet driver, all connected Parquet files are read-only. To make changes, you need to update the original files outside DBeaver.

Before you start, you need to create a connection in DBeaver and select the appropriate Parquet driver. If you haven’t done this, see our Database Connection article.

Parquet Files driver connection settings

This section describes how to set up a connection using the Parquet driver. The connection settings page requires the following fields:

Field	Description
Connect by (Path/URL)	Choose whether to connect using a local host path or a URL.
File paths	Specify the location of the Parquet file(s). Choose an action: - Edit: Modify an existing file or folder selection. - Add: Add a new file or folder. - Remove: Delete a selected file or folder. When Editing or Adding, choose from: - File: Select a single file. - Folder: Choose a directory containing multiple Parquet files. For more details, see Folder structure section. - Remote: Access a remote folder via Cloud Storage. This feature is available only in Ultimate and Team Editions.
Driver name	This field will be auto-filled based on your selected driver type.
Driver settings	If there are any specific driver settings, configure them here.

For details on driver properties, see File-based driver properties.

Tip: When using the Folder option, DBeaver scans the directory up to two levels deep for Parquet files. For more information, see folder structure. If you select a folder, DBeaver organizes files in schemas based on their directory structure.

Features and capabilities

Advanced SQL query capabilities

The Parquet driver supports the full range of SQL queries:

Simple queries (e.g., SELECT * FROM table): Data is read directly from the Parquet file.
Complex queries (e.g., using WHERE, JOIN, ORDER BY, GROUP BY): When a complex query is executed for the first time, the driver imports the entire Parquet file into an internal database to enable advanced SQL functions. Subsequent queries run faster because the data is already imported into internal database.

Structuring Parquet files with a schema

To control how DBeaver reads Parquet files, you can define a schema using a DDL (Data Definition Language) file.

How to create a DDL file

Create a .ddl file with the same name as your Parquet file, placing it in the same directory (e.g., employees.parquet and employees.parquet.ddl).
Write a schema using the CREATE TABLE statement:

CREATE TABLE employees
(
    id         INTEGER,
    name       TEXT NOT NULL,
    age        INTEGER,
    department TEXT
);

You can also use the WITH clause to set a data range - add firstRow and rowCount to your CREATE TABLE statement:

WITH (firstRow = 2, rowCount = 100)

firstRow - row number to start reading from (default: 1)
rowCount - maximum number of rows to read

Tip: You can also set firstRow and rowCount in the connection properties. DDL file settings take priority.

Important: If the DDL file contains errors, DBeaver will ignore it.

Folder structure

When working with a folder containing multiple Parquet files, DBeaver organizes them as follows:

Folder structure	Schema in DBeaver
Root files	`Default` schema
Subfolder files	Schema named after the subfolder
Files in deeper folders	Ignored

If your folder looks like this:

Data/
├── employees.parquet
├── sales.parquet
└── Reports/
    └── monthly.parquet
    └── yearly.parquet

DBeaver will create:

Default schema: employees, sales
Reports schema: monthly, yearly

Tip: To focus on specific files, consider selecting individual files or folders when configuring the connection.

Internal database

When you execute a complex query (such as WHERE, JOIN, GROUP BY, or ORDER BY.), on a Parquet file for the first time, the Parquet driver processes the data by importing it into a temporary internal SQLite database.

By default, this internal database stores data temporarily on the disk during your session and is cleared when DBeaver restarts. To speed up queries on the same file in future sessions, you can specify the internalDbFilePath option in the Driver properties tab (e.g., C:\User\database.db) to reuse the processed data.

Additional features

DBeaver provides additional features compatible with Parquet driver, but not exclusive to it:

Category	Feature
Data Transfer	Data Export
Data Visualization	Visual Query Builder
	Charts

Supported compression formats

DBeaver supports the following compression formats for Parquet files:

Format	Supported out of the box	Notes
Zstandard (`zstd`)	Yes	Works without additional setup
Snappy	No	Add the `snappy-java` `JAR` to the driver library
`LZ4_RAW`	No	Add the `aircompressor` `JAR` to the driver library

For details on adding a JAR files to the driver library, see Driver Manager.

DBeaver Documentation

Getting started
DBeaver configuration
Security
Connection settings
Databases support
- Classic
  - Apache Hive/Spark/Impala
  - Cassandra
  - ClickHouse
  - Couchbase
  - Greenplum
  - IBM Db2
  - InfluxDB
  - MariaDB
  - Microsoft SQL Server
  - MongoDB
  - MySQL
  - Netezza
  - Oracle
  - PostgreSQL
  - Redis
  - Salesforce
  - Teradata
  - Trino
  - Yellowbrick
- Cloud
  - AWS
    - Athena
    - DocumentDB
    - DynamoDB
    - Keyspaces
    - Neptune
    - Redshift
    - Timestream
  - Azure
  - Google
  - Snowflake
- Embedded
  - SQLite
- File drivers
  - Multi Source
  - CSV
  - JSON
  - Parquet
  - XLSX
  - XML
- Graph
  - Neo4j
Database Navigator
Data Editor
SQL Editor
Entity relation diagrams (ERD)
Cloud services
- Cloud Explorer
- Cloud Storage
AI Assistant
Data transfer and schema compare
Task management
- Task scheduler
- Composite tasks
Integrated tools
Administration
DBeaver Editions
- Standalone
- Cloud-hosted
  - Ultimate edition for AWS
FAQ
Development

DBeaver - Universal Database Manager

Table of contents