Skip to main content
Skip to main content

Overview

Differences between updating data in ClickHouse and OLTP databases

When it comes to handling updates, ClickHouse and OLTP databases diverge significantly due to their underlying design philosophies and target use cases. For example, PostgreSQL, a row-oriented, ACID-compliant relational database, supports robust and transactional update and delete operations, ensuring data consistency and integrity through mechanisms like Multi-Version Concurrency Control (MVCC). This allows for safe and reliable modifications even in high-concurrency environments.

Conversely, ClickHouse is a column-oriented database optimized for read-heavy analytics and high throughput append-only operations. While it does natively support in-place updates and delete, they must be used carefully to avoid high I/O. Alternatively, tables can be restructured to convert delete and update into appended operations where they are processed asynchronously and/or at read time, thus reflecting the focus on high-throughput data ingestion and efficient query performance over real-time data manipulation.

Methods to update data in ClickHouse

There are several ways to update data in ClickHouse, each with its own advantages and performance characteristics. You should select the appropriate method based on your data model and the amount of data you intend to update.

For both operations, if the number of submitted mutations constantly exceeds the number of mutations that are processed in the background over some time interval, the queue of non-materialized mutations that have to be applied will continue to grow. This will result in the eventual degradation of SELECT query performance.

In summary, update operations should be issued carefully, and the mutations queue should be tracked closely using the system.mutations table. Do not issue updates frequently as you would in OLTP databases. If you have a requirement for frequent updates, see ReplacingMergeTree.

MethodSyntaxWhen to use
Update mutationALTER TABLE [table] UPDATEUse when data must be updated to disk immediately (e.g. for compliance). Negatively affects SELECT performance.
Lightweight updatesUPDATE [table] SET ... WHEREUse for updating small amounts of data (up to ~10% of table). Creates patch parts for immediate visibility without rewriting entire columns. Adds overhead to SELECT queries but has predictable latency. Currently experimental.
On-the-fly updatesALTER TABLE [table] UPDATEEnable using SET apply_mutations_on_fly = 1;. Use when updating small amounts of data. Rows are immediately returned with updated data in all subsequent SELECT queries but are initially only internally marked as updated on disk.
ReplacingMergeTreeENGINE = ReplacingMergeTreeUse when updating large amounts of data. This table engine is optimized for data deduplication on merges.
CollapsingMergeTreeENGINE = CollapsingMergeTree(Sign)Use when updating individual rows frequently, or for scenarios where you need to maintain the latest state of objects that change over time. For example, tracking user activity or article stats.

Update mutations

Update mutations can be issued through a ALTER TABLE ... UPDATE command e.g.

These are extremely IO-heavy, rewriting all the partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. that match the WHERE expression. There is no atomicityAtomicity ensures that a transaction (a series of database operations) is treated as a single, indivisible unit. This means that either all operations within the transaction occur, or none do. An example of an atomic transaction is transferring money from one bank account to another. If either step of the transfer fails, the transaction fails, and the money stays in the first account. Atomicity ensures no money is lost or created. to this process - partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. are substituted for mutated partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. as soon as they are ready, and a SELECT query that starts executing during a mutationA mutation in ClickHouse refers to an operation that modifies or deletes existing data in a table, typically using commands like ALTER TABLE ... UPDATE or ALTER TABLE ... DELETE. Mutations are implemented as asynchronous background processes that rewrite entire data parts affected by the change, rather than modifying rows in place. will see data from partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. that have already been mutated along with data from partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. that have not been mutated yet. Users can track the state of the progress via the systems.mutations table. These are I/O intense operations and should be used sparingly as they can impact clusterA collection of nodes (servers) that work together to store and process data. SELECT performance.

Read more about update mutations.

Lightweight updates

Lightweight updates are a ClickHouse feature that updates rows using "patch partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key." - special data partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. containing only the updated columns and rows, rather than rewriting entire columns like traditional mutations. The Lightweight UPDATEA lightweight update in ClickHouse is an experimental feature that allows you to update rows in a table using standard SQL UPDATE syntax, but instead of rewriting entire columns or data parts (as with traditional mutations), it creates "patch parts" containing only the updated columns and rows. These updates are immediately visible in SELECT queries through patch application, but the physical data is only updated during subsequent merges. Key characteristics:

  • Uses the standard UPDATE syntax and creates patch partsA physical file (or directory) on disk that stores a portion of the table's data. This is different from a partition, which is a logical division of a table's data that is created using a partition key. immediately without waiting for merges
  • Updated values are immediately visible in SELECT queries through patch application, but physically materialized only during subsequent merges
  • Designed for small updates (up to ~10% of table) with predictable latency
  • Adds overhead to SELECT queries that need to apply patches, but avoids rewriting entire columns

For more details see "The Lightweight UPDATE Statement"

On-the-fly Updates

On-the-fly updates provide a mechanism to update rows such that they are updated immediately, and subsequent SELECT queries will automatically return with the changed values (this incurs an overhead and will slow queries). This effectively addresses the atomicityAtomicity ensures that a transaction (a series of database operations) is treated as a single, indivisible unit. This means that either all operations within the transaction occur, or none do. An example of an atomic transaction is transferring money from one bank account to another. If either step of the transfer fails, the transaction fails, and the money stays in the first account. Atomicity ensures no money is lost or created. limitation of normal mutations. We show an example below:

Note that for on-the-fly updates, a mutationA mutation in ClickHouse refers to an operation that modifies or deletes existing data in a table, typically using commands like ALTER TABLE ... UPDATE or ALTER TABLE ... DELETE. Mutations are implemented as asynchronous background processes that rewrite entire data parts affected by the change, rather than modifying rows in place. is still used to update the data; it is just not materialized immediately and applied during SELECT queries. It will still be applied in the background as an asynchronous process and incurs the same heavy overhead as a mutationA mutation in ClickHouse refers to an operation that modifies or deletes existing data in a table, typically using commands like ALTER TABLE ... UPDATE or ALTER TABLE ... DELETE. Mutations are implemented as asynchronous background processes that rewrite entire data parts affected by the change, rather than modifying rows in place. and thus is an I/O intense operation that should be used sparingly. The expressions that can be used with this operation are also limited (see here for details).

Read more about on-the-fly updates.

CollapsingMergeTree

Stemming from the idea that updates are expensive but inserts can be leveraged to perform updates, the CollapsingMergeTree table engineTable engines in ClickHouse determine how data is written, stored and accessed. MergeTree is the most common table engine, and allows quick insertion of large amounts of data which get processed in the background. can be used together with a sign column as a way to tell ClickHouse to update a specific row by collapsing (deleting) a pair of rows with sign 1 and -1. If -1 is inserted for the sign column, the whole row will be deleted. If 1 is inserted for the sign column, ClickHouse will keep the row. Rows to update are identified based on the sorting keyIn ClickHouse, a sorting key defines the physical order of rows on disk. If you do not specify a primary key, ClickHouse uses the sorting key as the primary key. If you specify both, the primary key must be a prefix of the sorting key. used in the ORDER BY () statement when creating the table.

Note

The approach above for updating requires users to maintain state client side. While this is most efficient from ClickHouse's perspective, it can be complex to work with at scale.

We recommend reading the documentation for CollapsingMergeTree for a more comprehensive overview.

More resources