<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://duckdb.org/feed.xml" rel="self" type="application/atom+xml" /><link href="https://duckdb.org/" rel="alternate" type="text/html" /><updated>2026-03-09T22:43:01+00:00</updated><id>https://duckdb.org/feed.xml</id><title type="html">DuckDB</title><subtitle>DuckDB is an in-process SQL database management system focused on analytical query processing. It is designed to be easy to install and easy to use. DuckDB has no external dependencies. DuckDB has bindings for C/C++, Python, R, Java, Node.js, Go and other languages.</subtitle><author><name>GitHub User</name><email>your-email@domain.com</email></author><entry><title type="html">Announcing DuckDB 1.5.0</title><link href="https://duckdb.org/2026/03/09/announcing-duckdb-150.html" rel="alternate" type="text/html" title="Announcing DuckDB 1.5.0" /><published>2026-03-09T00:00:00+00:00</published><updated>2026-03-09T00:00:00+00:00</updated><id>https://duckdb.org/2026/03/09/announcing-duckdb-150</id><content type="html" xml:base="https://duckdb.org/2026/03/09/announcing-duckdb-150.html"><![CDATA[<p>We are proud to release DuckDB v1.5.0, codenamed “Variegata” after the <em>Paradise shelduck</em> (Tadorna variegata) endemic to New Zealand.</p>

<p>In this blog post, we cover the most important updates for this release around support, features and extensions. As always, there is more: for the complete release notes, see the <a href="https://github.com/duckdb/duckdb/releases/tag/v1.5.0">release page on GitHub</a>.</p>

<blockquote>
  <p>To install the new version, please visit the <a href="/install/">installation page</a>. Note that it can take a few days to release some extensions (e.g., the <a href="/docs/current/core_extensions/ui.html">UI</a>) client libraries (e.g., Go, R, Java) due to the extra changes and review rounds required.</p>
</blockquote>

<p>With this release, we will have two DuckDB releases available: v1.4 (LTS) and v1.5 (current).
The next release – planned for September – will ship a major version, DuckDB 2.0.</p>

<h2 id="new-features">New Features</h2>

<h3 id="command-line-client">Command Line Client</h3>

<p>For users who use DuckDB through the terminal, the highlight of the new release is a rework of the CLI client with a new color scheme, dynamic prompts, a pager and many other convenience features.</p>

<h4 id="color-scheme">Color Scheme</h4>

<p>We shipped a <a href="/docs/current/clients/cli/friendly_cli.html">new color palette</a> and harmonized it with the documentation. The color palette is available in both dark mode and light mode. Both use two shades of gray, and five colors for keywords, strings, errors, functions and numbers. You can find the color palette in the <a href="/design/manual/#color-palette">Design Manual</a>.</p>

<p>You can customize the color scheme using the <code class="language-plaintext highlighter-rouge">.highlight_colors</code> dot command:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="py">.highlight_colors</span> <span class="n">column_name</span> <span class="n">darkgreen</span> <span class="n">bold_underline</span>
<span class="py">.highlight_colors</span> <span class="n">numeric_value</span> <span class="n">red</span> <span class="n">bold</span>
<span class="py">.highlight_colors</span> <span class="n">string_value</span> <span class="n">purple2</span>
<span class="k">FROM</span> <span class="n">ducks</span><span class="p">;</span>
</code></pre></div></div>

<p><img src="/images/blog/v150/cli-colors-example-light.png" alt="DuckDB CLI light mode" class="lightmode-img" />
<img src="/images/blog/v150/cli-colors-example-dark.png" alt="DuckDB CLI dark mode" class="darkmode-img" /></p>

<h4 id="dynamic-prompts-in-the-cli">Dynamic Prompts in the CLI</h4>

<p>DuckDB v1.5.0 introduces dynamic prompts for the CLI (<a href="https://github.com/duckdb/duckdb/pull/19579">PR #19579</a>). By default, these show the database and schema that you are currently connected to:</p>

<div class="language-batch highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">duckdb</span>
</code></pre></div></div>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">memory</span> <span class="n">D</span> <span class="k">ATTACH</span> <span class="s1">'my_database.duckdb'</span><span class="p">;</span>
<span class="gp">memory</span> <span class="n">D</span> <span class="k">USE</span> <span class="n">my_database</span><span class="p">;</span>
<span class="gp">my_database</span> <span class="n">D</span> <span class="k">CREATE</span> <span class="k">SCHEMA</span> <span class="n">my_schema</span><span class="p">;</span>
<span class="gp">my_database</span> <span class="n">D</span> <span class="k">USE</span> <span class="n">my_schema</span><span class="p">;</span>
<span class="gp">my_database.my_schema</span> <span class="n">D</span> <span class="p">...</span>
</code></pre></div></div>

<p>These prompts can be configured using bracket codes to have a maximum length, run a custom query, use different colors, etc. (<a href="https://github.com/duckdb/duckdb/pull/19579">#19579</a>).</p>

<h4 id="tables-and-describe"><code class="language-plaintext highlighter-rouge">.tables</code> and <code class="language-plaintext highlighter-rouge">DESCRIBE</code></h4>

<p>To show the columns of an individual table, use the <a href="/docs/current/sql/statements/describe.html"><code class="language-plaintext highlighter-rouge">DESCRIBE</code> statement</a>:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">memory</span> <span class="n">D</span> <span class="k">ATTACH</span> <span class="s1">'https://blobs.duckdb.org/data/animals.db'</span> <span class="k">AS</span> <span class="n">animals_db</span><span class="p">;</span>
<span class="gp">memory</span> <span class="n">D</span> <span class="k">USE</span> <span class="n">animals_db</span><span class="p">;</span>
<span class="gp">animals_db</span> <span class="n">D</span> <span class="k">DESCRIBE</span> <span class="n">ducks</span><span class="p">;</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌──────────────────────┐
│        ducks         │
│                      │
│ id           integer │
│ name         varchar │
│ extinct_year integer │
└──────────────────────┘
</code></pre></div></div>

<p>The <a href="/docs/current/clients/cli/dot_commands.html"><code class="language-plaintext highlighter-rouge">.tables</code> dot command</a> lists the attached catalogs, the schemas and tables in them, and the columns in each table.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">memory</span> <span class="n">D</span> <span class="k">ATTACH</span> <span class="s1">'https://blobs.duckdb.org/data/animals.db'</span> <span class="k">AS</span> <span class="n">animals_db</span><span class="p">;</span>
<span class="gp">memory</span> <span class="n">D</span> <span class="k">ATTACH</span> <span class="s1">'https://blobs.duckdb.org/data/numbers1.db'</span><span class="p">;</span>
<span class="gp">memory</span> <span class="n">D</span> <span class="py">.tables</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code> ────────────── animals_db ───────────────
 ───────────────── main ──────────────────
┌─────────────────┐┌──────────────────────┐
│      swans      ││        ducks         │
│                 ││                      │
│ id      integer ││ id           integer │
│ name    varchar ││ name         varchar │
│ species varchar ││ extinct_year integer │
│ color   varchar ││                      │
│ habitat varchar ││        5 rows        │
│                 │└──────────────────────┘
│     3 rows      │
└─────────────────┘
  numbers1
 ── main ──
┌──────────┐
│   tbl    │
│          │
│ i bigint │
│          │
│  2 rows  │
└──────────┘
</code></pre></div></div>

<h4 id="accessing-the-last-result-using-_">Accessing the Last Result Using <code class="language-plaintext highlighter-rouge">_</code></h4>

<p>You can access the last result of a query inline using the underscore character <code class="language-plaintext highlighter-rouge">_</code>. This is not only convenient but also makes it unnecessary to re-run potentially long-running queries:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">memory</span> <span class="n">D</span> <span class="k">ATTACH</span> <span class="s1">'https://blobs.duckdb.org/data/animals.db'</span> <span class="k">AS</span> <span class="n">animals_db</span><span class="p">;</span>
<span class="gp">memory</span> <span class="n">D</span> <span class="k">USE</span> <span class="n">animals_db</span><span class="p">;</span>
<span class="gp">animals_db</span> <span class="n">D</span> <span class="k">FROM</span> <span class="n">ducks</span> <span class="k">WHERE</span> <span class="n">extinct_year</span> <span class="k">IS</span> <span class="k">NOT</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="err">┌───────┬──────────────────┬──────────────┐</span>
<span class="err">│</span>  <span class="n">id</span>   <span class="err">│</span>       <span class="n">name</span>       <span class="err">│</span> <span class="n">extinct_year</span> <span class="err">│</span>
<span class="err">│</span> <span class="n">int32</span> <span class="err">│</span>     <span class="n">varchar</span>      <span class="err">│</span>    <span class="n">int32</span>     <span class="err">│</span>
<span class="err">├───────┼──────────────────┼──────────────┤</span>
<span class="err">│</span>     <span class="mi">1</span> <span class="err">│</span> <span class="n">Labrador</span> <span class="n">Duck</span>    <span class="err">│</span>         <span class="mi">1878</span> <span class="err">│</span>
<span class="err">│</span>     <span class="mi">3</span> <span class="err">│</span> <span class="n">Crested</span> <span class="n">Shelduck</span> <span class="err">│</span>         <span class="mi">1964</span> <span class="err">│</span>
<span class="err">│</span>     <span class="mi">5</span> <span class="err">│</span> <span class="n">Pink</span><span class="o">-</span><span class="n">headed</span> <span class="n">Duck</span> <span class="err">│</span>         <span class="mi">1949</span> <span class="err">│</span>
<span class="err">└───────┴──────────────────┴──────────────┘</span>
<span class="gp">animals_db</span> <span class="n">D</span> <span class="k">FROM</span> <span class="n">_</span><span class="p">;</span>
<span class="err">┌───────┬──────────────────┬──────────────┐</span>
<span class="err">│</span>  <span class="n">id</span>   <span class="err">│</span>       <span class="n">name</span>       <span class="err">│</span> <span class="n">extinct_year</span> <span class="err">│</span>
<span class="err">│</span> <span class="n">int32</span> <span class="err">│</span>     <span class="n">varchar</span>      <span class="err">│</span>    <span class="n">int32</span>     <span class="err">│</span>
<span class="err">├───────┼──────────────────┼──────────────┤</span>
<span class="err">│</span>     <span class="mi">1</span> <span class="err">│</span> <span class="n">Labrador</span> <span class="n">Duck</span>    <span class="err">│</span>         <span class="mi">1878</span> <span class="err">│</span>
<span class="err">│</span>     <span class="mi">3</span> <span class="err">│</span> <span class="n">Crested</span> <span class="n">Shelduck</span> <span class="err">│</span>         <span class="mi">1964</span> <span class="err">│</span>
<span class="err">│</span>     <span class="mi">5</span> <span class="err">│</span> <span class="n">Pink</span><span class="o">-</span><span class="n">headed</span> <span class="n">Duck</span> <span class="err">│</span>         <span class="mi">1949</span> <span class="err">│</span>
<span class="err">└───────┴──────────────────┴──────────────┘</span>
</code></pre></div></div>

<h4 id="pager">Pager</h4>

<p>Last but not least, the CLI now has a pager! It is triggered when there are more than 50 rows in the results.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">memory</span> <span class="n">D</span> <span class="py">.maxrows</span> <span class="mi">100</span>
<span class="gp">memory</span> <span class="n">D</span> <span class="k">FROM</span> <span class="nf">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">100</span><span class="p">);</span>
</code></pre></div></div>

<p>You can navigate on Linux and Windows using <code class="language-plaintext highlighter-rouge">Page Up</code> / <code class="language-plaintext highlighter-rouge">Page Down</code>. On macOS, use <code class="language-plaintext highlighter-rouge">Fn</code> + <code class="language-plaintext highlighter-rouge">Up</code> / <code class="language-plaintext highlighter-rouge">Down</code>. To exit the pager, press <code class="language-plaintext highlighter-rouge">Q</code>.</p>

<p>The initial implementation of the pager was provided by <a href="https://github.com/tobwen"><code class="language-plaintext highlighter-rouge">tobwen</code></a> in <a href="https://github.com/duckdb/duckdb/pull/19004">#19004</a>.</p>

<h3 id="peg-parser">PEG Parser</h3>

<p>DuckDB v1.5 ships an experimental parser based on PEG (Parser Expression Grammars). The new parser enables better suggestions, improved error messages, and allows extensions to extend the grammar. The PEG parser is currently disabled by default but you can opt-in using:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CALL</span> <span class="n">enable_peg_parser</span><span class="p">();</span>
</code></pre></div></div>

<p>The PEG parser is already used for generating suggestions. You can cycle through the options using <code class="language-plaintext highlighter-rouge">TAB</code>.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="gp">animals_db</span> <span class="n">D</span> <span class="k">FROM</span> <span class="n">ducks</span> <span class="k">WHERE</span> <span class="n">habitat</span> <span class="k">IS</span> 
<span class="k">IS</span>           <span class="k">ISNULL</span>       <span class="k">ILIKE</span>        <span class="gs">IN</span>           <span class="k">INTERSECT</span>    <span class="k">LIKE</span>
</code></pre></div></div>

<p>We are planning to make the switch to the new parser in the upcoming DuckDB release.</p>

<blockquote>
  <p>As a tradeoff, the parser has a slight performance overhead, however, this is in the range of milliseconds and is thus negligible for analytical queries. For more details on the rationale for using a PEG parser and benchmark results, please refer to the <a href="/library/runtime-extensible-parsers/">CIDR 2026 paper</a> by Hannes and Mark, or their <a href="/2024/11/22/runtime-extensible-parsers.html">blog post</a> summarizing the paper.</p>
</blockquote>

<h3 id="variant-type"><code class="language-plaintext highlighter-rouge">VARIANT</code> Type</h3>

<p>DuckDB now natively supports the <a href="https://github.com/duckdb/duckdb/pull/18609"><code class="language-plaintext highlighter-rouge">VARIANT</code> type</a>, inspired by <a href="https://docs.snowflake.com/en/sql-reference/data-types-semistructured">Snowflake's semi-structured <code class="language-plaintext highlighter-rouge">VARIANT</code> data type</a> and available <a href="https://github.com/apache/parquet-format/blob/master/VariantEncoding.md">in Parquet since 2025</a>. Unlike the <a href="/docs/current/data/json/json_type.html">JSON type</a>, which is physically stored as text, VARIANT stores typed, binary data. Each row in a VARIANT column is self-contained with its own type information. This leads to better compression and query performance. Here are a few examples of using <code class="language-plaintext highlighter-rouge">VARIANT</code>.</p>

<p>Store different types in the same column:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">events</span> <span class="p">(</span><span class="n">id</span> <span class="nb">INTEGER</span><span class="p">,</span> <span class="n">data</span> <span class="n">VARIANT</span><span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">events</span> <span class="k">VALUES</span>
    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">42</span><span class="p">::</span><span class="n">VARIANT</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'hello world'</span><span class="p">::</span><span class="n">VARIANT</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">]::</span><span class="n">VARIANT</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="p">{</span><span class="s1">'name'</span><span class="p">:</span> <span class="s1">'Alice'</span><span class="p">,</span> <span class="s1">'age'</span><span class="p">:</span> <span class="mi">30</span><span class="p">}::</span><span class="n">VARIANT</span><span class="p">);</span>

<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">events</span><span class="p">;</span>
</code></pre></div></div>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────┬────────────────────────────┐
│  id   │            data            │
│ int32 │          variant           │
├───────┼────────────────────────────┤
│     1 │ 42                         │
│     2 │ hello world                │
│     3 │ [1, 2, 3]                  │
│     4 │ {'name': Alice, 'age': 30} │
└───────┴────────────────────────────┘
</code></pre></div></div>
<p>Check the underlying type of each row:</p>
<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">id</span><span class="p">,</span> <span class="n">data</span><span class="p">,</span> <span class="n">variant_typeof</span><span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="k">AS</span> <span class="n">vtype</span>
<span class="k">FROM</span> <span class="n">events</span><span class="p">;</span>
</code></pre></div></div>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────┬────────────────────────────┬───────────────────┐
│  id   │            data            │       vtype       │
│ int32 │          variant           │      varchar      │
├───────┼────────────────────────────┼───────────────────┤
│     1 │ 42                         │ INT32             │
│     2 │ hello world                │ VARCHAR           │
│     3 │ [1, 2, 3]                  │ ARRAY(3)          │
│     4 │ {'name': Alice, 'age': 30} │ OBJECT(name, age) │
└───────┴────────────────────────────┴───────────────────┘
</code></pre></div></div>

<p>You can extract fields from nested variants using the dot notation or the <code class="language-plaintext highlighter-rouge">variant_extract</code> function:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="n">data.name</span> <span class="k">FROM</span> <span class="n">events</span> <span class="k">WHERE</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="c1">-- or </span>
<span class="k">SELECT</span> <span class="n">variant_extract</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="s1">'name'</span><span class="p">)</span> <span class="k">AS</span> <span class="n">name</span> <span class="k">FROM</span> <span class="n">events</span> <span class="k">WHERE</span> <span class="n">id</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
</code></pre></div></div>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────┐
│  name   │
│ variant │
├─────────┤
│ Alice   │
└─────────┘
</code></pre></div></div>

<p>DuckDB also supports reading <code class="language-plaintext highlighter-rouge">VARIANT</code> types from Parquet files, including <em>shredding</em> (storing nested data as flat values).</p>

<h3 id="read_duckdb-function"><code class="language-plaintext highlighter-rouge">read_duckdb</code> Function</h3>

<p>The <code class="language-plaintext highlighter-rouge">read_duckdb</code> table function can read DuckDB databases without first attaching them. This can make reading from DuckDB databases more ergonomic – for example, you can use globbing. You can read the <a href="#appendix-example-dataset">example</a> <code class="language-plaintext highlighter-rouge">numbers</code> databases as follows:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="nf">min</span><span class="p">(</span><span class="n">i</span><span class="p">),</span> <span class="nf">max</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="k">FROM</span> <span class="nf">read_duckdb</span><span class="p">(</span><span class="s1">'numbers*.db'</span><span class="p">);</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌────────┬────────┐
│ min(i) │ max(i) │
│ int64  │ int64  │
├────────┼────────┤
│      1 │      5 │
└────────┴────────┘
</code></pre></div></div>

<h3 id="azure-writes">Azure Writes</h3>

<p>You can now <a href="/docs/current/core_extensions/azure.html#writing-to-azure-blob-storage">write to the Azure Blob or ADLSv2 storage</a> using the <code class="language-plaintext highlighter-rouge">COPY</code> statement:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Write query results to a Parquet file on Blob Storage</span>
<span class="k">COPY</span> <span class="p">(</span><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">my_table</span><span class="p">)</span>
<span class="k">TO</span> <span class="s1">'az://my_container/path/output.parquet'</span><span class="p">;</span>

<span class="c1">-- Write a table to a CSV file on ADLSv2 Storage</span>
<span class="k">COPY</span> <span class="n">my_table</span>
<span class="k">TO</span> <span class="s1">'abfss://my_container/path/output.csv'</span><span class="p">;</span>
</code></pre></div></div>

<h3 id="odbc-scanner">ODBC Scanner</h3>

<p>We are now shipping an ODBC scanner extension. This allows you to query a remote endpoint as follows:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">LOAD</span><span class="n"> odbc_scanner</span><span class="p">;</span>
<span class="k">SET</span> <span class="k">VARIABLE</span> <span class="n">conn</span> <span class="o">=</span> <span class="n">odbc_connect</span><span class="p">(</span><span class="s1">'Driver={Oracle Driver};DBQ=//127.0.0.1:1521/XE;UID=scott;PWD=tiger;'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="nf">odbc_query</span><span class="p">(</span><span class="nf">getvariable</span><span class="p">(</span><span class="s1">'conn'</span><span class="p">),</span> <span class="s1">'SELECT SYSTIMESTAMP FROM dual;'</span><span class="p">);</span>
</code></pre></div></div>

<p>In the coming weeks, we'll publish the documentation page and release a followup post on the ODBC scanner.
In the meantime, please refer to the <a href="https://github.com/duckdb/odbc-scanner/blob/main/README.md">project's README</a>.</p>

<h2 id="major-changes">Major Changes</h2>

<h3 id="lakehouse-updates">Lakehouse Updates</h3>

<p>All of DuckDB’s supported Lakehouse formats have received some updates for v1.5.</p>

<h4 id="ducklake">DuckLake</h4>

<p>The main <a href="https://ducklake.select/">DuckLake</a> change for DuckDB v1.5 is updating the DuckLake specification to v0.4.
We are aiming for this to be the same specification that ships with DuckLake 1.0, which will be released in April.
Its main highlights include:</p>

<ul>
  <li>Macro support.</li>
  <li>Sorted tables.</li>
  <li>Deletion inlining and addition of partial delete files.</li>
  <li>Internal rework of DuckLake options.</li>
</ul>

<p>We'll announce more details about these features in the blog post for DuckLake v1.</p>

<h4 id="delta-lake">Delta Lake</h4>

<p>For the <a href="/docs/current/core_extensions/delta.html">Delta Lake extension</a>, the team has focused on improving support for writes via <a href="/docs/current/core_extensions/unity_catalog.html">Unity Catalog</a>, Delta idempotent writes and table <code class="language-plaintext highlighter-rouge">CHECKPOINT</code>s.</p>

<h4 id="iceberg">Iceberg</h4>

<p>For the <a href="/docs/current/core_extensions/iceberg/overview.html">Iceberg extension</a>, the team is working on a larger release for v1.5.1. For v1.5.0, the main feature is the addition of table properties in the <code class="language-plaintext highlighter-rouge">CREATE TABLE</code> statement:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">test_create_table</span> <span class="p">(</span><span class="n">a</span> <span class="nb">INTEGER</span><span class="p">)</span>
<span class="k">WITH</span> <span class="p">(</span>
    <span class="s1">'format-version'</span> <span class="o">=</span> <span class="s1">'2'</span><span class="p">,</span> <span class="c1">-- format version will be elevated to format-version when creating a table</span>
    <span class="s1">'location'</span> <span class="o">=</span> <span class="s1">'s3://path/to/data'</span><span class="p">,</span> <span class="c1">-- location will be elevated to location when creating a table</span>
    <span class="s1">'property1'</span> <span class="o">=</span> <span class="s1">'value1'</span><span class="p">,</span>
    <span class="s1">'property2'</span> <span class="o">=</span> <span class="s1">'value2'</span>
<span class="p">);</span>
</code></pre></div></div>

<p>Other minor additions have been made to enable passing <code class="language-plaintext highlighter-rouge">EXTRA_HTTP_HEADERS</code> when attaching to an Iceberg catalog, which has unlocked <a href="https://cloud.google.com/biglake">Google’s BigLake</a>.</p>

<blockquote>
  <p>Both Delta and DuckLake have implemented the <a href="#variant-type"><code class="language-plaintext highlighter-rouge">VARIANT</code> type</a>. Iceberg’s <code class="language-plaintext highlighter-rouge">VARIANT</code> type will ship in the v1.5.1 release with some other features that are specific to the Iceberg v3 specification.</p>
</blockquote>

<h3 id="network-stack">Network Stack</h3>

<p>The default backend for the <a href="/docs/current/core_extensions/httpfs/overview.html">httpfs extension</a> has changed from <a href="https://github.com/yhirose/cpp-httplib"><code class="language-plaintext highlighter-rouge">httplib</code></a> to <a href="https://curl.se/"><code class="language-plaintext highlighter-rouge">curl</code></a>. As one of the most popular and well-tested open-source projects, we expect <code class="language-plaintext highlighter-rouge">curl</code> to provide long-standing stability and security for DuckDB. Regardless of the <code class="language-plaintext highlighter-rouge">http</code> library used, <code class="language-plaintext highlighter-rouge">openssl</code> is still the backing SSL library and options such as <code class="language-plaintext highlighter-rouge">http_timeout</code>, <code class="language-plaintext highlighter-rouge">http_retries</code>, etc. are still the same.</p>

<p>Our community has been <a href="https://github.com/duckdb/duckdb/issues/20977">testing the new network stack</a> for the last few weeks. Still, if you encounter any issues, please submit them to the <a href="https://github.com/duckdb/duckdb-httpfs"><code class="language-plaintext highlighter-rouge">duckdb-httpfs</code> repository</a>.</p>

<details>
  <summary>
If you are interested in more details, click here.
</summary>
  <p>Due to technical reasons, <code class="language-plaintext highlighter-rouge">httplib</code> is still the library we use for downloading the <code class="language-plaintext highlighter-rouge">httpfs</code> extension. When <code class="language-plaintext highlighter-rouge">httpfs</code> is loaded with the (now default) <code class="language-plaintext highlighter-rouge">curl</code> backend, subsequent extension installations go through <code class="language-plaintext highlighter-rouge">https://</code>, with the default endpoint for core extensions pointing to <a href="https://extensions.duckdb.org"><code class="language-plaintext highlighter-rouge">https://extensions.duckdb.org</code></a>.</p>

  <p>All core and community extensions are cryptographically signed, so installing them through <code class="language-plaintext highlighter-rouge">http://</code> does not pose a security risk. However, some users reported issues about <code class="language-plaintext highlighter-rouge">http://</code> extension installs in environments with firewalls.</p>
</details>

<h3 id="lambda-syntax">Lambda Syntax</h3>

<p>Up to DuckDB v1.2, the syntax for defining lambda expressions used the arrow notation <code class="language-plaintext highlighter-rouge">x -&gt; x + 1</code>. While this was a nice syntax, it clashed with the JSON extract operator (<code class="language-plaintext highlighter-rouge">-&gt;</code>) due to operator precedence and led to error messages that some users found difficult to troubleshoot. To work around this, we introduced a new, Python-style <a href="/2025/05/21/announcing-duckdb-130.html#lambda-function-syntax">lambda syntax in v1.3</a>, <code class="language-plaintext highlighter-rouge">lambda x: x + 1</code>.</p>

<p>While DuckDB v1.5 supports both styles of writing lambda expressions, using the deprecated arrow syntax will now throw a warning:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="nf">list_transform</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">],</span> <span class="n">x</span> <span class="o">-&gt;</span> <span class="n">x</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
</code></pre></div></div>

<div class="language-console highlighter-rouge"><div class="highlight"><pre class="highlight"><code>WARNING:
Deprecated lambda arrow (-&gt;) detected. Please transition to the new lambda syntax, i.e., lambda x, i: x + i, before DuckDB's next release.
</code></pre></div></div>

<p>You can use the <code class="language-plaintext highlighter-rouge">lambda_syntax</code> configuration option to change this behavior to suppress the warning or to behave more strictly:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Suppress the warning</span>
<span class="k">SET</span> <span class="py">lambda_syntax</span> <span class="o">=</span> <span class="s1">'ENABLE_SINGLE_ARROW'</span><span class="p">;</span>
<span class="c1">-- Turn the deprecation warning into an error</span>
<span class="k">SET</span> <span class="py">lambda_syntax</span> <span class="o">=</span> <span class="s1">'DISABLE_SINGLE_ARROW'</span><span class="p">;</span>
</code></pre></div></div>

<p>DuckDB 2.0 will disable the single arrow syntax by default and it will only be available if you opt-in explicitly.</p>

<h3 id="spatial-extension">Spatial Extension</h3>

<p>The <a href="/docs/current/core_extensions/spatial/overview.html">spatial extension</a> ships several important changes.</p>

<h4 id="breaking-change-flipping-of-axis-order">Breaking Change: Flipping of Axis Order</h4>

<p>Most functions in <code class="language-plaintext highlighter-rouge">spatial</code> operate in Cartesian space and are unaffected by axis order, e.g., whether the <code class="language-plaintext highlighter-rouge">X</code> and <code class="language-plaintext highlighter-rouge">Y</code> axes represent “longitude” and “latitude” or the other way around. But there are some functions where this matters, and where the assumption, counterintuitively, is that all input geometries use (x = latitude, y = longitude). These are:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">ST_Distance_Spheroid</code></li>
  <li><code class="language-plaintext highlighter-rouge">ST_Perimeter_Spheroid</code></li>
  <li><code class="language-plaintext highlighter-rouge">ST_Area_Spheroid</code></li>
  <li><code class="language-plaintext highlighter-rouge">ST_Distance_Sphere</code></li>
  <li><code class="language-plaintext highlighter-rouge">ST_DWithin_Spheroid</code></li>
</ul>

<p>Additionally, <code class="language-plaintext highlighter-rouge">ST_Transform</code> also expects that the input geometries are in the same axis order as defined by the source coordinate reference system, which in the case of e.g., <code class="language-plaintext highlighter-rouge">EPSG:4326</code> is also (x = latitude, y = longitude).</p>

<p>This has been a long-standing source of confusion and numerous issues, as other databases, formats and GIS systems tend to always treat <code class="language-plaintext highlighter-rouge">X</code> as “easting”, “left-right” or “longitude”, and <code class="language-plaintext highlighter-rouge">Y</code> as “northing”, “up-down” or “latitude”.</p>

<p>We are changing how this currently works in DuckDB to be consistent with how other systems operate, and hopefully cause less confusion for new users in the future. However, to avoid silently breaking existing workflows that have adapted to this quirk (e.g., by using <code class="language-plaintext highlighter-rouge">ST_FlipCoordinates</code>), we are rolling out this change gradually via a new <code class="language-plaintext highlighter-rouge">geometry_always_xy</code> setting:</p>

<ul>
  <li>In DuckDB v1.5, setting <code class="language-plaintext highlighter-rouge">geometry_always_xy = true</code> enables the new behavior (x = longitude, y = latitude). Without it, affected functions emit a warning.</li>
  <li>In DuckDB v2.0, the warning will become an error. Set <code class="language-plaintext highlighter-rouge">geometry_always_xy = false</code> to preserve the old behavior.</li>
  <li>In DuckDB v2.1, <code class="language-plaintext highlighter-rouge">geometry_always_xy = true</code> will become the default.</li>
</ul>

<p>So to summarize, nothing is changing by default in this release, but to avoid being affected by this change in the future, set <code class="language-plaintext highlighter-rouge">geometry_always_xy</code> explicitly now. Set it to <code class="language-plaintext highlighter-rouge">true</code> to opt into the new behavior, or <code class="language-plaintext highlighter-rouge">false</code> to keep the existing one.</p>

<h3 id="geometry-rework">Geometry Rework</h3>

<h4 id="geometry-becomes-a-built-in-type"><code class="language-plaintext highlighter-rouge">GEOMETRY</code> Becomes a Built-In Type</h4>

<p>The <code class="language-plaintext highlighter-rouge">GEOMETRY</code> type has been moved from the <code class="language-plaintext highlighter-rouge">spatial</code> extension into core DuckDB!</p>

<p>Geospatial data is no longer niche. The Parquet standard now treats <code class="language-plaintext highlighter-rouge">GEOMETRY</code> as a first-class column type, and open table formats like Apache Iceberg and DuckLake are moving in the same direction. Many widely used data formats and systems also have geospatial counterparts—GeoJSON, PostGIS, GeoPandas, GeoPackage/Spatialite, and more.</p>

<p>DuckDB already offers extensions that integrate with many of these formats and systems. But there’s a structural problem: as long as <code class="language-plaintext highlighter-rouge">GEOMETRY</code> lives inside the <code class="language-plaintext highlighter-rouge">spatial</code> extension, other extensions that want to read or write geospatial data must either depend on <code class="language-plaintext highlighter-rouge">spatial</code>, implement their own incompatible geometry representation, or force users to handle the conversions themselves.</p>

<p>By moving <code class="language-plaintext highlighter-rouge">GEOMETRY</code> into DuckDB’s core, extensions can now produce and consume geometry values natively, without depending on <code class="language-plaintext highlighter-rouge">spatial</code>. While the <code class="language-plaintext highlighter-rouge">spatial</code> extension still provides most of the functions for working with geometries, the type itself becomes a shared foundation that the entire ecosystem can build on. We’ve already added <code class="language-plaintext highlighter-rouge">GEOMETRY</code> support to the Postgres scanner and GeoArrow conversion for Arrow import and export. Geometry support in additional extensions is coming soon.</p>

<p>This change also enables deeper integration with DuckDB’s storage engine and query optimizer, unlocking new compression techniques, query optimizations, and CRS awareness capabilities that were not possible when <code class="language-plaintext highlighter-rouge">GEOMETRY</code> only existed as an extension type. This is all documented in the new <a href="/docs/current/sql/data_types/geometry.html">geometry page</a> in the documentation, but we will highlight some below.</p>

<h4 id="improved-storage-wkb-and-shredding">Improved Storage: WKB and Shredding</h4>

<p>Geometry values are now stored using the industry-standard little-endian <a href="https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary">Well-Known Binary (WKB)</a> encoding, replacing the custom format used by the <code class="language-plaintext highlighter-rouge">spatial</code> extension. However, we are still experimenting with the in-memory representation we want to use in the execution engine so you should still use the conversion functions (e.g., <code class="language-plaintext highlighter-rouge">ST_AsWKT</code>, <code class="language-plaintext highlighter-rouge">ST_AsWKB</code>, <code class="language-plaintext highlighter-rouge">ST_GeomFromText</code>, <code class="language-plaintext highlighter-rouge">ST_GeomFromWKB</code>) when moving data in and out of DuckDB.</p>

<p>We’ve also implemented a new storage technique specialized for <code class="language-plaintext highlighter-rouge">GEOMETRY</code>. When a geometry column contains values that all share the same type and vertex dimensions, DuckDB can additionally apply "shredding": rather than storing opaque blobs, the column is decomposed into primitive <code class="language-plaintext highlighter-rouge">STRUCT</code>, <code class="language-plaintext highlighter-rouge">LIST</code>, and <code class="language-plaintext highlighter-rouge">DOUBLE</code> segments that compress far more efficiently. This can reduce on-disk size by roughly 3x for uniform geometry columns such as point clouds. Shredding is applied automatically for uniform row groups of a certain size, but can be configured via the <code class="language-plaintext highlighter-rouge">geometry_minimum_shredding_size</code> configuration option.</p>

<h4 id="geometry-statistics-and-query-optimization">Geometry Statistics and Query Optimization</h4>

<p>Geometry columns now track per-row-group statistics - including the bounding box and the set of geometry types and vertex dimensions present. The query optimizer can use these to skip row groups that cannot match a query's spatial predicates, similar to min/max pruning for numeric columns. The <code class="language-plaintext highlighter-rouge">&amp;&amp;</code> (bounding box intersection) operator is the first to benefit; broader support across <code class="language-plaintext highlighter-rouge">spatial</code> functions is in progress.</p>

<h4 id="coordinate-reference-system-support">Coordinate Reference System Support</h4>

<p>The <code class="language-plaintext highlighter-rouge">GEOMETRY</code> type now accepts an optional CRS parameter (e.g., <code class="language-plaintext highlighter-rouge">GEOMETRY('OGC:CRS84')</code>), making CRS part of the type system rather than implicit metadata. Spatial functions enforce CRS consistency across their inputs, catching a common class of silent errors that arises when mixing geometries from different coordinate systems. Only a couple of CRSs are built in by default, but loading the <code class="language-plaintext highlighter-rouge">spatial</code> extension registers over 7,000 CRSs from the EPSG dataset. While CRS support is still a bit experimental, we are planning to develop it further to support e.g., custom CRS definitions.</p>

<h3 id="optimizations">Optimizations</h3>

<h4 id="non-blocking-checkpointing">Non-Blocking Checkpointing</h4>

<p>During checkpointing, it's now possible to run concurrent reads (<a href="https://github.com/duckdb/duckdb/pull/19867">#19867</a>), writes (<a href="https://github.com/duckdb/duckdb/pull/20052">#20052</a>), insertions with indexes (<a href="https://github.com/duckdb/duckdb/pull/20160">#20160</a>) and deletes (<a href="https://github.com/duckdb/duckdb/pull/20286">#20286</a>). The rework of checkpointing benefits concurrent RW workloads and increases the TPC-H throughput score on SF100 from 246,115.60 to 287,122.97, a <strong>17% improvement</strong>.</p>

<h4 id="aggregates">Aggregates</h4>

<p>Aggregate functions received several optimizations. For example, the <code class="language-plaintext highlighter-rouge">last</code> aggregate function was optimized by community member <a href="https://github.com/xe-nvdk"><code class="language-plaintext highlighter-rouge">xe-nvdk</code></a> to iterate from the end of each vector batch instead of the beginning. In synthetic benchmarks, this results in a <a href="https://github.com/duckdb/duckdb/pull/20567">40% speedup</a>.</p>

<!-- markdownlint-disable MD001 -->

<h2 id="distribution">Distribution</h2>

<h4 id="python-pip">Python Pip</h4>

<p>You can install the DuckDB CLI on any platform where pip is available:</p>

<div class="language-batch highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">pip </span>install duckdb-cli
</code></pre></div></div>

<p>You can then launch DuckDB in your virtual environment using:</p>

<div class="language-batch highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">duckdb</span>
</code></pre></div></div>

<p>Both DuckDB v1.4 and v1.5 are supported. We are working on shipping extensions as extras using the <code class="language-plaintext highlighter-rouge">duckdb[extension_name]</code> syntax – stay tuned!</p>

<h4 id="windows-install-script-beta">Windows Install Script (Beta)</h4>

<p>On Windows, you can now use an install script:</p>

<div class="language-batch highlighter-rouge"><div class="highlight"><pre class="highlight"><code>powershell <span class="nt">-NoExit</span> iex <span class="o">(</span>iwr <span class="s2">"https://install.duckdb.org/install.ps1"</span><span class="o">)</span>.Content
</code></pre></div></div>

<p>Please note that this is currently in the beta stage. If you have any feedback, please <a href="https://github.com/duckdb/duckdb/issues">let us know</a>.</p>

<h4 id="cli-for-linux-with-musl-libc">CLI for Linux with musl libc</h4>

<p>We are distributing CLI clients that work with <a href="/docs/stable/dev/building/linux.html">musl libc</a> (e.g., for Alpine Linux, commonly used in Docker images). The archives are available <a href="https://github.com/duckdb/duckdb/releases/tag/v1.5.0">on GitHub</a>.</p>

<p>Note that the musl libc CLI client requires the <code class="language-plaintext highlighter-rouge">libstdc++</code>. To install this package, run:</p>

<div class="language-batch highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">apk </span>add libstdc++
</code></pre></div></div>

<h4 id="extension-sizes">Extension Sizes</h4>

<p>We reworked our build system to make the extension binaries smaller! The DuckLake extension's size was reduced by ~30%, from 17 MB to 12 MB. For smaller extensions such as Excel, the reduction is more than 60%, from 9 MB to 3 MB.</p>

<!-- markdownlint-enable MD001 -->

<h2 id="summary">Summary</h2>

<p>These were a few highlights – but there are many more features and improvements in this release.
There have been over 6500 commits by close to 100 contributors since v1.4. The full <a href="https://github.com/duckdb/duckdb/releases/tag/v1.5.0">release notes can be found on GitHub</a>. We would like to thank our community for providing detailed issue reports and feedback. And again, our special thanks go to external contributors!</p>

<p>PS: If you visited this blog post through a direct link – we also rolled out a new <a href="/">landing page</a>!</p>

<!-- markdownlint-disable MD040 -->

<h2 id="appendix-example-dataset">Appendix: Example Dataset</h2>

<details>
  <summary>
See the code that creates the example databases.
</summary>
  <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ATTACH</span> <span class="s1">'numbers1.db'</span><span class="p">;</span>
<span class="k">ATTACH</span> <span class="s1">'numbers2.db'</span><span class="p">;</span>
<span class="k">ATTACH</span> <span class="s1">'animals.db'</span><span class="p">;</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">numbers1.tbl</span> <span class="k">AS</span> <span class="k">FROM</span> <span class="nf">range</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="n">t</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">numbers2.tbl</span> <span class="k">AS</span> <span class="k">FROM</span> <span class="nf">range</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">6</span><span class="p">)</span> <span class="n">t</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">animals.ducks</span> <span class="k">AS</span>
<span class="k">FROM</span> <span class="p">(</span><span class="k">VALUES</span>
    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'Labrador Duck'</span><span class="p">,</span> <span class="mi">1878</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'Mallard'</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Crested Shelduck'</span><span class="p">,</span> <span class="mi">1964</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="s1">'Wood Duck'</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="s1">'Pink-headed Duck'</span><span class="p">,</span> <span class="mi">1949</span><span class="p">)</span>
<span class="p">)</span> <span class="n">t</span><span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">extinct_year</span><span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">animals.swans</span> <span class="k">AS</span>
<span class="k">FROM</span> <span class="p">(</span><span class="k">VALUES</span>
    <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'Aurora'</span><span class="p">,</span> <span class="s1">'Mute Swan'</span><span class="p">,</span> <span class="s1">'White'</span><span class="p">,</span> <span class="s1">'European lakes and rivers'</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'Midnight'</span><span class="p">,</span> <span class="s1">'Black Swan'</span><span class="p">,</span> <span class="s1">'Black'</span><span class="p">,</span> <span class="s1">'Australian wetlands'</span><span class="p">),</span>
    <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'Tundra'</span><span class="p">,</span> <span class="s1">'Tundra Swan'</span><span class="p">,</span> <span class="s1">'White'</span><span class="p">,</span> <span class="s1">'Arctic and subarctic regions'</span><span class="p">)</span>
<span class="p">)</span> <span class="n">t</span><span class="p">(</span><span class="n">id</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">species</span><span class="p">,</span> <span class="n">color</span><span class="p">,</span> <span class="n">habitat</span><span class="p">);</span>

<span class="k">DETACH</span> <span class="n">numbers1</span><span class="p">;</span>
<span class="k">DETACH</span> <span class="n">numbers2</span><span class="p">;</span>
<span class="k">DETACH</span> <span class="n">animals</span><span class="p">;</span>
</code></pre></div>  </div>
</details>]]></content><author><name>The DuckDB team</name></author><category term="release" /><summary type="html"><![CDATA[We are releasing DuckDB version 1.5.0, codenamed “Variegata”. This release comes with a friendly CLI (a new, more ergonomic command line client), support for the `VARIANT` type, a built-in `GEOMETRY` type, along with many other features and optimizations. The v1.4.0 LTS line (“Andium”) will keep receiving updates until its end-of-life in September 2026.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-5-0.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-5-0.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Announcing DuckDB 1.4.4 LTS</title><link href="https://duckdb.org/2026/01/26/announcing-duckdb-144.html" rel="alternate" type="text/html" title="Announcing DuckDB 1.4.4 LTS" /><published>2026-01-26T00:00:00+00:00</published><updated>2026-01-26T00:00:00+00:00</updated><id>https://duckdb.org/2026/01/26/announcing-duckdb-144</id><content type="html" xml:base="https://duckdb.org/2026/01/26/announcing-duckdb-144.html"><![CDATA[<p>In this blog post, we highlight a few important fixes in DuckDB v1.4.4, the fourth patch release in <a href="/2025/09/16/announcing-duckdb-140.html">DuckDB's 1.4 LTS line</a>.
The release ships bugfixes, performance improvements and security patches. You can find the complete <a href="https://github.com/duckdb/duckdb/releases/tag/v1.4.4">release notes on GitHub</a>.</p>

<p>To install the new version, please visit the <a href="/install/">installation page</a>.</p>

<h2 id="fixes">Fixes</h2>

<p>This version ships a number of performance improvements and bugfixes.</p>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/issues/20233"><code class="language-plaintext highlighter-rouge">#20233</code> Function chaining not allowed in QUALIFY #20233</a></li>
</ul>

<h3 id="correctness">Correctness</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/issues/20008"><code class="language-plaintext highlighter-rouge">#20008</code> Unexpected Result when Using Utility Function ALIAS #20008</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/20410"><code class="language-plaintext highlighter-rouge">#20410</code> ANTI JOIN produces wrong results with materialized CTEs</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/20156"><code class="language-plaintext highlighter-rouge">#20156</code> Streaming window unions produce incorrect results</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/20413"><code class="language-plaintext highlighter-rouge">#20413</code> ASOF joins with <code class="language-plaintext highlighter-rouge">predicate</code> fail with different errors for FULL, RIGHT, SEMI, and ANTI join types</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/20090"><code class="language-plaintext highlighter-rouge">#20090</code> mode() produces corrupted UTF-8 strings in parallel execution</a></li>
</ul>

<h3 id="crashes-and-internal-errors">Crashes and Internal Errors</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb-python/issues/127"><code class="language-plaintext highlighter-rouge">#20468</code> Segfault in Hive partitioning with NULL values</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/20086"><code class="language-plaintext highlighter-rouge">#20086</code> Incorrect results when using positional joins and indexes</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/20415"><code class="language-plaintext highlighter-rouge">#20415</code> C API data creation causes segfault</a></li>
</ul>

<h3 id="performance">Performance</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/pull/20252"><code class="language-plaintext highlighter-rouge">#20252</code> Optimize prepared statement parameter lookups</a></li>
  <li><a href="https://github.com/duckdb/duckdb/pull/20284"><code class="language-plaintext highlighter-rouge">#20284</code> dbgen: use TaskExecutor framework to respect the <code class="language-plaintext highlighter-rouge">threads</code> setting</a></li>
</ul>

<h3 id="miscellaneous">Miscellaneous</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/issues/20233"><code class="language-plaintext highlighter-rouge">#20233</code> Function chaining not allowed in QUALIFY #20233</a></li>
  <li><a href="https://github.com/duckdb/duckdb/pull/20339"><code class="language-plaintext highlighter-rouge">#20339</code> Use UTF-16 console output in Windows shell</a></li>
</ul>

<h2 id="conclusion">Conclusion</h2>

<p>This post was a short summary of the changes in v1.4.4. As usual, you can find the <a href="https://github.com/duckdb/duckdb/releases/tag/v1.4.4">full release notes on GitHub</a>.
We would like to thank our contributors for providing detailed issue reports and patches.
In the coming month, we'll release DuckDB v1.5.0.
We'll also keep v1.4 LTS updated until mid-September. We'll announce the release date of v1.4.5 in the <a href="/release_calendar.html">release calendar</a> in the coming months.</p>

<blockquote>
  <p>Earlier today, we pushed an incorrect tag that was visible for a few minutes.
No binaries or extensions were available under this tag and we replaced it as soon as we noticed the issue.
Our apologies for the erroneous release.</p>
</blockquote>]]></content><author><name>The DuckDB team</name></author><category term="release" /><summary type="html"><![CDATA[Today we are releasing DuckDB 1.4.4 with bugfixes and performance improvements.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-4-4-lts.jpg" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-4-4-lts.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Announcing Vortex Support in DuckDB</title><link href="https://duckdb.org/2026/01/23/duckdb-vortex-extension.html" rel="alternate" type="text/html" title="Announcing Vortex Support in DuckDB" /><published>2026-01-23T00:00:00+00:00</published><updated>2026-01-23T00:00:00+00:00</updated><id>https://duckdb.org/2026/01/23/duckdb-vortex-extension</id><content type="html" xml:base="https://duckdb.org/2026/01/23/duckdb-vortex-extension.html"><![CDATA[<p>I think it is worth starting this intro by talking a little bit about the established format for columnar data. Parquet has done some amazing things for analytics. If you go back to the times where CSV was the better alternative, then you know how important Parquet is. However, even if the  specification has evolved over time, Parquet has some design constraints. A particular limitation is that it is block-compressed and engines need to decompress pages in order to do further operations like filtering, decoding values, etc. For a while, <a href="https://www.cs.cmu.edu/~pavlo/blog/2026/01/2025-databases-retrospective.html?#fileformats">researchers and private companies</a> have been working on alternatives to Parquet that could improve on some of Parquet’s shortcomings. Vortex, from the SpiralDB team, is one of them.</p>

<h2 id="what-is-vortex">What is Vortex?</h2>

<p><a href="https://vortex.dev/">Vortex</a> is an extensible, open source format for columnar data. It was created to handle heterogeneous compute patterns and different data modalities. But, what does this mean?</p>

<blockquote>
  <p>The project was donated to the Linux Foundation by the <a href="https://spiraldb.com/post/vortex-a-linux-foundation-project">SpiralDB</a> team in August 2025.</p>
</blockquote>

<p>Vortex provides different layouts and encodings for different data types. Some of the most notable are <a href="/library/alp/">ALP</a> for floating point encoding or <a href="/2022/10/28/lightweight-compression.html">FSST</a> for string encoding. This lightweight compression strategy keeps data sizes down while allowing one of Vortex’s most important features: compute functions. By knowing the encoded layout of the data, Vortex is able to run arbitrary expressions on compressed data. This allows a Vortex reader to execute, for example, filter expressions within storage segments without decompressing data.</p>

<p>We mentioned heterogeneous compute to emphasize that Vortex was designed with the idea of having optimized layouts for different data types, including vectors, large text or even image or audio, but also to maximize CPU or GPU saturation. The idea is that decompression is deferred all the way to the GPU or CPU, enabling what Vortex calls “late materialization”. The <a href="/library/fastlanes/">FastLanes</a> encoding, a project originating at CWI (like DuckDB), is one of the main drivers behind this feature.</p>

<p>Vortex also supports dynamically loaded libraries (similar to DuckDB extensions) to provide new encodings for specific types as well as specific compute functions, e.g. for geospatial data. Another very interesting feature is encoding WebAssembly into the file, which can allow the reader to benefit from specific compute kernels applied to the file.</p>

<p>Besides DuckDB, other engines such as DataFusion, Spark and Arrow already offer integration with Vortex.</p>

<blockquote>
  <p>For more information, check out the <a href="https://spiraldb.com/post/vortex-a-linux-foundation-project">Vortex documentation</a>.</p>
</blockquote>

<h2 id="the-duckdb-vortex-extension">The DuckDB Vortex Extension</h2>

<p>DuckDB is a database as the name says, yes, but it is also widely used as an engine to query many different data sources. Through core or community extensions, DuckDB can integrate with:</p>

<ul>
  <li>Databases like Snowflake, BigQuery or PostgreSQL.</li>
  <li>Lakehouse formats like Delta, Iceberg or DuckLake.</li>
  <li>File formats, most notably JSON, CSV, Parquet and most recently Vortex.</li>
</ul>

<blockquote>
  <p>The community has gotten very creative, though, so these days you can even read YAML and Markdown with DuckDB using <a href="/community_extensions/">community extensions</a>.</p>
</blockquote>

<p>All this is possible due to the DuckDB <a href="/docs/stable/extensions/overview.html">extension system</a>, which makes it relatively easy to implement logic to interact with different file formats or external systems.</p>

<p>The SpiralDB team built a <a href="https://github.com/vortex-data/duckdb-vortex">DuckDB extension</a>. Together with the <a href="https://duckdblabs.com/">DuckDB Labs</a> team, we have made the extension available as a <a href="/docs/stable/core_extensions/overview.html">core DuckDB extension</a>, so that the community can enjoy Vortex as a first-class citizen in DuckDB.</p>

<h3 id="example-usage">Example Usage</h3>

<p>Installing and using the Vortex extension is very simple:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSTALL</span><span class="n"> vortex</span><span class="p">;</span>
<span class="k">LOAD</span><span class="n"> vortex</span><span class="p">;</span>
</code></pre></div></div>

<p>Then, you can easily use it to read and write, similar to other extensions such as Parquet.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="nf">read_vortex</span><span class="p">(</span><span class="s1">'my.vortex'</span><span class="p">);</span>

<span class="k">COPY</span> <span class="p">(</span><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="nf">generate_series</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="n">t</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
<span class="k">TO</span> <span class="s1">'my.vortex'</span> <span class="p">(</span><span class="k">FORMAT</span> <span class="k">vortex</span><span class="p">);</span>
</code></pre></div></div>

<h3 id="why-vortex-and-duckdb">Why Vortex and DuckDB?</h3>

<p>Vortex claims to do well primarily at three use cases:</p>

<ul>
  <li>Traditional SQL analytics: Through late decompression and compute expressions on compressed data, Vortex can filter down data within the storage segment, reducing IO and memory consumption.</li>
  <li>Machine learning pre-processing pipelines: By supporting a wide variety of encodings for different data types, Vortex claims to be effective at reading and writing data, whether it is audio, text, images or vectors.</li>
  <li>AI model training: Encodings such as FastLanes allow for a very efficient copy of data to the GPU. Vortex is aiming at being able to copy data directly from S3 object storage to the GPU.</li>
</ul>

<p>The promise of more efficient IO and memory use through late decompression is a good reason to try DuckDB and Vortex for SQL analytics. On another note, if you are looking at running analytics on unified datasets that are used for multiple use cases, including pre-processing pipelines and AI training, then Vortex may be a good candidate since it is designed to fit all of these use cases well.</p>

<h3 id="performance-experiment">Performance Experiment</h3>

<p>For those who are number hungry, we decided to run a TPC-H benchmark scale factor 100 with DuckDB to understand how Vortex can perform as a storage format compared to Parquet. We tried to make the benchmark as fair as possible. These are the parameters:</p>

<ul>
  <li>Run on Mac M1 with 10 cores &amp; 32 GB of memory.</li>
  <li>The benchmark runs each query 5 times and the average is used for the final report.</li>
  <li>The DuckDB connection is closed after each query to try to make runs “colder” and avoid DuckDB's caching (particularly with Parquet) from influencing the results. OS page caching does have an influence in subsequent runs but we decided to acknowledge this factor and still keep the first run.</li>
  <li>Each TPC-H table is a single file, which means that lineitem files for Parquet and Vortex are quite large (both around 20 GB). This allows us to ignore the effect of globbing and having many small files.</li>
  <li>Data files used for the benchmark are generated with <a href="https://github.com/clflushopt/tpchgen-rs">tpchgen-rs</a> and are copied out using DuckDB’s Parquet and Vortex extensions.</li>
  <li>We compared Vortex against Parquet v1 and v2. The v2 specification allows for considerably faster reading than the v1 specification but many writers do not support this, so we thought it was worth including both.</li>
</ul>

<p><strong>The results are very good.</strong> The TPC-H benchmark runs 18% faster with respect to Parquet V2 and 35% faster than Parquet V1 (using the geometric means, which is the recommended approach).</p>

<p>Another interesting result is the standard deviation across runs. There was a considerable difference between the first (and coldest) run of each query and subsequent runs in Parquet, while Vortex performed very well across all runs with a much smaller standard deviation.</p>

<p><img src="/images/blog/duckdb-vortex/tpch_summary.png" alt="summary" /></p>

<table>
  <thead>
    <tr>
      <th style="text-align: left">Format</th>
      <th style="text-align: right">Geometric Mean (s)</th>
      <th style="text-align: right">Arithmetic Mean (s)</th>
      <th style="text-align: right">Avg Std Dev (s)</th>
      <th style="text-align: right">Total Time (s)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: left">parquet_v1</td>
      <td style="text-align: right">2.324712</td>
      <td style="text-align: right">2.875722</td>
      <td style="text-align: right">0.145914</td>
      <td style="text-align: right">63.265881</td>
    </tr>
    <tr>
      <td style="text-align: left">parquet_v2</td>
      <td style="text-align: right">1.839171</td>
      <td style="text-align: right">2.288013</td>
      <td style="text-align: right">0.182962</td>
      <td style="text-align: right">50.336281</td>
    </tr>
    <tr>
      <td style="text-align: left">vortex</td>
      <td style="text-align: right">1.507675</td>
      <td style="text-align: right">1.991289</td>
      <td style="text-align: right">0.078893</td>
      <td style="text-align: right">43.808349</td>
    </tr>
  </tbody>
</table>

<blockquote>
  <p>The times did vary across different runs of the same benchmark, and subsequent runs have yielded similar results but with slight variations. The differences between Parquet v2 and Vortex have always been around 12-18% in geometric means and around 8-14% in total times. Benchmarking is very hard!</p>
</blockquote>

<!-- markdownlint-disable MD040 MD046 -->

<details>
  <summary>
Click here to see a more detailed breakdown of the benchmark results.
</summary>

  <p>This figure shows the results per query, including the standard deviation error bar.<br />
<img src="/images/blog/duckdb-vortex/tpch_rowgram.png" alt="mean_per_query" /><br />
The following is the summary of the sizes of the datasets. Note that both Parquet v1 and v2 are using the default compression used by the DuckDB Parquet writer, which is Snappy. In this case, Vortex is not using any general-purpose compression but still keeps the data sizes competitive.</p>

  <table>
    <thead>
      <tr>
        <th style="text-align: left">Table</th>
        <th style="text-align: left">parquet_v1</th>
        <th style="text-align: left">parquet_v2</th>
        <th style="text-align: left">vortex</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td style="text-align: left">customer</td>
        <td style="text-align: left">1.15</td>
        <td style="text-align: left">0.99</td>
        <td style="text-align: left">1.06</td>
      </tr>
      <tr>
        <td style="text-align: left">lineitem</td>
        <td style="text-align: left">21.15</td>
        <td style="text-align: left">16.02</td>
        <td style="text-align: left">18.14</td>
      </tr>
      <tr>
        <td style="text-align: left">nation</td>
        <td style="text-align: left">0.00</td>
        <td style="text-align: left">0.00</td>
        <td style="text-align: left">0.00</td>
      </tr>
      <tr>
        <td style="text-align: left">orders</td>
        <td style="text-align: left">6.02</td>
        <td style="text-align: left">4.54</td>
        <td style="text-align: left">5.03</td>
      </tr>
      <tr>
        <td style="text-align: left">part</td>
        <td style="text-align: left">0.59</td>
        <td style="text-align: left">0.47</td>
        <td style="text-align: left">0.54</td>
      </tr>
      <tr>
        <td style="text-align: left">partsupp</td>
        <td style="text-align: left">4.07</td>
        <td style="text-align: left">3.33</td>
        <td style="text-align: left">3.72</td>
      </tr>
      <tr>
        <td style="text-align: left">region</td>
        <td style="text-align: left">0.00</td>
        <td style="text-align: left">0.00</td>
        <td style="text-align: left">0.00</td>
      </tr>
      <tr>
        <td style="text-align: left">supplier</td>
        <td style="text-align: left">0.07</td>
        <td style="text-align: left">0.06</td>
        <td style="text-align: left">0.07</td>
      </tr>
      <tr>
        <td style="text-align: left"><strong>total</strong></td>
        <td style="text-align: left">33.06</td>
        <td style="text-align: left">25.40</td>
        <td style="text-align: left">28.57</td>
      </tr>
    </tbody>
  </table>

</details>

<!-- markdownlint-enable MD040 MD046 -->

<h2 id="conclusion">Conclusion</h2>

<p>Vortex is a very interesting alternative to established columnar formats like Parquet. Its focus on lightweight compression encodings, late decompression and being able to run compute expressions on compressed data makes it very interesting for a wide range of use cases. With regard to DuckDB, we see that Vortex is already very performant for analytical queries, where it is on par or better than Parquet v2 on the TPC-H benchmark queries.</p>

<blockquote>
  <p>Vortex has been <a href="https://docs.vortex.dev/specs/file-format">backwards compatible</a> since version 0.36.0, which was released more than 6 months ago. Vortex is now at version 0.56.0.</p>
</blockquote>]]></content><author><name>Guillermo Sanchez, SpiralDB Team</name></author><category term="benchmark" /><summary type="html"><![CDATA[Vortex is a new columnar file format with a very promising design. SpiralDB and DuckDB Labs have partnered to give you a very fast experience while reading and writing Vortex files!]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/vortex.svg" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/vortex.svg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">DuckDB on LoongArch</title><link href="https://duckdb.org/2026/01/06/duckdb-on-loongarch-morefine.html" rel="alternate" type="text/html" title="DuckDB on LoongArch" /><published>2026-01-06T00:00:00+00:00</published><updated>2026-01-06T00:00:00+00:00</updated><id>https://duckdb.org/2026/01/06/duckdb-on-loongarch-morefine</id><content type="html" xml:base="https://duckdb.org/2026/01/06/duckdb-on-loongarch-morefine.html"><![CDATA[<p>It’s not every day that a new CPU architecture arrives on your desk. I grew up on the <a href="https://en.wikipedia.org/wiki/I486">Intel 486</a> back in the early 90s. I also still remember AMD releasing its <a href="https://en.wikipedia.org/wiki/X86-64#History">64-bit x86 extension</a> in 2000. Then not a lot happened until Apple released the ARM-based M1 architecture in 2020. But today is the day again (for me), with the long-awaited arrival of the “MOREFINE M700S” in our office.</p>

<p><img src="/images/blog/loongarch/morefine-computer.jpg" width="800" /></p>

<p>The M700S contains a Loongson CPU. Also called “LoongArch” or “Godson” processors, this CPU was developed in China <a href="https://www.tomshardware.com/pc-components/cpus/chinese-chipmaker-loongson-wins-case-over-rights-to-mips-architecture-companys-new-cpu-architecture-heavily-resembles-existing-mips">based</a> on the (somewhat esoteric) <a href="https://en.wikipedia.org/wiki/MIPS_architecture">MIPS architecture</a>. This is part of a plan to become technologically self-sufficient as part of the government-funded <a href="https://en.wikipedia.org/wiki/Made_in_China_2025">Made in China 2025</a> plan.</p>

<p>It is probably safe to assume that – given the ongoing trade shenanigans – the Loongson will become much more popular in China as time goes on. DuckDB already sees quite a lot of usage from China, so naturally we want to make sure that DuckDB runs well on the Loongson. Thankfully, one of our community members has already opened a <a href="https://github.com/duckdb/duckdb/pull/19962">pull request</a> with two minimal changes to allow DuckDB to compile. We became curious.</p>

<p>We purchased the M700S on (where else?) <a href="https://nl.aliexpress.com/item/1005008047862187.html?spm=a2g0o.order_list.order_list_main.5.685479d21SDmQG&amp;gatewayAdapt=glo2nld">AliExpress</a> for around 500 EUR. Besides the Loongson 8-core 3A6000 CPU it contains 16 GB of main memory and a 256 GB solid-state disk.</p>

<p><img src="/images/blog/loongarch/morefine-aliexpress-listing.png" width="800" /></p>

<p>Once plugged in and booted up, things feel pretty normal besides the loud fan that seems to be always on. On the screen, a variant of Debian called <a href="https://www.loongson.cn/EN/system/loongnix">Loongnix</a> boots up. The GUI seems to be KDE-based and comes with a custom browser “LBrowser” which is a fork of Chromium. Just because it was not obvious we document it here: the default <code class="language-plaintext highlighter-rouge">root</code> password is <code class="language-plaintext highlighter-rouge">M700S</code>. There is also a user account <code class="language-plaintext highlighter-rouge">m700s</code> with the same password.</p>

<p><img src="/images/blog/loongarch/loongnix.jpg" width="800" /></p>

<p>Overall, the software seems a little dated, even after running <code class="language-plaintext highlighter-rouge">apt upgrade</code>: the Linux kernel seems to be version 4.19, which was released back in 2018, and which has been EOL for a year now. The GCC version is 8.3, which similarly came out in 2019.</p>

<p>With the <a href="https://github.com/duckdb/duckdb/pull/19962">aforementioned patch</a>, we managed to compile DuckDB 1.4.3 on Loongnix. There was one small issue where the CMake file <code class="language-plaintext highlighter-rouge">append_metadata.cmake</code> was not compatible with the older CMake version (3.13.4) available on Loongnix. But simply replacing that file with an empty one allowed us to complete the build. Of course we could also have updated CMake, but life is short. Once completed, we ran DuckDB’s extensive unit test suite (<code class="language-plaintext highlighter-rouge">make allunit</code>) to confirm that our build runs correctly on the Loongson CPU. Results looked good.</p>

<p>For performance comparison, we re-used the methodology from our <a href="https://duckdb.org/2025/01/17/raspberryi-pi-tpch">previous blog post</a> that ran DuckDB on a Raspberry Pi. In short, we run the 22 TPC-H benchmark queries on “Scale Factor” 100 and 300, which in DuckDB format is a 25 GB and 78 GB database file, respectively. We compare those numbers with the nearest computer, which is my day-to-day MacBook Pro with an M3 Max CPU. For fairness, we limit DuckDB to 14 GB of RAM on both platforms. The reported timings are “hot” runs, meaning we re-ran the query set and took the timings from the second run.</p>

<p>Here are the results, and they are not great. We start with aggregated timings:</p>

<table>
  <thead>
    <tr>
      <th>SF</th>
      <th>System</th>
      <th style="text-align: right">Geometric mean</th>
      <th style="text-align: right">Sum</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>SF100</td>
      <td>MacBook</td>
      <td style="text-align: right">0.6</td>
      <td style="text-align: right">16.9</td>
    </tr>
    <tr>
      <td>SF100</td>
      <td>MOREFINE</td>
      <td style="text-align: right">6.1</td>
      <td style="text-align: right">192.8</td>
    </tr>
    <tr>
      <td>SF300</td>
      <td>MacBook</td>
      <td style="text-align: right">2.8</td>
      <td style="text-align: right">78.8</td>
    </tr>
    <tr>
      <td>SF300</td>
      <td>MOREFINE</td>
      <td style="text-align: right">27.3</td>
      <td style="text-align: right">791.6</td>
    </tr>
  </tbody>
</table>

<p>We can see that the MacBook is around <em>ten times faster</em> than the MOREFINE on this benchmark, both in the geometric mean of runtimes as well as in the sum.
If you are interested in the individual query runtimes, you can find them below.</p>
<details>
  <summary>
Click here to see the individual query runtimes.
</summary>
  <div>
<table>
<thead>
<tr>
<th style="text-align: right;">Q</th>
<th style="text-align: right;">SF100/MacBook</th>
<th style="text-align: right;">SF100/MOREFINE</th>
<th style="text-align: right;">SF300/MacBook</th>
<th style="text-align: right;">SF300/MOREFINE</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: right;">1</td>
<td style="text-align: right;">1.247</td>
<td style="text-align: right;">7.363</td>
<td style="text-align: right;">4.528</td>
<td style="text-align: right;">26.475</td>
</tr>
<tr>
<td style="text-align: right;">2</td>
<td style="text-align: right;">0.117</td>
<td style="text-align: right;">1.058</td>
<td style="text-align: right;">0.474</td>
<td style="text-align: right;">4.101</td>
</tr>
<tr>
<td style="text-align: right;">3</td>
<td style="text-align: right;">0.697</td>
<td style="text-align: right;">8.563</td>
<td style="text-align: right;">2.759</td>
<td style="text-align: right;">32.432</td>
</tr>
<tr>
<td style="text-align: right;">4</td>
<td style="text-align: right;">0.570</td>
<td style="text-align: right;">7.348</td>
<td style="text-align: right;">2.331</td>
<td style="text-align: right;">27.185</td>
</tr>
<tr>
<td style="text-align: right;">5</td>
<td style="text-align: right;">0.631</td>
<td style="text-align: right;">8.498</td>
<td style="text-align: right;">3.217</td>
<td style="text-align: right;">34.462</td>
</tr>
<tr>
<td style="text-align: right;">6</td>
<td style="text-align: right;">0.180</td>
<td style="text-align: right;">1.236</td>
<td style="text-align: right;">1.395</td>
<td style="text-align: right;">13.225</td>
</tr>
<tr>
<td style="text-align: right;">7</td>
<td style="text-align: right;">0.620</td>
<td style="text-align: right;">7.702</td>
<td style="text-align: right;">3.119</td>
<td style="text-align: right;">37.411</td>
</tr>
<tr>
<td style="text-align: right;">8</td>
<td style="text-align: right;">0.640</td>
<td style="text-align: right;">5.593</td>
<td style="text-align: right;">3.611</td>
<td style="text-align: right;">29.914</td>
</tr>
<tr>
<td style="text-align: right;">9</td>
<td style="text-align: right;">1.906</td>
<td style="text-align: right;">30.560</td>
<td style="text-align: right;">6.670</td>
<td style="text-align: right;">99.884</td>
</tr>
<tr>
<td style="text-align: right;">10</td>
<td style="text-align: right;">0.923</td>
<td style="text-align: right;">11.755</td>
<td style="text-align: right;">4.036</td>
<td style="text-align: right;">40.412</td>
</tr>
<tr>
<td style="text-align: right;">11</td>
<td style="text-align: right;">0.102</td>
<td style="text-align: right;">1.037</td>
<td style="text-align: right;">0.709</td>
<td style="text-align: right;">4.444</td>
</tr>
<tr>
<td style="text-align: right;">12</td>
<td style="text-align: right;">0.535</td>
<td style="text-align: right;">6.422</td>
<td style="text-align: right;">2.918</td>
<td style="text-align: right;">31.501</td>
</tr>
<tr>
<td style="text-align: right;">13</td>
<td style="text-align: right;">1.847</td>
<td style="text-align: right;">21.185</td>
<td style="text-align: right;">6.394</td>
<td style="text-align: right;">74.081</td>
</tr>
<tr>
<td style="text-align: right;">14</td>
<td style="text-align: right;">0.408</td>
<td style="text-align: right;">5.616</td>
<td style="text-align: right;">3.240</td>
<td style="text-align: right;">26.613</td>
</tr>
<tr>
<td style="text-align: right;">15</td>
<td style="text-align: right;">0.252</td>
<td style="text-align: right;">2.652</td>
<td style="text-align: right;">1.906</td>
<td style="text-align: right;">17.454</td>
</tr>
<tr>
<td style="text-align: right;">16</td>
<td style="text-align: right;">0.273</td>
<td style="text-align: right;">3.108</td>
<td style="text-align: right;">0.879</td>
<td style="text-align: right;">11.480</td>
</tr>
<tr>
<td style="text-align: right;">17</td>
<td style="text-align: right;">0.805</td>
<td style="text-align: right;">5.184</td>
<td style="text-align: right;">4.655</td>
<td style="text-align: right;">28.469</td>
</tr>
<tr>
<td style="text-align: right;">18</td>
<td style="text-align: right;">1.538</td>
<td style="text-align: right;">15.492</td>
<td style="text-align: right;">7.619</td>
<td style="text-align: right;">71.845</td>
</tr>
<tr>
<td style="text-align: right;">19</td>
<td style="text-align: right;">0.779</td>
<td style="text-align: right;">9.143</td>
<td style="text-align: right;">4.379</td>
<td style="text-align: right;">39.111</td>
</tr>
<tr>
<td style="text-align: right;">20</td>
<td style="text-align: right;">0.441</td>
<td style="text-align: right;">4.993</td>
<td style="text-align: right;">3.234</td>
<td style="text-align: right;">25.967</td>
</tr>
<tr>
<td style="text-align: right;">21</td>
<td style="text-align: right;">1.996</td>
<td style="text-align: right;">23.231</td>
<td style="text-align: right;">9.503</td>
<td style="text-align: right;">96.452</td>
</tr>
<tr>
<td style="text-align: right;">22</td>
<td style="text-align: right;">0.441</td>
<td style="text-align: right;">5.036</td>
<td style="text-align: right;">1.237</td>
<td style="text-align: right;">18.709</td>
</tr>
</tbody>
</table>

</div>
</details>

<p>It is always exciting to get DuckDB running on a new platform. Of course, we have built DuckDB to be ulta-portable and agnostic to hardware environments while still delivering excellent performance. So it was not that surprising that it was not that difficult to get DuckDB running on the MOREFINE with its new-ish CPU. However, performance on the standard TPC-H benchmark was not that impressive, with the MacBook being around ten times faster than the MOREFINE.</p>

<p>Of course, there are many opportunities for improvement. For starters, the <code class="language-plaintext highlighter-rouge">gcc</code> toolchain on LoongArch is likely not as advanced by far compared with its x86/ARM counterpart, so advances there could make a big difference. The same applies of course to IO performance, which we have not measured separately. But hey, the “glass half full” department could also rightfully claim that the Loongson CPU can complete TPC-H SF300!</p>

<p>One could also argue that a MacBook Pro is much more expensive than 500 EUR MOREFINE. However, a recent M4 Mac Mini with the same memory and storage specs will cost around 700 EUR, not that much more all things considered. It will run circles around the MOREFINE. And it will not constantly annoy you with its fan.</p>]]></content><author><name>{&quot;twitter&quot; =&gt; &quot;hfmuehleisen&quot;, &quot;picture&quot; =&gt; &quot;/images/blog/authors/hannes_muehleisen.jpg&quot;}</name></author><category term="benchmark" /><summary type="html"><![CDATA[In today's “What's on your desk?” episode, we test a Loongson CPU with the LoongArch architecture.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/morefine-m700s.jpg" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/morefine-m700s.jpg" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Iceberg in the Browser</title><link href="https://duckdb.org/2025/12/16/iceberg-in-the-browser.html" rel="alternate" type="text/html" title="Iceberg in the Browser" /><published>2025-12-16T00:00:00+00:00</published><updated>2025-12-16T00:00:00+00:00</updated><id>https://duckdb.org/2025/12/16/iceberg-in-the-browser</id><content type="html" xml:base="https://duckdb.org/2025/12/16/iceberg-in-the-browser.html"><![CDATA[<p>In this post, we describe the current patterns for interacting with Iceberg Catalogs, and pose the question: could it be done from a browser?
After elaborating on the DuckDB ecosystem changes required to unlock this capability, we demonstrate our approach to interacting with an Iceberg REST Catalog.
It's browser-only, no extra setup required.</p>

<h2 id="interaction-models-for-iceberg-catalogs">Interaction Models for Iceberg Catalogs</h2>

<p><img src="/images/blog/iceberg-wasm/iceberg-analytics-today-dark.svg" alt="Iceberg analytics today" class="darkmode-img" />
<img src="/images/blog/iceberg-wasm/iceberg-analytics-today-light.svg" alt="Iceberg analytics today" class="lightmode-img" /></p>

<p><em>Iceberg</em> is an <em>open table format,</em> which allows you to capture a mutable database table as a set of static files on object storage (such as AWS S3).
<em>Iceberg catalogs</em> allow you to track and organize Iceberg tables.
For example, <a href="https://iceberg.apache.org/rest-catalog-spec/">Iceberg REST Catalogs</a> provide these functionalities through a REST API.</p>

<p>There are two common ways to interact with Iceberg catalogs:</p>

<ul>
  <li>The <em>client–server model,</em> where the compute part of the operation is delegated to a managed infrastructure (such as the cloud). Users can interact with the server by installing a local client or using a lightweight client such as a browser.</li>
  <li>The <em>client-is-the-server model,</em> where the user first installs the relevant libraries, and then performs queries directly on their machine.</li>
</ul>

<p>Iceberg engines follow these interaction models: they are either run natively in managed compute infrastructure or they are run locally by the user.
Let's see how things look with DuckDB in the mix!</p>

<h2 id="iceberg-with-duckdb">Iceberg with DuckDB</h2>

<p><img src="/images/blog/iceberg-wasm/iceberg-with-duckdb-dark.svg" alt="Iceberg with DuckDB" class="darkmode-img" />
<img src="/images/blog/iceberg-wasm/iceberg-with-duckdb-light.svg" alt="Iceberg with DuckDB" class="lightmode-img" /></p>

<p>DuckDB supports both Iceberg interaction models.
In the <em>client–server model,</em> DuckDB runs on the server to read the Iceberg datasets.
From the user's point of view, the choice of engine is transparent, and DuckDB is just one of many engines that the server could use in the background.
The <em>client-is-the-server</em> model is more interesting:
here, users <a href="/install/">install a DuckDB client locally</a>
and use it through its SQL interface to query Iceberg catalogs.
For example:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">SECRET</span> <span class="n">test_secret</span> <span class="p">(</span>
    <span class="k">TYPE</span> <span class="k">S3</span><span class="p">,</span> 
    <span class="k">KEY_ID</span> <span class="s1">'</span><span class="ge">AKIAIOSFODNN7EXAMPLE</span><span class="s1">'</span><span class="p">,</span>
    <span class="k">SECRET</span> <span class="s1">'</span><span class="ge">wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY</span><span class="s1">'</span>
<span class="p">);</span>

<span class="k">ATTACH</span> <span class="s1">'</span><span class="ge">warehouse</span><span class="s1">'</span> <span class="k">AS</span> <span class="n">db</span> <span class="p">(</span>
    <span class="k">TYPE</span> <span class="n">ICEBERG</span><span class="p">,</span>
    <span class="k">ENDPOINT_URL</span> <span class="s1">'</span><span class="ge">https://your-iceberg-endpoint</span><span class="s1">'</span><span class="p">,</span>
<span class="p">);</span>

<span class="k">SELECT</span> <span class="nf">sum</span><span class="p">(</span><span class="n">value</span><span class="p">)</span>
<span class="k">FROM</span> <span class="n">db.table</span>
<span class="k">WHERE</span> <span class="n">other_column</span> <span class="o">=</span> <span class="s1">'</span><span class="ge">some_value</span><span class="s1">'</span><span class="p">;</span>
</code></pre></div></div>

<p>The client-is-the-server model unlocks <a href="https://youtu.be/YQEUkFWa69o?t=3085">empowered clients</a>, which can operate directly on the data.</p>

<blockquote>
  <p>You can discover the full DuckDB-Iceberg extension feature set, including insert and update capabilities, in our <a href="/2025/11/28/iceberg-writes-in-duckdb.html">earlier blog post</a>.</p>
</blockquote>

<h2 id="iceberg-with-duckdb-in-the-browser">Iceberg with DuckDB in the Browser</h2>

<p>While setting up a local DuckDB installation is quite simple, opening a browser tab is even quicker.
Therefore, we asked ourselves: could we support the <em>client-is-the-server</em> model directly from within a browser tab?
This could provide a zero-setup, no-infrastructure, properly serverless option for interacting with Iceberg catalogs.</p>

<p><img src="/images/blog/iceberg-wasm/duckdb-iceberg-with-duckdb-wasm-dark.svg" alt="Iceberg with DuckDB-Wasm" class="darkmode-img" />
<img src="/images/blog/iceberg-wasm/duckdb-iceberg-with-duckdb-wasm-light.svg" alt="Iceberg with DuckDB-Wasm" class="lightmode-img" /></p>

<p>Luckily, DuckDB has a client that can run in any browser!
<a href="/docs/stable/clients/wasm/overview.html">DuckDB-Wasm</a> is a WebAssembly port of DuckDB, which <a href="/2023/12/18/duckdb-extensions-in-wasm.html">supports loading of extensions</a>.</p>

<p>Interacting with an Iceberg REST Catalog requires a number of functionalities; the ability to talk to a REST API over HTTP(S), the ability to read and write <code class="language-plaintext highlighter-rouge">avro</code> and <code class="language-plaintext highlighter-rouge">parquet</code> files on object storage, and finally, the ability to negotiate authentication to access those resources on behalf of the user. All of these must be done from within a browser without calling any native components.</p>

<p>To support these functionalities, we implemented the following high-level changes:</p>

<ul>
  <li>In the core <code class="language-plaintext highlighter-rouge">duckdb</code> codebase, we redesigned HTTP interactions, so that extensions and clients have a uniform interface to the networking stack. (<a href="https://github.com/duckdb/duckdb/pull/17464">PR</a>)</li>
  <li>In <code class="language-plaintext highlighter-rouge">duckdb-wasm</code>, we implemented such an interface, which in this case is a wrapper around the available JavaScript network stack. (<a href="https://github.com/duckdb/duckdb-wasm/pull/2056">PR</a>)</li>
  <li>In <code class="language-plaintext highlighter-rouge">duckdb-iceberg</code>, we routed all networking through the common HTTP interface, so that native DuckDB and DuckDB-Wasm execute the same logic. (<a href="https://github.com/duckdb/duckdb-iceberg/pull/576">PR</a>)</li>
</ul>

<p><strong>The result is that you can now query Iceberg with DuckDB running directly in a browser!</strong> Now you can access the same Iceberg catalog using <em>client–server</em>, <em>client-is-the-server</em>, or properly serverless from the isolation of a browser tab!</p>

<h2 id="welcome-to-serverless-iceberg-analytics">Welcome to Serverless Iceberg Analytics</h2>

<p>Check out our demo of serverless Iceberg analytics using the <a href="/visualizer/?iceberg" class="button yellow">DuckDB Table Visualizer</a></p>

<video muted="" controls="" loop="" width="700">
  <source src="https://blobs.duckdb.org/videos/iceberg-wasm-demo.mp4" type="video/mp4" />
</video>

<blockquote>
  <p>The current credentials in the demo are provided via a throwaway account with minimal permissions. If you enter your own credentials and share a link, you will be sharing your credentials.</p>
</blockquote>

<h2 id="access-your-own-data">Access Your Own Data</h2>

<p>Substituting your own S3 Tables bucket ARN and credentials with policy <a href="https://us-east-1.console.aws.amazon.com/iam/home?region=us-east-2#/policies/details/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAmazonS3TablesReadOnlyAccess"><code class="language-plaintext highlighter-rouge">AmazonS3TablesReadOnlyAccess</code></a>, you can also access your catalog, metadata and data.
Computations are fully local, and the credentials and warehouse ID are only sent to the catalog endpoint specified in your <code class="language-plaintext highlighter-rouge">ATTACH</code> command.
Inputs are translated to SQL, and added to the hash segment of the URL.</p>

<p>This means that:</p>

<ul>
  <li>no sensitive data is handled or sent to <code class="language-plaintext highlighter-rouge">duckdb.org</code></li>
  <li>computations are local, fully in your browser</li>
  <li>you can use the familiar SQL interface with the same code snippets that can run everywhere DuckDB runs</li>
  <li>if you edit the credentials and share the resulting link, you will be sharing the new credentials</li>
</ul>

<p>As of today, this works with <a href="/docs/stable/core_extensions/iceberg/amazon_s3_tables.html">Amazon S3 Tables</a>. This has been implemented through a collaboration with the Amazon S3 Tables team.
To learn more about S3 Tables, how to get started and their feature set, you can take a look at their <a href="https://aws.amazon.com/s3/features/tables/">product page</a> or <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables.html">documentation</a>.
A demo of DuckDB querying S3 Tables from a browser was presented at AWS re:Invent 2025 – <a href="https://www.youtube.com/watch?v=Pi82g0YGklU&amp;t=2603s">see the presentation</a>.</p>

<h2 id="conclusion">Conclusion</h2>

<p>The DuckDB-Iceberg extension is now supported in DuckDB-Wasm and it can read and edit Iceberg REST Catalogs.
Users can now access Iceberg data from within a browser, without having to install or manage any compute nodes!</p>

<p>If you would like to provide feedback or file issues, please reach out to us on either the <a href="https://github.com/duckdb/duckdb-wasm">DuckDB-Wasm</a> or <a href="https://github.com/duckdb/duckdb-iceberg">DuckDB-Iceberg</a> repository. If you are interested in using any part of this within your organization, feel free to <a href="https://duckdblabs.com/contact/">reach out</a>.</p>]]></content><author><name>Carlo Piovesan, Tom Ebergen, Gábor Szárnyas</name></author><category term="deep dive" /><summary type="html"><![CDATA[DuckDB is the first end-to-end interface to Iceberg REST Catalogs within a browser tab. You can now read and write tables in Iceberg catalogs without needing to manage any infrastructure – directly from your browser!]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/iceberg-in-the-browser.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/iceberg-in-the-browser.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Announcing DuckDB 1.4.3 LTS</title><link href="https://duckdb.org/2025/12/09/announcing-duckdb-143.html" rel="alternate" type="text/html" title="Announcing DuckDB 1.4.3 LTS" /><published>2025-12-09T00:00:00+00:00</published><updated>2025-12-09T00:00:00+00:00</updated><id>https://duckdb.org/2025/12/09/announcing-duckdb-143</id><content type="html" xml:base="https://duckdb.org/2025/12/09/announcing-duckdb-143.html"><![CDATA[<p>In this blog post, we highlight a few important fixes in DuckDB v1.4.3, the third patch release in <a href="/2025/09/16/announcing-duckdb-140.html">DuckDB's 1.4 LTS line</a>.
You can find the complete <a href="https://github.com/duckdb/duckdb/releases/tag/v1.4.3">release notes on GitHub</a>.</p>

<p>To install the new version, please visit the <a href="/install/">installation page</a>.</p>

<h2 id="fixes">Fixes</h2>

<p>This version ships a number of performance improvements and bugfixes.</p>

<h3 id="correctness">Correctness</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/issues/18782"><code class="language-plaintext highlighter-rouge">#18782</code> Incorrect “rows affected” was reported by ART index</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19313"><code class="language-plaintext highlighter-rouge">#19313</code> Wrong result in corner case: a <code class="language-plaintext highlighter-rouge">HAVING</code> clause without a <code class="language-plaintext highlighter-rouge">GROUP BY</code> returned an incorrect result</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19517"><code class="language-plaintext highlighter-rouge">#19517</code> <code class="language-plaintext highlighter-rouge">JOIN</code> with a <code class="language-plaintext highlighter-rouge">LIKE</code> pattern resulted in columns being incorrectly included</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19924"><code class="language-plaintext highlighter-rouge">#19924</code> The optimizer incorrectly removed the <code class="language-plaintext highlighter-rouge">ORDER BY</code> from aggregates</a></li>
  <li><a href="https://github.com/duckdb/duckdb/pull/19970"><code class="language-plaintext highlighter-rouge">#19970</code> Fixed updates on indexed tables with DICT_FSST compression</a></li>
  <li><a href="https://github.com/duckdb/duckdb/pull/20009"><code class="language-plaintext highlighter-rouge">#20009</code> Fixed updates with DICT_FSST compression</a></li>
</ul>

<h3 id="crashes-and-internal-errors">Crashes and Internal Errors</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/issues/19469"><code class="language-plaintext highlighter-rouge">#19469</code> Potential error occurred in constraint violation message when checking foreign key constraints</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19754"><code class="language-plaintext highlighter-rouge">#19754</code> Race condition could trigger a segfault in the encryption key cache</a></li>
  <li><a href="https://github.com/duckdb/duckdb/pull/20044"><code class="language-plaintext highlighter-rouge">#20044</code> Fixed edge case in index deletion code path</a></li>
</ul>

<h3 id="performance">Performance</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/issues/18997"><code class="language-plaintext highlighter-rouge">#18997</code> Macro binding had slow performance for unbalanced trees</a></li>
  <li><a href="https://github.com/duckdb/duckdb/pull/19901"><code class="language-plaintext highlighter-rouge">#19901</code> Memory management has been improved during WAL replay in the presence of indexes</a></li>
  <li>The <a href="/docs/stable/core_extensions/vortex.html"><code class="language-plaintext highlighter-rouge">vortex</code> extension</a> ships significant performance improvements for writing Vortex files</li>
</ul>

<h3 id="miscellaneous">Miscellaneous</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/issues/19575"><code class="language-plaintext highlighter-rouge">#19575</code> Invalid Unicode error with <code class="language-plaintext highlighter-rouge">LIKE</code> expressions</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19916"><code class="language-plaintext highlighter-rouge">#19916</code> The default time zone of DuckDB-Wasm had an offset inverted from what it should be</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19884"><code class="language-plaintext highlighter-rouge">#19884</code> Copying to Parquet with a prepared statement did not work</a></li>
</ul>

<h2 id="azure-blob-storage-writes">Azure Blob Storage Writes</h2>

<p>The <a href="/docs/stable/core_extensions/azure.html"><code class="language-plaintext highlighter-rouge">azure</code> extension</a> can now <a href="https://github.com/duckdb/duckdb-azure/pull/131">write to the Azure Blob Storage</a>.
This unlocks several other Azure and Fabric features, including using <a href="https://learn.microsoft.com/en-us/fabric/onelake/onelake-overview">OneLake</a> instances.</p>

<h2 id="windows-arm64">Windows Arm64</h2>

<p>With this release, we are introducing beta support for Windows Arm64 by distributing native DuckDB extensions and Python wheels.</p>

<h3 id="extension-distribution">Extension Distribution</h3>

<p>On Windows Arm64, you can now natively install core extensions, including complex ones like <a href="/docs/stable/core_extensions/spatial/overview.html"><code class="language-plaintext highlighter-rouge">spatial</code></a>:</p>

<div class="language-batch highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">duckdb</span>
</code></pre></div></div>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">PRAGMA</span> <span class="py">platform</span><span class="p">;</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────────────┐
│   platform    │
│    varchar    │
├───────────────┤
│ windows_arm64 │
└───────────────┘
</code></pre></div></div>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSTALL</span><span class="n"> spatial</span><span class="p">;</span>
<span class="k">LOAD</span><span class="n"> spatial</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="nf">ST_Area</span><span class="p">(</span><span class="nf">ST_GeomFromText</span><span class="p">(</span>
        <span class="s1">'POLYGON((0 0, 4 0, 4 3, 0 3, 0 0))'</span>
    <span class="p">))</span> <span class="k">AS</span> <span class="n">area</span><span class="p">;</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌────────┐
│  area  │
│ double │
├────────┤
│  12.0  │
└────────┘
</code></pre></div></div>

<h3 id="python-wheel-distribution">Python Wheel Distribution</h3>

<p>We now distribute Python wheels for Windows Arm64 for Python 3.11+. This means that you take e.g. a Copilot+ PC, install the native Python interpreter and run:</p>

<div class="language-batch highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">pip </span>install <span class="nb">duckdb</span>
</code></pre></div></div>

<p>This installs the <code class="language-plaintext highlighter-rouge">duckdb</code> package using the binary distributed through <a href="https://pypi.org/project/duckdb/">PyPI</a>.
Then, you can use it as follows:</p>

<div class="language-batch highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">python</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Python 3.13.9
    (tags/v3.13.9:8183fa5, Oct 14 2025, 14:51:39)
    [MSC v.1944 64 bit (ARM64)] on win32

&gt;&gt;&gt; import duckdb
&gt;&gt;&gt; duckdb.__version__
'1.4.3'
</code></pre></div></div>

<blockquote>
  <p>Currently, many Python installations that you'll find on Windows Arm64 computers use the x86_64 (AMD64) Python distribution and run through Microsoft's <a href="https://learn.microsoft.com/en-us/windows/arm/apps-on-arm-x86-emulation">Prism emulator</a>. For example, if you install Python through the Windows Store, you will get the Python AMD64 installation. To understand which platform your Python installation is using, observe the Python CLI's first line (e.g., <code class="language-plaintext highlighter-rouge">Python 3.13.9 ... (ARM64)</code>).</p>
</blockquote>

<h3 id="odbc-driver">ODBC Driver</h3>

<p>We are now shipping a native ODBC driver for Windows Arm64.
Head to the <a href="https://duckdb.org/install/?platform=windows&amp;environment=odbc">ODBC Windows installation page</a> to try it out!</p>

<h2 id="conclusion">Conclusion</h2>

<p>This post was a short summary of the changes in v1.4.3. As usual, you can find the <a href="https://github.com/duckdb/duckdb/releases/tag/v1.4.3">full release notes on GitHub</a>.
We would like to thank our contributors for providing detailed issue reports and patches.
Stay tuned for DuckDB v1.4.4 and v1.5.0, both released <a href="/release_calendar.html">early next year</a>!</p>]]></content><author><name>The DuckDB team</name></author><category term="release" /><summary type="html"><![CDATA[Today we are releasing DuckDB 1.4.3. Along with bugfixes, we are shipping native extensions and Python support for Windows Arm64.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-4-3-lts.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-4-3-lts.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Writes in DuckDB-Iceberg</title><link href="https://duckdb.org/2025/11/28/iceberg-writes-in-duckdb.html" rel="alternate" type="text/html" title="Writes in DuckDB-Iceberg" /><published>2025-11-28T00:00:00+00:00</published><updated>2025-11-28T00:00:00+00:00</updated><id>https://duckdb.org/2025/11/28/iceberg-writes-in-duckdb</id><content type="html" xml:base="https://duckdb.org/2025/11/28/iceberg-writes-in-duckdb.html"><![CDATA[<p>Over the past several months, the DuckDB Labs team has been hard at work on the <a href="/docs/stable/core_extensions/iceberg/overview.html">DuckDB-Iceberg extension</a>, with <em>full read support</em> and <em>initial write support</em> released in <a href="/2025/09/16/announcing-duckdb-140.html">v1.4.0</a>.
Today, we are happy to announce delete and update support for Iceberg v2 tables is available in <a href="/2025/11/12/announcing-duckdb-142.html">v1.4.2</a>!</p>

<p>The Iceberg open table format has become extremely popular in the past two years, with many databases announcing support for the open table format <a href="https://softwareengineeringdaily.com/2024/03/07/iceberg-at-netflix-and-beyond-with-ryan-blue/">originally developed at Netflix</a>. This past year the DuckDB team has made Iceberg integration a <a href="/roadmap.html">priority</a> and today we are happy to announce another step in that direction. In this blog post we will describe the current feature set of DuckDB-Iceberg in DuckDB v1.4.2.</p>

<h2 id="getting-started">Getting Started</h2>

<p>To experiment with the new DuckDB-Iceberg features, you will need to connect to your favorite Iceberg REST Catalog. There are many ways to connect to an Iceberg REST Catalog: please have a look at the <a href="/docs/stable/core_extensions/iceberg/iceberg_rest_catalogs.html">Connecting to REST Catalogs</a> for connecting to catalogs like <a href="https://polaris.apache.org/">Apache Polaris</a> or <a href="https://lakekeeper.io/">Lakekeeper</a> and the <a href="/docs/stable/core_extensions/iceberg/amazon_s3_tables.html">Connecting to S3 Tables</a> page if you would like to connect to <a href="https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-tables.html">Amazon S3 Tables</a>.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ATTACH</span> <span class="s1">'</span><span class="ge">warehouse_name</span><span class="s1">'</span> <span class="k">AS</span> <span class="n">iceberg_catalog</span> <span class="p">(</span>
    <span class="k">TYPE</span> <span class="k">iceberg</span><span class="p">,</span>
    <span class="ge">other options</span>
<span class="p">);</span>
</code></pre></div></div>

<h2 id="inserts-deletes-and-updates">Inserts, Deletes and Updates</h2>

<p>Support for creating tables and inserting to tables was already added in DuckDB v1.4.0: you can use standard DuckDB SQL syntax to insert data into your Iceberg table.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">iceberg_catalog.default.simple_table</span> <span class="p">(</span>
    <span class="n">col1</span> <span class="nb">INTEGER</span><span class="p">,</span>
    <span class="n">col2</span> <span class="nb">VARCHAR</span>
<span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">iceberg_catalog.default.simple_table</span>
    <span class="k">VALUES</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'hello'</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'world'</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'duckdb is great'</span><span class="p">);</span>
</code></pre></div></div>

<p>You can also use any DuckDB table scan function to insert data into an Iceberg table:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">iceberg_catalog.default.more_data</span>
    <span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="nf">read_parquet</span><span class="p">(</span><span class="s1">'path/to/parquet'</span><span class="p">);</span>
</code></pre></div></div>

<p>Starting with v1.4.2, the standard SQL syntax also works for deletes and updates:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">DELETE</span> <span class="k">FROM</span> <span class="n">iceberg_catalog.default.simple_table</span>
<span class="k">WHERE</span> <span class="n">col1</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>

<span class="k">UPDATE</span><span class="n"> iceberg_catalog.default.simple_table</span>
<span class="k">SET</span> <span class="n">col1</span> <span class="o">=</span> <span class="n">col1</span> <span class="o">+</span> <span class="mi">5</span>
<span class="k">WHERE</span> <span class="n">col1</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>

<span class="k">SELECT</span> <span class="o">*</span>
<span class="k">FROM</span> <span class="n">iceberg_catalog.default.simple_table</span><span class="p">;</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────┬─────────────────┐
│ col1  │      col2       │
│ int32 │     varchar     │
├───────┼─────────────────┤
│     3 │ duckdb is great │
│     6 │ hello           │
└───────┴─────────────────┘
</code></pre></div></div>

<blockquote>
  <p>The Iceberg write support current has two limitations:</p>

  <p>The update support is limited to <em>tables that are not partitioned and not sorted.</em> Attempting to perform update, insert or delete operations on partitioned or sorted tables using DuckDB-Iceberg will result in an error.</p>

  <p>DuckDB-Iceberg only writes positional deletes for <code class="language-plaintext highlighter-rouge">DELETE</code> and <code class="language-plaintext highlighter-rouge">UPDATE</code> statements. Copy-on-write functionality is not yet supported.</p>
</blockquote>

<h2 id="functions-for-table-properties">Functions for Table Properties</h2>

<p>Currently, DuckDB-Iceberg only supports <em>merge-on-read semantics</em>. Within <a href="https://iceberg.apache.org/spec/#table-metadata-fields">Iceberg Table Metadata</a>, table properties can be used to describe what form of deletes or updates are allowed. DuckDB-Iceberg will respect <code class="language-plaintext highlighter-rouge">write.update.mode</code> and <code class="language-plaintext highlighter-rouge">write.delete.mode</code> table properties for updates and deletes. If a table has these properties and they are not <code class="language-plaintext highlighter-rouge">merge-on-read</code>, DuckDB will throw an error and the <code class="language-plaintext highlighter-rouge">UPDATE</code> or <code class="language-plaintext highlighter-rouge">DELETE</code> will not be committed. Version v1.4.2 introduces three new functions to add, remove, and view table properties for an Iceberg table:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">set_iceberg_table_properties</code></li>
  <li><code class="language-plaintext highlighter-rouge">iceberg_table_properties</code></li>
  <li><code class="language-plaintext highlighter-rouge">remove_iceberg_table_properties</code></li>
</ul>

<p>You can use them as follows:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- to set table properties</span>
<span class="k">CALL</span> <span class="nf">set_iceberg_table_properties</span><span class="p">(</span><span class="n">iceberg_catalog.default.simple_table</span><span class="p">,</span> <span class="p">{</span>
    <span class="s1">'write.update.mode'</span><span class="p">:</span> <span class="s1">'merge-on-read'</span><span class="p">,</span>
    <span class="s1">'write.file.size'</span><span class="p">:</span> <span class="s1">'100000kb'</span>
<span class="p">});</span>
<span class="c1">-- to read table properties</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="nf">iceberg_table_properties</span><span class="p">(</span><span class="n">iceberg_catalog.default.simple_table</span><span class="p">);</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────────────────┬───────────────┐
│        key        │     value     │
│      varchar      │    varchar    │
├───────────────────┼───────────────┤
│ write.update.mode │ merge-on-read │
│ write.file.size   │ 100000kb      │
└───────────────────┴───────────────┘
</code></pre></div></div>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- to remove table properties</span>
<span class="k">CALL</span> <span class="nf">remove_iceberg_table_properties</span><span class="p">(</span>
    <span class="n">iceberg_catalog.default.simple_table</span><span class="p">,</span>
    <span class="p">[</span><span class="s1">'some.other.property'</span><span class="p">]</span>
<span class="p">);</span>
</code></pre></div></div>

<h2 id="iceberg-table-metadata">Iceberg Table Metadata</h2>

<p>DuckDB-Iceberg also allows you to view the metadata of your Iceberg tables using the <code class="language-plaintext highlighter-rouge">iceberg_metadata()</code> and <code class="language-plaintext highlighter-rouge">iceberg_snapshots()</code> functions.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="nf">iceberg_metadata</span><span class="p">(</span><span class="n">iceberg_catalog.default.table_1</span><span class="p">);</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌──────────────────────┬──────────────────────┬──────────────────┬─────────┬──────────────────┬─────────────────────────────────────────────────────────────┬─────────────┬──────────────┐
│    manifest_path     │ manifest_sequence_…  │ manifest_content │ status  │     content      │                         file_path                           │ file_format │ record_count │
│       varchar        │        int64         │     varchar      │ varchar │     varchar      │                          varchar                            │   varchar   │    int64     │
├──────────────────────┼──────────────────────┼──────────────────┼─────────┼──────────────────┼─────────────────────────────────────────────────────────────┼─────────────┼──────────────┤
│ s3://warehouse/def…  │                    1 │ DATA             │ ADDED   │ EXISTING         │ s3://&lt;storage_location&gt;/simple_table/data/019a6ecc-9e9e-7…  │ parquet     │            3 │
│ s3://warehouse/def…  │                    2 │ DELETE           │ ADDED   │ POSITION_DELETES │ s3://&lt;storage_location&gt;/simple_table/data/d65b1db8-9fa8-4…  │ parquet     │            1 │
│ s3://warehouse/def…  │                    3 │ DELETE           │ ADDED   │ POSITION_DELETES │ s3://&lt;storage_location&gt;/simple_table/data/8d1b92dc-5f6e-4…  │ parquet     │            1 │
│ s3://warehouse/def…  │                    3 │ DATA             │ ADDED   │ EXISTING         │ s3://&lt;storage_location&gt;/simple_table/data/019a6ecf-5261-7…  │ parquet     │            1 │
└──────────────────────┴──────────────────────┴──────────────────┴─────────┴──────────────────┴─────────────────────────────────────────────────────────────┴─────────────┴──────────────┘
</code></pre></div></div>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="nf">iceberg_snapshots</span><span class="p">(</span><span class="n">iceberg_catalog.default.simple_table</span><span class="p">);</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────────────┬─────────────────────┬─────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ sequence_number │     snapshot_id     │      timestamp_ms       │                                                manifest_list                                                 │
│     uint64      │       uint64        │        timestamp        │                                                   varchar                                                    │
├─────────────────┼─────────────────────┼─────────────────────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│               1 │ 1790528822676766947 │ 2025-11-10 17:24:55.075 │ s3://&lt;storage_location&gt;/simple_table/data/snap-1790528822676766947-f09658c4-ca52-4305-943f-6a8073529fef.avro │
│               2 │ 6333537230056014119 │ 2025-11-10 17:27:35.602 │ s3://&lt;storage_location&gt;/simple_table/data/snap-6333537230056014119-316d09bc-549d-46bc-ae13-a9fab5cbf09b.avro │
│               3 │ 7452040077415501383 │ 2025-11-10 17:27:52.169 │ s3://&lt;storage_location&gt;/simple_table/data/snap-7452040077415501383-93dee94e-9ec1-45fa-aec2-13ef434e50eb.avro │
└─────────────────┴─────────────────────┴─────────────────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

</code></pre></div></div>

<h2 id="time-travel">Time Travel</h2>

<p>Time travel is also possible via snapshot ids or timestamps using the <code class="language-plaintext highlighter-rouge">AT (VERSION =&gt; ...)</code> or <code class="language-plaintext highlighter-rouge">AT (TIMESTAMP =&gt; ...)</code> syntax.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- via snapshot id</span>
<span class="k">SELECT</span> <span class="o">*</span>
<span class="k">FROM</span> <span class="n">iceberg_catalog.default.simple_table</span> <span class="k">AT</span> <span class="p">(</span>
	<span class="k">VERSION</span> <span class="o">=&gt;</span> <span class="ge">snapshot_id</span>
<span class="p">);</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────┬─────────────────┐
│ col1  │      col2       │
│ int32 │     varchar     │
├───────┼─────────────────┤
│     1 │ hello           │
│     3 │ duckdb is great │
└───────┴─────────────────┘
</code></pre></div></div>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- via timestamp</span>
<span class="k">SELECT</span> <span class="o">*</span>
<span class="k">FROM</span> <span class="n">iceberg_catalog.default.simple_table</span> <span class="k">AT</span> <span class="p">(</span>
    <span class="nb">TIMESTAMP</span> <span class="o">=&gt;</span> <span class="s1">'2025-11-10 17:27:45.602'</span>
<span class="p">);</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────┬─────────────────┐
│ col1  │      col2       │
│ int32 │     varchar     │
├───────┼─────────────────┤
│     1 │ hello           │
│     3 │ duckdb is great │
└───────┴─────────────────┘
</code></pre></div></div>

<h2 id="viewing-requests-to-the-iceberg-rest-catalog">Viewing Requests to the Iceberg REST Catalog</h2>

<p>You may also be curious as to what requests DuckDB is making to the Iceberg REST Catalog.
To do so, enable HTTP <a href="/docs/stable/operations_manual/logging/overview.html">logging</a>, run your workload, then select from the <code class="language-plaintext highlighter-rouge">HTTP</code> logs.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CALL</span> <span class="nf">enable_logging</span><span class="p">(</span><span class="s1">'HTTP'</span><span class="p">);</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">iceberg_catalog.default.simple_table</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="n">request.type</span><span class="p">,</span> <span class="n">request.url</span><span class="p">,</span> <span class="n">response.status</span>
<span class="k">FROM</span> <span class="nf">duckdb_logs_parsed</span><span class="p">(</span><span class="s1">'HTTP'</span><span class="p">);</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────────────┐
│  type   │                                                                             url                          │       status       │
│ varchar │                                                                           varchar                        │      varchar       │
├─────────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────┤
│ GET     │ https://&lt;catalog_endpoint&gt;/iceberg/v1/&lt;warehouse&gt;/iceberg-testing/namespaces/default                     │ NULL               │
│ HEAD    │ https://&lt;catalog_endpoint&gt;/iceberg/v1/&lt;warehouse&gt;/iceberg-testing/namespaces/default/tables/simple_table │ NULL               │
│ GET     │ https://&lt;catalog_endpoint&gt;/iceberg/v1/&lt;warehouse&gt;/iceberg-testing/namespaces/default/tables/simple_table │ NULL               │
│ GET     │ https://&lt;storage_endpoint&gt;/data/snap-5943683398986255948-c2217dde-6036-4e07-88f2-…                       │ OK_200             │
│ GET     │ https://&lt;storage_endpoint&gt;/data/f8c95b93-7b6b-4a24-8557-b98b553723d4-m0.avro                             │ OK_200             │
│ GET     │ https://&lt;storage_endpoint&gt;/data/214a7988-da39-4dac-aa3a-4a73d3ead405-m0.avro                             │ OK_200             │
│ GET     │ https://&lt;storage_endpoint&gt;/data/019a7244-c6e8-7bc9-9dd4-7249fcb04959.parquet                             │ PartialContent_206 │
│ GET     │ https://&lt;storage_endpoint&gt;/data/019a7244-fcb5-7308-96ec-1c9e32509eab.parquet                             │ PartialContent_206 │
│ GET     │ https://&lt;storage_endpoint&gt;/data/7f14bb06-f57a-42b4-ba7f-053a65152759-m0.avro                             │ OK_200             │
│ GET     │ https://&lt;storage_endpoint&gt;/data/71f8b43d-51e7-40e7-be88-e8d869836ecd-deletes.parq…                       │ PartialContent_206 │
│ GET     │ https://&lt;storage_endpoint&gt;/data/64f6c6e2-2f54-470e-b990-b201bc615042-m0.avro                             │ OK_200             │
│ GET     │ https://&lt;storage_endpoint&gt;/data/4e54afed-6dd8-4ba0-88fb-16f972ac1d91-deletes.parq…                       │ PartialContent_206 │
├─────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────┤
│ 12 rows                                                                                                                       3 columns │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
</code></pre></div></div>

<p>Here we can see calls to the Iceberg REST Catalog, followed by calls to the storage endpoint. The first three calls to the Iceberg REST Catalog are to verify the schema still exists and to get the latest <code class="language-plaintext highlighter-rouge">metadata.json</code> of the DuckDB-Iceberg table. Next, it queries the manifest list, manifest files, and eventually the files with data and deletes. The data and delete files are stored locally in a cache to speed up subsequent reads.</p>

<h2 id="transactions">Transactions</h2>

<p>DuckDB is an ACID-compliant database that supports <a href="/docs/stable/sql/statements/transactions.html">transactions</a>.
Work on DuckDB-Iceberg has been made with this in mind. Within a transaction, the following conditions will hold for Iceberg tables.</p>

<ol>
  <li>The first time a table is read in a transaction, its snapshot information is stored in the transaction and will remain consistent within that transaction.</li>
  <li>Updates, inserts and deletes will only be committed to an Iceberg Table when the transaction is committed (i.e., <code class="language-plaintext highlighter-rouge">COMMIT</code>);</li>
</ol>

<p>Point #1 is important for read performance. If you wish to do analytics on an Iceberg table and you do not need to get the latest version of the table every time, running your analytics in a transaction will prevent fetching the latest version for every query.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- truncate the logs</span>
<span class="k">CALL</span> <span class="n">truncate_duckdb_logs</span><span class="p">();</span>
<span class="k">CALL</span> <span class="nf">enable_logging</span><span class="p">(</span><span class="s1">'HTTP'</span><span class="p">)</span>
<span class="k">BEGIN</span><span class="p">;</span>
<span class="c1">-- first read gets latest snapshot information</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">iceberg_catalog.default.simple_table</span><span class="p">;</span>
<span class="c1">-- subsequent read reads from local cached data</span>
<span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="n">iceberg_catalog.default.simple_table</span><span class="p">;</span>
<span class="c1">-- get logs</span>
<span class="k">SELECT</span> <span class="n">request.type</span><span class="p">,</span> <span class="n">request.url</span><span class="p">,</span> <span class="n">response.status</span>
<span class="k">FROM</span> <span class="nf">duckdb_logs_parsed</span><span class="p">(</span><span class="s1">'HTTP'</span><span class="p">);</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────┬────────────────────┐
│  type   │                                                  url                                                        │       status       │
│ varchar │                                                varchar                                                      │      varchar       │
├─────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────┤
│ GET     │ https://&lt;catalog_endpoint&gt;/iceberg/v1/&lt;warehouse&gt;/iceberg-testing/namespaces/default                        │ NULL               │
│ HEAD    │ https://&lt;catalog_endpoint&gt;/iceberg/v1/&lt;warehouse&gt;/iceberg-testing/namespaces/default/tables/simple_table    │ NULL               │
│ GET     │ https://&lt;catalog_endpoint&gt;/iceberg/v1/&lt;warehouse&gt;/iceberg-testing/namespaces/default/tables/simple_table    │ NULL               │
│ GET     │ https://&lt;storage_endpoint&gt;/data/snap-5943683398986255948-c2217dde-6036-4e07-88f2-1…                         │ OK_200             │
│ GET     │ https://&lt;storage_endpoint&gt;/data/f8c95b93-7b6b-4a24-8557-b98b553723d4-m0.avro                                │ OK_200             │
│ GET     │ https://&lt;storage_endpoint&gt;/data/214a7988-da39-4dac-aa3a-4a73d3ead405-m0.avro                                │ OK_200             │
│ GET     │ https://&lt;storage_endpoint&gt;/data/019a7244-c6e8-7bc9-9dd4-7249fcb04959.parquet                                │ PartialContent_206 │
│ GET     │ https://&lt;storage_endpoint&gt;/data/019a7244-fcb5-7308-96ec-1c9e32509eab.parquet                                │ PartialContent_206 │
│ GET     │ https://&lt;storage_endpoint&gt;/data/7f14bb06-f57a-42b4-ba7f-053a65152759-m0.avro                                │ OK_200             │
│ GET     │ https://&lt;storage_endpoint&gt;/data/71f8b43d-51e7-40e7-be88-e8d869836ecd-deletes.parquet                        │ PartialContent_206 │
│ GET     │ https://&lt;storage_endpoint&gt;/data/64f6c6e2-2f54-470e-b990-b201bc615042-m0.avro                                │ OK_200             │
│ GET     │ https://&lt;storage_endpoint&gt;/data/4e54afed-6dd8-4ba0-88fb-16f972ac1d91-deletes.parquet                        │ PartialContent_206 │
├─────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────┴────────────────────┤
│ 12 rows                                                                                                                          3 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
</code></pre></div></div>

<p>Here we see all the same requests we saw in the previous section. However, now we are in a transaction, which means the second time we read from <code class="language-plaintext highlighter-rouge">iceberg_catalog.default.simple_table</code>, we do not need to query the REST Catalog for table updates. This means DuckDB-Iceberg performs no extra requests when reading a table a second time, significantly improving performance.</p>

<h2 id="conclusion-and-future-work">Conclusion and Future Work</h2>

<p>With these features, DuckDB-Iceberg now has a strong base support for the Iceberg tables, which enables users to unlock the analytical powers of DuckDB on their Iceberg tables. There is still more work to come and the Iceberg table specification has many more features the DuckDB team would like to support in DuckDB-Iceberg. If you feel any feature is a priority for your analytical workloads, please reach out to us in the <a href="https://github.com/duckdb/duckdb-iceberg">DuckDB-Iceberg GitHub repository</a> or <a href="https://duckdblabs.com/contact/">get in touch</a> with our engineers.</p>

<p>Below is a list of improvements planned for the near future (in no particular order):</p>

<ul>
  <li>Performance improvements</li>
  <li>Updates / deletes / inserts to partitioned tables</li>
  <li>Updates / deletes / inserts to sorted tables</li>
  <li>Schema evolution</li>
  <li>Support for Iceberg v3 tables, focusing on binary deletion vectors and row lineage tracking</li>
</ul>]]></content><author><name>{&quot;twitter&quot; =&gt; &quot;the_Tmonster&quot;, &quot;picture&quot; =&gt; &quot;/images/blog/authors/tom_ebergen.jpg&quot;}</name></author><category term="deep dive" /><summary type="html"><![CDATA[We shipped a number of features and improvements to the DuckDB-Iceberg extension: insert, update, and delete statements are all supported now.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/iceberg-writes.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/iceberg-writes.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Data-at-Rest Encryption in DuckDB</title><link href="https://duckdb.org/2025/11/19/encryption-in-duckdb.html" rel="alternate" type="text/html" title="Data-at-Rest Encryption in DuckDB" /><published>2025-11-19T00:00:00+00:00</published><updated>2025-11-19T00:00:00+00:00</updated><id>https://duckdb.org/2025/11/19/encryption-in-duckdb</id><content type="html" xml:base="https://duckdb.org/2025/11/19/encryption-in-duckdb.html"><![CDATA[<blockquote>
  <p>If you would like to use encryption in DuckDB, we recommend using the latest stable version, v1.4.2. For more details, see the <a href="/2025/11/12/announcing-duckdb-142.html#vulnerabilities">latest release blog post</a>.</p>
</blockquote>

<p>Many years ago, we read the excellent “<a href="https://en.wikipedia.org/wiki/The_Code_Book">Code Book</a>” by <a href="https://en.wikipedia.org/wiki/Simon_Singh">Simon Singh</a>. Did you know that <a href="https://en.wikipedia.org/wiki/Mary,_Queen_of_Scots">Mary, Queen of Scots</a>, used an <a href="https://en.wikipedia.org/wiki/Caesar_cipher">encryption method harking back to Julius Caesar</a> to encrypt her more saucy letters? But alas: the cipher was broken and the contents of the letters got her <a href="https://en.wikipedia.org/wiki/Execution_of_Mary,_Queen_of_Scots">executed</a>.</p>

<p>These <a href="https://en.wikipedia.org/wiki/Crypto_Wars">days</a>, strong encryption software and hardware is a commodity. Modern CPUs <a href="https://developer.arm.com/documentation/ddi0602/2025-09/SIMD-FP-Instructions/AESE--AES-single-round-encryption-">come with specialized cryptography instructions</a>, and operating systems small and big contain <a href="https://www.heartbleed.com/">mostly</a>-robust cryptography software like OpenSSL.</p>

<p>Databases store arbitrary information, it is clear that many if not most datasets of any value should perhaps not be plainly available to everyone. Even if stored on tightly controlled hardware like a cloud virtual machine, there have been <a href="https://haveibeenpwned.com/">many cases</a> of files being lost through various privilege escalations. Unsurprisingly, compliance frameworks like the common <a href="https://secureframe.com/hub/soc-2/what-is-soc-2">SOC 2</a> “highly recommend” encrypting data when stored on storage mediums like hard drives.</p>

<p>However, database systems and encryption have a somewhat problematic track record. Even PostgreSQL, the self-proclaimed “The World's Most Advanced Open Source Relational Database” has very <a href="https://www.postgresql.org/docs/current/encryption-options.html">limited options</a> for data encryption. SQLite, the world’s “<a href="https://www.sqlite.org/mostdeployed.html">Most Widely Deployed and Used Database Engine</a>” does not support data encryption out-of-the-box, its encryption extension is <a href="https://sqlite.org/com/see.html">a $2000 add-on</a>.</p>

<p>DuckDB has supported <a href="https://parquet.apache.org/docs/file-format/data-pages/encryption/">Parquet Modular Encryption</a> <a href="https://duckdb.org/docs/stable/data/parquet/encryption">for a while</a>. This feature allows reading and writing Parquet files with encrypted columns. However, while Parquet files are great and <a href="https://materializedview.io/p/nimble-and-lance-parquet-killers">reports of their impending death</a> are greatly exaggerated, they cannot – for example – be updated in place, a pretty basic feature of a database management system.</p>

<p>Starting with DuckDB 1.4.0, DuckDB supports <em>transparent data encryption</em> of data-at-rest using industry-standard AES encryption.</p>

<blockquote>
  <p>DuckDB's encryption does not yet meet the official <a href="https://csrc.nist.gov/projects/cryptographic-standards-and-guidelines">NIST requirements</a>.
Please follow issue <a href="https://github.com/duckdb/duckdb/issues/20162"><code class="language-plaintext highlighter-rouge">#20162</code> “Store and verify tag for canary encryption”</a> to track our progress towards NIST-compliance.</p>
</blockquote>

<h2 id="some-basics-of-encryption">Some Basics of Encryption</h2>

<p>There are many different ways to encrypt data, some more secure than others. In database systems and elsewhere, the standard is the <a href="https://en.wikipedia.org/wiki/Advanced_Encryption_Standard">Advanced Encryption Standard</a> (AES), which is a block cipher algorithm standardized by <a href="https://en.wikipedia.org/wiki/National_Institute_of_Standards_and_Technology">US NIST</a>. AES is a symmetric encryption algorithm, meaning that the <em>same</em> key is used for both encryption and decryption of data.</p>

<p>For this reason, most systems choose to only support <em>randomized</em> encryption, meaning that identical plaintexts will always yield different ciphertexts (if used correctly!). The most commonly used industry standard and recommended encryption algorithm is AES – <a href="https://en.wikipedia.org/wiki/Galois/Counter_Mode">Galois Counter Mode</a> (AES-GCM). This is because on top of its ability to randomize encryption, it also <em>authenticates</em> data by calculating a tag to ensure data has not been tampered with.</p>

<p>DuckDB v1.4 supports encryption at rest using AES-GCM-256 and AES-CTR-256 (counter mode) ciphers. AES-CTR is a simpler and faster version of AES-GCM, but less secure, since it does not provide authentication by calculating a tag. The 256 refers to the size of the key in bits, meaning that DuckDB now only supports GCM with 32-byte keys.</p>

<p>GCM and CTR both require as input a (1) plaintext, (2) an initialization vector (IV) and (3) an encryption key. Plaintext is the text that a user wants to encrypt. An IV is a unique bytestream of usually 16 bytes, that ensures that identical plaintexts get encrypted into different ciphertexts. A <em>number used once</em> (nonce) is a bytestream of usually 12 bytes, that together with a 4-byte counter construct the IV. Note that the IV needs to be unique for every encrypted block, but it does not necessarily have to be random. Reuse of the same IV is problematic, since an attacker could XOR the two ciphertexts and extract both messages. The tag in AES-GCM is calculated after all blocks are encrypted, pretty much like a checksum, but it adds an integrity check that securely authenticates the entire ciphertext.</p>

<h2 id="implementation-in-duckdb">Implementation in DuckDB</h2>

<p>Before diving deeper into how we actually implemented encryption in DuckDB, we’ll explain some things about the DuckDB file format.</p>

<p>DuckDB has one <strong>main database header</strong> which stores data that enables it to correctly load and verify a DuckDB database. At the start of each DuckDB main database header, the magic bytes (“DUCKDB”) are stored and read upon initialization to verify whether the file is a valid DuckDB database file. The magic bytes are followed by four 8-byte of flags that can be set for different purposes.</p>

<p>When a database is encrypted in DuckDB, the main database header remains plaintext at all times, since the main header contains <em>no sensitive data</em> about the contents of the database file.
Upon initializing an encrypted database, DuckDB sets the first bit in the first flag to indicate that the database is encrypted. After setting this bit, additional metadata is stored that is necessary for encryption. This metadata entails the (1) <em>database identifier</em>, (2) 8 bytes of additional metadata for e.g. the encryption cipher used, and (3) the encrypted canary.</p>

<p>The <em>database identifier</em> is used as a “salt”, and consists of 16 randomly generated bytes created upon initialization of each database. The salt is often used to ensure uniqueness, i.e., it makes sure that identical input keys or passwords are transformed into <em>different</em> derived keys. The 8-bytes of metadata comprise the key derivation function (first byte), usage of additional authenticated data (second byte), the encryption cipher (third byte), and the key length (fifth byte). After the metadata, the main header uses the encrypted canary to check if the input key is correct.</p>

<h3 id="encryption-key-management">Encryption Key Management</h3>

<p>To encrypt data in DuckDB, you can use practically <em>any</em> plaintext or base64 encoded string, but we recommend using a secure 32-byte base64 key. The user itself is responsible for the key management and thus for using a secure key. Instead of directly using the plain key provided by the user, DuckDB always derives a more secure key by means of a key derivation function (kdf). The kdf is a function that reduces or extends the input key to a 32-byte secure key. If the correctness of the input key is checked by deriving the secure key and decrypting the canary, the derived key is managed in a <em>secure</em> encryption key cache. This cache manages encryption keys for the current DuckDB context and ensures that the derived encryption keys are never swapped to disk by locking its memory. To strengthen security even more, the original input keys are immediately wiped from memory when the input keys are transformed into secure derived keys.</p>

<h3 id="duckdb-block-structure">DuckDB Block Structure</h3>

<p>After the main database header, DuckDB stores two 4KB database headers that contain more information about e.g. the block (header) size and the storage version used. After keeping the main database header plaintext, <em>all</em> remaining headers and blocks are encrypted when encryption is used.</p>

<p>Blocks in DuckDB are by default 256KB, but their size is configurable. At the start of each <em>plaintext</em> block there is an 8-byte block header, which stores an 8-byte checksum. The checksum is a simple calculation that is often used in database systems to check for any corrupted data.</p>

<div>
    <img src="/images/blog/encryption/plaintext-block-light.svg" alt="Plaintext block" class="lightmode-img" />
    <img src="/images/blog/encryption/plaintext-block-dark.svg" alt="Plaintext block" class="darkmode-img" />
</div>

<p>For encrypted blocks however, its block header consists of 40 bytes instead of 8 bytes for the checksum. The block header for encrypted blocks contains a 16-byte <em>nonce/IV</em> and, optionally, a 16-byte <em>tag</em>, depending on which encryption cipher is used. The nonce and tag are stored in plaintext, but the checksum is encrypted for better security. Note that the block header always needs to be 8-bytes aligned to calculate the checksum.</p>

<div>
    <img src="/images/blog/encryption/encrypted-block-light.svg" alt="Encrypted block" class="lightmode-img" />
    <img src="/images/blog/encryption/encrypted-block-dark.svg" alt="Encrypted block" class="darkmode-img" />
</div>

<h3 id="write-ahead-log-encryption">Write-Ahead-Log Encryption</h3>

<p>The write ahead log (WAL) in database systems is a crash recovery mechanism to ensure <em>durability</em>. It is an append-only file that is used in scenarios where the database crashed or is abruptly closed, and when not all changes are written yet to the main database file. The WAL makes sure these changes can be replayed up to the last checkpoint; which is a consistent snapshot of the database at a certain point in time. This means, when a checkpoint is enforced, which happens in DuckDB by either (1) closing the database or (2) reaching a certain threshold for storage, the WAL gets written into the main database file.</p>

<p>In DuckDB, you can force the creation of a WAL by setting</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">PRAGMA</span> <span class="py">disable_checkpoint_on_shutdown</span><span class="p">;</span>
<span class="k">PRAGMA</span> <span class="py">wal_autocheckpoint</span> <span class="o">=</span> <span class="s1">'1TB'</span><span class="p">;</span>
</code></pre></div></div>

<p>This way you’ll disable a checkpointing on closing the database, meaning that the WAL does not get merged into the main database file. In addition, by setting wal_autocheckpoint to a high threshold, this will avoid intermediate checkpoints to happen and the WAL will persist. For example, we can create a persistent WAL file by first setting the above PRAGMAs, then attach an encrypted database, and then create a table where we insert 3 values.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ATTACH</span> <span class="s1">'encrypted.db'</span> <span class="k">AS</span> <span class="n">enc</span> <span class="p">(</span>
    <span class="k">ENCRYPTION_KEY</span> <span class="s1">'asdf'</span><span class="p">,</span>
    <span class="k">ENCRYPTION_CIPHER</span> <span class="s1">'GCM'</span>
<span class="p">);</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">enc.test</span> <span class="p">(</span><span class="n">a</span> <span class="nb">INTEGER</span><span class="p">,</span> <span class="n">b</span> <span class="nb">INTEGER</span><span class="p">);</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">enc.test</span> <span class="k">VALUES</span> <span class="p">(</span><span class="mi">11</span><span class="p">,</span> <span class="mi">22</span><span class="p">),</span> <span class="p">(</span><span class="mi">13</span><span class="p">,</span> <span class="mi">22</span><span class="p">),</span> <span class="p">(</span><span class="mi">12</span><span class="p">,</span> <span class="mi">21</span><span class="p">)</span>
</code></pre></div></div>

<p>If we now close the DuckDB process, we can see that there is a <code class="language-plaintext highlighter-rouge">.wal</code> file shown: <code class="language-plaintext highlighter-rouge">encrypted.db.wal</code>. But how is the WAL created internally?</p>

<p>Before writing new entries (inserts, updates, deletes) to the database, these entries are essentially logged and appended to the WAL. Only <em>after</em> logged entries are flushed to disk, a transaction is considered as committed. A plaintext WAL entry has the following structure:</p>

<div>
    <img src="/images/blog/encryption/plaintext-wal-entry-light.svg" alt="Plaintext block" class="lightmode-img" />
    <img src="/images/blog/encryption/plaintext-wal-entry-dark.svg" alt="Plaintext block" class="darkmode-img" />
</div>

<p>Since the WAL is append-only, we encrypt a WAL entry <em>per value</em>. For AES-GCM this means that we append a nonce and a tag to each entry. The structure in which we do this is depicted in below. When we serialize an encrypted entry to the encrypted WAL, we first store the length in plaintext, because we need to know how many bytes we should decrypt. The length is followed by a nonce, which on its turn is followed by the encrypted checksum and the encrypted entry itself. After the entry, a 16-byte tag is stored for verification.</p>

<div>
    <img src="/images/blog/encryption/encrypted-wal-entry-light.svg" alt="Plaintext block" class="lightmode-img" />
    <img src="/images/blog/encryption/encrypted-wal-entry-dark.svg" alt="Plaintext block" class="darkmode-img" />
</div>

<p>Encrypting the WAL is triggered by default when an encryption key is given for any (un)encrypted database.</p>

<h3 id="temporary-file-encryption">Temporary File Encryption</h3>

<p>Temporary files are used to store intermediate data that is often necessary for large, out-of-core operations such as <a href="/2025/09/24/sorting-again.html">sorting</a>, large joins and <a href="https://duckdb.org/2021/10/13/windowing">window functions</a>. This data could contain sensitive information and can, in case of a crash, remain on disk. To protect this leftover data, DuckDB automatically encrypts temporary files too.</p>

<h4 id="the-structure-of-temporary-files">The Structure of Temporary Files</h4>

<p>There are three different types of temporary files in DuckDB: (1) temporary files that have the same layout as a regular 256KB block, (2) compressed temporary files and (3) temporary files that exceed the standard 256KB block size. The former two are suffixed with .tmp, while the latter is distinguished by a suffix with .block. To keep track of the size of .block temporary files, they are always prefixed with its length. As opposed to regular database blocks, temporary files do not contain a checksum to check for data corruption, since the calculation of a checksum is somewhat expensive.</p>

<h4 id="encrypting-temporary-files">Encrypting Temporary Files</h4>

<p>Temporary files are encrypted (1) <strong>automatically</strong> when you attach an encrypted database or (2) when you use the setting <code class="language-plaintext highlighter-rouge">SET temp_file_encryption = true</code>. In the latter case, the main database file is plaintext, but the temporary files will be encrypted. For the encryption of temporary files DuckDB internally generates <em>temporary keys.</em> This means that when the database crashes, the temporary keys are also lost. Temporary files cannot be decrypted in this case and are then essentially garbage.</p>

<p>To force DuckDB to produce temporary files, you can use a simple trick by just setting the memory limit low. This will create temporary files once the memory limit is exceeded. For example, we can create a new encrypted database, load this database with TPC-H data (SF 1), and then set the memory limit to 1 GB. If we then perform a large join, we force DuckDB to spill intermediate data to disk. For example:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SET</span> <span class="py">memory_limit</span> <span class="o">=</span> <span class="s1">'1GB'</span><span class="p">;</span>
<span class="k">ATTACH</span> <span class="s1">'tpch_encrypted.db'</span> <span class="k">AS</span> <span class="n">enc</span> <span class="p">(</span>
    <span class="k">ENCRYPTION_KEY</span> <span class="s1">'asdf'</span><span class="p">,</span>
    <span class="k">ENCRYPTION_CIPHER</span> <span class="s1">'cipher'</span>
<span class="p">);</span>
<span class="k">USE</span> <span class="n">enc</span><span class="p">;</span>
<span class="k">CALL</span> <span class="nf">dbgen</span><span class="p">(</span><span class="k">sf</span> <span class="o">=</span> <span class="mi">1</span><span class="p">);</span>

<span class="k">ALTER</span> <span class="k">TABLE</span> <span class="n">lineitem</span>
    <span class="k">RENAME</span> <span class="k">TO</span> <span class="n">lineitem1</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">lineitem2</span> <span class="k">AS</span>
    <span class="k">FROM</span> <span class="n">lineitem1</span><span class="p">;</span>
<span class="k">CREATE</span> <span class="k">OR</span> <span class="k">REPLACE</span> <span class="k">TABLE</span> <span class="n">ans</span> <span class="k">AS</span>
    <span class="k">SELECT</span> <span class="n">l1</span><span class="p">.</span><span class="o">*</span> <span class="p">,</span> <span class="n">l2</span><span class="p">.</span><span class="o">*</span>
    <span class="k">FROM</span> <span class="n">lineitem1</span> <span class="n">l1</span>
    <span class="k">JOIN</span> <span class="n">lineitem2</span> <span class="n">l2</span> <span class="k">USING</span> <span class="p">(</span><span class="n">l_orderkey</span> <span class="p">,</span> <span class="n">l_linenumber</span><span class="p">);</span>
</code></pre></div></div>

<p>This sequence of commands will result in encrypted temporary files being written to disk. Once the query completes or when the DuckDB shell is exited, the temporary files are automatically cleaned up. In case of a crash however, it may happen that temporary files will be left on disk and need to be cleaned up manually.</p>

<h2 id="how-to-use-encryption-in-duckdb">How to Use Encryption in DuckDB</h2>

<p>In DuckDB, you can (1) encrypt an existing database, (2) initialize a new, empty encrypted database or (3) reencrypt a database. For example, let's create a new database, load this database with TPC-H data of scale factor 1 and then encrypt this database.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSTALL</span><span class="n"> tpch</span><span class="p">;</span>
<span class="k">LOAD</span><span class="n"> tpch</span><span class="p">;</span>
<span class="k">ATTACH</span> <span class="s1">'encrypted.duckdb'</span> <span class="k">AS</span> <span class="k">encrypted</span> <span class="p">(</span><span class="k">ENCRYPTION_KEY</span> <span class="s1">'asdf'</span><span class="p">);</span>
<span class="k">ATTACH</span> <span class="s1">'unencrypted.duckdb'</span> <span class="k">AS</span> <span class="n">unencrypted</span><span class="p">;</span>
<span class="k">USE</span> <span class="n">unencrypted</span><span class="p">;</span>
<span class="k">CALL</span> <span class="nf">dbgen</span><span class="p">(</span><span class="k">sf</span> <span class="o">=</span> <span class="mi">1</span><span class="p">);</span>
<span class="k">COPY</span> <span class="k">FROM</span> <span class="k">DATABASE</span> <span class="n">unencrypted</span> <span class="k">TO</span> <span class="k">encrypted</span><span class="p">;</span>
</code></pre></div></div>

<p>There is not a trivial way to prove that a database is encrypted, but correctly encrypted data should look like random noise and has a high entropy. So, to check whether a database is actually encrypted, we can use tools to calculate the entropy or visualize the binary, such as <a href="https://github.com/lsauer/entropy">ent</a> and <a href="https://github.com/sharkdp/binocle">binocle</a>.</p>

<p>When we use ent after executing the above chunk of SQL, i.e., <code class="language-plaintext highlighter-rouge">ent encrypted.duckdb</code>, this will result in an entropy of 7.99999 bits per byte. If we do the same for the plaintext (unencrypted) database, this results in 7.65876 bits per byte. Note that the plaintext database also has a high entropy, but this is due to compression.</p>

<p>Let’s now visualize both the plaintext and encrypted data with binocle. For the visualization we created both a plaintext DuckDB database with scale factor of 0.001 of TPC-H data and an encrypted one:</p>

<details>
  <summary>
Click here to see the entropy of a plaintext database
</summary>
  <div>
    <img src="https://blobs.duckdb.org/images/duckdb-plaintext-database.png" width="800" />
</div>
</details>

<details style="margin-top: 15px">
  <summary>
Click here to see the entropy of an encrypted database
</summary>
  <div>
    <img src="https://blobs.duckdb.org/images/duckdb-encrypted-database.png" width="800" />
</div>
</details>

<p>In these figures, we can clearly observe that the encrypted database file seems completely random, while the plaintext database file shows some clear structure in its binary data.</p>

<p>To decrypt an encrypted database, we can use the following SQL:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ATTACH</span> <span class="s1">'encrypted.duckdb'</span> <span class="k">AS</span> <span class="k">encrypted</span> <span class="p">(</span><span class="k">ENCRYPTION_KEY</span> <span class="s1">'asdf'</span><span class="p">);</span>
<span class="k">ATTACH</span> <span class="s1">'new_unencrypted.duckdb'</span> <span class="k">AS</span> <span class="n">unencrypted</span><span class="p">;</span>
<span class="k">COPY</span> <span class="k">FROM</span> <span class="k">DATABASE</span> <span class="k">encrypted</span> <span class="k">TO</span> <span class="n">unencrypted</span><span class="p">;</span>
</code></pre></div></div>

<p>And to reencrypt an existing database, we can just simply copy the old encrypted database to a new one, like:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ATTACH</span> <span class="s1">'encrypted.duckdb'</span> <span class="k">AS</span> <span class="k">encrypted</span> <span class="p">(</span><span class="k">ENCRYPTION_KEY</span> <span class="s1">'asdf'</span><span class="p">);</span>
<span class="k">ATTACH</span> <span class="s1">'new_encrypted.duckdb'</span> <span class="k">AS</span> <span class="n">new_encrypted</span> <span class="p">(</span><span class="k">ENCRYPTION_KEY</span> <span class="s1">'xxxx'</span><span class="p">);</span>
<span class="k">COPY</span> <span class="k">FROM</span> <span class="k">DATABASE</span> <span class="k">encrypted</span> <span class="k">TO</span> <span class="n">new_encrypted</span><span class="p">;</span>
</code></pre></div></div>

<p>The default encryption algorithm is AES GCM. This is recommended since it also authenticates data by calculating a tag. Depending on the use case, you can also use AES CTR. This is faster than AES GCM since it skips calculating a tag after encrypting all data. You can specify the CTR cipher as follows:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ATTACH</span> <span class="s1">'encrypted.duckdb'</span> <span class="k">AS</span> <span class="k">encrypted</span> <span class="p">(</span>
    <span class="k">ENCRYPTION_KEY</span> <span class="s1">'asdf'</span><span class="p">,</span>
    <span class="k">ENCRYPTION_CIPHER</span> <span class="s1">'CTR'</span>
<span class="p">);</span>
</code></pre></div></div>

<p>To keep track of which databases are encrypted, you can query this by running:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span> <span class="nf">duckdb_databases</span><span class="p">();</span>
</code></pre></div></div>

<p>This will show which databases are encrypted, and which cipher is used:</p>

<div class="monospace_table"></div>

<table>
  <thead>
    <tr>
      <th>database_name</th>
      <th>database_oid</th>
      <th>path</th>
      <th>…</th>
      <th>encrypted</th>
      <th>cipher</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>encrypted</td>
      <td>2103</td>
      <td>encrypted.duckdb</td>
      <td>…</td>
      <td>true</td>
      <td>GCM</td>
    </tr>
    <tr>
      <td>unencrypted</td>
      <td>2050</td>
      <td>unencrypted.duckdb</td>
      <td>…</td>
      <td>false</td>
      <td>NULL</td>
    </tr>
    <tr>
      <td>memory</td>
      <td>592</td>
      <td>NULL</td>
      <td>…</td>
      <td>false</td>
      <td>NULL</td>
    </tr>
    <tr>
      <td>system</td>
      <td>0</td>
      <td>NULL</td>
      <td>…</td>
      <td>false</td>
      <td>NULL</td>
    </tr>
    <tr>
      <td>temp</td>
      <td>1995</td>
      <td>NULL</td>
      <td>…</td>
      <td>false</td>
      <td>NULL</td>
    </tr>
  </tbody>
</table>

<!-- markdownlint-disable MD036 -->
<p><em>5 rows —  10 columns (5 shown)</em>
<!-- markdownlint-enable MD036 --></p>

<h2 id="implementation-and-performance">Implementation and Performance</h2>

<p>Here at DuckDB, we strive to achieve a good out-of-the-box experience with zero external dependencies and a small footprint. Encryption and decryption, however, are usually performed by pretty heavy external libraries such as OpenSSL. We would much prefer not to rely on external libraries or statically linking huge codebases just so that people can use encryption in DuckDB without additional steps. This is why we actually implemented encryption <em>twice</em> in DuckDB, once with the (excellent) <a href="https://github.com/Mbed-TLS/mbedtls">Mbed TLS</a> library and once with the ubiquitous OpenSSL library.</p>

<p>DuckDB already shipped parts of Mbed TLS because we use it to verify RSA extension signatures. However, for maximum compatibility we actually disabled the hardware acceleration of MbedTLS, which has a performance impact. Furthermore, Mbed TLS is not particularly hardened against things like nasty timing attacks. OpenSSL on the other hand contains heavily vetted and hardware-accelerated code to perform AES operations, which is why we can also use it for encryption.</p>

<p>In DuckDB Land, OpenSSL is part of the <code class="language-plaintext highlighter-rouge">httpfs</code> extension. Once you load that extension, encryption will <em>automatically</em> switch to using OpenSSL. After we shipped encryption in DuckDB 1.4.0, security experts actually found <a href="https://github.com/duckdb/duckdb/security/advisories/GHSA-vmp8-hg63-v2hp">issues with the random number generator</a> we used in Mbed TLS mode. Even though it would be difficult to actually exploit this, we <em>disabled writing</em> to databases in MbedTLS mode from DuckDB 1.4.1. Instead, DuckDB now (version 1.4.2+) tries to auto-install and auto-load the <code class="language-plaintext highlighter-rouge">httpfs</code> extension whenever a write is attempted. We might be able to revisit this in the future, but for now this seems the safest path forward that still allows high compatibility for reading. In OpenSSL mode, we always used a cryptographically-safe random number generation so that mode is unaffected.</p>

<p>Encrypting and decrypting database files is an additional step in writing tables to disk, so we would naturally assume that there is some performance impact. Let’s investigate the performance impact of DuckDB’s new encryption feature with a very basic experiment.</p>

<p>We first create two DuckDB database files, one encrypted and one unencrypted. We use the TPC-H benchmark generator again to create the table data, particularly the (somewhat tired) <code class="language-plaintext highlighter-rouge">lineitem</code> table.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSTALL</span><span class="n"> httpfs</span><span class="p">;</span>
<span class="k">INSTALL</span><span class="n"> tpch</span><span class="p">;</span>
<span class="k">LOAD</span><span class="n"> tpch</span><span class="p">;</span>

<span class="k">ATTACH</span> <span class="s1">'unencrypted.duckdb'</span> <span class="k">AS</span> <span class="n">unencrypted</span><span class="p">;</span>
<span class="k">CALL</span> <span class="nf">dbgen</span><span class="p">(</span><span class="k">sf</span> <span class="o">=</span> <span class="mi">10</span><span class="p">,</span> <span class="k">catalog</span> <span class="o">=</span> <span class="s1">'unencrypted'</span><span class="p">);</span>

<span class="k">ATTACH</span> <span class="s1">'encrypted.duckdb'</span> <span class="k">AS</span> <span class="k">encrypted</span> <span class="p">(</span><span class="k">ENCRYPTION_KEY</span> <span class="s1">'asdf'</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="k">encrypted</span><span class="n">.lineitem</span> <span class="k">AS</span> <span class="k">FROM</span> <span class="n">unencrypted.lineitem</span><span class="p">;</span>
</code></pre></div></div>

<p>Now we use DuckDB’s neat <code class="language-plaintext highlighter-rouge">SUMMARIZE</code> command three times: once on the unencrypted database, and once on the encrypted database using MbedTLS and once on the encrypted database using OpenSSL. We set a very low memory limit to force more reading and writing from disk.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SET</span> <span class="py">memory_limit</span> <span class="o">=</span> <span class="s1">'200MB'</span><span class="p">;</span>
<span class="py">.timer</span> <span class="n">on</span>

<span class="k">SUMMARIZE</span> <span class="n">unencrypted.lineitem</span><span class="p">;</span>
<span class="k">SUMMARIZE</span> <span class="k">encrypted</span><span class="n">.lineitem</span><span class="p">;</span>

<span class="k">LOAD</span><span class="n"> httpfs</span><span class="p">;</span> <span class="c1">-- use OpenSSL</span>
<span class="k">SUMMARIZE</span> <span class="k">encrypted</span><span class="n">.lineitem</span><span class="p">;</span>
</code></pre></div></div>

<p>Here are the results on a fairly recent MacBook: <code class="language-plaintext highlighter-rouge">SUMMARIZE</code> on the unencrypted table took ca. 5.4 seconds. Using Mbed TLS, this went up to around 6.2 s. However, when enabling OpenSSL the end-to-end time went straight back to 5.4 s. How is this possible? Is decryption not expensive? Well, there is a lot more happening in query processing than reading blocks from storage. So the impact of decryption is not all that huge, even when using a slow implementation. Secondly, when using hardware acceleration in OpenSSL, the overall overhead of encryption and decryption becomes almost negligible.</p>

<p>But just running summarization is overly simplistic. Real™ database workloads include modifications to data, insertion of new rows, updates of rows, deletion of rows etc. Also, multiple clients will be updating and querying at the same time. So we re-surrected the full TPC-H “Power” test from our previous blog post “<a href="https://duckdb.org/2024/09/25/changing-data-with-confidence-and-acid">Changing Data with Confidence and ACID</a>”. We slightly tweaked the <a href="https://github.com/duckdb/duckdb-tpch-power-test/blob/main/benchmark.py">benchmark script</a> to enable the new database encryption. For this experiment, we used the OpenSSL encryption implementation due to the issues outlined above. We observe “Power@Size” and “Throughput@Size”. The former is raw sequential query performance, while the latter measures multiple parallel query streams in the presence of updates.</p>

<p>When running on the same MacBook with DuckDB 1.4.1 and a “scale factor” of 100, we get a Power@Size metric of 624,296 and a Throughput@Size metric of 450,409 <em>without</em> encryption.</p>

<p>When we enable encryption, the results are almost unchanged, confirming the observation of the small microbenchmark above. However, the relationship between available memory and the benchmark size means that we’re not stressing temporary file encryption. So we re-ran everything with an 8GB memory limit. We confirmed constant reading and writing to and from disk in this configuration by observing operating system statistics. For the unencrypted case, the Power@Size metric predictably went down to 591,841 and Throughput@Size went down to 153,690. And finally, we could observe a slight performance decrease with Power@Size of 571,985 and Throughput@Size of 145,353. However, that difference is not very great either and likely not relevant in real operational scenarios.</p>

<h2 id="conclusion">Conclusion</h2>

<p>With the new encrypted database feature, we can now safely pass around DuckDB database files with all information inside them completely opaque to prying eyes. This allows for some interesting new deployment models for DuckDB, for example, we could now put an encrypted DuckDB database file on a Content Delivery Network (CDN). A fleet of DuckDB instances could attach to this file read-only using the decryption key. This elegantly allows efficient distribution of private background data in a similar way like encrypted Parquet files, but of course with many more features like multi-table storage. When using DuckDB with encrypted storage, we can also simplify threat modeling when – for example – using DuckDB on cloud providers. While in the past access to DuckDB storage would have been enough to leak data, we can now relax paranoia regarding storage a little,  especially since temporary files and WAL are also encrypted. And the best part of all of this, there is almost no performance overhead to using encryption in DuckDB, especially with the OpenSSL implementation.</p>

<p>We are very much looking forward to what you are going to do with this feature, and please <a href="https://github.com/duckdb/duckdb/issues/new">let us know</a> if you run into any issues.</p>]]></content><author><name>Lotte Felius, Hannes Mühleisen</name></author><category term="deep dive" /><summary type="html"><![CDATA[DuckDB v1.4 ships database encryption capabilities. In this blog post, we dive into the implementation details of the encryption, show how to use it and demonstrate its performance implications.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/encryption-in-duckdb.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/encryption-in-duckdb.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Announcing DuckDB 1.4.2 LTS</title><link href="https://duckdb.org/2025/11/12/announcing-duckdb-142.html" rel="alternate" type="text/html" title="Announcing DuckDB 1.4.2 LTS" /><published>2025-11-12T00:00:00+00:00</published><updated>2025-11-12T00:00:00+00:00</updated><id>https://duckdb.org/2025/11/12/announcing-duckdb-142</id><content type="html" xml:base="https://duckdb.org/2025/11/12/announcing-duckdb-142.html"><![CDATA[<p>In this blog post, we highlight a few important fixes and convenience improvements in DuckDB v1.4.2, the second patch release in <a href="/2025/09/16/announcing-duckdb-140.html">DuckDB's 1.4 LTS line</a>. To see the complete list of updates, please consult the <a href="https://github.com/duckdb/duckdb/releases/tag/v1.4.2">release notes on GitHub</a>.</p>

<p>While this is a patch release, we are shipping some small features. In LTS releases, these can come in two forms:</p>

<ol>
  <li>We add small opt-in features such as <a href="#accessing-the-profilers-output-from-the-logger">accessing the profiler's output from the logger</a> in this release. These features have been highly-requested from the community and we are confident that these will not cause any issues for people upgrading to the latest release. In fact, using them carefully can help detect and understand performance regressions.</li>
  <li>Some of DuckDB's extensions that are marked as <a href="/docs/stable/core_extensions/overview.html">“experimental”</a> are shipping full-fledged features. For example, this is how we have rolled out support for <a href="#iceberg-improvements">Iceberg deletes and updates</a>. Extensions are opt-in by nature, so if you stick to core DuckDB and its stable extensions, changes in the experimental extensions will not affect the stability of your installation.</li>
</ol>

<blockquote>
  <p>To install the new version, please visit the <a href="/install/">installation page</a>. Note that it can take a few hours to days for some client libraries (e.g., R, Rust) to be released due to the extra changes and review rounds required.</p>
</blockquote>

<h2 id="features-and-improvements">Features and Improvements</h2>

<h3 id="iceberg-improvements">Iceberg Improvements</h3>

<p>Similarly to the <a href="/2025/10/07/announcing-duckdb-141.html#iceberg-improvements">v1.4.1 release blog post</a>, we can start with some good news for our Iceberg users: DuckDB v1.4.2 ships a number of improvements for the <a href="/docs/stable/core_extensions/iceberg/overview.html"><code class="language-plaintext highlighter-rouge">iceberg</code> extension</a>. Insert, update, and delete statements are all supported now:</p>

<!-- markdownlint-disable MD040 -->

<details>
  <summary>
Click to see the SQL code sample for Iceberg updates.
</summary>
  <div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">ATTACH</span> <span class="s1">'</span><span class="ge">warehouse_name</span><span class="s1">'</span> <span class="k">AS</span> <span class="n">iceberg_catalog</span> <span class="p">(</span>
    <span class="k">TYPE</span> <span class="k">iceberg</span><span class="p">,</span>
    <span class="ge">other options</span>
<span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">iceberg_catalog.default.simple_table</span>
    <span class="p">(</span><span class="n">col1</span> <span class="nb">INTEGER</span><span class="p">,</span> <span class="n">col2</span> <span class="nb">VARCHAR</span><span class="p">);</span>

<span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">iceberg_catalog.default.simple_table</span>
    <span class="k">VALUES</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="s1">'hello'</span><span class="p">),</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="s1">'world'</span><span class="p">),</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="s1">'duckdb is great'</span><span class="p">);</span>

<span class="k">DELETE</span> <span class="k">FROM</span> <span class="n">iceberg_catalog.default.simple_table</span>
<span class="k">WHERE</span> <span class="n">col1</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>

<span class="k">UPDATE</span><span class="n"> iceberg_catalog.default.simple_table</span>
<span class="k">SET</span> <span class="n">col1</span> <span class="o">=</span> <span class="n">col1</span> <span class="o">+</span> <span class="mi">5</span>
<span class="k">WHERE</span> <span class="n">col1</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
</code></pre></div>  </div>
</details>

<!-- markdownlint-enable MD040 -->

<p>We will publish a separate blog post on these improvements shortly. Stay tuned!</p>

<h3 id="logger-and-profiler-improvements">Logger and Profiler Improvements</h3>

<h4 id="time-http-requests">Time HTTP Requests</h4>

<p>The logger can now log the time of HTTP requests (<a href="https://github.com/duckdb/duckdb/pull/19691"><code class="language-plaintext highlighter-rouge">#19691</code></a>).
For example, if we query the Dutch railway tariffs table as a Parquet file (<a href="https://blobs.duckdb.org/tariffs.parquet"><code class="language-plaintext highlighter-rouge">tariffs.parquet</code></a>),
we can see multiple HTTP requests: a <code class="language-plaintext highlighter-rouge">HEAD</code> request and three <code class="language-plaintext highlighter-rouge">GET</code> requests:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CALL</span> <span class="nf">enable_logging</span><span class="p">(</span><span class="s1">'HTTP'</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">railway_tariffs</span> <span class="k">AS</span>
    <span class="k">FROM</span> <span class="s1">'https://blobs.duckdb.org/tariffs.parquet'</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="n">request.type</span><span class="p">,</span> <span class="n">request.url</span><span class="p">,</span> <span class="n">request.duration_ms</span>
<span class="k">FROM</span> <span class="nf">duckdb_logs_parsed</span><span class="p">(</span><span class="s1">'HTTP'</span><span class="p">);</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌─────────┬──────────────────────────────────────────┬─────────────┐
│  type   │                   url                    │ duration_ms │
│ varchar │                 varchar                  │    int64    │
├─────────┼──────────────────────────────────────────┼─────────────┤
│ HEAD    │ https://blobs.duckdb.org/tariffs.parquet │         177 │
│ GET     │ https://blobs.duckdb.org/tariffs.parquet │         103 │
│ GET     │ https://blobs.duckdb.org/tariffs.parquet │         176 │
│ GET     │ https://blobs.duckdb.org/tariffs.parquet │         182 │
└─────────┴──────────────────────────────────────────┴─────────────┘
</code></pre></div></div>

<h4 id="accessing-the-profilers-output-from-the-logger">Accessing the Profiler's Output from the Logger</h4>

<p>The logger can now also access the profiler's output (<a href="https://github.com/duckdb/duckdb/pull/19572"><code class="language-plaintext highlighter-rouge">#19572</code></a>).
This means that if both the profiler and the logger are enabled, you can log information such as the execution time of queries:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- Enable profiling to JSON file</span>
<span class="c1">-- This is necessary to make sure that queries are profiled</span>
<span class="k">PRAGMA</span> <span class="py">profiling_output</span> <span class="o">=</span> <span class="s1">'profiling_output.json'</span><span class="p">;</span>
<span class="k">PRAGMA</span> <span class="py">enable_profiling</span> <span class="o">=</span> <span class="s1">'json'</span><span class="p">;</span>

<span class="c1">-- Enable logging to an in-memory table</span>
<span class="k">CALL</span> <span class="nf">enable_logging</span><span class="p">();</span>

<span class="c1">-- Run some queries</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">small</span> <span class="k">AS</span> <span class="k">FROM</span> <span class="nf">range</span><span class="p">(</span><span class="mi">1_000_000</span><span class="p">);</span>
<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">large</span> <span class="k">AS</span> <span class="k">FROM</span> <span class="nf">range</span><span class="p">(</span><span class="mi">1_000_000_000</span><span class="p">);</span>

<span class="k">PRAGMA</span> <span class="py">disable_profiling</span><span class="p">;</span>

<span class="k">SELECT</span> <span class="n">query_id</span><span class="p">,</span> <span class="n">type</span><span class="p">,</span> <span class="n">metric</span><span class="p">,</span> <span class="n">value</span><span class="p">::</span><span class="nb">DECIMAL</span><span class="p">(</span><span class="mi">15</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="k">AS</span> <span class="n">value</span>
<span class="k">FROM</span> <span class="nf">duckdb_logs_parsed</span><span class="p">(</span><span class="s1">'Metrics'</span><span class="p">)</span>
<span class="k">WHERE</span> <span class="n">metric</span> <span class="o">=</span> <span class="s1">'CPU_TIME'</span><span class="p">;</span>
</code></pre></div></div>

<p>You can see in the output that the first <code class="language-plaintext highlighter-rouge">CREATE</code> statement took about 3 milliseconds, while the second one took 3.3 seconds.</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌──────────┬─────────┬──────────┬───────────────┐
│ query_id │  type   │  metric  │     value     │
│  uint64  │ varchar │ varchar  │ decimal(15,3) │
├──────────┼─────────┼──────────┼───────────────┤
│        8 │ Metrics │ CPU_TIME │         0.003 │
│        9 │ Metrics │ CPU_TIME │         3.267 │
└──────────┴─────────┴──────────┴───────────────┘
</code></pre></div></div>

<h4 id="profiler-metrics">Profiler Metrics</h4>

<p>The profiler now supports <a href="/docs/stable/dev/profiling.html#metrics">several new metrics</a>.
These allow you the get a deeper understanding on where the execution time is spent in queries.</p>

<h3 id="performance-improvements">Performance Improvements</h3>

<p>DuckDB v1.4.2 also ships some small performance improvements:</p>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/pull/19477"><code class="language-plaintext highlighter-rouge">#19477</code> DuckDB now buffers WAL index deletes, not only appends</a></li>
  <li><a href="https://github.com/duckdb/duckdb/pull/19644"><code class="language-plaintext highlighter-rouge">#19644</code> Detaching from a database is now faster</a></li>
</ul>

<h3 id="vortex-support">Vortex Support</h3>

<p>DuckDB now supports the <a href="https://vortex.dev/">Vortex file format</a> through the <a href="/docs/stable/core_extensions/vortex.html"><code class="language-plaintext highlighter-rouge">vortex</code> extension</a> on Linux and macOS.</p>

<p>To use <code class="language-plaintext highlighter-rouge">vortex</code>, first install and load the extension:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">INSTALL</span><span class="n"> vortex</span><span class="p">;</span>
<span class="k">LOAD</span><span class="n"> vortex</span><span class="p">;</span>
</code></pre></div></div>

<p>Then, you can write Vortex files as follows:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">COPY</span> <span class="p">(</span><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="nf">generate_series</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span> <span class="n">t</span><span class="p">(</span><span class="n">i</span><span class="p">))</span>
<span class="k">TO</span> <span class="s1">'my.vortex'</span> <span class="p">(</span><span class="k">FORMAT</span> <span class="k">vortex</span><span class="p">);</span>
</code></pre></div></div>

<p>And read them using the <code class="language-plaintext highlighter-rouge">read_vortex</code> function:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="o">*</span> <span class="k">FROM</span> <span class="nf">read_vortex</span><span class="p">(</span><span class="s1">'my.vortex'</span><span class="p">);</span>
</code></pre></div></div>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>┌───────┐
│   i   │
│ int64 │
├───────┤
│     0 │
│     1 │
│     2 │
└───────┘
</code></pre></div></div>

<h2 id="fixes">Fixes</h2>

<p>We fixed a vulnerability, several crashes, internal errors, incorrect results and regressions.
We also fixed several issues discovered by our <a href="https://github.com/duckdb/duckdb-fuzzer/">fuzzer</a>.</p>

<h3 id="vulnerabilities">Vulnerabilities</h3>

<p>We fixed <a href="https://github.com/duckdb/duckdb/security/advisories/GHSA-vmp8-hg63-v2hp">four vulnerabilities</a> in DuckDB's <a href="/docs/stable/sql/statements/attach.html#database-encryption">database encryption</a>:</p>

<ol>
  <li>The DuckDB can fall back to an insecure random number generator (<code class="language-plaintext highlighter-rouge">pcg32</code>) to generate cryptographic keys or IVs.</li>
  <li>When clearing keys from memory, the compiler may remove the memset() and leave sensitive data on the heap</li>
  <li>By modifying the database header, an attacker could downgrade the encryption mode from GCM to CTR to bypass integrity checks.</li>
  <li>Failure to check return value on call to OpenSSL <code class="language-plaintext highlighter-rouge">rand_bytes()</code>.</li>
</ol>

<p>See the <a href="https://github.com/duckdb/duckdb/security/advisories/GHSA-vmp8-hg63-v2hp">security advisory</a> for more details.</p>

<p>If you are using database encryption, you are strongly encouraged to update to DuckDB v1.4.2.</p>

<blockquote>
  <p>We would like to thank <a href="https://github.com/SalusaSecondus">Greg Rubin</a> and <a href="https://github.com/skosowski">Sławek Kosowski</a> for reporting these vulnerabilities!</p>
</blockquote>

<h3 id="crashes-and-internal-errors">Crashes and Internal Errors</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/issues/19238"><code class="language-plaintext highlighter-rouge">#19238</code> <code class="language-plaintext highlighter-rouge">MERGE INTO</code> Iceberg table with <code class="language-plaintext highlighter-rouge">TIMESTAMPTZ</code> columns crashes</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19355"><code class="language-plaintext highlighter-rouge">#19355</code> Unknown expression type invalidates database</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19357"><code class="language-plaintext highlighter-rouge">#19357</code> Expected unified vector format of type <code class="language-plaintext highlighter-rouge">VARCHAR</code>, but found type <code class="language-plaintext highlighter-rouge">INT32</code></a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19455"><code class="language-plaintext highlighter-rouge">#19455</code> <code class="language-plaintext highlighter-rouge">MERGE INTO</code> failed: logical operator type mismatch</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19498"><code class="language-plaintext highlighter-rouge">#19498</code> Window function crash with <code class="language-plaintext highlighter-rouge">pdqsort_loop</code></a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19700"><code class="language-plaintext highlighter-rouge">#19700</code> RLE select bug</a></li>
</ul>

<h3 id="incorrect-results">Incorrect Results</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/issues/17757"><code class="language-plaintext highlighter-rouge">#17757</code> UUID Comparison in aggregation filter broken on Linux</a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19327"><code class="language-plaintext highlighter-rouge">#19327</code> Wrong result for <code class="language-plaintext highlighter-rouge">DISTINCT</code> and <code class="language-plaintext highlighter-rouge">LEFT JOIN</code></a></li>
  <li><a href="https://github.com/duckdb/duckdb/issues/19377"><code class="language-plaintext highlighter-rouge">#19377</code> Array with values shows null depending on query</a></li>
</ul>

<h3 id="regressions">Regressions</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/issues/19333"><code class="language-plaintext highlighter-rouge">#19333</code> DuckDB hangs when using <code class="language-plaintext highlighter-rouge">ATTACH IF NOT EXISTS</code> on subsequent connections to databases that have previously attached a database file</a></li>
</ul>

<h3 id="storage">Storage</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb/pull/19424"><code class="language-plaintext highlighter-rouge">#19424</code> Fix issue in MetadataManager triggered when doing concurrent reads while checkpointing</a></li>
  <li><a href="https://github.com/duckdb/duckdb/pull/19527"><code class="language-plaintext highlighter-rouge">#19527</code> Ensure that DuckDB outputs the expected <code class="language-plaintext highlighter-rouge">STORAGE_VERSION</code></a></li>
  <li><a href="https://github.com/duckdb/duckdb/pull/19543"><code class="language-plaintext highlighter-rouge">#19543</code> Error when setting <code class="language-plaintext highlighter-rouge">force_compression = 'zstd'</code> in an in-memory environment database</a></li>
</ul>

<h3 id="issues-discovered-by-the-fuzzer">Issues Discovered by the Fuzzer</h3>

<ul>
  <li><a href="https://github.com/duckdb/duckdb-fuzzer/issues/3389"><code class="language-plaintext highlighter-rouge">duckdb-fuzzer#3389</code></a></li>
  <li><a href="https://github.com/duckdb/duckdb-fuzzer/issues/4208"><code class="language-plaintext highlighter-rouge">duckdb-fuzzer#4208</code></a></li>
  <li><a href="https://github.com/duckdb/duckdb-fuzzer/issues/4296"><code class="language-plaintext highlighter-rouge">duckdb-fuzzer#4296</code></a></li>
</ul>]]></content><author><name>The DuckDB team</name></author><category term="release" /><summary type="html"><![CDATA[Today we are releasing DuckDB 1.4.2, the second patch release of our LTS edition. The new release ships several bugfixes and performance optimizations. We also fixed vulnerabilities in DuckDB's database encryption, and introduced some (opt-in) logger/profiler features that help users understand performance, and full write support through the Iceberg extension.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-4-2-lts.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/duckdb-release-1-4-2-lts.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Relational Charades: Turning Movies into Tables</title><link href="https://duckdb.org/2025/10/27/movies-in-databases.html" rel="alternate" type="text/html" title="Relational Charades: Turning Movies into Tables" /><published>2025-10-27T00:00:00+00:00</published><updated>2025-10-27T00:00:00+00:00</updated><id>https://duckdb.org/2025/10/27/movies-in-databases</id><content type="html" xml:base="https://duckdb.org/2025/10/27/movies-in-databases.html"><![CDATA[<p style="text-align: right"><i>“Your scientists were so preoccupied with whether they could,<br /> they didn't stop to think if they should.”</i><br />
   Dr. Ian Malcolm, Jurassic Park (1993)</p>

<p>Here at team DuckDB, we <em>love</em> tables. Tables are a timeless elegant abstraction that precedes literature by <a href="https://www.youtube.com/watch?v=-wCzn9gKoUk">about a thousand years</a>. Relational tables specifically can represent <em>any</em> kind of information imaginable. But just because something <em>can</em> be done it is not always a great idea to do so. Can we build a rocket propelled by a nuclear chain reaction that irradiates the land it flies over? <a href="https://en.wikipedia.org/wiki/Project_Pluto">Yes</a>. Should we? Probably not.</p>

<h2 id="disclaimer">Disclaimer</h2>

<p>Array-like data like images and videos are a <em>textbook example</em> of something that <a href="https://stackoverflow.com/questions/3748/storing-images-in-db-yea-or-nay">might not benefit</a> from storing them in a database. While of course any binary data can be added to tables as <code class="language-plaintext highlighter-rouge">BLOB</code>s, there is not that much added value from it. Sure, it's harder to lose the image compared to the industry standard solution, storing a file name that points to the image. But there are not that many meaningful operations that can be done on BLOBs other than store and load. Without adding some <a href="https://pluralistic.net/2025/09/27/econopocalypse/">overhyped</a> AI tech, you can't even ask the database <a href="https://xkcd.com/1425/">what the picture shows</a>.</p>

<p>Array data also has its own world of highly specialized file formats and compression algorithms. Just think of the ubiquitous <a href="https://en.wikipedia.org/wiki/MPEG-4">MPEG-4 standard</a> to store movies. They are approximate (not exact, lossy) formats that are designed around human perception models, which is why they can avoid storing things people do not notice. They achieve impressive compression rates, with a two hour "Full" HD movie compressing to about 2 GB using MPEG-4.</p>

<h2 id="ignoring-the-disclaimer">Ignoring the Disclaimer</h2>

<p>But what would it feel like to turn a movie into a table? (Very) deep down, a movie is just a series of fast-moving pictures ("frames"), typically at something around 25 frames per second. At that speed, our monkey-brain cannot distinguish the separate images any more and is fooled into thinking that we are watching smooth movement. Side note for the younger generations, a strip of pictures was the way we <a href="https://en.wikipedia.org/wiki/Film_stock#/media/File:ButterflyDancebis.jpg">shipped around movies</a> for more than 100 years.</p>

<p>So a series of pictures it is. Each picture can be further deconstructed into a two-dimensional array (a "matrix") of points, so-called "pixels". Every pixel in turn consists of three numbers, one each for the intensity of red, green and blue, or <a href="https://en.wikipedia.org/wiki/RGB_color_model">RGB</a> for short. Note that we're ignoring the audio tracks in this post, but in principle it would work the exact same way, just with a different kind of intensity.</p>

<p>As an added complexity, the relational model (famously) <a href="https://www.reddit.com/r/Database/comments/1l7tbrc/why_is_inherent_order_in_rdbms_so_neglected/">does not require</a> an absolute order of records. So all the various offsets have to be made explicit to not lose information. This of course greatly increases the size of our data set. We end up with a table that looks like this:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: right">i</th>
      <th style="text-align: right">y</th>
      <th style="text-align: right">x</th>
      <th style="text-align: right">r</th>
      <th style="text-align: right">g</th>
      <th style="text-align: right">b</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">4</td>
      <td style="text-align: right">5</td>
      <td style="text-align: right">1</td>
    </tr>
    <tr>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">1</td>
      <td style="text-align: right">4</td>
      <td style="text-align: right">5</td>
      <td style="text-align: right">1</td>
    </tr>
    <tr>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">2</td>
      <td style="text-align: right">5</td>
      <td style="text-align: right">6</td>
      <td style="text-align: right">2</td>
    </tr>
    <tr>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">3</td>
      <td style="text-align: right">8</td>
      <td style="text-align: right">9</td>
      <td style="text-align: right">4</td>
    </tr>
    <tr>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">4</td>
      <td style="text-align: right">9</td>
      <td style="text-align: right">10</td>
      <td style="text-align: right">5</td>
    </tr>
    <tr>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">5</td>
      <td style="text-align: right">11</td>
      <td style="text-align: right">12</td>
      <td style="text-align: right">8</td>
    </tr>
    <tr>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">6</td>
      <td style="text-align: right">11</td>
      <td style="text-align: right">12</td>
      <td style="text-align: right">8</td>
    </tr>
    <tr>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">7</td>
      <td style="text-align: right">11</td>
      <td style="text-align: right">12</td>
      <td style="text-align: right">8</td>
    </tr>
    <tr>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">8</td>
      <td style="text-align: right">9</td>
      <td style="text-align: right">10</td>
      <td style="text-align: right">5</td>
    </tr>
    <tr>
      <td style="text-align: right">0</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">9</td>
      <td style="text-align: right">9</td>
      <td style="text-align: right">10</td>
      <td style="text-align: right">5</td>
    </tr>
  </tbody>
</table>

<p>We have the time offset or frame number <code class="language-plaintext highlighter-rouge">i</code>, we have <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> for the pixel position in the frame, and <code class="language-plaintext highlighter-rouge">r</code>, <code class="language-plaintext highlighter-rouge">g</code> and <code class="language-plaintext highlighter-rouge">b</code> for the color components red, green and blue. Quite involved.</p>

<p>But now the movie is just a single table. If only just we had a conventional and guaranteed total order of rows, we could in theory skip all columns except for <code class="language-plaintext highlighter-rouge">r</code>, <code class="language-plaintext highlighter-rouge">g</code> and <code class="language-plaintext highlighter-rouge">b</code>, because with a known resolution all other columns can be inferred. This is coincidentally also how actual movie data files are stored, ignoring compression. This is another reason why maybe relational tables are not the best place for a movie to live in, but if <a href="https://en.wikipedia.org/wiki/Law_of_the_instrument">all you have is a hammer</a>. We could also have used some more modern features of SQL and use nested fields (a <code class="language-plaintext highlighter-rouge">LIST</code> in DuckDB), but let's keep it to a table even <a href="https://en.wikipedia.org/wiki/IBM_System_R">System R</a> could have dealt with.  In addition, having explicit offsets  does not require nebulous conventions or additional metadata to know along with axis order the array data was serialized.</p>

<h2 id="experiments">Experiments</h2>

<p>To investigate this daft idea further (for <a href="https://www.ru.nl/personen/muhleisen-h">Science</a>!), we convert the 1963 classic "<a href="https://en.wikipedia.org/wiki/Charade_(1963_film)">Charade</a>", a "romantic screwball comedy mystery film" starring Audrey Hepburn and Cary Grant to a DuckDB table. This movie was picked because it is <em>accidentally in the Public Domain</em> because a screw-up in the wording of the copyright notice (no, really). Because of this, you can actually <a href="https://archive.org/details/Charade19631280x696">freely download this movie</a> from the Internet Archive.</p>

<p><img src="/images/blog/movies/charade-poster.jpg" width="400" /></p>

<p>Since we're just creating a table, we will use DuckDB's native storage format. Here is the <em>complete</em> code snippet we used to convert the movie. In fact, this code should actually be generic enough to convert anything that <code class="language-plaintext highlighter-rouge">ffmpeg</code> can read to a table. Just in case you would want to try this at home on your own movies.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">imageio</span>
<span class="kn">import</span> <span class="nn">duckdb</span>

<span class="c1"># setup movie reading
</span><span class="n">vid</span> <span class="o">=</span> <span class="n">imageio</span><span class="p">.</span><span class="n">get_reader</span><span class="p">(</span><span class="s">"Charade-1963.mp4"</span><span class="p">,</span> <span class="s">"ffmpeg"</span><span class="p">)</span>
<span class="n">dim_x</span> <span class="o">=</span> <span class="n">vid</span><span class="p">.</span><span class="n">get_meta_data</span><span class="p">()[</span><span class="s">'size'</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span>
<span class="n">dim_y</span> <span class="o">=</span> <span class="n">vid</span><span class="p">.</span><span class="n">get_meta_data</span><span class="p">()[</span><span class="s">'size'</span><span class="p">][</span><span class="mi">1</span><span class="p">]</span>
<span class="n">rows_per_frame</span> <span class="o">=</span> <span class="n">dim_y</span> <span class="o">*</span> <span class="n">dim_x</span>

<span class="c1"># setup a DuckDB database and table
</span><span class="n">con</span> <span class="o">=</span> <span class="n">duckdb</span><span class="p">.</span><span class="n">connect</span><span class="p">()</span>
<span class="n">con</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">"ATTACH 'charade.duckdb' AS m (STORAGE_VERSION 'latest'); USE m;"</span><span class="p">)</span>
<span class="n">con</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">"CREATE TABLE movie (i BIGINT, y USMALLINT, x USMALLINT, r UTINYINT, g UTINYINT, b UTINYINT)"</span><span class="p">)</span>

<span class="c1"># those offsets don't change between frames, so pre-compute them
</span><span class="n">con</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">"CREATE TEMPORARY TABLE y AS SELECT unnest(list_sort(repeat(range(?), ?))) y"</span><span class="p">,</span> <span class="p">[</span><span class="n">dim_y</span><span class="p">,</span> <span class="n">dim_x</span><span class="p">])</span>
<span class="n">con</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">"CREATE TEMPORARY TABLE x AS SELECT unnest(repeat(range(?), ?)) x"</span><span class="p">,</span> <span class="p">[</span><span class="n">dim_x</span><span class="p">,</span> <span class="n">dim_y</span><span class="p">])</span>

<span class="c1"># loop over each frame in the movie and insert the pixel data
</span><span class="k">for</span> <span class="n">i_idx</span><span class="p">,</span> <span class="n">im</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">vid</span><span class="p">):</span>
    <span class="n">v</span> <span class="o">=</span> <span class="n">im</span><span class="p">.</span><span class="n">flatten</span><span class="p">()</span>
    <span class="n">r</span> <span class="o">=</span> <span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">v</span><span class="p">):</span><span class="mi">3</span><span class="p">]</span>
    <span class="n">g</span> <span class="o">=</span> <span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">v</span><span class="p">):</span><span class="mi">3</span><span class="p">]</span>
    <span class="n">b</span> <span class="o">=</span> <span class="n">v</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">v</span><span class="p">):</span><span class="mi">3</span><span class="p">]</span>

    <span class="n">con</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'''INSERT INTO movie 
        FROM repeat(?, ?) i -- frame offset 
        POSITIONAL JOIN   y -- temp table
        POSITIONAL JOIN   x -- temp table
        POSITIONAL JOIN   r -- numpy scan
        POSITIONAL JOIN   g -- numpy scan
        POSITIONAL JOIN   b -- numpy scan
        '''</span><span class="p">,</span> <span class="p">[</span><span class="n">i_idx</span><span class="p">,</span> <span class="n">rows_per_frame</span><span class="p">])</span>
</code></pre></div></div>

<p>This script makes use of not just one, but (at least) <em>two</em> cool DuckDB features. First, we use so-called <a href="/docs/stable/clients/c/replacement_scans.html">replacement scans</a> to directly query the NumPy arrays <code class="language-plaintext highlighter-rouge">r</code>, <code class="language-plaintext highlighter-rouge">g</code>, and <code class="language-plaintext highlighter-rouge">b</code>. Note that those have not been created as tables in DuckDB nor registered in any way, but they are referenced by name in the <code class="language-plaintext highlighter-rouge">INSERT</code>. What happens here is that DuckDB inspects the Python context for the missing "tables" and finds objects with those names that it can read. The other cool feature is the <a href="/docs/stable/sql/query_syntax/from.html#positional-joins"><code class="language-plaintext highlighter-rouge">POSITIONAL JOIN</code></a>, which lets us stack multiple tables horizontally by position without running an actual (expensive) <code class="language-plaintext highlighter-rouge">JOIN</code>. This way, we assemble all the columns we need for a single frame in a bulk <code class="language-plaintext highlighter-rouge">INSERT</code>, which executes quite efficiently.</p>

<p>The movie file we have has a frame rate of 25 frames per second at a (DVD-ish) resolution of 720x392 pixels. The total runtime is 01:53:02.56 seconds, which comes down to 169 563 individual frames. Because we have a row for each pixel we end up with 169 563 * 720 x 392  rows, or 47 857 461 120. 47 billion rows! Finally <a href="https://motherduck.com/blog/big-data-is-dead/">Big Data</a>! When stored as a DuckDB database however, the file size is "only" around 200 GB. Totally doable on a laptop!</p>

<p>DuckDB's <a href="/2022/10/28/lightweight-compression.html">lightweight compression</a> performs quite well here, given that in a naive binary format we would have to store at least 15 bytes per row. If we multiply that with the row count (47 billion, remember) we would end up at around 700 GB in storage for this hypothetical naive format.</p>

<p>Of course, by turning the data into a relational table we add a bunch of previously implicit information due to the lack of ordering in relations. If we just stored the raw pixel bytes, for example as an implicitly ordered series of BMP (bitmap) files, we would end up with the same amount of bytes as the rows above times three, or 133 GB. Even <em>including materializing all the offsets</em>, the DuckDB file still manages to end up at a comparable size (200 GB). And of course, comparing the size of the table with the MPEG-4 version of the movie is not entirely fair because MPEG-4 is a <em>lossy</em> compression format. Databases can't just randomly decide to compromise on the numerical accuracy of the tables they store!</p>

<p>To prove that the transformation is accurate, let's try to turn the table data for one random frame back into a human-consumable picture: we will retrieve the corresponding rows from DuckDB, and use some Python magic to turn them back into a PNG image file:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">duckdb</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">import</span> <span class="nn">PIL.Image</span>

<span class="n">frame</span> <span class="o">=</span> <span class="mi">48000</span>

<span class="n">con</span> <span class="o">=</span> <span class="n">duckdb</span><span class="p">.</span><span class="n">connect</span><span class="p">(</span><span class="s">'charade.duckdb'</span><span class="p">,</span> <span class="n">read_only</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
<span class="n">dim_y</span><span class="p">,</span> <span class="n">dim_x</span> <span class="o">=</span> <span class="n">con</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">"SELECT max(y) + 1 dim_y, max(x) + 1 dim_x FROM movie WHERE i=0"</span><span class="p">).</span><span class="n">fetchone</span><span class="p">()</span>

<span class="n">res</span> <span class="o">=</span> <span class="n">con</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">"SELECT r, g, b FROM movie WHERE i = ? ORDER BY y, x"</span><span class="p">,</span> <span class="p">[</span><span class="n">frame</span><span class="p">]).</span><span class="n">fetchnumpy</span><span class="p">()</span>
<span class="n">v</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">dim_y</span> <span class="o">*</span> <span class="n">dim_x</span> <span class="o">*</span> <span class="mi">3</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">uint8</span><span class="p">)</span>
<span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">v</span><span class="p">):</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">res</span><span class="p">[</span><span class="s">'r'</span><span class="p">]</span>
<span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">v</span><span class="p">):</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">res</span><span class="p">[</span><span class="s">'g'</span><span class="p">]</span>
<span class="n">v</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">v</span><span class="p">):</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">res</span><span class="p">[</span><span class="s">'b'</span><span class="p">]</span>

<span class="n">img</span> <span class="o">=</span> <span class="n">PIL</span><span class="p">.</span><span class="n">Image</span><span class="p">.</span><span class="n">fromarray</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">reshape</span><span class="p">((</span><span class="n">dim_y</span><span class="p">,</span> <span class="n">dim_x</span><span class="p">,</span> <span class="mi">3</span><span class="p">)))</span>
<span class="n">img</span><span class="p">.</span><span class="n">save</span><span class="p">(</span><span class="sa">f</span><span class="s">'frame.png'</span><span class="p">)</span>
</code></pre></div></div>

<p><img src="/images/blog/movies/frame.png" width="800" /></p>

<p>And voila, we can see a wonderful frame with Audrey and Cary appear. This trick can also be used to create a sequence of pictures and write them to a MPEG-4 file again using – for example – the <code class="language-plaintext highlighter-rouge">moviepy</code> library.</p>

<p>But now that we have a table, we can have some fun with it. Let's do some basic exploration first: we start with <code class="language-plaintext highlighter-rouge">DESCRIBE</code>, which basically tells us the schema. We knew this of course.</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">DESCRIBE</span> <span class="n">movie</span><span class="p">;</span>
</code></pre></div></div>

<div class="monospace_table"></div>

<table>
  <thead>
    <tr>
      <th>column_name</th>
      <th>column_type</th>
      <th>null</th>
      <th>key</th>
      <th>default</th>
      <th>extra</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>i</td>
      <td>BIGINT</td>
      <td>YES</td>
      <td>NULL</td>
      <td>NULL</td>
      <td>NULL</td>
    </tr>
    <tr>
      <td>y</td>
      <td>USMALLINT</td>
      <td>YES</td>
      <td>NULL</td>
      <td>NULL</td>
      <td>NULL</td>
    </tr>
    <tr>
      <td>x</td>
      <td>USMALLINT</td>
      <td>YES</td>
      <td>NULL</td>
      <td>NULL</td>
      <td>NULL</td>
    </tr>
    <tr>
      <td>r</td>
      <td>UTINYINT</td>
      <td>YES</td>
      <td>NULL</td>
      <td>NULL</td>
      <td>NULL</td>
    </tr>
    <tr>
      <td>g</td>
      <td>UTINYINT</td>
      <td>YES</td>
      <td>NULL</td>
      <td>NULL</td>
      <td>NULL</td>
    </tr>
    <tr>
      <td>b</td>
      <td>UTINYINT</td>
      <td>YES</td>
      <td>NULL</td>
      <td>NULL</td>
      <td>NULL</td>
    </tr>
  </tbody>
</table>

<p>No surprises there. How many rows are there?</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span> <span class="n">movie</span> <span class="k">SELECT</span> <span class="nf">count</span><span class="p">(</span><span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<table>
  <thead>
    <tr>
      <th style="text-align: right">count_star()</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: right">47857461120</td>
    </tr>
  </tbody>
</table>

<p>Ah yes, 47 billion. What are the numeric properties of the columns? DuckDB has this neat <code class="language-plaintext highlighter-rouge">SUMMARIZE</code> statement that computes single-pass summary statistics on a table (or arbitrary query).</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SUMMARIZE</span> <span class="n">movie</span><span class="p">;</span>
</code></pre></div></div>

<p>This one is admittedly a bit of a flex. DuckDB can compute elaborate summary statistics on all the 47 billion rows in ca. 20 minutes on a MacBook. Here are the results:</p>

<table>
  <thead>
    <tr>
      <th style="text-align: right">column_name</th>
      <th>column_type</th>
      <th style="text-align: right">min</th>
      <th style="text-align: right">max</th>
      <th style="text-align: right">approx_unique</th>
      <th style="text-align: right">avg</th>
      <th style="text-align: right">std</th>
      <th style="text-align: right">q25</th>
      <th style="text-align: right">q50</th>
      <th style="text-align: right">q75</th>
      <th style="text-align: right">count</th>
      <th style="text-align: right">null_percentage</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: right">i</td>
      <td>BIGINT</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">169562</td>
      <td style="text-align: right">150076</td>
      <td style="text-align: right">84781.0</td>
      <td style="text-align: right">48948.621846957954</td>
      <td style="text-align: right">42429</td>
      <td style="text-align: right">84751</td>
      <td style="text-align: right">127137</td>
      <td style="text-align: right">47857461120</td>
      <td style="text-align: right">0.00</td>
    </tr>
    <tr>
      <td style="text-align: right">y</td>
      <td>USMALLINT</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">391</td>
      <td style="text-align: right">430</td>
      <td style="text-align: right">195.5</td>
      <td style="text-align: right">113.16028455346597</td>
      <td style="text-align: right">98</td>
      <td style="text-align: right">196</td>
      <td style="text-align: right">294</td>
      <td style="text-align: right">47857461120</td>
      <td style="text-align: right">0.00</td>
    </tr>
    <tr>
      <td style="text-align: right">x</td>
      <td>USMALLINT</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">719</td>
      <td style="text-align: right">840</td>
      <td style="text-align: right">359.5</td>
      <td style="text-align: right">207.84589644146592</td>
      <td style="text-align: right">180</td>
      <td style="text-align: right">359</td>
      <td style="text-align: right">540</td>
      <td style="text-align: right">47857461120</td>
      <td style="text-align: right">0.00</td>
    </tr>
    <tr>
      <td style="text-align: right">r</td>
      <td>UTINYINT</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">255</td>
      <td style="text-align: right">252</td>
      <td style="text-align: right">65.32575855816732</td>
      <td style="text-align: right">44.85627602555231</td>
      <td style="text-align: right">27</td>
      <td style="text-align: right">54</td>
      <td style="text-align: right">96</td>
      <td style="text-align: right">47857461120</td>
      <td style="text-align: right">0.00</td>
    </tr>
    <tr>
      <td style="text-align: right">g</td>
      <td>UTINYINT</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">249</td>
      <td style="text-align: right">249</td>
      <td style="text-align: right">56.79713844669577</td>
      <td style="text-align: right">37.03562456032193</td>
      <td style="text-align: right">28</td>
      <td style="text-align: right">44</td>
      <td style="text-align: right">77</td>
      <td style="text-align: right">47857461120</td>
      <td style="text-align: right">0.00</td>
    </tr>
    <tr>
      <td style="text-align: right">b</td>
      <td>UTINYINT</td>
      <td style="text-align: right">0</td>
      <td style="text-align: right">255</td>
      <td style="text-align: right">252</td>
      <td style="text-align: right">43.249715985643995</td>
      <td style="text-align: right">38.39218963268899</td>
      <td style="text-align: right">16</td>
      <td style="text-align: right">28</td>
      <td style="text-align: right">61</td>
      <td style="text-align: right">47857461120</td>
      <td style="text-align: right">0.00</td>
    </tr>
  </tbody>
</table>

<p>Since we're basically storing a lot of colors, just how many different combinations of red, green and blue are there, DuckDB?</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span> <span class="p">(</span><span class="k">FROM</span> <span class="n">movie</span> <span class="k">SELECT</span> <span class="k">DISTINCT</span> <span class="n">r</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span>
<span class="k">SELECT</span> <span class="nf">count</span><span class="p">(</span><span class="o">*</span><span class="p">);</span>
</code></pre></div></div>

<p>Any seasoned data engineer would rightfully caution you to run a <code class="language-plaintext highlighter-rouge">DISTINCT</code> on this many rows. There have just been too many production outages caused by overflowing aggregations. But thanks to DuckDB's <a href="/2024/03/29/external-aggregation.html">larger-than-memory aggregate hash table</a>, we can confidently issue this query. We even get a nice progress bar and (since 1.4.0) a surprisingly accurate estimate of how long the query will take.</p>

<table>
  <thead>
    <tr>
      <th style="text-align: right">count_star()</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: right">826568</td>
    </tr>
  </tbody>
</table>

<p>So roughly 800 thousand different colors. Computing this took about 2 minutes in the end. But what are the frequencies of those colors? Let's compute a histogram of the 10 most-used colors!</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span> <span class="n">movie</span>
<span class="k">SELECT</span> <span class="n">r</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="nf">count</span><span class="p">(</span><span class="o">*</span><span class="p">)</span> <span class="k">AS</span> <span class="n">ct</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="k">ALL</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">ct</span> <span class="k">DESC</span>
<span class="k">LIMIT</span> <span class="mi">10</span><span class="p">;</span>
</code></pre></div></div>

<table>
  <thead>
    <tr>
      <th style="text-align: right">r</th>
      <th style="text-align: right">g</th>
      <th style="text-align: right">b</th>
      <th style="text-align: right">ct</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td style="text-align: right">17</td>
      <td style="text-align: right">20</td>
      <td style="text-align: right">15</td>
      <td style="text-align: right">106521429</td>
    </tr>
    <tr>
      <td style="text-align: right">23</td>
      <td style="text-align: right">25</td>
      <td style="text-align: right">15</td>
      <td style="text-align: right">93004303</td>
    </tr>
    <tr>
      <td style="text-align: right">23</td>
      <td style="text-align: right">25</td>
      <td style="text-align: right">13</td>
      <td style="text-align: right">85552738</td>
    </tr>
    <tr>
      <td style="text-align: right">13</td>
      <td style="text-align: right">22</td>
      <td style="text-align: right">15</td>
      <td style="text-align: right">81734796</td>
    </tr>
    <tr>
      <td style="text-align: right">22</td>
      <td style="text-align: right">24</td>
      <td style="text-align: right">13</td>
      <td style="text-align: right">76560295</td>
    </tr>
    <tr>
      <td style="text-align: right">24</td>
      <td style="text-align: right">26</td>
      <td style="text-align: right">15</td>
      <td style="text-align: right">75376896</td>
    </tr>
    <tr>
      <td style="text-align: right">15</td>
      <td style="text-align: right">19</td>
      <td style="text-align: right">8</td>
      <td style="text-align: right">74285763</td>
    </tr>
    <tr>
      <td style="text-align: right">23</td>
      <td style="text-align: right">24</td>
      <td style="text-align: right">19</td>
      <td style="text-align: right">72904497</td>
    </tr>
    <tr>
      <td style="text-align: right">22</td>
      <td style="text-align: right">24</td>
      <td style="text-align: right">12</td>
      <td style="text-align: right">69269099</td>
    </tr>
    <tr>
      <td style="text-align: right">24</td>
      <td style="text-align: right">26</td>
      <td style="text-align: right">16</td>
      <td style="text-align: right">62230136</td>
    </tr>
  </tbody>
</table>

<p>The most common colors here seems to be dark shades of grey. Makes sense! Keep in mind that the MPEG-4 compression is lossy and will probably produce some odd colors as rounding artifacts.</p>

<p>But we can also have some more fun. We have an analytical database system. How about we compute the average frame for every thousand frames and stitch the results back into a movie? It's just a big aggregation. We first create the actual averages:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">averages</span> <span class="k">AS</span>
    <span class="k">FROM</span> <span class="n">movie</span>
    <span class="k">SELECT</span>
        <span class="n">i</span> <span class="o">//</span> <span class="mi">1000</span> <span class="k">AS</span> <span class="n">idx</span><span class="p">,</span>
        <span class="n">y</span><span class="p">,</span>
        <span class="n">x</span><span class="p">,</span>
        <span class="nf">avg</span><span class="p">(</span><span class="n">r</span><span class="p">)::</span><span class="nb">UTINYINT</span> <span class="k">AS</span> <span class="n">r</span><span class="p">,</span>
        <span class="nf">avg</span><span class="p">(</span><span class="n">g</span><span class="p">)::</span><span class="nb">UTINYINT</span> <span class="k">AS</span> <span class="n">g</span><span class="p">,</span>
        <span class="nf">avg</span><span class="p">(</span><span class="n">b</span><span class="p">)::</span><span class="nb">UTINYINT</span> <span class="k">AS</span> <span class="n">b</span>
<span class="k">GROUP</span> <span class="k">BY</span> <span class="k">ALL</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">idx</span><span class="p">,</span> <span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">;</span>
</code></pre></div></div>

<p>Then, we use Python again to turn this <code class="language-plaintext highlighter-rouge">averages</code> table into a movie:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># some setup omitted
</span>
<span class="c1"># fetch a bunch of frames in bulk
</span><span class="n">res</span> <span class="o">=</span> <span class="n">con</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">"SELECT r, g, b FROM averages ORDER BY i, y, x"</span><span class="p">).</span><span class="n">fetchnumpy</span><span class="p">()</span>

<span class="c1"># split the rgb arrays by frame again
</span><span class="n">r_splits</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="n">res</span><span class="p">[</span><span class="s">'r'</span><span class="p">],</span> <span class="n">num_frames</span><span class="p">)</span>
<span class="n">g_splits</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="n">res</span><span class="p">[</span><span class="s">'g'</span><span class="p">],</span> <span class="n">num_frames</span><span class="p">)</span>
<span class="n">b_splits</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">split</span><span class="p">(</span><span class="n">res</span><span class="p">[</span><span class="s">'b'</span><span class="p">],</span> <span class="n">num_frames</span><span class="p">)</span>

<span class="c1"># generate pictures
</span><span class="n">image_files</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">num_frames</span><span class="p">):</span>
    <span class="n">v</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">dim_y</span> <span class="o">*</span> <span class="n">dim_x</span> <span class="o">*</span> <span class="mi">3</span><span class="p">,</span> <span class="n">dtype</span><span class="o">=</span><span class="n">np</span><span class="p">.</span><span class="n">uint8</span><span class="p">)</span>
    <span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">v</span><span class="p">):</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">r_splits</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
    <span class="n">v</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">v</span><span class="p">):</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">g_splits</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
    <span class="n">v</span><span class="p">[</span><span class="mi">2</span><span class="p">:</span><span class="nb">len</span><span class="p">(</span><span class="n">v</span><span class="p">):</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="n">b_splits</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
    <span class="n">image_files</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">reshape</span><span class="p">((</span><span class="n">dim_y</span><span class="p">,</span> <span class="n">dim_x</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span> <span class="n">order</span><span class="o">=</span><span class="s">'C'</span><span class="p">))</span>

<span class="c1"># write movie file
</span><span class="n">clip</span> <span class="o">=</span> <span class="n">moviepy</span><span class="p">.</span><span class="n">video</span><span class="p">.</span><span class="n">io</span><span class="p">.</span><span class="n">ImageSequenceClip</span><span class="p">.</span><span class="n">ImageSequenceClip</span><span class="p">(</span><span class="n">image_files</span><span class="p">,</span> <span class="n">fps</span><span class="o">=</span><span class="mi">25</span><span class="p">)</span>
<span class="n">clip</span><span class="p">.</span><span class="n">write_videofile</span><span class="p">(</span><span class="s">'averages.mp4'</span><span class="p">)</span>
</code></pre></div></div>

<p>There is some wrangling here because we want to retrieve the whole frame dataset in bulk and not run a query for every single one. We then use NumPy to split them into frames and stitch the RGB-channels together into the three-dimensional array that the image libraries like. This does not achieve any business purpose but the results are kind of funny, here is average frame #68, with apologies to the actors:</p>

<p><img src="/images/blog/movies/average_frame_68.png" width="800" /></p>

<p>We can also stitch all the averages together to make a somewhat twitchy average movie:</p>

<details>
  <summary>
Click here to see the twitchy movie generated from “Charade”:
</summary>
  <p><img src="https://blobs.duckdb.org/data/movie-averages.gif" width="800" /></p>
</details>

<p>For some added fun, we could even write a SQL query that turns a frame into a HTML table with one-pixel fields. Below is the result, let's hope your browser can render this and let's thank Cloudflare again for <a href="/foundation/#technical-sponsors">sponsoring our traffic</a>. Here is the somewhat unholy query to generate this:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span> <span class="s1">'&lt;html&gt;&lt;body&gt;&lt;table style="padding:0px; margin: 0px; border-collapse: collapse;"&gt;'</span><span class="p">;</span>
<span class="k">FROM</span> <span class="n">movie</span>
<span class="k">SELECT</span>
    <span class="k">IF</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="s1">'&lt;tr&gt;'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span> <span class="o">||</span>
    <span class="nf">printf</span><span class="p">(</span><span class="s1">'&lt;td style="background-color: #%02x%02x%02x; height: 1px; width: 1px"; &gt;&lt;/td&gt;'</span><span class="p">,</span> <span class="n">r</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">b</span><span class="p">)</span> <span class="o">||</span>
    <span class="k">IF</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="mi">719</span><span class="p">,</span> <span class="s1">'&lt;/tr&gt;'</span><span class="p">,</span> <span class="s1">''</span><span class="p">)</span>
<span class="k">WHERE</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">48000</span>
<span class="k">ORDER</span> <span class="k">BY</span> <span class="n">y</span><span class="p">,</span> <span class="n">x</span><span class="p">;</span>
<span class="k">SELECT</span> <span class="s1">'&lt;/table&gt;&lt;/body&gt;&lt;/html&gt;'</span><span class="p">;</span>
</code></pre></div></div>

<p>You can see the result in <a href="https://blobs.duckdb.org/data/movies-table.html"><code class="language-plaintext highlighter-rouge">movies-table.html</code></a> (keep in mind that it's 20 MB and renders each movie pixels as a table field!).</p>

<h2 id="conclusion">Conclusion</h2>

<p>You can probably tell that this post is not entirely serious. Fun was had. But what did we learn? A few things: first, basically anything can be represented as a table, even an obscure 1963 movie. In the grand scheme of things, it is probably not a great idea, there are amazing open-source libraries like <code class="language-plaintext highlighter-rouge">ffmpeg</code> and apps like <code class="language-plaintext highlighter-rouge">VLC</code> to deal with movie files, or similarly with their array cousins that contain music or just images. Despite the massive blow-up and billions of rows of data, DuckDB actually handled this pretty well, both its data format and its execution engine. Here at team DuckDB, our <a href="https://www.youtube.com/watch?v=TsWNMwH1NyM">mission</a> is to raise your overall confidence wrangling data of all shapes and sizes, and we hope this post contributes to that. And to finish up, just pay attention to your copyright notices!</p>]]></content><author><name>{&quot;twitter&quot; =&gt; &quot;hfmuehleisen&quot;, &quot;picture&quot; =&gt; &quot;/images/blog/authors/hannes_muehleisen.jpg&quot;}</name></author><summary type="html"><![CDATA[You can store and even process videos in DuckDB. In this post, we show you how.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://duckdb.org/images/blog/thumbs/movies-in-databases.png" /><media:content medium="image" url="https://duckdb.org/images/blog/thumbs/movies-in-databases.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>