n8n and large files

I’ve self hosted an n8n to automate video uploads to youtube. The idea was pretty simple - when I put an mp4 file in a specific dir on my nas, n8n would pick it up, as well as a metadata.json file. The metadata file would contain things like the title and description, and it would upload this to youtube.

I started creating this automation and pretty soon had something that resembled the finished workflow. I started testing it out. However, I ran into a problem. I had a Read/Write files from disk node in my workflow, and whenever I manually executed it, it would take way too long to finish - a couple of minutes. My first thought was OK - the video files are pretty large, and we’re likely loading them into memory. Then again, it should not take that long, as the video I was testing with was about 200MB in size. I tried playing around with it a bit more but could not get it to work properly. The spare time I had ran out, so I left it as is to continue with it when I next have time for it.

A few days pass, and I notice that the s3 bucket I use for off-site backups started rapidly increasing in size. My backup script takes in a bunch of directories in my NAS that I consider critical, things like my database dumps, configuration files and similar. These are then backed up using kopia to an s3 bucket. The backup is pretty small, usually. The databases are a hundred MB, other files in the backup take a similar amount of space. It was very unusual to see the backup gaining a couple of gigabytes a day. So I went to investigate.

I looked at the backup, and noticed that my database dump went from 104 MB on one day, to 2.37GB the next. I quickly determined that the culprit was the public.execution_data table in the n8n database by first running the query:

SELECT 
    datname AS database_name,
    pg_size_pretty(pg_database_size(datname)) AS size
FROM 
    pg_database
ORDER BY 
    pg_database_size(datname) DESC;

This helped me determine which database was the culprit. I then executed the following on the relevant database:

SELECT 
    schemaname || '.' || relname AS table_name,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    pg_size_pretty(pg_relation_size(relid)) AS table_size,
    pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_size
FROM 
    pg_catalog.pg_statio_user_tables
ORDER BY 
    pg_total_relation_size(relid) DESC;

Which listed public.execution_data as the one to blame.

How so? Well… apparently, when you use the Read/Write Files from Disk node in n8n, the data for the run is saved as is in the table, meaning that all executions saved the mp4 file to the database. This is also the reason why the node took so long to execute.

To alleviate this, you should set the env var N8N_DEFAULT_BINARY_DATA_MODE to filesystem as per n8n docs. Removing the relevant workflow runs from n8n and running a postgres vacuum shrunk the database dump back to 100MB. The step is now also taking seconds, not minutes to complete.

Having this data stored in the database allows for the n8n runs to be repeatable, but doesn’t play very nice if you’re working with larger files, such as video. I haven’t tinkered as much yet, but I guess using the filesystem mode would mean that if you move the file in the filesystem, you’re not going to be able to fully reproduce a workflow run in n8n. However, if you are self hosting, working with large files and don’t really care about the ability to reproduce a run, it’s probably best to use the filesystem mode.

Comments