Add functionality to index grib files as a grib file is written by ColemanTom · Pull Request #494 · ecmwf/eccodes

ColemanTom · 2026-06-17T03:29:02Z

Description

Previous workflow:

create grib file
save it
create index from that grib file

New workflow:

create message for grib file
save message
add message details to index file
repeat for remaining messages in file

There is no need for double handling of the grib file data anymore, providing significant time savings for this pattern.

NOTE: I only tested writing a grib file on the fly. In theory it should work for bufr too. I have confirmed bufr index's made before and after this change are identical.

Test cases

I ran both workflows above with the new build, and the output index files are bit identical. I also compared against an index created using 2.36.0 and got bit identical results still.

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <eccodes.h>

#define MAX_PARAMS 100

int matches(long paramId, int n, long list[]) {
    for (int i = 0; i < n; i++) {
        if (paramId == list[i]) return 1;
    }
    return 0;
}

int main(int argc, char* argv[]) {
    const char *input = NULL, *output = NULL, *index_keys = NULL;
    char  *output_idx = NULL;

    long params[MAX_PARAMS];
    int n_params = 0;
    int idx_err = 0;
    codes_index* idx;

    /* Parse args */
    for (int i = 1; i < argc; i++) {
        if (strcmp(argv[i], "--input") == 0 && i + 1 < argc) {
            input = argv[++i];
        } else if (strcmp(argv[i], "--output") == 0 && i + 1 < argc) {
            output = argv[++i];
        } else if (strcmp(argv[i], "--params") == 0) {
            while (i + 1 < argc && strncmp(argv[i + 1], "--", 2) != 0) {
                if (n_params >= MAX_PARAMS) {
                    fprintf(stderr, "Too many params (max %d)\n", MAX_PARAMS);
                    return 1;
                }
                params[n_params++] = atol(argv[++i]);
            }
        } else if (strcmp(argv[i], "--index") == 0) {
            index_keys = argv[++i];
        }
    }

    if (!input || !output || n_params == 0) {
        fprintf(stderr, "Usage: --input file --output file --params id1 id2 ...\n");
        return 1;
    }

    FILE* fin = fopen(input, "rb");
    if (!fin) {
        perror("Input file");
        return 1;
    }

    FILE* fout = fopen(output, "wb");
    if (!fout) {
        perror("Output file");
        fclose(fin);
        return 1;
    }

    if (index_keys) {
        size_t len = strlen(output) + 5;
        output_idx = malloc(len);
        if (!output_idx) {
            fprintf(stderr, "Memory allocation failed\n");
            return 1;
        }
        strcpy(output_idx, output);
        strcat(output_idx, ".idx");
        idx = codes_index_new(NULL, index_keys, &idx_err);
        if (!idx) {
            fprintf(stderr, "Index create failed: %s\n", codes_get_error_message(idx_err));
            /* non-fatal: continue without indexing */
        }
    }

    int err = 0;
    codes_handle* h = NULL;

    while (1) {
        h = codes_handle_new_from_file(NULL, fin, PRODUCT_GRIB, &err);

        if (!h) {
            if (err == CODES_SUCCESS) break;  /* EOF */
            fprintf(stderr, "Read error: %s\n", codes_get_error_message(err));
            break;
        }

        long paramId;
        if (codes_get_long(h, "paramId", &paramId) == CODES_SUCCESS) {

            if (matches(paramId, n_params, params)) {
                const void* buffer = NULL;
                size_t size = 0;

                if (codes_get_message(h, &buffer, &size) == CODES_SUCCESS) {
                    if (buffer && size > 0) {
                          off_t offset;
                        if (idx)
                            offset = (off_t)ftello(fout);
                        fwrite(buffer, 1, size, fout);
                        if (idx)
                            codes_index_add_message(idx, h, output, offset);
                    }
                }
            }
        }

        codes_handle_delete(h);
    }

    if (idx) {
        fflush(fout);   /* ensure all bytes are visible before index is used */
        codes_index_write(idx, output_idx);   /* optional: persist the index */
        codes_index_delete(idx);
    }

    fclose(fin);
    fclose(fout);

    return 0;
}

I ran this and collected timings with

#!/bin/bash
set -euo pipefail

./a.out \
    --input ~/tmp/ACCESS-G_2025051218_008.model.grb2 \
    --output separate.grib2 \
    --params 132 130 156 133 260238 157 247

grib_index_build -k paramId,level -o separate.grib2.ix separate.grib2

and

#!/bin/bash
set -euo pipefail

./index.out \
    --input ~/tmp/ACCESS-G_2025051218_008.model.grb2 \
    --output combined.grib2 \
    --params 132 130 156 133 260238 157 247 \
    --index paramId,level

I also tested the fortran bindings and they work too

program grib_filter_index
  use eccodes
  implicit none

  integer            :: fin, fout, idx_id
  integer            :: igrib, iret, i
  integer(kind=8)    :: out_offset, total_length
  integer(kind=4)    :: paramId
  character(len=512) :: input_file, output_file, output_idx
  character(len=256) :: index_keys
  character(len=512) :: arg
  integer            :: nargs, iarg, n_params
  integer, parameter :: MAX_PARAMS = 100
  integer(kind=4)    :: params(MAX_PARAMS)
  logical            :: do_index, matched

  input_file  = ''
  output_file = ''
  index_keys  = ''
  n_params    = 0
  do_index    = .false.

  ! --- Parse command line ---
  nargs = command_argument_count()
  iarg  = 1
  do while (iarg <= nargs)
    call get_command_argument(iarg, arg)
    select case (trim(arg))
    case ('--input')
      iarg = iarg + 1;  call get_command_argument(iarg, input_file)
    case ('--output')
      iarg = iarg + 1;  call get_command_argument(iarg, output_file)
    case ('--index')
      iarg = iarg + 1;  call get_command_argument(iarg, index_keys)
      do_index = .true.
    case ('--params')
      do
        if (iarg >= nargs) exit
        call get_command_argument(iarg + 1, arg)
        if (arg(1:2) == '--') exit
        iarg = iarg + 1
        n_params = n_params + 1
        read(arg, *) params(n_params)
      end do
    end select
    iarg = iarg + 1
  end do

  if (len_trim(input_file) == 0 .or. len_trim(output_file) == 0 .or. n_params == 0) then
    write(0,*) 'Usage: --input f --output f --params id1 id2 ... [--index keys]'
    stop 1
  end if

  ! --- Open files ---
  call grib_open_file(fin,  trim(input_file),  'r', iret)
  if (iret /= GRIB_SUCCESS) then
    write(0,*) 'Cannot open input:', trim(input_file)
    stop 1
  end if

  call grib_open_file(fout, trim(output_file), 'w', iret)
  if (iret /= GRIB_SUCCESS) then
    write(0,*) 'Cannot open output:', trim(output_file)
    stop 1
  end if

  ! --- Create empty index if requested ---
  if (do_index) then
    output_idx = trim(output_file) // '.idx'
    call grib_index_new(idx_id, trim(index_keys), iret)
    if (iret /= GRIB_SUCCESS) then
      write(0,*) 'Warning: index create failed:', iret, ' - continuing without index'
      do_index = .false.
    end if
  end if

  out_offset = 0_8

  ! --- Main loop ---
  do
    call grib_new_from_file(fin, igrib, iret)
    if (iret == GRIB_END_OF_FILE) exit
    if (iret /= GRIB_SUCCESS) then
      write(0,*) 'Read error:', iret
      exit
    end if

    call grib_get(igrib, 'paramId', paramId)

    matched = .false.
    do i = 1, n_params
      if (paramId == params(i)) then
        matched = .true.
        exit
      end if
    end do

    if (matched) then
      call grib_get(igrib, 'totalLength', total_length)

      if (do_index) then
        call grib_index_add_message(idx_id, igrib, trim(output_file), out_offset, iret)
        if (iret /= GRIB_SUCCESS) &
          write(0,*) 'Warning: index_add_message failed:', iret
      end if

      call grib_write(igrib, fout)
      out_offset = out_offset + total_length
    end if

    call grib_release(igrib)
  end do

  ! --- Write index and clean up ---
  if (do_index) then
    call grib_index_write(idx_id, trim(output_idx), iret)
    if (iret /= GRIB_SUCCESS) write(0,*) 'Warning: index write failed:', iret
    call grib_index_release(idx_id)
    write(*,*) 'Index written to:', trim(output_idx)
  end if

  call grib_close_file(fin)
  call grib_close_file(fout)
  write(*,*) 'Written', out_offset, 'bytes to', trim(output_file)

end program grib_filter_index

Timing details

I ran the two workflows mentioned above on three different disk types - /dev/shm, an NFS disk and a Lustre disk (no flash storage). In all three cases the new approach provided significant improvements. The number of messages being indexed is 490.

Run #	/dev/shm		NFS		Lustre
	Separate	Combined	Separate	Combined	Separate	Combined
1	11.03	4.7	26.15	10.06	12.79	7.32
2	10.72	5.23	25.97	11.18	14.16	8.19
3	11.23	6	30.34	10.6	14.03	8.32
4	10.81	5.23	27.47	9.67	13.92	8.22
Average	10.9475	5.29	27.4825	10.3775	13.725	8.0125

What has not been done?

There are no python bindings. I think it would be good to have them exist, but I have not looked at where they are setup, I think in python-eccodes, but I need to confirm

Contributor Declaration

By opening this pull request, I affirm the following:

All authors agree to the Contributor License Agreement.
The code follows the project's coding standards.
I have performed self-review and added comments where needed.
I have added or updated tests to verify that my changes are effective and functional.
I have run all existing tests and confirmed they pass.

Previous workflow: - create grib file - save it - create index from that grib file New workflow: - create message for grib file - save message - add message details to index file - repeat for remaining messages in file There is no need for double handling of the grib file data anymore, providing significant time savings for this pattern.

github-actions Bot added the contributor label Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add functionality to index grib files as a grib file is written#494

Add functionality to index grib files as a grib file is written#494
ColemanTom wants to merge 1 commit into
ecmwf:developfrom
ColemanTom:index_at_grib_write_time

ColemanTom commented Jun 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ColemanTom commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test cases

Timing details

What has not been done?

Contributor Declaration

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ColemanTom commented Jun 17, 2026 •

edited

Loading