Change log#
The change log file hosting all releases with lists of new features and breaking changes. Best viewed here.
Unreleased#
New features
Breaking changes:
Deprecations:
Bug fixes
Grain 0.2.13 (October 15, 2025)#
New features
Adds
reseed_each_epochoption toMapDataset.repeatthat allows to replay the first epoch exactly if set to False (True by default).Introduces
grain.experimental.RebatchIterDatasetfor efficient rebatch.Migrates data loader to use dataset API under the hood.
Improves first-fit packing speed by up to 12x.
Adds best-fit packing implementation which reduces padding in benchmarks by over 27% compared to first-fit.
Adds
max_sequences_per_binto packing transformations to limit the number of sequences packed into a single bin.Introduces
grain.experimental.RepeatIterDataset.Adds custom batching function support to
grain.DataLoader.Adds
grain.experimental.FlatMapTransformsupport tograin.DataLoader.
Breaking changes:
SliceMapDataset updated to use the full index relative to the parent dataset, instead index%len(self).
Deprecations:
Graduates
grain.experimental.apply_transformationstograin.{MapDataset|IterDataset}.apply. The experimental API will soon be deprecated.
Bug fixes
Fixes memory leak on
ThreadPrefetchDatasetIteratordeletion.
Grain 0.2.12 (August 21, 2025)#
New features:
Adds Windows build.
Allow passing
read_kwargstoParquetIterDatasetfor configuring parquet file reading.ThreadPrefetchDatasetIteratornow supports non-Grain iterators that support checkpointing.Introduces API for device prefetch -
grain.experimental.device_put()for easy CPU and device prefetching.Introduces API for autotuning – given the user provided RAM restrictions and specific
IterDataset, finds number of processes formp_prefetchand buffer size forPrefetchDatasetIterator.Allow passing
reader_optionstoArrayRecordDataSourcefor configuring array record file reading.Introduces
grain.experimental.batch_and_padfor padding a partial batch to avoid dropping batch remainder data.Grain interleave optimization - allow creating more threads to parallelly keep starting iterators and prefetching elements.
Allow for alternative slicing of the data for
MultiprocessPrefetchIterDataset. New slicing allows each worker process to read unique file shards and thus improving performance.
Breaking changes:
Upgrades
array_recordandprotobuf.
Deprecations:
Bug fixes
Grain 0.2.11 (July 2, 2025)#
New features:
Automatic publishing releases to PyPI via GitHub actions.
Nightly builds.
Introduced changelog.