External Tool Utilities

These tools help TXPipe interface with external libraries, including scipy, h5py, mpi4py, and NaMaster.

txpipe.utils.fitting.fit_straight_line(x, y, y_err=None)[source]

Use scipy to fit a straight line, with errors bars in y.

Parameters:
  • x (array) – x-coordinate

  • y (array) – y-coordinate

  • y_err (array/float) – optional, default=None, errors are 1D std. dev.

Returns:

  • m (float) – gradient

  • c (float) – intercept

class txpipe.utils.hdf_tools.BatchWriter(group, col_dtypes, offset, max_size=1000000)[source]

This class is designed to batch together writes to an HDF5 file to minimize the contention when many processes are writing to a file at the same time using MPI

txpipe.utils.hdf_tools.create_dataset_early_allocated(group, name, size, dtype)[source]

Create an HdF5 dataset, allocating the full space for it at the start of the process. This can make it faster to write data incrementally from multiple processes. The dataset is also not pre-filled, saving more time.

Parameters:
  • group (h5py.Group) – the parent for the dataset

  • name (str) – name for the new dataset

  • size (int) – The size of the new data set (which must be 1D)

  • dtype (str) – Data type, One of f4, f8, i4, i8

txpipe.utils.hdf_tools.h5py_shorten(group, name, n)[source]

Trim an HDF5 column down to length n.

Parameters:
  • group (h5py.Group) –

  • name (str) –

  • n (int) –

txpipe.utils.hdf_tools.load_complete_file(f)[source]

Read all the information in an HDF5 file or group into a nested dictionary.

Using this on large files will quickly run out of memory!

Only use it on small test data.

Parameters:

f (h5py.File or h5py.Group) – The file or group to be walked through

Returns:

output – Nested dictionary with all file content.

Return type:

dict

txpipe.utils.hdf_tools.repack(filename)[source]

In-place HDF5 repack operation on file.

txpipe.utils.mpi_utils.mpi_reduce_large(data, comm, max_chunk_count=1073741824, root=0, op=None, debug=False)[source]

Use MPI reduce in-place on an array, even a very large one.

MPI Reduce is a reduction operation that will, (e.g.) sum arrays from different processors on a single process.

It fails whenever the size of the array is greater than 2**31, due to an overflow. This version detects that case and divides the array up into chunks, running reduction on each one separately

This specific call does in-place reduction, so that the root process overwrites its own array with the result. This minimizes memory usage.

The default is to do a sum of all the arrays, and to collect at process zero, but those can be overridden.

Parameters:
  • data (array) – can be any shape but must be contiguous

  • comm (MPI communictator) –

  • max_chunk_count (int) – Optional, default=2**30. Max number of items to allow to be sent at once

  • root (int) – Optional, default=0. Rank of process to receive final result

  • op (MPI operation) – Optional, default=None. MPI operation, e.g. MPI.PROD, MPI.MAX, etc. Default is to SUM

  • debug (bool) – Optional, default=False. Whether to print out information from each rank

class txpipe.utils.nmt_utils.MyNmtBin(*args: Any, **kwargs: Any)[source]
class txpipe.utils.nmt_utils.MyNmtBinFlat(*args: Any, **kwargs: Any)[source]