Reading and visualising data¶

[2]:

import numpy as np
import tools21cm as t2c

import warnings
warnings.filterwarnings("ignore")

Different simulations codes write their output in different formats. It is same for observations, which will differ based of the observation facility and research group. One has to define a function that is specific to that case.

In order to manipulate and analyse data with tools21cm, we want the data to be read in as numpy array.

Reading data¶

Here we read the ionisation fraction data cube produced with the C2Ray code. For the density field, we will consider the gridded density field created by an N-body, CubeP3M, which were used by C2Ray code as input.

We provide few simulation output for test: https://doi.org/10.5281/zenodo.3953639

[3]:

path_to_datafiles = './data/'
z = 7.059

[4]:

t2c.set_sim_constants(244) # This line is only useful while working with C2Ray simulations.

x_file = t2c.XfracFile(path_to_datafiles+'xfrac3d_7.059.bin')
d_file = t2c.DensityFile(path_to_datafiles+'7.059n_all.dat')

xfrac = x_file.xi
dens  = d_file.cgs_density

The above function set_sim_constants is useful only for C2Ray simulation outputs. This function takes as its only parameter the box side in cMpc/h and sets simulations constants.

See here for more data reading functions.

Visualising the data¶

You can of course plot the data you read using your favorite plotting software. For example, if you have matplotlib installed.

[5]:

import matplotlib.pyplot as plt

[6]:

box_dims = 244/0.7 # Length of the volume along each direction in Mpc.

dx, dy = box_dims/xfrac.shape[1], box_dims/xfrac.shape[2]
y, x = np.mgrid[slice(dy/2,box_dims,dy),
                slice(dx/2,box_dims,dx)]

[7]:

plt.rcParams['figure.figsize'] = [16, 6]
plt.suptitle('$z={0:.1f},~x_v=${1:.2f}'.format(z,xfrac.mean()), size=18)

plt.subplot(121)
plt.title('Density contrast slice')
plt.pcolormesh(x, y, dens[0]/dens.mean()-1)
plt.xlabel('$x$ (Mpc)')
plt.ylabel('$y$ (Mpc)')
plt.colorbar()

plt.subplot(122)
plt.title('Ionisation fraction slice')
plt.pcolormesh(x, y, xfrac[0])
plt.xlabel('$x$ (Mpc)')
plt.ylabel('$y$ (Mpc)')
plt.colorbar()

plt.show()

21 cm brightness temperature¶

We can construct the 21 cm brightness temperature from the density field and ionisation fraction field using calc_dt. Due to the absence of zero baseline, the mean signal will be subtracted from each frequency channel. One can use subtract_mean_signal to add this effect.

[8]:

dT = t2c.calc_dt(xfrac, dens, z)
print('Mean of first channel: {0:.4f}'.format(dT[0].mean()))

Mean of first channel: 11.8259

[9]:

dT_subtracted = t2c.subtract_mean_signal(dT, 0)
print('Mean of first channel: {0:.4f}'.format(dT_subtracted[0].mean()))

Mean of first channel: -0.0000

[10]:

plt.rcParams['figure.figsize'] = [6, 5]

plt.title('21 cm signal')
plt.pcolormesh(x, y, dT_subtracted[0,:,:])
plt.xlabel('$x$ (Mpc)')
plt.ylabel('$y$ (Mpc)')
plt.colorbar(label='mK')

plt.show()

21 cm power spectrum¶

One of the most interesting metric to analyse this field is the power spectrum. Here we estimate the spherically average power spectrum using power_spectrum_1d function.

The function needs the length of the input_array in Mpc (or Mpc/h) through box_dims parameters. This is used to calculate the wavenumbers (k). The unit of the output k values will be 1/Mpc (or h/Mpc). If the input_array has unequal length in each direction, then one can provide box_dims with a list containing the lengths in each direction.

[11]:

box_dims = 244/0.7 # Length of the volume along each direction in Mpc.

[12]:

ps, ks = t2c.power_spectrum_1d(dT_subtracted, kbins=15, box_dims=box_dims)

[13]:

plt.rcParams['figure.figsize'] = [7, 5]

plt.title('Spherically averaged power spectrum.')
plt.loglog(ks, ps*ks**3/2/np.pi**2)
plt.xlabel('k (Mpc$^{-1}$)')
plt.ylabel('P(k) k$^{3}$/$(2\pi^2)$')

plt.show()

Redshift-space distortions¶

The 21 cm signal will be modified while mapping from real space to redshift space due to peculiar velocities (Mao et al. 2012).

The VelocityFile function is used to read the velocity files produced by CubeP3M. We need the velocities in km/s as a numpy array of shape (3,nGridx,nGridy,nGridyz), where the first axis represent the velocity component along x, y and z spatial direction. The get_kms_from_density attribute gives such a numpy array.

[14]:

v_file = t2c.VelocityFile(path_to_datafiles+'7.059v_all.dat')

[15]:

kms = v_file.get_kms_from_density(d_file)

The get_distorted_dt function will distort the signal.

[16]:

dT_rsd = t2c.get_distorted_dt(dT, kms, z,
                              los_axis=0,
                              velocity_axis=0,
                              num_particles=20)

Spherically averaged power spectrum of the 21 cm signal with RSD.

[17]:

ps_rsd, ks_rsd = t2c.power_spectrum_1d(dT_rsd, kbins=15, box_dims=box_dims)

[18]:

plt.rcParams['figure.figsize'] = [7, 5]

plt.title('Spherically averaged power spectrum.')
plt.loglog(ks, ps*ks**3/2/np.pi**2, label='no RSD')
plt.loglog(ks_rsd, ps_rsd*ks_rsd**3/2/np.pi**2, linestyle='--', label='RSD')

plt.xlabel('k (Mpc$^{-1}$)')
plt.ylabel('P(k) k$^{3}$/$(2\pi^2)$')
plt.legend()

plt.show()

We see in the above figure that the spherically averaged power spectrum has changed after RSD is implemented.

However, a better marker of RSD in 21 cm signal is the power spectrum’s \(\mu (\equiv k_\parallel/k)\) dependence (Jensen et al. 2013). The power spectrum of 21 cm signal with RSD will have the following dependence (Barkana & Loeb 2005),

\(P(k,\mu) = P_0 + \mu^2P_2 +\mu^4P_4\).

We can calculate \(P(k,\mu)\) using power_spectrum_mu function.

[19]:

Pk, mubins, kbins, nmode = t2c.power_spectrum_mu(
    dT,
    los_axis=0,
    mubins=8,
    kbins=15,
    box_dims=box_dims,
    exclude_zero_modes=True,
    return_n_modes=True,
    absolute_mus=False,
)

[20]:

Pk_rsd, mubins_rsd, kbins_rsd, nmode_rsd = t2c.power_spectrum_mu(
    dT_rsd,
    los_axis=0,
    mubins=8,
    kbins=15,
    box_dims=box_dims,
    exclude_zero_modes=True,
    return_n_modes=True,
    absolute_mus=False,
)

[21]:

plt.rcParams['figure.figsize'] = [7, 5]
ii = 8

plt.title('$k={0:.2f}$'.format(kbins[ii]))
plt.plot(mubins, Pk[:,ii], label='no RSD')
plt.plot(mubins_rsd, Pk_rsd[:,ii], linestyle='--', label='RSD')

plt.xlabel('$\mu$')
plt.ylabel('$P(k,\mu)$')
plt.legend()

plt.show()

Bubble size distribution¶

The bubble (HII regions) size distribution is an intersting probe of the reionization process (Giri et al. 2018).

Tools21cm contains three methods to determine the size distributions, which are Friends-of-friends, Spherical average and mean free path approach.

In this tutorial, we will take the ionisation fraction field and assume all the pixels with value \(>0.5\) as ionised.

[22]:

xHII = xfrac>0.5

boxsize = 244/0.7 # in Mpc

Mean free path (e.g. Mesinger & Furlanetto 2007)

[23]:

r_mfp, dn_mfp = t2c.mfp(xHII, boxsize=boxsize, iterations=1000000)

MFP method applied on 3D data (ver 1.0)
Initialising random rays...
...done
Estimating ray lengths...

 48%|███████████████████▋                     | 120/250 [00:06<00:07, 18.24it/s]

...done

Program runtime: 0.126151 minutes.
The output contains a tuple with three values: r, rdP/dr
The curve has been normalized.

[24]:

plt.rcParams['figure.figsize'] = [7, 5]

plt.semilogx(r_mfp, dn_mfp)

plt.xlabel('$R$ (Mpc)')
plt.ylabel('$R\mathrm{d}P/\mathrm{d}R$')
plt.title('Mean free path method')

plt.show()

Spherical average (e.g. Zahn et al. 2007)

[25]:

r_spa, dn_spa = t2c.spa(xHII, boxsize=boxsize, nscales=20)

100%|███████████████████████████████████████████| 20/20 [01:30<00:00,  4.51s/it]

[26]:

plt.rcParams['figure.figsize'] = [7, 5]

plt.semilogx(r_spa, dn_spa)

plt.xlabel('$R$ (Mpc)')
plt.ylabel('$R\mathrm{d}P/\mathrm{d}R$')
plt.title('Spherical Average method')
plt.show()

Friends of friends (e.g. Iliev et al. 2006)

[27]:

labelled_map, volumes = t2c.fof(xHII)
fof_dist = t2c.plot_fof_sizes(volumes, bins=30, boxsize=boxsize)

Program runtime: 0.015371 minutes.
The output is a tuple containing output-map and volume-list array respectively.
The output is Size, Size**2 dP/d(Size), lowest value

[28]:

plt.rcParams['figure.figsize'] = [7, 5]

plt.step(fof_dist[0], fof_dist[1])
plt.xscale('log')
plt.yscale('log')
plt.ylim(fof_dist[2],1)

plt.xlabel('$V$ (Mpc$^3$)')
plt.ylabel('$V^2\mathrm{d}P/\mathrm{d}V$')
plt.title('Friends of friends method')

plt.show()

Telescope noise calculation.¶

See Ghara et al. 2016 and Ghara et al. 2016 for more information

uv-coverage¶

We start with calculation of the uv-coverage for SKA1-Low configuration.

[29]:

uv, Nant = t2c.get_uv_daily_observation(ncells=dT.shape[0], # The number of cell used to make the image
                                        z=z,                # Redhsift of the slice observed
                                        filename=None,      # If None, it uses the SKA-Low 2016 configuration.
                                        total_int_time=6.0, # Observation per day in hours.
                                        int_time=10.0,      # Time period of recording the data in seconds.
                                        boxsize=boxsize,    # Comoving size of the sky observed
                                        declination=-30.0,
                                        verbose=True)

Making uv map from daily observations.

100%|███████████████████████████████████████| 2159/2159 [04:09<00:00,  8.65it/s]

...done

We suggest that you save the uv map as it is computationally expensive. Expecially when computed for an array of redshifts.

[30]:

np.save(path_to_datafiles+'uv_map.npy', uv)
np.save(path_to_datafiles+'Nant.npy', Nant)

[31]:

plt.rcParams['figure.figsize'] = [5, 5]

plt.title(r'$z=%.3f$ $\nu_{obs}=%.2f$ MHz' %(z, t2c.z_to_nu(z)))
plt.pcolormesh(x, y, np.log10(np.fft.fftshift(uv)))
plt.xlabel('u [$Mpc^-1$]'), plt.ylabel('v [$Mpc^-1$]');

Here we plot the antennas location of the SKA1-Low configuration.

[32]:

from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from mpl_toolkits.axes_grid1.inset_locator import mark_inset

ska_ant = t2c.SKA1_LowConfig_Sept2016()

fig, ax = plt.subplots(figsize=(5, 5))
plt.plot(ska_ant[:,0], ska_ant[:,1], '.')
x1, x2, y1, y2 = 116.2, 117.3, -26.45, -27.25
ax.set_xlim(x1, x2)
ax.set_ylim(y1, y2)
ax.grid(b=True, alpha=0.5)

axins = inset_axes(ax, 1, 1, loc=4, bbox_to_anchor=(0.2, 0.2))
plt.plot(ska_ant[:,0], ska_ant[:,1], ',')
x1, x2, y1, y2 = 116.75, 116.78, -26.815, -26.833
axins.set_xlim(x1, x2)
axins.set_ylim(y1, y2)
axins.grid(b=True, alpha=0.5)
mark_inset(ax, axins, loc1=4, loc2=2, fc="none", ec="0.5")
axins.axes.xaxis.set_ticks([]);
axins.axes.yaxis.set_ticks([]);

Creating the noise cube¶

[33]:

noise_cube = t2c.noise_cube_coeval(ncells=dT.shape[0],
                                   z=z,
                                   depth_mhz=None,   #If None, estimates the value such that the final output is a cube.
                                   obs_time=1000,
                                   filename=None,
                                   boxsize=boxsize,
                                   total_int_time=6.0,
                                   int_time=10.0,
                                   declination=-30.0,
                                   uv_map=uv,
                                   N_ant=Nant,
                                   verbose=True,
                                   fft_wrap=False)

Creating the noise cube...

100%|████████████████████████████████████████| 250/250 [00:02<00:00, 120.25it/s]

[34]:

plt.rcParams['figure.figsize'] = [16, 6]

plt.suptitle('$z=%.3f$ $x_v=%.3f$' %(z, xfrac.mean()), size=18)
plt.subplot(121)
plt.title('noise cube slice')
plt.pcolormesh(x, y, noise_cube[0])
plt.colorbar(label='$\delta T^{noise}$ [mK]')
plt.subplot(122)
plt.title('noise distribution')
plt.hist(noise_cube.flatten(), bins=150, histtype='step');
plt.xlabel('$\delta T^{noise}$ [mK]'), plt.ylabel('$N_{sample}$');

Create realistic 21-cm observational images¶

To create realistict observational images we combined the 21-cm filed with the noise cube and smooth the data to the resolution corresponding to maximum baseline of 2 km. The smoothing works for any kind of data (e.g. xHII, dT, etc.) and also for single dish. In the former case the max_baseline should be substituted to D/122, where D is the antenna disk diameter.

[35]:

dT2 = dT_subtracted + noise_cube

dT_smooth = t2c.smooth_coeval(cube=dT2,    # Data cube that is to be smoothed
                              z=z,                  # Redshift of the coeval cube
                              box_size_mpc=boxsize, # Box size in cMpc
                              max_baseline=2.0,     # Maximum baseline of the telescope
                              ratio=1.0,            # Ratio of smoothing scale in frequency direction
                              nu_axis=2)            # frequency axis

100%|████████████████████████████████████████| 250/250 [00:00<00:00, 268.96it/s]

[36]:

plt.rcParams['figure.figsize'] = [15, 6]

plt.subplot(121)
plt.title('$\delta T_b$ at simulation resolution')
plt.pcolormesh(x, y, dT[0])
plt.colorbar(label='$\delta T_b$ [mK]')
plt.subplot(122)
plt.title('smoothed $\delta T_b + \delta T^{noise}_b$')
plt.pcolormesh(x, y, dT_smooth[0])
plt.colorbar(label='$\delta T_b$ [mK]')

[36]:

<matplotlib.colorbar.Colorbar at 0x7efefecf6370>

Image segmenation with Tools21cm¶

The Identification of HII regions in noisy 21-cm interferometric images is not trivial. Tools21cm contains two methods to identify (segmentation) ionised/neutral regions: the super-pixel method Giri et al. 2018 and the deep learning algorithm SegU-Net Bianco et al. 2022.

In this tutorial, we will take the ionisation fraction field and assume all the pixels with value \(>0.5\) as ionised.

The superpixel algorithm¶

We start by calculating the superpixel labels

[37]:

%%time
labels = t2c.slic_cube(cube=dT_smooth,
                       n_segments=5000,
                       compactness=0.1,
                       max_iter=20,
                       sigma=0,
                       min_size_factor=0.5,
                       max_size_factor=3,
                       cmap=None)

Estimating superpixel labels using SLIC...
The output contains the labels with 4911 segments
CPU times: user 3min 42s, sys: 375 ms, total: 3min 42s
Wall time: 3min 43s

Make the superpixel cube and plot a slice.

[38]:

superpixel_map = t2c.superpixel_map(dT_smooth, labels)

plt.rcParams['figure.figsize'] = [18, 5]
plt.subplot(131)
plt.pcolormesh(x, y, superpixel_map[0], cmap='jet')
plt.subplot(132)
plt.pcolormesh(x, y, dT_smooth[0], cmap='jet')

Estimating the superpixel mean map...

100%|████████████████████████████████████| 4911/4911 [00:00<00:00, 51612.08it/s]

...done
Constructing the superpixel map...

100%|███████████████████████████████████████| 4911/4911 [01:16<00:00, 64.19it/s]

[38]:

<matplotlib.collections.QuadMesh at 0x7efefecf3b50>

Stitch the superpixels using the superpixel PDFs and plot a slice from identified ionized region cube. Then smooth the true ionisation fraction field to compare.

[39]:

xHII_stitch = t2c.stitch_superpixels(data=dT_smooth,
                                     labels=labels,
                                     bins='knuth',
                                     binary=True,
                                     on_superpixel_map=True)

mask_xHI = t2c.smooth_coeval(xfrac, z, box_size_mpc=boxsize, max_baseline=2.0, nu_axis=2) < 0.5

Estimating the superpixel mean map...

100%|████████████████████████████████████| 4911/4911 [00:00<00:00, 57067.18it/s]

...done
Constructing the superpixel map...

100%|███████████████████████████████████████| 4911/4911 [01:21<00:00, 60.60it/s]
100%|████████████████████████████████████████| 250/250 [00:01<00:00, 194.97it/s]

[40]:

from sklearn.metrics import matthews_corrcoef

phicoef_sup = matthews_corrcoef(mask_xHI.flatten(), 1-xHII_stitch.flatten())

plt.rcParams['figure.figsize'] = [12, 6]
plt.subplot(121)
plt.title('Super-Pixel: $r_{\phi}=%.3f$' %phicoef_sup)
plt.pcolormesh(x, y, 1-xHII_stitch[0], cmap='jet')
plt.contour(mask_xHI[0], colors='lime', extent=[0, boxsize, 0, boxsize])

[40]:

<matplotlib.contour.QuadContourSet at 0x7eff00837ee0>

SegU-Net¶

Here, we load the best performing network and calculate the nubmer of manipulation on the cube to evaluate the pixel-uncertainty. Additionally, we can get pixel-error maps

tta = 0 : super-fast (approximatelly 7 sec), it tends to be a few percent less accurate (within 2% differences), no pixel-error map (no TTA manipulation)
tta = 1 : fast (about 17 sec), accurate, with pixel-error map (3 samples)
tta = 2 : slow (about 10 min), accurate, with pixel-error map (~100 samples)

[42]:

seg = t2c.segmentation.segunet21cm(tta=2, verbose=True)

tot number of (unique) manipulation we can do on a cube: 28
 Loaded model: /home/michele/codes/tools21cm/src/tools21cm/input_data/segunet_03-11T12-02-05_128slice_ep35.h5

In the current version of Tools21cm, the network operates on data cubes with a mesh size of \(128^3\). In future updates, we plan to extend to cover the mesh size of any given input. Moreover, SegU-Net is optimised for cube data with and spatial resolution of \(2\,Mpc\). However, it deals well with simulation with smaller resolution.

For the reason mentioned above, here we cut the differential brightness cube and Super-Pixel output in order to match SegU-Net input shape.

[43]:

dT_cut = dT_smooth[:128,:128,:128]
mask_xHI2 = mask_xHI[:128,:128,:128]

xHI_seg, xHI_seg_err = seg.prediction(x=dT_cut)
phicoef_seg = matthews_corrcoef(mask_xHI2.flatten(), xHI_seg.flatten())

xHI_sup = xHII_stitch[:128,:128,:128]
phicoef_sup = matthews_corrcoef(mask_xHI2.flatten(), 1-xHI_sup.flatten())

100%|███████████████████████████████████████████| 28/28 [19:38<00:00, 42.09s/it]

[46]:

fig, axs = plt.subplots(figsize=(12,6), ncols=3, sharey=True, sharex=True)
(ax0, ax1, ax2) = axs

ax0.set_title('Super-Pixel ($r_{\phi}=%.3f$)' %phicoef_sup)
ax0.imshow(1-xHI_sup[0], origin='lower', cmap='jet', extent=[0, boxsize, 0, boxsize])
ax0.contour(mask_xHI2[0], colors='lime', extent=[0, boxsize, 0, boxsize])
ax0.set_xlabel('x [Mpc]')

ax1.set_title('SegU-Net ($r_{\phi}=%.3f$)' %phicoef_seg)
ax1.imshow(xHI_seg[0], origin='lower', cmap='jet', extent=[0, boxsize, 0, boxsize])
ax1.contour(mask_xHI2[0], colors='lime', extent=[0, boxsize, 0, boxsize])
ax1.set_xlabel('x [Mpc]')

ax2.set_title('SegUNet Pixel-Error')
im = ax2.imshow(xHI_seg_err[0], origin='lower', cmap='jet', extent=[0, boxsize, 0, boxsize])
fig.colorbar(im, label=r'$\sigma_{std}$', ax=ax2, pad=0.02, cax=fig.add_axes([0.905, 0.25, 0.02, 0.51]))
ax2.set_xlabel('x [Mpc]')

plt.subplots_adjust(hspace=0.1, wspace=0.01)
for ax in axs.flat: ax.label_outer()