Reading and visualising data

[2]:
import numpy as np
import tools21cm as t2c

import warnings
warnings.filterwarnings("ignore")

Different simulations codes write their output in different formats. It is same for observations, which will differ based of the observation facility and research group. One has to define a function that is specific to that case.

In order to manipulate and analyse data with tools21cm, we want the data to be read in as numpy array.

Reading data

Here we read the ionisation fraction data cube produced with the C2Ray code. For the density field, we will consider the gridded density field created by an N-body, CubeP3M, which were used by C2Ray code as input.

We provide few simulation output for test: https://doi.org/10.5281/zenodo.3953639

[3]:
path_to_datafiles = './data/'
z = 7.059
[4]:
t2c.set_sim_constants(244) # This line is only useful while working with C2Ray simulations.

x_file = t2c.XfracFile(path_to_datafiles+'xfrac3d_7.059.bin')
d_file = t2c.DensityFile(path_to_datafiles+'7.059n_all.dat')

xfrac = x_file.xi
dens  = d_file.cgs_density

The above function set_sim_constants is useful only for C2Ray simulation outputs. This function takes as its only parameter the box side in cMpc/h and sets simulations constants.

See here for more data reading functions.

Visualising the data

You can of course plot the data you read using your favorite plotting software. For example, if you have matplotlib installed.

[5]:
import matplotlib.pyplot as plt
[6]:
box_dims = 244/0.7 # Length of the volume along each direction in Mpc.

dx, dy = box_dims/xfrac.shape[1], box_dims/xfrac.shape[2]
y, x = np.mgrid[slice(dy/2,box_dims,dy),
                slice(dx/2,box_dims,dx)]
[7]:
plt.rcParams['figure.figsize'] = [16, 6]
plt.suptitle('$z={0:.1f},~x_v=${1:.2f}'.format(z,xfrac.mean()), size=18)

plt.subplot(121)
plt.title('Density contrast slice')
plt.pcolormesh(x, y, dens[0]/dens.mean()-1)
plt.xlabel('$x$ (Mpc)')
plt.ylabel('$y$ (Mpc)')
plt.colorbar()

plt.subplot(122)
plt.title('Ionisation fraction slice')
plt.pcolormesh(x, y, xfrac[0])
plt.xlabel('$x$ (Mpc)')
plt.ylabel('$y$ (Mpc)')
plt.colorbar()

plt.show()
../_images/examples_tutorials_11_0.png

21 cm brightness temperature

We can construct the 21 cm brightness temperature from the density field and ionisation fraction field using calc_dt. Due to the absence of zero baseline, the mean signal will be subtracted from each frequency channel. One can use subtract_mean_signal to add this effect.

[8]:
dT = t2c.calc_dt(xfrac, dens, z)
print('Mean of first channel: {0:.4f}'.format(dT[0].mean()))
Mean of first channel: 11.8259
[9]:
dT_subtracted = t2c.subtract_mean_signal(dT, 0)
print('Mean of first channel: {0:.4f}'.format(dT_subtracted[0].mean()))
Mean of first channel: -0.0000
[10]:
plt.rcParams['figure.figsize'] = [6, 5]

plt.title('21 cm signal')
plt.pcolormesh(x, y, dT_subtracted[0,:,:])
plt.xlabel('$x$ (Mpc)')
plt.ylabel('$y$ (Mpc)')
plt.colorbar(label='mK')

plt.show()
../_images/examples_tutorials_16_0.png

21 cm power spectrum

One of the most interesting metric to analyse this field is the power spectrum. Here we estimate the spherically average power spectrum using power_spectrum_1d function.

The function needs the length of the input_array in Mpc (or Mpc/h) through box_dims parameters. This is used to calculate the wavenumbers (k). The unit of the output k values will be 1/Mpc (or h/Mpc). If the input_array has unequal length in each direction, then one can provide box_dims with a list containing the lengths in each direction.

[11]:
box_dims = 244/0.7 # Length of the volume along each direction in Mpc.
[12]:
ps, ks = t2c.power_spectrum_1d(dT_subtracted, kbins=15, box_dims=box_dims)
[13]:
plt.rcParams['figure.figsize'] = [7, 5]

plt.title('Spherically averaged power spectrum.')
plt.loglog(ks, ps*ks**3/2/np.pi**2)
plt.xlabel('k (Mpc$^{-1}$)')
plt.ylabel('P(k) k$^{3}$/$(2\pi^2)$')

plt.show()
../_images/examples_tutorials_21_0.png

Redshift-space distortions

The 21 cm signal will be modified while mapping from real space to redshift space due to peculiar velocities (Mao et al. 2012).

The VelocityFile function is used to read the velocity files produced by CubeP3M. We need the velocities in km/s as a numpy array of shape (3,nGridx,nGridy,nGridyz), where the first axis represent the velocity component along x, y and z spatial direction. The get_kms_from_density attribute gives such a numpy array.

[14]:
v_file = t2c.VelocityFile(path_to_datafiles+'7.059v_all.dat')
[15]:
kms = v_file.get_kms_from_density(d_file)

The get_distorted_dt function will distort the signal.

[16]:
dT_rsd = t2c.get_distorted_dt(dT, kms, z,
                              los_axis=0,
                              velocity_axis=0,
                              num_particles=20)

Spherically averaged power spectrum of the 21 cm signal with RSD.

[17]:
ps_rsd, ks_rsd = t2c.power_spectrum_1d(dT_rsd, kbins=15, box_dims=box_dims)
[18]:
plt.rcParams['figure.figsize'] = [7, 5]

plt.title('Spherically averaged power spectrum.')
plt.loglog(ks, ps*ks**3/2/np.pi**2, label='no RSD')
plt.loglog(ks_rsd, ps_rsd*ks_rsd**3/2/np.pi**2, linestyle='--', label='RSD')

plt.xlabel('k (Mpc$^{-1}$)')
plt.ylabel('P(k) k$^{3}$/$(2\pi^2)$')
plt.legend()

plt.show()
../_images/examples_tutorials_31_0.png

We see in the above figure that the spherically averaged power spectrum has changed after RSD is implemented.

However, a better marker of RSD in 21 cm signal is the power spectrum’s \(\mu (\equiv k_\parallel/k)\) dependence (Jensen et al. 2013). The power spectrum of 21 cm signal with RSD will have the following dependence (Barkana & Loeb 2005),

\(P(k,\mu) = P_0 + \mu^2P_2 +\mu^4P_4\).

We can calculate \(P(k,\mu)\) using power_spectrum_mu function.

[19]:
Pk, mubins, kbins, nmode = t2c.power_spectrum_mu(
    dT,
    los_axis=0,
    mubins=8,
    kbins=15,
    box_dims=box_dims,
    exclude_zero_modes=True,
    return_n_modes=True,
    absolute_mus=False,
)
[20]:
Pk_rsd, mubins_rsd, kbins_rsd, nmode_rsd = t2c.power_spectrum_mu(
    dT_rsd,
    los_axis=0,
    mubins=8,
    kbins=15,
    box_dims=box_dims,
    exclude_zero_modes=True,
    return_n_modes=True,
    absolute_mus=False,
)
[21]:
plt.rcParams['figure.figsize'] = [7, 5]
ii = 8

plt.title('$k={0:.2f}$'.format(kbins[ii]))
plt.plot(mubins, Pk[:,ii], label='no RSD')
plt.plot(mubins_rsd, Pk_rsd[:,ii], linestyle='--', label='RSD')

plt.xlabel('$\mu$')
plt.ylabel('$P(k,\mu)$')
plt.legend()

plt.show()
../_images/examples_tutorials_35_0.png

Bubble size distribution

The bubble (HII regions) size distribution is an intersting probe of the reionization process (Giri et al. 2018).

Tools21cm contains three methods to determine the size distributions, which are Friends-of-friends, Spherical average and mean free path approach.

In this tutorial, we will take the ionisation fraction field and assume all the pixels with value \(>0.5\) as ionised.

[22]:
xHII = xfrac>0.5

boxsize = 244/0.7 # in Mpc

Mean free path (e.g. Mesinger & Furlanetto 2007)

[23]:
r_mfp, dn_mfp = t2c.mfp(xHII, boxsize=boxsize, iterations=1000000)
MFP method applied on 3D data (ver 1.0)
Initialising random rays...
...done
Estimating ray lengths...
 48%|███████████████████▋                     | 120/250 [00:06<00:07, 18.24it/s]
...done

Program runtime: 0.126151 minutes.
The output contains a tuple with three values: r, rdP/dr
The curve has been normalized.
[24]:
plt.rcParams['figure.figsize'] = [7, 5]

plt.semilogx(r_mfp, dn_mfp)

plt.xlabel('$R$ (Mpc)')
plt.ylabel('$R\mathrm{d}P/\mathrm{d}R$')
plt.title('Mean free path method')

plt.show()
../_images/examples_tutorials_41_0.png

Spherical average (e.g. Zahn et al. 2007)

[25]:
r_spa, dn_spa = t2c.spa(xHII, boxsize=boxsize, nscales=20)
100%|███████████████████████████████████████████| 20/20 [01:30<00:00,  4.51s/it]
[26]:
plt.rcParams['figure.figsize'] = [7, 5]

plt.semilogx(r_spa, dn_spa)

plt.xlabel('$R$ (Mpc)')
plt.ylabel('$R\mathrm{d}P/\mathrm{d}R$')
plt.title('Spherical Average method')
plt.show()
../_images/examples_tutorials_44_0.png

Friends of friends (e.g. Iliev et al. 2006)

[27]:
labelled_map, volumes = t2c.fof(xHII)
fof_dist = t2c.plot_fof_sizes(volumes, bins=30, boxsize=boxsize)
Program runtime: 0.015371 minutes.
The output is a tuple containing output-map and volume-list array respectively.
The output is Size, Size**2 dP/d(Size), lowest value
[28]:
plt.rcParams['figure.figsize'] = [7, 5]

plt.step(fof_dist[0], fof_dist[1])
plt.xscale('log')
plt.yscale('log')
plt.ylim(fof_dist[2],1)

plt.xlabel('$V$ (Mpc$^3$)')
plt.ylabel('$V^2\mathrm{d}P/\mathrm{d}V$')
plt.title('Friends of friends method')

plt.show()
../_images/examples_tutorials_47_0.png

Telescope noise calculation.

See Ghara et al. 2016 and Ghara et al. 2016 for more information

uv-coverage

We start with calculation of the uv-coverage for SKA1-Low configuration.

[29]:
uv, Nant = t2c.get_uv_daily_observation(ncells=dT.shape[0], # The number of cell used to make the image
                                        z=z,                # Redhsift of the slice observed
                                        filename=None,      # If None, it uses the SKA-Low 2016 configuration.
                                        total_int_time=6.0, # Observation per day in hours.
                                        int_time=10.0,      # Time period of recording the data in seconds.
                                        boxsize=boxsize,    # Comoving size of the sky observed
                                        declination=-30.0,
                                        verbose=True)
Making uv map from daily observations.
100%|███████████████████████████████████████| 2159/2159 [04:09<00:00,  8.65it/s]
...done

We suggest that you save the uv map as it is computationally expensive. Expecially when computed for an array of redshifts.

[30]:
np.save(path_to_datafiles+'uv_map.npy', uv)
np.save(path_to_datafiles+'Nant.npy', Nant)
[31]:
plt.rcParams['figure.figsize'] = [5, 5]

plt.title(r'$z=%.3f$ $\nu_{obs}=%.2f$ MHz' %(z, t2c.z_to_nu(z)))
plt.pcolormesh(x, y, np.log10(np.fft.fftshift(uv)))
plt.xlabel('u [$Mpc^-1$]'), plt.ylabel('v [$Mpc^-1$]');
../_images/examples_tutorials_53_0.png

Here we plot the antennas location of the SKA1-Low configuration.

[32]:
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from mpl_toolkits.axes_grid1.inset_locator import mark_inset

ska_ant = t2c.SKA1_LowConfig_Sept2016()

fig, ax = plt.subplots(figsize=(5, 5))
plt.plot(ska_ant[:,0], ska_ant[:,1], '.')
x1, x2, y1, y2 = 116.2, 117.3, -26.45, -27.25
ax.set_xlim(x1, x2)
ax.set_ylim(y1, y2)
ax.grid(b=True, alpha=0.5)

axins = inset_axes(ax, 1, 1, loc=4, bbox_to_anchor=(0.2, 0.2))
plt.plot(ska_ant[:,0], ska_ant[:,1], ',')
x1, x2, y1, y2 = 116.75, 116.78, -26.815, -26.833
axins.set_xlim(x1, x2)
axins.set_ylim(y1, y2)
axins.grid(b=True, alpha=0.5)
mark_inset(ax, axins, loc1=4, loc2=2, fc="none", ec="0.5")
axins.axes.xaxis.set_ticks([]);
axins.axes.yaxis.set_ticks([]);
../_images/examples_tutorials_55_0.png

Creating the noise cube

[33]:
noise_cube = t2c.noise_cube_coeval(ncells=dT.shape[0],
                                   z=z,
                                   depth_mhz=None,   #If None, estimates the value such that the final output is a cube.
                                   obs_time=1000,
                                   filename=None,
                                   boxsize=boxsize,
                                   total_int_time=6.0,
                                   int_time=10.0,
                                   declination=-30.0,
                                   uv_map=uv,
                                   N_ant=Nant,
                                   verbose=True,
                                   fft_wrap=False)
Creating the noise cube...
100%|████████████████████████████████████████| 250/250 [00:02<00:00, 120.25it/s]
[34]:
plt.rcParams['figure.figsize'] = [16, 6]

plt.suptitle('$z=%.3f$ $x_v=%.3f$' %(z, xfrac.mean()), size=18)
plt.subplot(121)
plt.title('noise cube slice')
plt.pcolormesh(x, y, noise_cube[0])
plt.colorbar(label='$\delta T^{noise}$ [mK]')
plt.subplot(122)
plt.title('noise distribution')
plt.hist(noise_cube.flatten(), bins=150, histtype='step');
plt.xlabel('$\delta T^{noise}$ [mK]'), plt.ylabel('$N_{sample}$');
../_images/examples_tutorials_58_0.png

Create realistic 21-cm observational images

To create realistict observational images we combined the 21-cm filed with the noise cube and smooth the data to the resolution corresponding to maximum baseline of 2 km. The smoothing works for any kind of data (e.g. xHII, dT, etc.) and also for single dish. In the former case the max_baseline should be substituted to D/122, where D is the antenna disk diameter.

[35]:
dT2 = dT_subtracted + noise_cube

dT_smooth = t2c.smooth_coeval(cube=dT2,    # Data cube that is to be smoothed
                              z=z,                  # Redshift of the coeval cube
                              box_size_mpc=boxsize, # Box size in cMpc
                              max_baseline=2.0,     # Maximum baseline of the telescope
                              ratio=1.0,            # Ratio of smoothing scale in frequency direction
                              nu_axis=2)            # frequency axis
100%|████████████████████████████████████████| 250/250 [00:00<00:00, 268.96it/s]
[36]:
plt.rcParams['figure.figsize'] = [15, 6]

plt.subplot(121)
plt.title('$\delta T_b$ at simulation resolution')
plt.pcolormesh(x, y, dT[0])
plt.colorbar(label='$\delta T_b$ [mK]')
plt.subplot(122)
plt.title('smoothed $\delta T_b + \delta T^{noise}_b$')
plt.pcolormesh(x, y, dT_smooth[0])
plt.colorbar(label='$\delta T_b$ [mK]')
[36]:
<matplotlib.colorbar.Colorbar at 0x7efefecf6370>
../_images/examples_tutorials_61_1.png

Image segmenation with Tools21cm

The Identification of HII regions in noisy 21-cm interferometric images is not trivial. Tools21cm contains two methods to identify (segmentation) ionised/neutral regions: the super-pixel method Giri et al. 2018 and the deep learning algorithm SegU-Net Bianco et al. 2022.

In this tutorial, we will take the ionisation fraction field and assume all the pixels with value \(>0.5\) as ionised.

The superpixel algorithm

We start by calculating the superpixel labels

[37]:
%%time
labels = t2c.slic_cube(cube=dT_smooth,
                       n_segments=5000,
                       compactness=0.1,
                       max_iter=20,
                       sigma=0,
                       min_size_factor=0.5,
                       max_size_factor=3,
                       cmap=None)
Estimating superpixel labels using SLIC...
The output contains the labels with 4911 segments
CPU times: user 3min 42s, sys: 375 ms, total: 3min 42s
Wall time: 3min 43s

Make the superpixel cube and plot a slice.

[38]:
superpixel_map = t2c.superpixel_map(dT_smooth, labels)

plt.rcParams['figure.figsize'] = [18, 5]
plt.subplot(131)
plt.pcolormesh(x, y, superpixel_map[0], cmap='jet')
plt.subplot(132)
plt.pcolormesh(x, y, dT_smooth[0], cmap='jet')
Estimating the superpixel mean map...
100%|████████████████████████████████████| 4911/4911 [00:00<00:00, 51612.08it/s]
...done
Constructing the superpixel map...
100%|███████████████████████████████████████| 4911/4911 [01:16<00:00, 64.19it/s]
[38]:
<matplotlib.collections.QuadMesh at 0x7efefecf3b50>
../_images/examples_tutorials_66_5.png

Stitch the superpixels using the superpixel PDFs and plot a slice from identified ionized region cube. Then smooth the true ionisation fraction field to compare.

[39]:
xHII_stitch = t2c.stitch_superpixels(data=dT_smooth,
                                     labels=labels,
                                     bins='knuth',
                                     binary=True,
                                     on_superpixel_map=True)

mask_xHI = t2c.smooth_coeval(xfrac, z, box_size_mpc=boxsize, max_baseline=2.0, nu_axis=2) < 0.5
Estimating the superpixel mean map...
100%|████████████████████████████████████| 4911/4911 [00:00<00:00, 57067.18it/s]
...done
Constructing the superpixel map...
100%|███████████████████████████████████████| 4911/4911 [01:21<00:00, 60.60it/s]
100%|████████████████████████████████████████| 250/250 [00:01<00:00, 194.97it/s]
[40]:
from sklearn.metrics import matthews_corrcoef

phicoef_sup = matthews_corrcoef(mask_xHI.flatten(), 1-xHII_stitch.flatten())

plt.rcParams['figure.figsize'] = [12, 6]
plt.subplot(121)
plt.title('Super-Pixel: $r_{\phi}=%.3f$' %phicoef_sup)
plt.pcolormesh(x, y, 1-xHII_stitch[0], cmap='jet')
plt.contour(mask_xHI[0], colors='lime', extent=[0, boxsize, 0, boxsize])
[40]:
<matplotlib.contour.QuadContourSet at 0x7eff00837ee0>
../_images/examples_tutorials_69_1.png

SegU-Net

Here, we load the best performing network and calculate the nubmer of manipulation on the cube to evaluate the pixel-uncertainty. Additionally, we can get pixel-error maps

  • tta = 0 : super-fast (approximatelly 7 sec), it tends to be a few percent less accurate (within 2% differences), no pixel-error map (no TTA manipulation)

  • tta = 1 : fast (about 17 sec), accurate, with pixel-error map (3 samples)

  • tta = 2 : slow (about 10 min), accurate, with pixel-error map (~100 samples)

[42]:
seg = t2c.segmentation.segunet21cm(tta=2, verbose=True)
tot number of (unique) manipulation we can do on a cube: 28
 Loaded model: /home/michele/codes/tools21cm/src/tools21cm/input_data/segunet_03-11T12-02-05_128slice_ep35.h5

In the current version of Tools21cm, the network operates on data cubes with a mesh size of \(128^3\). In future updates, we plan to extend to cover the mesh size of any given input. Moreover, SegU-Net is optimised for cube data with and spatial resolution of \(2\,Mpc\). However, it deals well with simulation with smaller resolution.

For the reason mentioned above, here we cut the differential brightness cube and Super-Pixel output in order to match SegU-Net input shape.

[43]:
dT_cut = dT_smooth[:128,:128,:128]
mask_xHI2 = mask_xHI[:128,:128,:128]

xHI_seg, xHI_seg_err = seg.prediction(x=dT_cut)
phicoef_seg = matthews_corrcoef(mask_xHI2.flatten(), xHI_seg.flatten())

xHI_sup = xHII_stitch[:128,:128,:128]
phicoef_sup = matthews_corrcoef(mask_xHI2.flatten(), 1-xHI_sup.flatten())
100%|███████████████████████████████████████████| 28/28 [19:38<00:00, 42.09s/it]
[46]:
fig, axs = plt.subplots(figsize=(12,6), ncols=3, sharey=True, sharex=True)
(ax0, ax1, ax2) = axs

ax0.set_title('Super-Pixel ($r_{\phi}=%.3f$)' %phicoef_sup)
ax0.imshow(1-xHI_sup[0], origin='lower', cmap='jet', extent=[0, boxsize, 0, boxsize])
ax0.contour(mask_xHI2[0], colors='lime', extent=[0, boxsize, 0, boxsize])
ax0.set_xlabel('x [Mpc]')

ax1.set_title('SegU-Net ($r_{\phi}=%.3f$)' %phicoef_seg)
ax1.imshow(xHI_seg[0], origin='lower', cmap='jet', extent=[0, boxsize, 0, boxsize])
ax1.contour(mask_xHI2[0], colors='lime', extent=[0, boxsize, 0, boxsize])
ax1.set_xlabel('x [Mpc]')

ax2.set_title('SegUNet Pixel-Error')
im = ax2.imshow(xHI_seg_err[0], origin='lower', cmap='jet', extent=[0, boxsize, 0, boxsize])
fig.colorbar(im, label=r'$\sigma_{std}$', ax=ax2, pad=0.02, cax=fig.add_axes([0.905, 0.25, 0.02, 0.51]))
ax2.set_xlabel('x [Mpc]')

plt.subplots_adjust(hspace=0.1, wspace=0.01)
for ax in axs.flat: ax.label_outer()
../_images/examples_tutorials_75_0.png