Date Tags skimage

In this post, we’ll see a way to segment nuclei in a confocal microscopy image stack using skimage

First import all we need

As I go on with the analysis, I usually add in the first cell all the import statements, rather than having them spread across the notebook. Once development is done, this will go into a script, and those import will already be sorted out.

I’m using the tifffile plugin (see http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html) for the I/O.

In [3]:
import skimage.io as io
io.use_plugin('tifffile')

from skimage.filter import threshold_otsu, threshold_adaptive, rank
from skimage.morphology import label
from skimage.measure import regionprops
from skimage.feature import peak_local_max
from scipy import ndimage
from skimage.morphology import disk, watershed
import pandas as pd

from mpl_toolkits.axes_grid1 import make_axes_locatable
from scipy.spatial import distance as dist
import scipy.cluster.hierarchy as hier

Load the image

Also, we set the pixel sizes for later. In a wonderfull world, this will be a OME XML file and we would read the metadata carrefully collected by the biologist who acquired the image directely from the file. Of course this is rarely the case.

In [5]:
image_stack = io.imread('../files/test_nuclei_stack.tif')
z_size, x_size, y_size = image_stack.shape

z_scale = 1.5 # µm per plane
xy_scale = 0.71 # µm per pixel

Look at the images

In [7]:
nrows = np.int(np.ceil(np.sqrt(z_size)))
ncols = np.int(z_size // nrows + 1)

fig, axes = plt.subplots(nrows, ncols, figsize=(3*ncols, 3*nrows))
for n in range(z_size):
    i = n // ncols
    j = n % ncols
    axes[i, j].imshow(image_stack[n, ...],
                      interpolation='nearest', cmap='gray')
    
## Remove empty plots 
for ax in axes.ravel():
    if not(len(ax.images)):
        fig.delaxes(ax)
fig.tight_layout()

Any analysis should start with a look at the histogram

In [8]:
fig, ax = plt.subplots(figsize=(8, 4))

ax.hist(image_stack.flatten(), log=True,
        bins=4096, range=(0, 4096))

_ = ax.set_title('Min value: %i \n'
                 'Max value: %i \n'
                 'Image shape: %s \n'
                 % (image_stack.min(),
                    image_stack.max(),
                    image_stack.shape))

Some brief comments on the histogram above. We see that the full 12 bits dynamic range is occupied, which is good. Yet, the much higher peak at the maximum value (4095) tells us some of the signal is saturated, which is not so good, we’ll see why later on. Lastly, the background doesn’t look like the expected background noise for fluorescence imagery, which should follow a Poisson distribution: $P(I) = I / \sigma_I^2 \exp{-I^2/\sigma_I^2} dI$, like the left graph bellow. What we see in the above is the result of an automated background correction, like in the roght graph bellow. So we don’t really deal with raw images…

In [11]:
intensities = np.arange(0, 4096)
background = 1000
dI = 1
proba = (intensities/background**2) * np.exp( - intensities**2 / background**2) * dI
proba /= proba.sum()
n_pixels = 512*512
proba *= n_pixels
proba = proba.astype(np.int)

fig, (ax0, ax1) = plt.subplots(1, 2, figsize=(9, 4))
ax0.step(intensities, proba, 'k')
ax0.set_title('Full histogram')
ax0.set_ylabel('Bin count')
ax0.set_xlabel('Intensity')

ax1.step(intensities[background:], proba[background:], 'k')
ax1.set_title('Background corrected histogram')
ax1.set_xlabel('Intensity')


fig.tight_layout()

Detection parameters

We usually need some settings, derived from what we now of the objects we are trying to segment. We’ll also normaly do some kind of smoothing, which will be specified as the pixel size of the filter kernel. So here we go:

In [12]:
smooth_size = 5 # pixels
min_radius = 4
max_radius = 20

Overview of the segmentation strategy:

In a nutt shell the idea is to (1) threshold, then (2) label the regions above threshold, then (3) segment the labeled regions. We’ll do this on each image plane. After that, we’ll cluster the resulting objects across the $z$ axis

Per plane segmentation

Of course this is always the difficult part. Here we use the threshold_otsu function from skimage. We compute an overall threshold over a maximum intensitiy projection along the $z$ axis, and use this threshold to label each stack individually. To ease the segmentation, we filter the image by first smoothing with a median filter and then a local contrast enhancement.

In [13]:
## Computing threshold on the maximum intensity projection with `threshold_otsu`
max_int_proj = image_stack.max(axis=0)
thresh_global = threshold_otsu(max_int_proj)

smoothed_stack = np.zeros_like(image_stack)
labeled_stack = smoothed_stack.copy()
## Labeling for each z plane:
for z, frame in enumerate(image_stack):
    smoothed = rank.median(frame, disk(smooth_size))
    smoothed = rank.enhance_contrast(smoothed, disk(smooth_size))
    smoothed_stack[z] = smoothed
    im_max = smoothed.max()
    thresh = thresh_global
    # thresh = threshold_otsu(smoothed)
    if im_max < thresh_global:
        labeled_stack[z] = np.zeros(smoothed.shape, dtype=np.int32)
    else:
        binary = smoothed > thresh
        #binary = threshold_adaptive(smoothed, block_size=smooth_size)
        
        distance = ndimage.distance_transform_edt(binary)
        local_maxi = peak_local_max(distance, min_distance=2*min_radius,
                                    indices=False, labels=smoothed)
        markers = ndimage.label(local_maxi)[0]
        labeled_stack[z] = watershed(-distance, markers, mask=binary)
/home/guillaume/python3/lib/python3.3/site-packages/scikit_image-0.10dev-py3.3-linux-x86_64.egg/skimage/filter/rank/generic.py:63: UserWarning: Bitdepth of 11 may result in bad rank filter performance due to large number of bins.
  "performance due to large number of bins." % bitdepth)

Result of the segmentation

In [14]:
fig, axes = plt.subplots(nrows, ncols*2, figsize=(3*ncols, 1.5*nrows))

for z in range(z_size):
    i = z // ncols
    j = z % ncols * 2
    axes[i, j].imshow(smoothed_stack[z, ...], interpolation='nearest', cmap='gray')
    axes[i, j+1].imshow(labeled_stack[z, ...], interpolation='nearest', cmap='Dark2')
    
    axes[i, j].set_xticks([])
    axes[i, j].set_yticks([])
    axes[i, j+1].set_xticks([])
    axes[i, j+1].set_yticks([])

## Remove empty plots 
for ax in axes.ravel():
    if not(len(ax.images)):
        fig.delaxes(ax)

        
fig.tight_layout()

a closer look:

In [15]:
fig, axes = plt.subplots(1, 2, figsize=(8, 16))

z = z_size // 2
axes[0].imshow(smoothed_stack[z, ...], interpolation='nearest', cmap='gray')
axes[1].imshow(labeled_stack[z, ...], interpolation='nearest', cmap='Dark2')
Out[15]:
<matplotlib.image.AxesImage at 0x7fd55d9b2110>

Computing the properties of the labeled regions:

We use skimage.measure handy function regionprops. For convinience, we store the computed properties in a pandas DataFrame object (this is particullarly usefulll if you have more images or more timepoints, and want to later on manipulate the collected data.

In [20]:
properties = []
columns = ('x', 'y', 'z', 'I', 'w')
indices = []
for z, frame in enumerate(labeled_stack):
    f_prop = regionprops(frame.astype(np.int),
                         intensity_image=image_stack[z])
    for d in f_prop:
        radius = (d.area / np.pi)**0.5
        if (min_radius  < radius < max_radius):
            properties.append([d.weighted_centroid[0],
                               d.weighted_centroid[1],
                               z, d.mean_intensity * d.area,
                               radius])
            indices.append(d.label)
if not len(indices):
    all_props = pd.DataFrame([], index=[])
indices = pd.Index(indices, name='label')
properties = pd.DataFrame(properties, index=indices, columns=columns)
properties['I'] /= properties['I'].max()

Here is what we collected:

In [21]:
properties.head()
Out[21]:
x y z I w
label
6 71.249178 40.507531 5 0.394966 5.808687
1 70.186552 39.377994 6 0.854559 8.156394
2 82.987933 50.992160 6 0.171050 4.583498
1 70.022599 38.560650 7 0.990137 8.758578
3 83.256457 50.622169 7 0.316493 5.262410

5 rows × 5 columns

Let’s look at what we collected, plotted over the segmented image

In [22]:
fig, axes = plt.subplots(nrows, ncols, figsize=(3*ncols, 3*nrows))

for z in range(z_size):
    plane_props = properties[properties['z'] == z]
    if not(plane_props.shape[0]) :
        continue
    i = z // ncols
    j = z % ncols
    axes[i, j].imshow(labeled_stack[z, ...],
                      interpolation='nearest', cmap='Dark2')
    axes[i, j].set_xticks([])
    axes[i, j].set_yticks([])
    x_lim = axes[i, j].get_xlim()
    y_lim = axes[i, j].get_ylim()    
    
    
    axes[i, j].scatter(plane_props['y'], plane_props['x'],  
                       s=plane_props['I']*200, alpha=0.4)
    axes[i, j].scatter(plane_props['y'], plane_props['x'],
                       s=40, marker='+', alpha=0.4)
    axes[i, j].set_xlim(x_lim)
    axes[i, j].set_ylim(y_lim)    
 

## Remove empty plots 
for ax in axes.ravel():
    if not(len(ax.images)):
        fig.delaxes(ax)

        
fig.tight_layout()

Bellow is another way to look at the segmented nuclei positions, this time over intensity projections of the original image.

In [23]:
fig = plt.figure(figsize=(12, 12))
colors = plt.cm.jet(properties.index.astype(np.int32))

# xy projection:
ax_xy = fig.add_subplot(111)
ax_xy.imshow(image_stack.max(axis=0), cmap='gray')
ax_xy.scatter(properties['y'],
              properties['x'], c=colors, alpha=1)

divider = make_axes_locatable(ax_xy)
ax_zx = divider.append_axes("top", 2, pad=0.2, sharex=ax_xy)
ax_zx.imshow(image_stack.max(axis=1), aspect=z_scale/xy_scale, cmap='gray')
ax_zx.scatter(properties['y'],
              properties['z'], c=colors, alpha=1)
ax_yz = divider.append_axes("right", 2, pad=0.2, sharey=ax_xy)
ax_yz.imshow(image_stack.max(axis=2).T, aspect=xy_scale/z_scale, cmap='gray')
ax_yz.scatter(properties['z'],
              properties['x'], c=colors, alpha=1)
plt.draw()

Clustering

Now that we have detected the nuclei in each plane, we want to regroup them across $z$ so that we have only one 3D position for each of the 4 (or is it 5?) nuclei.

We do so by applying a hierarchical clustering whith scipy.cluster module of the positions in the $(x, y)$ plane. As a parameter for the clustering, we quite naturally use the maximum radius we defined earlier.

In [24]:
positions = properties[['x', 'y']].copy()

dist_mat = dist.squareform(dist.pdist(positions.values))
link_mat = hier.linkage(dist_mat)
cluster_idx = hier.fcluster(link_mat, max_radius,
                            criterion='distance')
properties['new_label'] = cluster_idx
properties.set_index('new_label', drop=True, append=False, inplace=True)
properties.index.name = 'label'
properties = properties.sort_index()

Now the detected elements are regrouped by label:

In [25]:
properties.head()
Out[25]:
x y z I w
label
1 57.060095 52.985963 18 0.255932 5.556623
1 57.433341 52.431252 17 0.357211 5.698035
1 55.415167 54.268103 16 0.301234 5.232079
1 55.324279 54.091615 15 0.297468 5.382026
2 44.683366 49.372282 13 0.290805 4.982787

5 rows × 5 columns

Next we want to get the center of each cluster. To do so we run a weighted average of the positions for each cluster, using the measured intensity as weight.

In [26]:
def df_average(df, weights_column):
    '''Computes the average on each columns of a dataframe, weighted
    by the values of the column `weight_columns`.
    
    Parameters:
    -----------
    df: a pandas DataFrame instance
    weights_column: a string, the column name of the weights column 
    
    Returns:
    --------
    
    values: pandas DataFrame instance with the same column names as `df`
        with the weighted average value of the column
    '''
    
    values = df.copy().iloc[0]
    norm = df[weights_column].sum()
    for col in df.columns:
        try:
            v = (df[col] * df[weights_column]).sum() / norm
        except TypeError:
            v = df[col].iloc[0]
        values[col] = v
    return values
In [27]:
cell_positions = properties.groupby(level='label').apply(df_average, 'I')

So here is what we where looking for:

In [28]:
cell_positions
Out[28]:
x y z I w
label
1 56.335143 53.412561 16.471683 0.307242 5.474775
2 44.899683 49.921097 11.522186 0.419746 5.945169
3 82.425622 50.970181 11.551490 0.364361 5.626174
4 70.343025 37.816205 8.129519 0.794513 7.884338

4 rows × 5 columns

The final result

Let’s plot all that

In [29]:
fig = plt.figure(figsize=(12, 12))
colors = plt.cm.jet(properties.index.astype(np.int32))

# xy projection:
ax_xy = fig.add_subplot(111)
ax_xy.imshow(image_stack.max(axis=0), cmap='gray')
ax_xy.scatter(properties['y'],
              properties['x'],
              c=colors, alpha=0.2)


ax_xy.scatter(cell_positions['y'],
              cell_positions['x'],
              c='r', s=50, alpha=1.)


divider = make_axes_locatable(ax_xy)
ax_yz = divider.append_axes("top", 2, pad=0.2, sharex=ax_xy)
ax_yz.imshow(image_stack.max(axis=1), aspect=z_scale/xy_scale, cmap='gray')
ax_yz.scatter(properties['y'],
              properties['z'],
              c=colors, alpha=0.2)

ax_yz.scatter(cell_positions['y'],
              cell_positions['z'],
              c='r', s=50, alpha=1.)


ax_zx = divider.append_axes("right", 2, pad=0.2, sharey=ax_xy)
ax_zx.imshow(image_stack.max(axis=2).T, aspect=xy_scale/z_scale, cmap='gray')
ax_zx.scatter(properties['z'],
              properties['x'],
              c=colors, alpha=0.2)

ax_zx.scatter(cell_positions['z'],
              cell_positions['x'],
              c='r', s=50, alpha=1.)

plt.draw()

A brief final comment: The detection looks fine althou we only have 4 nuclei, and it well looks like (especially in the right pannel) two nuclei where merged in the process. I don’t really see how to avoid that with this method, maybe by searching for two local intensity maxima along the $z$ axis in each cluster (I’ve tried it), but this is not very noise resitant. Maybe a random walker segmentation in 3D could avoir this, but I must say I don’t really see how to implement that strategy. One of the difficulty with a random walker here is that as the data is saturated you don’t have well defined local maxima, but instead big regions at maximum value.


Comments

comments powered by Disqus