3D nuclei segmentation and HCR spot counting¶

In [1]:
#basic packages
import numpy as np
import pandas as pd
import glob
import os
from scipy import stats

#segmentation
import cellpose
from cellpose import models, core

#working with images
import tifffile
import nd2
import skimage.measure as measure
from skimage import morphology
from skimage.measure import label, regionprops
from skimage.morphology import disk
from skimage.feature import peak_local_max
from scipy.ndimage import gaussian_filter, gaussian_laplace

#visualization
import napari
import plotly.express as px  
import matplotlib.pyplot as plt
In [2]:
from cellpose import core
use_GPU = core.use_gpu()
print('>>> GPU activated? %d'%use_GPU)
>>> GPU activated? 1
In [3]:
imageloc = 'S:/micro/jbz/kd2200/mck/20230213_AT2_IMARE-118118_Blastocyst_Mutants/'
saveloc = imageloc + '3dSegmentation/'
dataloc = saveloc + 'Data/'
maskloc = saveloc + 'masks/'
plotloc = saveloc + 'Graphs/'
images = glob.glob(imageloc + '*.tif')

if not os.path.exists(saveloc): os.makedirs(saveloc)
if not os.path.exists(dataloc): os.makedirs(dataloc)
if not os.path.exists(maskloc): os.makedirs(maskloc)
if not os.path.exists(plotloc): os.makedirs(plotloc)
In [4]:
model = models.Cellpose(gpu=True, model_type='cyto')

Make Masks¶

This will work through a folder of similarly sized z stack tiff files and segment the nuclei with Cellpose. The masks will be saved in the specfied folder.

In [ ]:
for image in images:
    img= tifffile.imread(image)
    basename = image.split('\\')[-1][:-4]
    dapi = img[:,3,:,:] 
    smoothed = gaussian_filter(dapi, sigma=[1,2,2]) #a little bit of smoothing helps reduce the texture in the DAPI signal
    rslt3d = model.eval(dapi, channels=[0,0], diameter=60, z_axis=0, anisotropy=4, 
                    do_3D=True, batch_size=64, min_size=10000)[0]
    tifffile.imwrite(saveloc+basename + '_mask.tif', rslt3d.astype(np.uint16)) 

Examine some of the masks and see how well the parameters worked. insert one file name into the "basename" variable and check it out in napari.

In [ ]:
viewer=napari.Viewer()
In [ ]:
 
In [ ]:
 

Manually edit the masks¶

Cellpose does not perfectly segment the 3d dapi. However, it does a fairly good job. This section will open each image as you specify in "basename" and let you edit the mask in napari viewer. Napari has tutorials of this on youtube. This step can take a lot of time and slow down the analysis.

In [ ]:
basename = 'MUT_well01_Cdx2_Tomato_Ezr_blast04'
mask=tifffile.imread(maskloc+basename+'_mask.tif')
img = tifffile.imread(imageloc+basename+'.tif')
viewer.layers.clear()
viewer.add_image(img[:,0:4,:,:], channel_axis=1, colormap=['magenta','yellow','cyan','gray'], blending='additive', scale=[4,1,1])
viewer.add_labels(mask, opacity=0.4, scale = [4,1,1])

Here, work in napari viewer¶

take a look at the mask and make adjustments as necessary. Use the next cell to save your adjustments

In [ ]:
mask = viewer.layers['mask'].data #retrieve the edited mask data from napari
tifffile.imwrite(saveloc+basename+'_mask.tif', mask.astype(np.uint16)) 
In [ ]:
#Dilate Masks.
#I have a pixel width of 4 for dilation. Adjust as necessary for your measurements
dmask = np.zeros(mask.shape)
for i in range(mask.shape[0]):
    dmask[i] = morphology.dilation(mask[i], disk(4))
    
dmask = dmask.astype('int')
viewer.add_labels(dmask)
In [ ]:
tifffile.imwrite(saveloc+basename+'_mask.tif', mask.astype(np.uint16))
tifffile.imwrite(saveloc+basename+'_Dilatedmask.tif', dmask.astype(np.uint16))
In [ ]:
 

HCR dots¶

We have found that finding HCR dots and counting gives a more reliable measure of a cells expression level of a gene over an intensity measurement. Change of labeling "goodness" between samples, autofluorescence, different z-depths into the tissue can all change the intensity. Counting local maxima gives a more consistent threshold between samples.

In [ ]:
def find_peaks(img, thresh):
    gaussian_blur = gaussian_filter(img, sigma=[2,5,5])
    back_sub = img - 0.9*gaussian_blur
    LoG = -gaussian_laplace(back_sub/1000.0, sigma=[0.1,0.5,0.5]) #the dividide is b/c a really bright thing will throw off the visualization of the log
    peaks = peak_local_max(LoG, min_distance=1, threshold_abs=thresh)#peaks is in order of like intensity or something. not xyz
    return peaks

def getPeaksSum(mask, img, peaks):
    dots = np.zeros(img.shape)
    dots[peaks[:,0],peaks[:,1],peaks[:,2]] =1  
    img_props = measure.regionprops(mask,dots)
    x=[]
    a=[]
    labs = []

    x = [i['mean_intensity'] for i in img_props]
    a= [i['area'] for i in img_props]
    labs = [i['label'] for i in img_props]
    df = pd.DataFrame(data={'meanIntensity':x, 'area':a, 'labels':labs})
    df['sum'] = df['meanIntensity']*df['area']
    return df

def colorarray(column):
    return (column - column.min())/(column - column.min()).max()

peaks of HCR signal for each channel.¶

I have not batch processed through a list of images. This will work for one image at a time.

A different threshold is set for each channel, but the same threshold is used across all samples

In [ ]:
peaksCdx = find_peaks(img[:,0,:,:],11)
peaksTom = find_peaks(img[:,1,:,:], 15)
peaksEzr = find_peaks(img[:,2,:,:],15)

Take a look at the found peaks.

In [ ]:
viewer.layers.clear()
viewer.add_image(img[:,0:4,:,:], channel_axis=1, colormap=['magenta','yellow','cyan','gray'], blending='additive')

viewer.add_points(peaksCdx, size=2)
viewer.add_points(peaksTom, size=2, face_color='blue' )
viewer.add_points(peaksEzr, size=2, face_color='magenta' )

save the peaks data

In [ ]:
dfpeakscdx2 = pd.DataFrame(data = peaksCdx, columns = ['z','x','y'])
dfpeakscdx2.to_csv(saveloc+basename+'_cdx2peaks.csv')
dfpeaksezr = pd.DataFrame(data = peaksEzr, columns = ['z','x','y'])
dfpeaksezr.to_csv(saveloc+basename+'_Ezrpeaks.csv')
dfpeakstom = pd.DataFrame(data = peaksTom, columns = ['z','x','y'])
dfpeakstom.to_csv(saveloc+basename+'_Tompeaks.csv')
del dfpeakscdx2, dfpeaksezr, dfpeakstom

Count how many peaks appear inside individual cells from the dilated mask image

In [ ]:
Dots = pd.merge(getPeaksSum(dmask, img[:,0,:,:], peaksCdx), getPeaksSum(dmask,img[:,1,:,:], peaksTom), left_on='labels', right_on='labels')
Dots = pd.merge(Dots, getPeaksSum(dmask,img[:,2,:,:], peaksEzr), left_on='labels', right_on='labels')
Dots = Dots.rename(columns = {'meanIntensity_x':'Cdx2_mean','sum_x':'Cdx2',
                             'meanIntensity_y':'Tom_mean','sum_y':'Tom',
                             'meanIntensity':'Ezr_mean','sum':'Ezr'})
Dots = Dots.drop(columns = ['area_x','area_y'])
In [ ]:
Dots.head()
In [ ]:
Dots.to_csv(dataloc+basename+'_IntensityByDots.csv')

Categorization¶

at this point, i'm assuming you have gone through each of the files and made csv documents for each blastocyst in a folder.

The next part opens all the _IntensityByDots.csv files in a folder, compiles them and starts to make some graphs to help you find thresholds and categorize the cells as inner cell mass or trophoectoderm.

In [5]:
sortdict = {'ICM':0,'Tomato ICM':1,'Cdx2':2,'Tomato+Cdx2':3}
In [6]:
files = glob.glob(dataloc + '*IntensityByDots.csv')
df = pd.DataFrame(columns = ['Cdx2_mean','labels','Cdx2','Tom_mean',
                             'Tom','Ezr_mean','area','Ezr','TomOn','Cdx2On','Type','ColorTab'])

for f in files:
    basename = f.split('\\')[-1][:-20]
    df1 = pd.read_csv(f)
    df1 = df1.drop(columns = ['Unnamed: 0'])
    df1['file'] = basename
    df1['Category'] = basename.split('_')[0]
    df1['sample'] = int(basename.split('_')[-1][-2:])
    
    df = pd.concat([df,df1]).reset_index(drop=True)


df = df.sort_values(by=['file','labels'])
C:\Users\mck\AppData\Local\Temp\ipykernel_10528\3219655425.py:13: FutureWarning: In a future version, object-dtype columns with all-bool values will not be included in reductions with bool_only=True. Explicitly cast to bool dtype instead.
  df = pd.concat([df,df1]).reset_index(drop=True)
In [7]:
df = df[df['area']>500]
# I have included this size filter just in case there are any 
#tiny fragments of a DAPI still in any of the files that shouldn't count as a cell
In [8]:
fig = px.scatter(df, x='Tom',y='Cdx2', color = 'Ezr', hover_data=['labels'])
fig.update_traces(
    marker=dict(size=8))
#fig.write_image(plotloc+ basename+'Scatter_ByHCRDots.png')
fig.show()
In [9]:
px.histogram(df, x='Tom')
In [10]:
px.histogram(df, x='Cdx2')

pick threshold and categorize cells

In [11]:
# make sure everyone is on the same threshold
Cdx2Thres = 5
TomThres = 15
df['TomOn']= df['Tom']>TomThres
df['Cdx2On'] =df['Cdx2']>Cdx2Thres


df['Type'] = 'ICM'
df['ColorTab'] =0

temp = df[(df['TomOn']) & (df['Cdx2On']==False)].copy()
df.loc[temp.index, 'Type']='Tomato ICM'
df.loc[temp.index,'ColorTab'] = 1

temp = df[(df['TomOn']) & (df['Cdx2On'])].copy()
df.loc[temp.index,'Type']='Tomato+Cdx2'
df.loc[temp.index,'ColorTab'] = 3

temp = df[(df['TomOn']==False) & (df['Cdx2On'])].copy()
df.loc[temp.index, 'Type']='Cdx2'
df.loc[temp.index,'ColorTab'] = 2

df.to_csv(dataloc+'AllCellData.csv')
In [12]:
df.head()
Out[12]:
Cdx2_mean labels Cdx2 Tom_mean Tom Ezr_mean area Ezr TomOn Cdx2On ... ColorTab file Category sample Cdx2_intensity Tom_intensity Ezr_intensity z x y
0 0.002147 1 108.0 0.000040 2.0 0.000994 50307 50.0 False True ... 2 MUT_well01_Cdx2_Tomato_Ezr_blast01 MUT 1.0 NaN NaN NaN NaN NaN NaN
1 0.001815 2 60.0 0.000121 4.0 0.001210 33062 40.0 False True ... 2 MUT_well01_Cdx2_Tomato_Ezr_blast01 MUT 1.0 NaN NaN NaN NaN NaN NaN
2 0.002188 3 127.0 0.000052 3.0 0.002154 58042 125.0 False True ... 2 MUT_well01_Cdx2_Tomato_Ezr_blast01 MUT 1.0 NaN NaN NaN NaN NaN NaN
3 0.001234 4 72.0 0.000069 4.0 0.001388 58346 81.0 False True ... 2 MUT_well01_Cdx2_Tomato_Ezr_blast01 MUT 1.0 NaN NaN NaN NaN NaN NaN
4 0.003208 5 169.0 0.000152 8.0 0.001158 52681 61.0 False True ... 2 MUT_well01_Cdx2_Tomato_Ezr_blast01 MUT 1.0 NaN NaN NaN NaN NaN NaN

5 rows × 21 columns

In [ ]:
 

Visualization in Napari¶

Making colorful masks indicating transcript levels or category of cell

In [ ]:
 
In [ ]:
basename = 'WT_well01_Cdx2_Tomato_Ezr_blast04'
viewer.layers.clear()
img = tifffile.imread(imageloc + basename+'.tif')
mask = tifffile.imread(maskloc + basename + '_mask.tif')
df = pd.read_csv(dataloc + basename + '_IntensityByDots.csv')
In [ ]:
viewer.layers.clear()
viewer.add_image(img[:,0:4,:,:], channel_axis=1, colormap=['magenta','yellow','cyan','gray'], blending='additive')
In [ ]:
colors = plt.cm.get_cmap('viridis')(colorarray(df.Tom))
colormap = dict(zip(df.labels.to_numpy(),colors))
label_layer = viewer.add_labels(mask, name = 'TomCount')
label_layer.color = colormap

colors = plt.cm.get_cmap('viridis')(colorarray(df.Cdx2))
colormap = dict(zip(df.labels.to_numpy(),colors))
label_layer = viewer.add_labels(mask, name = 'Cdx2Count')
label_layer.color = colormap
In [ ]:
colors = plt.cm.get_cmap('viridis')(df.TomOn*255)
colormap = dict(zip(df.labels.to_numpy(),colors))
label_layer = viewer.add_labels(dmask, name = 'TomOn')
label_layer.color = colormap

colors = plt.cm.get_cmap('viridis')(df.Cdx2On*255)
colormap = dict(zip(df.labels.to_numpy(),colors))
label_layer = viewer.add_labels(mask, name = 'Cdx2On')
label_layer.color = colormap
In [ ]:
 

Statistical Graphs¶

In [13]:
x=df.groupby(['file', 'Type','Category'])['Ezr'].mean()
r = x.reset_index(level=[0])#.rename(columns={'letters':'count'})
r=r.reset_index(drop=False)

r['sort']=pd.Categorical(r['Type'], sortdict)
r = r.sort_values(by = 'Category', ascending=True)
r = r.sort_values(by='sort')

#r=r.sort_values(by=['Category'])
fig=px.box(r, x='Category', y='Ezr',color='Type', points='all', template='plotly_white',
          hover_data = ['file'], width=800, height=500, )
fig.update_layout(
    font_size=20,
    xaxis_title="Average per cell type per blastocyst",
    yaxis_title="Ezr Count",
    showlegend=True)
fig.show()

#fig.write_html(plotloc + 'Ezrnumbers2.html')
#fig.write_image(plotloc + "Ezrnumbers2.pdf")
In [14]:
rvs1 = r[(r['Type']=='Cdx2') & (r['Category']=='MUT')].Ezr
rvs2 = r[(r['Type']=='Tomato+Cdx2') & (r['Category']=='MUT')].Ezr
stats.ttest_ind(rvs1, rvs2)
Out[14]:
Ttest_indResult(statistic=-4.1323042565999035, pvalue=0.003288011129478401)
In [15]:
rvs1 = r[(r['Type']=='Cdx2') & (r['Category']=='WT')].Ezr
rvs2 = r[(r['Type']=='Tomato+Cdx2') & (r['Category']=='WT')].Ezr
stats.ttest_ind(rvs1, rvs2)
Out[15]:
Ttest_indResult(statistic=-0.5653393628276684, pvalue=0.584303070702666)
In [16]:
x=df.groupby(['file', 'Type','Category'])['Cdx2'].mean()
r = x.reset_index(level=[0])#.rename(columns={'letters':'count'})
r=r.reset_index(drop=False)
r['sort']=pd.Categorical(r['Type'], sortdict)
r=r.sort_values(by='Category',ascending=True)
r = r.sort_values(by='sort')

fig=px.box(r, x='Category', y='Cdx2',color='Type', points='all', template='plotly_white',
          hover_data = ['file'])
fig.update_layout(
    font_size=18,
    xaxis_title="Average per cell type per blastocyst",
    yaxis_title="Cdx2 Count",
    showlegend=True)
fig.show()
fig.write_html(plotloc + 'Cdx2numbers2.html')
fig.write_image(plotloc + "Cdx2numbers2.pdf")
In [17]:
rvs1 = r[(r['Type']=='Cdx2') & (r['Category']=='MUT')].Cdx2
rvs2 = r[(r['Type']=='Tomato+Cdx2') & (r['Category']=='MUT')].Cdx2
stats.ttest_ind(rvs1, rvs2)
Out[17]:
Ttest_indResult(statistic=0.4214450633217043, pvalue=0.6845257312038829)
In [18]:
rvs1 = r[(r['Type']=='Cdx2') & (r['Category']=='WT')].Cdx2
rvs2 = r[(r['Type']=='Tomato+Cdx2') & (r['Category']=='WT')].Cdx2
stats.ttest_ind(rvs1, rvs2)
Out[18]:
Ttest_indResult(statistic=0.8720736614342207, pvalue=0.40362059166513675)
In [19]:
from plotly.subplots import make_subplots
import plotly.graph_objects as go
def semcathy(array):
    return stats.sem(array)
In [20]:
x=df.groupby(['file', 'Type','Category'])['Cdx2'].mean()
r = x.reset_index(level=[0])
r2=r.groupby(['Type','Category'])['Cdx2'].mean().reset_index(drop=False)
r2e = r.groupby(['Type','Category'])['Cdx2'].apply(semcathy).reset_index(drop=False).rename(columns={'Cdx2':'SEM'})
r2 = pd.merge(r2, r2e, left_on=['Type','Category'],right_on=['Type','Category'])
r2['sort']=pd.Categorical(r2['Type'], sortdict)
r2=r2.sort_values(by='Category',ascending=True)
r2 = r2.sort_values(by='sort')
r2wt = r2[r2['Category']=='WT']
r2mut = r2[r2['Category']=='MUT']
In [21]:
fig = make_subplots(rows=1, cols=2, shared_yaxes=True)
fig.add_trace(go.Bar(x=r2wt['Type'],y=r2wt['Cdx2'],
             error_y=dict(type='data', array = r2wt.SEM.values)), row=1, col=1)
fig.add_trace(go.Bar(x=r2mut['Type'],y=r2mut['Cdx2'],
             error_y=dict(type='data', array = r2mut.SEM.values)), row=1, col=2)
fig.update_layout(template='plotly_white', font_size=18,
    xaxis_title="Average per cell type per blastocyst",
    yaxis_title="Cdx2 Count",
    showlegend=False)
fig.write_image(plotloc + 'Cdx2Bars.pdf')
fig.show()
In [22]:
x=df.groupby(['file', 'Type','Category'])['Ezr'].mean()
r = x.reset_index(level=[0])
r2=r.groupby(['Type','Category'])['Ezr'].mean().reset_index(drop=False)
r2e = r.groupby(['Type','Category'])['Ezr'].apply(semcathy).reset_index(drop=False).rename(columns={'Ezr':'SEM'})
r2 = pd.merge(r2, r2e, left_on=['Type','Category'],right_on=['Type','Category'])
r2['sort']=pd.Categorical(r2['Type'], sortdict)
r2=r2.sort_values(by='Category',ascending=True)
r2 = r2.sort_values(by='sort')
r2wt = r2[r2['Category']=='WT']
r2mut = r2[r2['Category']=='MUT']
In [23]:
fig = make_subplots(rows=1, cols=2, shared_yaxes=True)
fig.add_trace(go.Bar(x=r2wt['Type'],y=r2wt['Ezr'],
             error_y=dict(type='data', array = r2wt.SEM.values)), row=1, col=1)
fig.add_trace(go.Bar(x=r2mut['Type'],y=r2mut['Ezr'],
             error_y=dict(type='data', array = r2mut.SEM.values)), row=1, col=2)
fig.update_layout(template='plotly_white', font_size=18,
    xaxis_title="Average per cell type per blastocyst",
    yaxis_title="Ezr Count",
    showlegend=False)
fig.write_image(plotloc + 'EzrBars.pdf')
fig.show()
In [ ]:
 
In [ ]:
 
In [ ]: