Page MenuHomec4science

IOPLaTeXGuidelines.tex
No OneTemporary

File Metadata

Created
Wed, May 1, 11:34

IOPLaTeXGuidelines.tex

\documentclass[12pt]{iopart}
%Uncomment next line if AMS fonts required
\usepackage{iopams}
\usepackage{amssymb}
\usepackage{amsfonts}
\usepackage{graphics}
\usepackage{amsthm}
%\usepackage{amsmath}
\usepackage{graphicx}
\usepackage{amssymb}
\usepackage{pifont}
\usepackage[]{algorithm2e}
\usepackage{pdflscape}
\usepackage{floatpag,mwe}
\usepackage{changepage}
\usepackage{adjustbox}
\usepackage{parskip}
\usepackage{harvard}
\bibliographystyle{dcu}
\begin{document}
\title[Sparse coupled logistic regression for dynamic FC mapping]{Sparse coupled logistic regression to estimate co-activation and modulatory influences of brain regions}
\author{Thomas A. W. Bolton$^{1,2}$ \& Dimitri Van De Ville$^{1,2}$}
\address{$^1$ Institute of Bioengineering, \'{E}cole Polytechnique F\'{e}d\'{e}rale de Lausanne (EPFL), Lausanne, Switzerland \\ $^2$ Department of Radiology and Medical Informatics, University of Geneva (UNIGE), Geneva, Switzerland}
\ead{thomas.bolton@epfl.ch}
\vspace{10pt}
\begin{indented}
\item[]November 2019
\end{indented}
\begin{abstract}
This abstract describes my work.
Essentially, it's awesome.
You have to accept it.
\end{abstract}
\vspace{2pc}
\noindent{\it Keywords}: dynamic functional connectivity, effective connectivity, logistic regression, $\ell_1$ regularisation
% Uncomment for Submitted to journal title message
\submitto{\JNE}
% Uncomment if a separate title page is required
%\maketitle
% For two-column output uncomment the next line and choose [10pt] rather than [12pt] in the \documentclass declaration
%\ioptwocol
\clearpage
%%%%%%%%%%% INTRODUCTION
\section{Introduction}
Understanding the structural wiring of the brain at its most global scale, and how information flows between remote processing centres, are essential questions to shed light on higher-order behaviours involving multi-modal integration and associated brain disorders. When it comes to functional magnetic resonance imaging (fMRI), the mapping of brain function is commonly performed from resting-state (RS) recordings through the computation of \textit{functional connectivity} (FC), that is, the statistical interdependence between different time courses reflective of regional activity~\cite{Friston1994b}, as can be assessed from an array of measures~\cite{Smith2010}. This approach has revealed the presence of a set of RS networks (RSNs)~\cite{Damoiseaux2006,Power2010,Yeo2011}, whose properties are critical landmarks of brain function and cognition~\cite{Bressler2010,VanDenHeuvel2010}.
Over the past decade, it has become increasingly clear that quantifying FC between two brain regions as one scalar for a full scanning session is an overly simplistic approach that does not characterise the numerous reconfigurations that occur at the time scale of seconds~\cite{Chang2010}. Accordingly, many methodological pipelines have been developed to dig into time-resolved FC, and map brain function dynamically (see~\cite{Preti2017,Lurie2018} for contemporary reviews).
The most notorious family of dynamic approaches simplifies the originally voxel-wise fMRI data into a state-level representation: first, FC is computed over successive temporal sub-windows, and the concatenated data across the full subject population at hand is subjected to hard clustering to yield a set of dynamic FC (dFC) states~\cite{Allen2014,Damaraju2014}. Because spatial Independent Component Analysis (ICA) is typically performed prior to clustering, each state stands for a set of RSNs showing specific correlational relationships.
In other analytical schemes, whole-brain voxelwise activity~\cite{Liu2013b}, or activity transients~\cite{Karahanoglu2015}, undergo clustering instead of FC patterns; in this case, each of the retrieved centroids directly stands for an RSN. If temporal ICA is applied after spatial ICA, temporally mutually independent RSNs are retrieved~\cite{Smith2012}. Finally, the use of a hidden Markov model (HMM) also enables to derive RSNs, as represented under the form of (sparse) FC patterns~\cite{Eavani2013,Vidaurre2017} or vectors of activation~\cite{Chen2016d}.
In all the above cases, there is the underlying assumption that the raw fMRI data can be downscaled to a set of RSNs, and that the dynamics of brain function should be understood from this simplified starting point. Recent results, however, question the validity of this assumption: for instance, some brain regions do not remain attached to the same network throughout a scanning session, but instead adjust their modular allegiance over time in a way that relates to cognitive performance~\cite{Chen2016,Pedersen2018b}. In addition, brain regions or networks also morph spatially over time~\cite{Kiviniemi2011,Kottaram2018,Iraji2019}.
To capture these spatially more subtle reconfigurations, novel methodologies have attempted to operate at the regional scale, and the assessment of \textit{causal} relationships (\textit{i.e.}, from time $t$ to $t+1$) between distinct areas showed particular merits as an alternative conceptualisation of RS functional brain dynamics, be it through autoregressive models~\cite{Liegeois2017,Lennartz2018} or Ornstein-Uhlenbeck processes~\cite{Gilson2016}.
At present, there are thus two conceptually discrepant ways to view RS dFC: on the one hand, expressing it as sets of simultaneously activating regions that make networks, and on the other hand, viewing it as effective connectivity between individual areas. It remains to be determined which of these two viewpoints offers the best representation of RS dynamics, and whether they describe overlapping or distinct facets of the data.
In this work, we have attempted to progress in answering these questions by developing a novel methodological framework that jointly estimate sets of co-activations, and causal couplings, between individual brain regions. A dedicated parameter also enables to modulate the trade-off in data fitting between these two viewpoints.
\clearpage
%%%%%%%%%%% METHODS
\section{Materials and Methods}
\subsection{Mathematical framework}
Let us denote the activity of a region $r$ (out of $R$ in total) at time $t$ as $h_t^{(r)}$. We hypothesise two possible states of activity: \textit{baseline} ($h_t^{(r)}=0$) or \textit{active} ($h_t^{(r)}=+1$). Each region may interact with all the other areas $s\neq r$ in two ways: (1) showing simultaneous activity (that is, episodes of co-activation), or (2) being causally modulated. To jointly describe these two phenomena, we characterise the probability of a region $r$ to switch between activity states as a logistic regression~\cite{Friedman2010}:
\begin{equation}
\left\{
\begin{array}{ll}
\mathcal{P}(h_{t+1}^{(r)}=+1 | h_{t}^{(r)}=0, \mathbf{h}_{t}^{\mathbf{(-r)}}, \mathbf{h}_{t+1}^{\mathbf{(-r)}}) = \frac{1}{1 + e^{-(\alpha_A^{(r)} + \boldsymbol{\gamma}_{A}^{(r)\top} \mathbf{h}_{t+1}^{\mathbf{(-r)}} + \boldsymbol{\beta}_{A}^{(r)\top} \mathbf{h}_{t}^{\mathbf{(-r)}})}}\\
\mathcal{P}(h_{t+1}^{(r)}=0 | h_{t}^{(r)}=+1, \mathbf{h}_{t}^{\mathbf{(-r)}}, \mathbf{h}_{t+1}^{\mathbf{(-r)}}) = \frac{1}{1 + e^{-(\alpha _D^{(r)}+ \boldsymbol{\gamma}_{D}^{(r)\top} \mathbf{h}_{t+1}^{\mathbf{(-r)}} + \boldsymbol{\beta}_{D}^{(r)\top} \mathbf{h}_{t}^{\mathbf{(-r)}})}}
\end{array}.
\right.
\end{equation}
The baseline-to-active transition is modelled by the first equation, while the return to baseline from an active state is governed by the second. Associated coefficients are respectively written with the $\cdot_{A}$ and $\cdot_{D}$ subscripts. In what follows, for the sake of clarity, we will omit these subscripts and only show one set of equations, as the formulations are strictly equivalent for both types of transitions.
If all other regions are at a baseline level of activity at the start ($\mathbf{h}_{t}^{\mathbf{(-r)}}=\mathbf{0}$) and end ($\mathbf{h}_{t+1}^{\mathbf{(-r)}}=\mathbf{0}$) of the transition, only the scalar coefficient $\alpha^{(r)}$ plays a role in shaping the transition likelihood. The vector $\boldsymbol{\gamma}^{(r)}\in\mathbb{R}^{R-1}$ contains the co-activation coefficients for all regions $s\neq r$: positive-valued coefficients will enhance the likelihood of the transition of interest if $h_{t+1}^{(s)}=+1$ (that is, if regions $r$ and $s$ are co-active at time $t+1$). Negative-valued coefficients will, likewise, reduce the transition probability. The reasoning is similar for the vector $\boldsymbol{\beta}^{(r)}\in\mathbb{R}^{R-1}$, except that a modulatory effect is then exerted if $h_t^{(s)}=+1$ (\textit{i.e.}, region $s$ is active before the transition, resulting in a causal modulation instead of a co-activation).
If the above pair of equations is considered for each brain region, the resulting coefficients can be arranged in two types of matrices, where the $r$\textsuperscript{th} column contains the influences onto region $r$ (diagonal elements are left empty): one type is reflective of co-activations, which we be termed $\mathbf{\Gamma}$, and one symbolises causal modulations, and will be referred to as $\mathbf{B}$. $\mathbf{\Gamma}$ and $\mathbf{B}$ can respectively be interpreted as equivalents of the functional connectome and effective connectome. An overview of our framework is provided in Figure~\ref{JNE_F1}.
%%%%%%%%%%%%%
\begin{figure}[h!]
\centering
\includegraphics[width=1.0\textwidth]{Figures/JNE_FIG1.eps}
\caption[Overview of the framework]{\textbf{Overview of the framework.} \textbf{(A)} Example activity time courses for a set of 14 regions; each can transit between a baseline state of activity (symbolised by a grey circle) and an active state (red circle). The green, salmon and blue underlays highlight the regions that belong to the same RSN, and thus exhibit a similar transitory dynamics. Regions 10 to 12 evolve according to their own dynamics, which are independent from all the others. As for regions 13 and 14, they are \textit{hubs} that belong to two networks at a time (as rendered by the mixed colour underlay), and thus turn active as soon as one of their affiliated networks does so. \textbf{(B)} Coefficient matrices associated to the example presented in {(A)} for co-activations (top row) and causal modulations (bottom row). The left column pertains to the transition from the baseline to the active state: a positive-valued coefficient at $l,r$ means that when region $l$ is active, it enhances the likelihood of a transition for region $r$ at the same time point (for co-activations) or one time point later (for causal modulations). The middle column similarly characterises transitions from the active to the baseline state; thus, modulations that enhance the overall activity of an area are here reflected by negative-valued coefficients (\textit{i.e.}, the probability to go down in activity is lowered). The right column yields total influences summed across both transition types. \textbf{(C)} To solve the framework, optimal regularisation parameters $\lambda$ and $\xi$ are first determined by extracting local maxima of the full likelihood across regions and transition types (top box). Then, co-activation and causal coefficients are computed for each region $r$ and transition type (middle box). Finally, the likelihood to switch activity state can be compared with and without an external region's influence, to compute a pair-wise probabilistic modulation coefficient (bottom box). R: region. N: network.}
\label{JNE_F1}
\end{figure}
%%%%%%%%%%%%%
The concomitant modelling of co-activations and causal modulations enables to jointly derive the two sets of coefficients. Given the fact that the resting brain is often described as a series of RSNs~\cite{Damoiseaux2006,Power2010,Yeo2011}, we expect $\boldsymbol{\Gamma}$ to only contain a sparse subset of non-null entries. Similarly, only a restricted amount of areas or networks are expected to causally modulate each other~\cite{Christoff2016,Bolton2017b}. To fit these neurobiological priors, while also enabling convergence of the framework with fewer data points, we appended an $\ell_1$ regularisation term:
\begin{equation}
\xi ||\boldsymbol{\gamma}^{(r)}||_1 + (1-\xi) ||\boldsymbol{\beta}^{(r)}||_1 < \rho \quad \forall \quad r = 1,...,R.
\end{equation}
In the above, the parameter $\rho$ controls the extent of regularisation casted on all coefficients (it is associated to an inversely proportional parameter $\lambda$ in the optimisation equation detailed below). The parameter $\xi$ enables to balance the extent with which the co-activation and causal sets are regularised: if $\xi=0$, regularisation only operates on causal coefficients, while if $\xi=1$, only co-activation coefficients are made sparse. This respectively amounts to a description of regional brain dynamics where co-activations, or causal influences, dominate.
\subsection{Implementation}
Solving the above set of coupled logistic regression equations requires that the activity levels of all regions be known. To binarise the input time courses, we individually z-score each, and set to 1/0 the time points with a value above/below 0. While binarisation may remove part of the insightful information from the original data, it has been used in recently developed methodological pipelines~\cite{Kang2019}. In the discussion, we touch upon possibilities to make the framework amenable to a case with more than 2 states of activity.
After defining the activation states, initial parameter estimates can be computed. Co-activation and modulatory coefficients are all set to 0, and intrinsic transition probabilities are estimated by a standard HMM approach~\cite{Rabiner1989}.
Following~\cite{Friedman2010}, in a regularised logistic regression, one attempts to solve the following:
\begin{equation}
\min_{\alpha^{(r)},\boldsymbol{\gamma}^{(r)},\boldsymbol{\beta}^{(r)}} -\mathcal{L}^{(r)}(\alpha^{(r)},\boldsymbol{\gamma}^{(r)},\boldsymbol{\beta}^{(r)}) + \lambda (\xi ||\boldsymbol{\gamma}^{(r)}||_1 + (1-\xi) ||\boldsymbol{\beta}^{(r)}||_1),
\end{equation}
where $r$ is the assessed region, and the log-likelihood is approximated as:
\begin{equation}
\mathcal{L}^{(r)}(\alpha^{(r)},\boldsymbol{\gamma}^{(r)},\boldsymbol{\beta}^{(r)}) = - \frac{1}{2|\mathcal{T}|} \sum_{t\in\mathcal{T}} \omega_{t} (z_{t} - \alpha^{(r)} - \boldsymbol{\gamma}^{(r)\top} \mathbf{h}_{t+1}^{\mathbf{(-r)}} - \boldsymbol{\beta}^{(r)\top} \mathbf{h}_{t}^{\mathbf{(-r)}}) + C.
\end{equation}
The ensemble $\mathcal{T}$ contains all the data points for which the probed region is in the start state of interest at time $t$ (\textit{e.g.}, baseline for the baseline-to-active transitions), and $C$ is a constant. If we denote the probability of the transition of interest as $p(\alpha^{(r)},\boldsymbol{\gamma}^{(r)},\boldsymbol{\beta}^{(r)},\mathbf{h}_{t}^{\mathbf{(-r)}}, \mathbf{h}_{t+1}^{\mathbf{(-r)}})$, the parameters $\omega_t$ and $z_t$ depend on the current estimates of the coefficients---which we denote with a tilda---as:
\begin{equation}
\left\{
\begin{array}{ll}
\omega_t = p(\tilde{\alpha}^{(r)},\boldsymbol{\tilde{\gamma}}^{(r)},\boldsymbol{\tilde{\beta}}^{(r)},\mathbf{h}_{t}^{\mathbf{(-r)}}, \mathbf{h}_{t+1}^{\mathbf{(-r)}})-p(\tilde{\alpha}^{(r)},\boldsymbol{\tilde{\gamma}}^{(r)},\boldsymbol{\tilde{\beta}}^{(r)},\mathbf{h}_{t}^{\mathbf{(-r)}}, \mathbf{h}_{t+1}^{\mathbf{(-r)}})^2 \\
z_{t} = \tilde{\alpha}^{(r)} + \boldsymbol{\tilde{\gamma}}^{(r)\top} \mathbf{h}_{t+1}^{\mathbf{(-r)}} + \boldsymbol{\tilde{\beta}}^{(r)\top} \mathbf{h}_{t}^{\mathbf{(-r)}} + \frac{y_t - p(\tilde{\alpha}^{(r)},\boldsymbol{\tilde{\gamma}}^{(r)},\boldsymbol{\tilde{\beta}}^{(r)},\mathbf{h}_{t}^{\mathbf{(-r)}}, \mathbf{h}_{t+1}^{\mathbf{(-r)}})}{\omega_t}
\end{array}.
\right.
\end{equation}
$y_t$ defines whether there was a change in activity level from time $t$ to $t+1$ or not (respectively, $y_t = 1$ or $y_t = 0)$. Coefficients are iteratively estimated by a coordinate-wise descent algorithm, following ~\cite{Friedman2007}: the initial estimates outlined above are used at the maximal regularisation level $\lambda_{MAX}$, and individual coefficients are successively re-estimated in random order (note that for $\alpha^{(r)}$ coefficients, which do not enter the $\ell_1$ regularisation term, soft shrinkage is not required). The process continues until the change across two iterations becomes lower than a defined tolerance threshold $\epsilon$. The next regularisation level is then considered, using warm restarts to speed up computations (\textit{i.e.}, the estimates obtained at the end of a regularisation cycle are used as initial values for the following one).
In all the analyses performed in this work, we considered a regularisation path with $\lambda\in[10000,0.02]$ (206 logarithmically distributed values), compared five levels of trade-off between co-activation and causal coefficients ($\xi=\{0,0.25,0.5,0.75,1\}$), and used a tolerance $\epsilon=10^{-40}$.
\subsection{Validation of the framework on simulated data}
We first sought to validate our pipeline on simulated data containing cross-regional causal modulations as well as co-activations. To do so, we considered parameters resembling those of the assessed experimental data (see the following section) as much as possible. We simulated activity time courses for $R=45$ regions, for a total of $S=135$ subjects and $T=1190$ time points per subject.
To design our simulations in accordance with the RS literature~\cite{Yeo2011}, we considered the presence of $N=7$ separate RSNs, each of which could contain between 4 and 7 areas. Time courses for all regions belonging to the same network were similar (prior to the addition of noise). In addition, we also included a set of areas evolving according to their own, independent dynamics; since in such a setting, no co-activation or causal coefficients should be retrieved, these regions can be regarded as a negative control. Furthermore, a few regions were also set as \textit{hubs} that jointly belong to two networks, and activate as soon as one of the networks turns on. Figure~\ref{JNE_F2}C (top left matrix) shows the ground truth co-activation relationships between the set of simulated regions.
Each simulated dynamics was associated to a probability to switch from the baseline to the active state, selected uniformly in the [0.2,0.5] interval. Similarly, the probability to transit from the active to the baseline state was uniformly selected in the [0.7,0.9] interval. Causal modulations were introduced between a subset of networks, as summarised in Figure~\ref{JNE_F2}C (top right matrix): when a modulating network turned active, it could enhance the activity of the modulated network (both by enhancing the likelihood of a 0 to +1 transition, and reducing that of a +1 to 0 one), as symbolised by a positive-valued causal coefficient, or decrease that activity, as reflected by a negative-valued element. We used a shift in transition probability $\Delta P=0.6$.
Eventually, all time courses were corrupted with Gaussian noise, at standard deviation $\sigma=2$; indicative time courses for a simulated subject are presented in Figure~\ref{JNE_F2}A, where noise is sufficient not to be able to infer any cross-regional relationships by mere eyesight.
To assess the ability of the framework to recover the ground truth, we computed Pearson's spatial correlation coefficient between ground truth and estimated coefficients, separately for the co-activation and causal sets, and contrasted these similarity measures to the evolution of the log-likelihood of the data. In addition, we examined whether the information contained in the co-activation coefficients was sufficient to re-order the regions into their underlying networks, by computing Ward's linkage on probabilistic co-activation values (see Figure~\ref{JNE_F1}C, bottom box).
\subsection{Application of the framework to experimental fMRI data}
We applied our framework to experimental RS fMRI data from the \textit{Human Connectome Project}~\cite{VanEssen2013}. We considered one scanning session long of $T=1190$ time points for $S=135$ subjects. The data was acquired at a fast TR of 720 ms, at a spatial resolution of 2 $\times$ 2 $\times$ 2 mm\textsuperscript{3}; additional acquisition details can be found elsewhere~\cite{Smith2013}.
We started from the publicly available minimally preprocessed data. Each voxel-wise time course was detrended, and constant, linear and quadratic trends were regressed out at the same time as a Discrete Cosine Transform basis (cutoff frequency: 0.01 Hz). We chose not to perform global signal regression, since it remains a debated preprocessing step~\cite{Murphy2017}, and in light of recent results showing extensive relationships between spatio-temporal motion patterns and human behaviour~\cite{Bolton2019c}, also decided not to include individual motion time course regressors (note that motion is handled by conservative scrubbing at a later stage of the pipeline---see below).
Voxel-wise time courses were averaged into 90 regions of interest defined from the AAL atlas~\cite{TzourioMazoyer2002}; although more accurate parcellation schemes have been introduced~\cite{Glasser2016,Schaefer2017}, they involve a larger amount of brain regions and would thus require an amount of data larger than the available one for accurate estimation. As the main goal of the present report is the introduction of our framework, rather than its application to neurobiologically relevant questions, we opted to operate at the smaller AAL scale.
As a final preprocessing step, scrubbing was performed at a framewise displacement threshold~\cite{Power2012} of 0.3 mm, and discarded frames were re-estimated by cubic spline interpolation.
To assess the reproducibility of our findings, we separately applied our framework to each hemisphere of the brain; in each case, co-activations and causal modulations were thus estimated between $R=45$ separate areas.
\clearpage
\section{Results}
\subsection{Validation of the framework on simulated data}
Figure~\ref{JNE_F2} displays the results of our simulations.
%%%%%%%%%%%%%
\begin{figure}[h!]
\centering
\includegraphics[width=1.0\textwidth]{Figures/JNE_FIG2.eps}
\caption[Results on simulated data]{\textbf{Results on simulated data.} \textbf{(A)} AAA.}
\label{JNE_F2}
\end{figure}
%%%%%%%%%%%%%
\subsection{Application of the framework to experimental fMRI data}
\clearpage
\section{Discussion}
Future work can model innovations instead of changes between baseline and active states: we would then just consider the probability to undergo an innovation. The advantage is that then, we can also more readily bridge the results from different datasets together, even if acquired at different TRs: indeed, we could consider whether an innovation occurred at time $t-1$, $t-2$. $t-3$...
Here, we make the assumption that the 0 to +1 and +1 to 0 transitions are mirroring each other (that is, if a region modulates another, it will boost one and decrease the other). Our framework already enables to also look for more subtle interplays by simply not combining the $\cdot_A$ and $\cdot_D$ cases anymore, but keeping them separate. For instance, maybe a given network only modulates another when the other is at baseline.
We considered the AAL atlas because we had a limited amount of data available, but theoretically, the framework can also readily be applied to a larger amount of regions. Actually, we saw in our simulations that the regularisation subpart of the framework was not super helpful; perhaps working on more regions with a similar amount of data points would make it a more useful component.
The hope is that in follow-up work, this approach enables to address brain disorders; for this purpose, bootstrapping could be conducted on both subject populations and coefficient distributions (or probabilistic modulations) compared statistically. The assessment of behavioural differences is also interesting, but harder to achieve: tailored ways to derive subject-level estimates despite too low data amount should then be developed.
I should also mention that this approach is inspired from a former SCHMM pipeline, where I modelled network time courses. The differences are: here, I consider only two states, and I do not require to define networks \textit{a priori} anymore (this was the primary motivation for the present work). In addition, the regularisation strategy is more elegant owing to the two parameters, and the biological relevance of $\xi$.
There is the option to append an elastic net term to the current regularisation strategy, which may help get rid of the otherwise remnant noise linked to the case closest to the ground truth: indeed, more coefficients would be non-null (and closer in values) at stronger regularisation levels. Of course, however, there is another parameter that must then be optimised. All the scripts used in this work can be accessed online at [LINK].
What is the trade-off between co-activation and causal relationships seen in the real data?
More far-fetched ideas for future work: (1) apply this framework to hyperscanning to probe co-activation and causal links between two interacting subjects, and (2) apply this framework between concomitantly acquired fMRI and EEG data (that's the advantage of hidden states as a representational approach): of course, there would however be the need to define an equivalent temporal resolution between modalities.
\clearpage
\bibliography{papers_library}
\end{document}

Event Timeline