%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Author: Thomas Bolton % Date: 23/10/2019 % Description: % This script runs motion and behavior analyses on an extended set of % subjects from the Human Connectome Project, in an updated fashion to % answer the comments from the Reviewers %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %% 1. Loading some of the data and setting paths % SS contains the parameters for convergence of the probabilistic PCA % algorithm load SS % Adds paths to functions and utilities addpath(genpath(fullfile(pwd,'GlobalUtilities'))); addpath(genpath(fullfile(pwd,'Functions'))); addpath(genpath('/Users/TiBiUan/Desktop/Github/IBI_Application/HCP_General')); addpath(genpath('/Users/TiBiUan/Desktop/Github/MOT_ANA/Behavioral_Data_Extended')); % Do we want to plot stuff or not is_plot = 1; %% 4. Flagging scrubbed time points % Threshold to use for scrubbing (i.e., separating the scrubbed frames from % the non-scrubbed ones) T_scrubbing = 0.5; n_subjects = size(FD_ALL,2); % Will mask which frames to keep or not to keep for a given subject TemporalMask_All = []; for s = 1:n_subjects TemporalMask_All(:,s) = GetTemporalMask(FD_ALL(:,s),T_scrubbing,1); end %% 5. Computing processed motion information (spatiotemporal features) % number of time bins that we want to split the motion data into: 6 is the % recommended value. n_TB = 2; % Computation of the motion parameters: Motion_Matrix will contain the % results for session 1 [Motion_Matrix,Motion_Matrix_Pos,Motion_Matrix_Neg,Space_Labels,... Time_Labels,Session_Labels] = MOT_ANA_ComputeParameters(MM,TemporalMask_All,n_TB); %% 6. Conversion into "relative to population values" % Note that we use both session data in this step: indeed, we are % interested in keeping them in the same space Motion_Matrix = Motion_Matrix(idx_motion&idx_behavior,:); n_subjects=size(Motion_Matrix,1); [MOTION_VAR] = MOT_ANA_Reftopop([Motion_Matrix],'Popref'); %% 7. Low dimensional representation of the spatiotemporal motion data: % determination of the optimal number of clusters for the combined data % 10 nearest neighbours for spectral clustering n_NN = 10; % Construction of the adjacency matrix reflecting my graph, using % cosine distance (similar results are obtained using euclidean % distance): note that the graph is constructed on both session data points W = construct_knn_graph(MOTION_VAR,n_NN,'cosine'); % Then, I compute the first three dimensions of my new space in which I % express my data: first I compute the normalized Laplacian of my graph, % and second, I do an eigenvalue decomposition on this. X, Y and Z are the % second, third and fourth eigenvectors resulting from the process (that % is, the "low frequency information" along the graph that links similar % subjects together [X,Y,Z,E1,E2,E3] = compute_non_linear_dim_reduction(W); % Of course, we want to find the optimal data parcellation, and so % we must derive an optimal number of clusters to use. For this, we % resort to Consensus Clustering (Monti et al., Machine learning, 2003) % From here on, we separate session 1 and session 2 data, in the spirit of % conducting a replicability assessment [Consensus] = CAP_ConsensusClustering({[X,Y,Z]'},2:50,'items',0.8,100,'cosine'); % Consensus clustering matrices for K ranging from 2 to 17, in both session % cases % if is_plot % figure; % set(gcf,'color','w'); % % for j = 1:36 % % subplot(6,6,j); % set(gca,'Box'); % axis square % % imagesc(squeeze(Consensus(:,:,j))); % % end % % end % Calculates the quality metrics to determine the optimal value of % K: the inter-percentile interval (PAC) is used [~,Lorena] = ComputeClusteringQuality(Consensus,2:50); % Assessment of the results if is_plot figure; bar(Lorena); end %% 8. Analysis and displays for the chosen, whole data % Based on the output metrics, the optimal cluster number is n_clusters = 7; % Clustering is performed with this optimal value [IDX] = kmeans([X,Y,Z],n_clusters,'Distance',... 'cosine','Replicates',10000,'Start','sample'); % Figure 1: representation of movers in nonlinearly reduced space % (paper figure and representation as cluster members) if is_plot % Plotting of the whole data %[F1,F2,F3,F4,F5,F6] = MOTANA_OMFGBBQPlotSimple(MOTION_VAR,X,Y,Z,n_subjects,1.5,500,1,Time_Labels,Space_Labels,'RGB',IDX,[]); [F1,F2,F3,F4,F5,F6] = MOTANA_OMFGBBQPlotSimple_UCLA(MOTION_VAR,X,Y,Z,size(MOTION_VAR,1),1.5,500,3,Time_Labels,Space_Labels,'Groups',IDX,[]); [F1,F2,F3,F4,F5,F6] = MOTANA_OMFGBBQPlotSimple_UCLA(MOTION_VAR,X,Y,Z,size(MOTION_VAR,1),1.5,500,3,Time_Labels,Space_Labels,'Groups',Diag,[]); end C_UCLA = zeros(n_clusters,6*n_TB); for k = 1:n_clusters C_UCLA(k,:) = mean(MOTION_VAR(IDX==k,:)); end C_UCLA_K4 = zeros(n_clusters,6*n_TB); for k = 1:n_clusters C_UCLA_K4(k,:) = mean(MOTION_VAR_UCLA(IDX_K4==k,:)); end %% 9. Summary of motion along time and space % Observation: we find similar clusters: although K=4 is a less "clear" % pick in the latter dataset case MOT_ANA_SummariseMotion(MOTION_VAR,IDX,Time_Labels,Space_Labels); [RealCorrs,NullCors] = MOT_ANA_CompareClusters(C_HCP,C_UCLA,'MSE'); [RealCorrs,NullCors] = MOT_ANA_CompareClusters(C_HCP,C_UCLA_K4,'MSE'); %% 12. Preparation of the behavioural data (normalization of each column) Output = Behav(idx_motion&idx_behavior,:); % Normalize each column (keeping 'non-existing values') O2 = Output - repmat(nanmean(Output,1),n_subjects,1); O2 = O2 ./ repmat(nanstd(O2,1),n_subjects,1); % Number of missing values P_missing = sum(sum(isnan(O2)))/size(O2,1)/size(O2,2)*100; PCA = O2; %% 15. Univariate assessment of the derived anthropomorphic and % behavioral features link to motion (whole dataset and separate assessments) % We compute the Fisher score (and related null distribution when shuffling % behvior and motion entries with respect to each other) for each feature % of interest [F,Fnull] = MOT_LinkBehavMotion(PCA(:,[1:4,6:9,12:17,28,29]),IDX); % Bonferroni correction for the number of univariate assessments and % subsequent derivation of significance thresholds alpha = 5/size(PCA(:,[1:4,6:9,12:17,28,29]),2); Fthresh = squeeze(prctile(Fnull,100-alpha,2)); % Another attempt using the proposal by Holmes and Nichols (Human Brain % Mapping, 2001). We compute a distribution based on the max of each score % null distribution and then, threshold that at 5% tmp_F = max(Fnull,[],2); F_Holmes = prctile(tmp_F,100-alpha); % Yet another attempt using FDR correction for i = 1:size(PCA(:,[1:4,6:9,12:17,28,29]),2) tmp = Fnull(i,:); tmpF = F(i); pval(i) = sum(tmp > tmpF)/length(tmp); end % FDR correction [h, crit_p]=fdr_bh(pval,0.05); % Determines the thresold level matching crit_p for i = 1:size(PCA(:,[1:4,6:9,12:17,28,29]),2) tmp = Fnull(i,:); F_FDR(i) = prctile(tmp,100 - 100*crit_p); end % Plotting of univariate results figure; bar(F); hold on; for i = 1:size(PCA(:,[1:4,6:9,12:17,28,29]),2) plot([i-0.4,i+0.4],[Fthresh(i),Fthresh(i)],'color',[0.3 0.3 0.3],'LineWidth',1.5); plot([i-0.4,i+0.4],[F_FDR(i),F_FDR(i)],'color',[0.6 0.6 0.6],'LineWidth',1.5); end %% % Indices of the significant variables idx_uni = [1,7]; % For each, post-hoc comparison across groups for i = 1:length(idx_uni) tmp = PCA(:,idx_uni(i)); [p_12(i),~,STATS_12{i}] = ranksum(tmp(IDX == 1),tmp(IDX == 2)); [p_13(i),~,STATS_13{i}] = ranksum(tmp(IDX == 1),tmp(IDX == 3)); [p_14(i),~,STATS_14{i}] = ranksum(tmp(IDX == 1),tmp(IDX == 4)); [p_23(i),~,STATS_23{i}] = ranksum(tmp(IDX == 2),tmp(IDX == 3)); [p_24(i),~,STATS_24{i}] = ranksum(tmp(IDX == 2),tmp(IDX == 4)); [p_34(i),~,STATS_34{i}] = ranksum(tmp(IDX == 3),tmp(IDX == 4)); Delta_12(i) = nanmean(tmp(IDX == 1)) - nanmean(tmp(IDX == 2)); Delta_13(i) = nanmean(tmp(IDX == 1)) - nanmean(tmp(IDX == 3)); Delta_14(i) = nanmean(tmp(IDX == 1)) - nanmean(tmp(IDX == 4)); Delta_23(i) = nanmean(tmp(IDX == 2)) - nanmean(tmp(IDX == 3)); Delta_24(i) = nanmean(tmp(IDX == 2)) - nanmean(tmp(IDX == 4)); Delta_34(i) = nanmean(tmp(IDX == 3)) - nanmean(tmp(IDX == 4)); tmpf = figure; set(gcf,'color','w'); imagesc([0,Delta_12(i),Delta_13(i),Delta_14(i);... 0,0,Delta_23(i),Delta_24(i);... 0,0,0,Delta_34(i);... 0,0,0,0]); set(gca,'Box','off'); CM = cbrewer('div','RdBu',1000); colormap(flipud(CM)); tmp_scale = max(abs([Delta_12(i),Delta_13(i),Delta_14(i),Delta_23(i),... Delta_24(i),Delta_34(i)])); caxis([-tmp_scale,tmp_scale]); axis off axis square colorbar print(tmpf,'-depsc',['UNIV_',num2str(i)]); close(tmpf); end %% 16. Combination of both sides with PLS [U,V,EV,sprob,BS_U,BS_V,Ub_vect,Vb_vect] = RunPLS(MOTION_VAR,PCA(:,[1:4,6:9,12:17,28,29])); %% 17. PLS-related plots % Here, we plot the results of the PLS analysis % How many components do we want to plot ? n_comps = 4; % Latent scores for behaviors for the subjects (how strongly a component is % expressed in a subject) Lm = MOTION_VAR*U; Lm = Lm(:,1:n_comps); % Latent scores for motion Lb = PCA(:,[1:4,6:9,12:17,28,29])*V; Lb = Lb(:,1:n_comps); for j = 1:n_comps % Computing the time and space indices for my 36 values for t = 1:n_TB Summary_time_PLS(j,t) = mean(U(Time_Labels(t,:),j)); end for p = 1:6 Summary_position_PLS(j,p) = mean(U(Space_Labels(p,:),j)); end % Same for my bootstrapping folds for k = 1:1000 for t = 1:n_TB Summary_time_boot(j,t,k) = mean(squeeze(Ub_vect(k,Time_Labels(t,:),j))); end for p = 1:6 Summary_position_boot(j,p,k) = mean(squeeze(Ub_vect(k,Space_Labels(p,:),j))); end end % Computes standard deviation V_STD = squeeze(std(Vb_vect,[],1)); U_STDt = squeeze(std(Summary_time_boot,[],3)); U_STDs = squeeze(std(Summary_position_boot,[],3)); % Computes z-scores U_zt = Summary_time_PLS./U_STDt; U_zs = Summary_position_PLS./U_STDs; V_z = V./V_STD; if is_plot figure; set(gcf,'color','w'); bar(1:size(V,1),V_z(:,j)); set(gca,'Box','off'); figure; set(gcf,'color','w'); subplot(2,1,1); bar(1:n_TB,U_zt(j,:)); set(gca,'Box','off'); subplot(2,1,2); bar(1:6,U_zs(j,:)); set(gca,'Box','off'); end end %% 18. Understanding how components relate to each other % For age for i = 1:n_comps figure; set(gcf,'color','w'); scatter(Age2,Lm(:,i),'k.'); set(gca,'Box','off'); figure; set(gcf,'color','w'); scatter(Age2,Lb(:,i),'k.'); set(gca,'Box','off'); [R_age(i,1), p_age(i,1)] = AssessAgeEffect(Age2,Lm(:,i)); [R_age(i,2), p_age(i,2)] = AssessAgeEffect(Age2,Lb(:,i)); [R_age_abs(i,1), p_age_abs(i,1)] = AssessAgeEffect(Age2,abs(Lm(:,i))); [R_age_abs(i,2), p_age_abs(i,2)] = AssessAgeEffect(Age2,abs(Lb(:,i))); end % For gender for i = 1:n_comps figure; tmp_scale = max(abs(Lm(:,i))); tmp_scale2 = max(abs(Lb(:,i))); subplot(1,2,1); boxplot(Lm(Gender2==0,i)); ylim([-tmp_scale,tmp_scale]); subplot(1,2,2); boxplot(Lm(Gender2==1,i)); ylim([-tmp_scale,tmp_scale]); figure; subplot(1,2,1); boxplot(Lb(Gender2==0,i)); ylim([-tmp_scale2,tmp_scale2]); subplot(1,2,2); boxplot(Lb(Gender2==1,i)); ylim([-tmp_scale2,tmp_scale2]); tmpf = figure; COL{1} = [0.5 0.5 0.5]; COL{2} = [0 0 0]; MakeViolin(Lb(Gender==0,i)',gca,{''},'',COL,1,1); ylim([-8 8]); print(tmpf,'-depsc',['VIOLIN_GENDER']); close(tmpf); [t_gender(i,1), p_gender(i,1)] = AssessGenderEffect(Gender2,Lm(:,i)); [t_gender(i,2), p_gender(i,2)] = AssessGenderEffect(Gender2,Lb(:,i)); [t_gender_abs(i,1), p_gender_abs(i,1)] = AssessGenderEffect(Gender2,abs(Lm(:,i))); [t_gender_abs(i,2), p_gender_abs(i,2)] = AssessGenderEffect(Gender2,abs(Lb(:,i))); end % For Framewise displacement FD_Data2 = FD_Data; FD_Data2(:,isnan(sum(PCA,2)),:) = []; TemporalMaskAll2 = TemporalMask_All; TemporalMaskAll2(:,isnan(sum(PCA,2)),:) = []; % FD is size 1199 x 224; I want to compute the mean FD across non scrubbed % frames for s = 1:length(F1b) for ses = 1:4 FD_s(s,ses) = mean(FD_Data2(~TemporalMaskAll2(:,s,ses),s,ses)); FD_ns(s,ses) = mean(FD_Data2(:,s,ses)); end end for i = 1:n_comps for ses = 1:4 figure; set(gcf,'color','w'); scatter(FD_s(:,ses),Lm(:,i),'k.'); set(gca,'Box','off'); figure; set(gcf,'color','w'); scatter(FD_s(:,ses),Lb(:,i),'k.'); set(gca,'Box','off'); [R_FDs(i,ses,1), p_FDs(i,ses,1)] = AssessAgeEffect(FD_s(:,ses),Lm(:,i)); [R_FDs(i,ses,2), p_FDs(i,ses,2)] = AssessAgeEffect(FD_s(:,ses),Lb(:,i)); [R_FDs_abs(i,ses,1), p_FDs_abs(i,ses,1)] = AssessAgeEffect(FD_s(:,ses),abs(Lm(:,i))); [R_FDs_abs(i,ses,2), p_FDs_abs(i,ses,2)] = AssessAgeEffect(FD_s(:,ses),abs(Lb(:,i))); end end %% 20. Another prediction attempt % n_subtrain = floor(0.8*936); % n_subtest = 936-n_subtrain; % % idx = randperm(936); % % % for n = 1:1 % % Mot_train = MOTION_VAR_trim(idx(1:n_subtrain),:); % Behav_train = PCA_trim(idx(1:n_subtrain),:); % % Mot_test = MOTION_VAR_trim(idx(n_subtrain+1:end),:); % Behav_test = PCA_trim(idx(n_subtrain+1:end),:); % % [U,V,EV,sprob,BS_U,BS_V,Ub_vect,Vb_vect] = RunPLS(Mot_train,Behav_train); % % % Size 4 * 951 % LM_test = U(:,1:4)' * Mot_test'; % % EV(1)* % % end Data_ANOVA = [Lm(:,1)',Lm(:,2)',Lm(:,4)',Lb(:,1)',Lb(:,2)',Lb(:,4)']; Diag_ANOVA = repmat(Diag',1,6); Motvsbehav_ANOVA = [ones(1,3*245),2*ones(1,3*245)]; Comp_ANOVA = repmat([ones(1,245),2*ones(1,245),3*ones(1,245)],1,2); [~,Tab,Stats] = anovan(Data_ANOVA,{Diag_ANOVA',Motvsbehav_ANOVA',Comp_ANOVA'},'model','full'); % How many times do we run the non-parametric case n_np = 10000; for null = 1:n_np tmp_idx = randperm(length(Diag_ANOVA)); tmp_idx2 = randperm(length(Diag_ANOVA)); tmp_idx3 = randperm(length(Diag_ANOVA)); [~,Tab,Stats] = anovan(Data_ANOVA,{Diag_ANOVA(tmp_idx)',Motvsbehav_ANOVA(tmp_idx2)',Comp_ANOVA(tmp_idx3)'},'model','full','display','off'); Tab = Tab(:,6); Tab_null(:,null) = [Tab{2:8}]; end F_threshold = prctile(Tab_null,99,2);