Codetrainee

Research blogs

Semi quantitative proteomics enables mapping of global neutrophil dynamics following influenza virus infection

Source code and dataset availability

Source code is hosted on github

Prerequisite

We first loaded libraries required for analysis. Scripts used in the analysis could be found in data/Rscripts. Packages and their version are specificed in a plain file as well as listed at the bottom of the page.

source("data/Rscripts/load_libraries.R")
source("data/Rscripts/utilities.R")

The theme for figures generated by ggplot2 was set to dust style from ggthemr packages.

ggthemr("dust")

To have a tidy output, the figures were centered and message and warning were suppressed by defaults. Code was also hidden for simplicity.

knitr::opts_chunk$set(fig.align = "center",
                      message   = FALSE,
                      warning   = FALSE,
                      cache     = TRUE,
                      echo      = FALSE,
                      results   = "hide")

All data were stored under the below folders and can be found in the Source code and dataset availability

In raw files generated from MaxQuant software, a plenty of NA values has to be removed before downstream analyses. To survive the filtering step, a pick-up protein should meet following requirements:

  • should be detected in at least one comparison group.
  • should be picked up in all of three independent replicates.

Analysis of neutrophil proteome at homeostasis or following a lethal IAV infection

3 experiments were included in the analysis pipeline. The first two were combined together to identify neutrophil-derived markers in a 1000 pfu PR8 influenza infection compared to PBS treated control. The third experiment compared the neutrophil proteome in different doses of influenza infection and bacterial infection.

Raw data cleaning

We first loaded datasets from two experiments and removed the potential contaminations marked by MaxQuant.

The datasets were annotated as below. Missing values were filtered out using customised functions.

Experiment 1 Biological replicate 1
Labelling Sample
Light (L) 1000 PFU PR8 BM neutrophil
Medium (M) 1000 PFU PR8 Blood neutrophil
Heavy (H) 1000 PFU PR8 BAL neutrophil
Experiment 1 Biological replicate 2
Labelling Sample
Light (L) 1000 PFU PR8 BAL neutrophil
Medium (M) 1000 PFU PR8 BM neutrophil
Heavy (H) 1000 PFU PR8 Blood neutrophil
Experiment 1 Biological replicate 3
Labelling Sample
Light (L) 1000 PFU PR8 Blood neutrophil
Medium (M) 1000 PFU PR8 BAL neutrophil
Heavy (H) 1000 PFU PR8 BM neutrophil
Experiment 2 Biological replicate 1
Labelling Sample
Light (L) PBS BM neutrophil
Medium (M) PBS Blood neutrophil
Heavy (H) 1000 PFU PR8 BM neutrophil
Experiment 2 Biological replicate 2
Labelling Sample
Light (L) 1000 PFU PR8 BM neutrophil
Medium (M) PBS BM neutrophil
Heavy (H) PBS Blood neutrophil
Experiment 2 Biological replicate 3
Labelling Sample
Light (L) PBS Blood neutrophil
Medium (M) 1000 PFU PR8 neutrophil
Heavy (H) PBS BM neutrophil

Proteins passed the filtering step were subjected to statistics tests for difference analysis. A protein with a fold change greater than 2 and p value less than 0.05 in at least one comparison group was considered as a significant one in protein abundance change. With all data cleaned and processed, visualization was the next step to clearly illustrate the proteomics profiles.

An additional comparison between 1000 pfu PR8 blood vs PBS control blood neutrophils was also calculated in silico using the equation \[\frac{1000 \: pfu \: PR8 \: blood}{PBS \: blood} = \frac{1000 \: pfu \: PR8 \: blood}{1000 \: pfu \: PR8 BM} \times \frac{1000 \: pfu \: PR8 \: BM}{PBS \: blood}\] where \(\frac{1000 \: pfu \: PR8 \: blood}{1000 \: pfu \: PR8 \: BM}\) is a sample comparison obtained from Experiment 1 and \(\frac{1000 \: pfu \: PR8 \: BM}{PBS \: blood}\) is a sample comparison obtained from Experiment 2.

Visualization of pick-up proteins in independent experiments and biological replicates

Before moving on, we checked all proteins identified in all replicates and visualized them via Venn Diagram. As shown, a total of 1372 proteins survived in the filtering step and they were used for subsequent analyses.

Overall impression of proteome dataset

Data reproducibility between biological replicates were examined by a PCA analysis. The comparison between 1000 pfu PR8 Blood vs PBS Blood was included. From the PCA plot, we can clearly see that the comparison between 1000 pfu PR8 blood and PBS blood was similar to that of 1000 pfu PR8 BM versus PBS BM. Another similar comparison was also identified in 1000 pfu PR8 blood vs 1000 pfu PR8 BM and PBS blood and PBS BM.

As the comparison between 1000 pfu PR8 blood vs PBS blood was calculated in silico, it was removed temporarily from the following analysis unless indicated. We redrawn the PCA plot with the aforementioned group removed.

The heatmap for all proteins was also generatef for an overall impression of expression profiles. The first one included the in silico calculated group.

The next heatmap was without the in silico calculated group.

We then submitted the identified proteins to DAVID database to investigate the cellular compartments they belong to.

Analysis of proteins with significant abundance changes

Proteins with significant expression changes across at least one comparison group were filterd out. The 155 protein expression matrix was subjected to fviz_nbclust using wss method to determine the optimal cluster for visualization.

Based on the figure, 9 clusters were the best one to separate protein expression difference.

All analyses above were only about all proteins from different comparison group. To identify proteins in each comparison group, we visualized data with volcano plots. The dataset was first cleaned for the subsequent visualization and the volcano plots were shown below.

We are also interested in the pathways these proteins involved. Therefore, we filtered the proteins and annotated them as down or up. They were used to search the Reactome database respectively and retrieved the pathway data for visualization.

Because in blood tissues, many proteins related to ribisome were downregulated and they would interfere the final outputs. To address this, we separated them first and proceeded to visualization.

We first checked the proteins without ribisome “contamination”.

And then the proteins related to ribisomes were visualized.

Finally we used the alluvival flow to show the protein abundance changes from PBS BM to 1000 pfu PR8 BM and then Blood and BAL.

iBAQ intensity ranking

The intensity data were extracted from iBAQ values and only the proteins identified in both experiments were used. Ranking all proteins based on their iBAQ intensity values is a good alternative to visualize their abundance in a cell when no abosulte copy number is available.

Analysis of BAL neutrophil proteome changes following different doses of influenza infection or bacterial infection

Raw data cleaning

The above data analysis revealed that neutrophils underwent proteomics profile changes during influenza infection. To investigate whether it is a pathogen specific or a dose-dependent response during infection. We performed another round of proteomics experiment where the design was shown below:

Experiment 3 Biological replicate 1
Labelling Sample
Light (L) 100 pfu PR8 BAL neutrophil
Medium (M) 1000 pfu PR8 BAL neutrophil
Heavy (H) 3 × 106 CFU Pseudomonas Aeruginosa BAL neutrophil
Experiment 3 Biological replicate 2
Labelling Sample
Light (L) 1000 PFU PR8 BAL neutrophil
Medium (M) 3 × 106 CFU Pseudomonas Aeruginosa BAL neutrophil
Heavy (H) 100 pfu PR8 BAL neutrophil
Experiment 3 Biological replicate 3
Labelling Sample
Light (L) 3 × 106 CFU Pseudomonas Aeruginosa BAL neutrophil
Medium (M) 100 PFU PR8 BAL neutrophil
Heavy (H) 1000 PFU PR8 BAL neutrophil

We first tried data cleaning for this dataset. The proteins with different abundance were also filtered out.

Overall impression of datasets

Then we generated a PCA analysis to look at the protein abundance pattern.

Analysis of proteins with significant abundance changes

Lastly, a heatmap for all proteins with different abundance detected in at least one group was visualized.

The cluster number was first determined by fviz_nbclust function with wss method.

Based on the plot, we can see 3 clusters are the best option to visualize the dataset.

State of the machine

## sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17763)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_Australia.1252  LC_CTYPE=English_Australia.1252
## [3] LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
## [5] LC_TIME=English_Australia.1252
##
## attached base packages:
##  [1] parallel  stats4    grid      stats     graphics  grDevices utils
##  [8] datasets  methods   base
##
## other attached packages:
##  [1] blogdown_0.12         ggpubr_0.2            corrplot_0.84
##  [4] scatterpie_0.1.2      ggthemr_1.1.0         org.Mm.eg.db_3.7.0
##  [7] AnnotationDbi_1.44.0  IRanges_2.16.0        S4Vectors_0.20.1
## [10] Biobase_2.42.0        BiocGenerics_0.28.0   customLayout_0.3.0
## [13] glue_1.3.1            ggalluvial_0.9.1      RColorBrewer_1.1-2
## [16] egg_0.4.2             cowplot_0.9.4         data.table_1.12.2
## [19] factoextra_1.0.5      gridExtra_2.3         ComplexHeatmap_1.20.0
## [22] eulerr_5.1.0          ggthemes_4.2.0        ggrepel_0.8.1
## [25] magrittr_1.5          forcats_0.4.0         stringr_1.4.0
## [28] dplyr_0.8.0.1         purrr_0.3.2           readr_1.3.1
## [31] tidyr_0.8.3           tibble_2.1.1          ggplot2_3.1.1
## [34] tidyverse_1.2.1
##
## loaded via a namespace (and not attached):
##  [1] nlme_3.1-137        fs_1.3.1            lubridate_1.7.4
##  [4] bit64_0.9-7         httr_1.4.0          tools_3.5.1
##  [7] backports_1.1.4     R6_2.4.0            DBI_1.0.0
## [10] lazyeval_0.2.2      colorspace_1.4-1    GetoptLong_0.1.7
## [13] withr_2.1.2         processx_3.3.1      tidyselect_0.2.5
## [16] bit_1.1-14          compiler_3.5.1      cli_1.1.0
## [19] rvest_0.3.3         pacman_0.5.1        xml2_1.2.0
## [22] bookdown_0.10       labeling_0.3        scales_1.0.0
## [25] callr_3.2.0         digest_0.6.18       rmarkdown_1.12
## [28] pkgconfig_2.0.2     htmltools_0.3.6     rlang_0.3.4
## [31] GlobalOptions_0.1.0 readxl_1.3.1        rstudioapi_0.10
## [34] RSQLite_2.1.1       shape_1.4.4         generics_0.0.2
## [37] farver_1.1.0        jsonlite_1.6        Rcpp_1.0.1
## [40] munsell_0.5.0       clipr_0.6.0         whisker_0.3-2
## [43] stringi_1.4.3       yaml_2.2.0          MASS_7.3-50
## [46] plyr_1.8.4          blob_1.1.1          crayon_1.3.4
## [49] lattice_0.20-35     haven_2.1.0         circlize_0.4.6
## [52] hms_0.4.2           polylabelr_0.1.0    ps_1.3.0
## [55] knitr_1.22          pillar_1.4.0        rjson_0.2.20
## [58] reprex_0.2.1        evaluate_0.13       modelr_0.1.4
## [61] tweenr_1.0.1        cellranger_1.1.0    gtable_0.3.0
## [64] polyclip_1.10-0     assertthat_0.2.1    xfun_0.6
## [67] ggforce_0.2.2       broom_0.5.2         rvcheck_0.1.3
## [70] memoise_1.1.0