A two-component HIV-1-induced lineage tracing (HILT) system
To study infected and latent cell populations, we developed a genetic system to mark HIV-1 infected cells irreversibly. The approach utilizes a two-component, Cre-lox-based recombination system to report on the HIV-1 infection status of cells. The system involves a lentiviral construct (Lenti-RG) carrying a “red-to-green” Cre-lox-based cassette (Fig. 1A). This cassette contains the dsRed gene flanked by two LoxP sites, blocking eGFP translation until exposure to Cre recombinase triggers GFP expression44. The Cre recombinase is expressed from an HIV-1 clone (NL-CreI), replacing nef. Nef expression is restored with an internal ribosome entry site (IRES) (Fig. 1B). This HIV-1 construct is called HIV-1 NL-CreI. It has been employed here in a version carrying an NL4-3 envelope (X4-tropic) or a clade B transmitted founder envelope from HIV RHPA (R5-tropic). To test the efficiency of the two-component system, we transduced the Jurkat T-cell line with the Lenti-RG virus and analyzed the cells for dsRed expression using flow cytometry (Fig. 1C). Upon transduction with the Lenti-RG vector 39% of cells were positive for dsRed expression (Fig. 1D). This was followed by infection of the transduced cells with HIV-1 NLCreI (X4-tropic) and a phenotypic switch from red to green was observed in 3% of the total Lenti-RG-transduced population. The red-to-green switch was blocked by the reverse transcriptase inhibitor, Azidothymidine (AZT), indicating that productive HIV infection is required for the phenotypic switch (Fig. 1D). During the acute infection of Jurkat cells, the upregulation of GFP occurs relatively early in infection, the expected decrease in dsRed expression was not apparent at 2 days post-infection. In a separate infection 3 days post infection, dsRed is downregulated in the GFP positive fraction (Fig. S1), suggesting that the decay of dsRed occurs after GFP is upregulated.

The red-to-green switch enriches cells that carry HIV-1 proviral DNA
We next tested if the switched GFP+ cells were enriched for HIV-1 NLCreI proviral DNA. We infected Lenti-RG-transduced Jurkat cells with HIV-1 NLCreI and sorted 5000 unmarked, dsRed+/GFP- (marked targets), and GFP+ (switched) cells each (Fig. S1A). Genomic DNA extracted from sorted populations was directly measured for HIV-Cre DNA content by qPCR assay detecting the cre gene. Two different cre-specific PCR primer probes (Table S1) were used to measure the relative level of enrichment of HIV-1 NL-CreI in unmarked, marked (dsRed+/GFP-), and switched (GFP+) samples in comparison to the control genomic locus, RNaseP. We observed a >100-fold enrichment of HIV-1 NL-CreI proviral DNA in DNA isolated from sorted GFP+ cells in comparison to sorted dsRed+/GFP- or sorted unmarked cells (Fig. 1E). This data supports the idea that the HILT system effectively enriches for rare HIV-1 DNA+ infected cells.
The HILT-switched cells infected in humanized mice exhibit enrichment of HIV DNA
To measure the enrichment of HIV-1 proviral DNA in HILT-marked, GFP+ cells in humanized mice, we used HIV NL-CreI (RHPA) to infect 3 huPBL mice engrafted with Lenti-RG transduced human peripheral blood mononuclear cells (PBMC) and 3 mice with a replication-competent GFP-expressing HIV, HIV GFP (RHPA) for comparison (Fig. 1F). Infection of humanized huPBL mice with the HIV RHPA Cre-I virus results in high viral titers of up to 107 copies/ml at peak viremia and showed a similar range of infectivity to the HIV-GFP virus(Fig. 1G). At 21 dpi following peak viremia, splenocytes from these mice were collected and sorted into GFP+, dsRed+/GFP-, and double negative (DN) fractions (Fig. 1H). The DNA extracted from sorted cells was measured for the enrichment of HIV DNA in the GFP+ fractions by DNA PCR assay. A 1600-fold enrichment for HIV-Cre DNA was observed in sorted GFP+ cells compared to dsRed+/GFP- cells. We note that the dsRed+/GFP- fraction was not enriched in HIV DNA as compared to the unmarked fraction, indicating to us that the GFP provides a strong indicator of an infection with an intact cre-expressing provirus (Fig. 1I). These results support that in humanized mice the switched GFP+ cells are highly enriched in HIV NL-CreI provirus. The low background DNA PCR signal in the unswitched cells also indicates that we are not missing a large percentage of silent integration events where there is insufficient cre expressed from an integrated provirus that does not activate the red-to-green switch.
Humanized mice with dsRed expression in immune lineages (HILT mice)
Human CD34+ hematopoietic stem cell engrafted mouse models (huHSC) support the study of long-term HIV-1 infection as well as latency during ART therapy42,45,46. A strength of this model is the development of diverse hematopoietic cell lineages, including naïve and memory T-cell subsets that are susceptible to HIV. Human cord blood-derived CD34+ cells were transduced by Lenti-RG as described by Wang et al.47 and injected into 0 to 3 days old neonates of NSG (NOD.Cg-Prkdcscid Il2rgtm1Wjl/SzJ) mice (Fig. 2A). At 12–16 weeks diverse human immune lineages developed from transduced CD34+ cells (Fig. 2B). An average of 28.75% were huCD45 cells. 29.54% of these cells were CD3+ T-cells. The distribution of CD4 and CD8 in the CD3 compartment was 37.9% and 18.62%, respectively (Fig. 2B, n = 183). Engrafted human cells were found in the peripheral blood, the spleen, lung, and bone marrow (Fig. S3A, B, C, D).

A Newborn NSG mice are injected with Lenti-RG transduced cord blood-derived HuCD34+ hematopoietic stem cells and are engrafted with human immune cell lineages in 16 weeks. B Percentages of total peripheral blood cells in reconstituted NSG mice measuring huCD45, CD3, CD4, or CD8, respectively, at week 16 (n = 184 mice, data represented as mean +/− SD). C Frequency of dsRed+ cells in huCD45, CD3, CD4 and CD8 cells. (n = 184 mice, data represent mean +/− SD). D The percentage of engrafted CD4+ dsRed+ T cells throughout the lifetime of the mice (n = 14 mice, data represented as mean +/- SD), with each individual mouse represented as the same color filled, with a different symbol at each time point. (E) Experimental timeline of engraftment and infection in days. 17 well-engrafted mice were infected with HIV-Cre. At 12 dpi, mice were either treated with antiretroviral (ART) therapy or left untreated, and plasma viremia was determined for each mouse over the course of the infection. F Plasma viremia (copies/ml) for infected mice as determined by RT-qPCR assay specific for HIV-1 (n = 7). Plasma viremia (copies/ml) for infected and treated mice as determined by RT-qPCR assay specific for HIV-1 (n = 10). The red dotted line depicts the limit of detection (~300 copies/ml) for the RT-qPCR assay. The olive-green shaded box shows the duration of ART treatment. G The change in CD4/CD8 ratio was plotted over time following infection. The CD4/CD8 ratio changes over time in infected and ART-treated mice. The black dotted line represents the fractional deviation from the starting CD4/CD8 ratio, set to 1 for each animal. H Graph showing percentage switching normalized to total transduced (dsRed) targets in peripheral blood for infected HILT mice (n = 6, bars indicate mean +/− SD). I Plasma viremia (copies/ml) for infected mice as determined by RT-qPCR assay specific for HIV-1. (Infected mice n = 6, uninfected mice, n = 3). The dotted line indicates the background detection level of the assay.
Transduction of the immune lineages with Lenti-RG was detected by measuring the dsRed positive population in huCD45, CD3, CD4, and CD8 compartments in the peripheral blood (Fig. 2C). dsRed+ cells were observed in all the lineages. Since CD4+ T-cells are the major targets for HIV-1 infection, we selected well-engrafted mice with more than 20% dsRed+ CD4 T-cells to challenge HIV-1 NL-CreI. Lineage-marked, dsRed+ cells were detected over time within the CD4 compartment in engrafted mice (Fig. 2D), suggesting variability of dsRed expression. Additionally, our observations revealed uniform dsRed expression in various organs beyond peripheral blood, indicating a consistent distribution of dsRed throughout each organ (Fig. S3H).
HILT mice recapitulate the features of HIV-1 pathogenesis
To validate that HILT mice can be used to study long-term HIV-1 infection as well as latency, we infected HILT mice with HIV NLCreI (RHPA), treated with ART, and monitored plasma viremia over time (Fig. 2E). To study HIV-1 persistence in HILT mice, 17 mice were infected in 3 independent experiments. Seven mice out of 17 were followed for acute infection, while ten were ART-treated starting day 12 dpi. Seven acutely infected mice exhibited a peak viral load at around 15 dpi with a mean of 1.5 x 105 copies/ml. By day 25, the viral load was down to a mean of 6.6 x 103 copies/ml. Two mice in the cohort were followed for up to 105 days. They showed sustained plasma viremia throughout this period (Fig. 2F).
Ten mice out of 17 infected were ART-treated starting day 12 post-infection, and their plasma viral load followed over time. Viral loads were suppressed in treated animals about ten days after treatment started, with viremia falling below the limit of detection of the assay (Fig. 2F). At 49 dpi, the mice that were taken off treatment exhibited a rebound in viremia (Fig. 2F). The viral rebound was confirmed in 3 surviving mice out of 10 that were treated. The HIV-I NL Cre-I infected HILT mice show key features of infection, including sustained virus infection, suppression by ART, and viral rebound after ART interruption, suggesting that they are suitable for studying HIV-1 persistence.
HIV-1 NL-CreI infection led to CD4 T-cell depletion, as measured by CD4/CD8 ratio changes from the baseline (Fig. 2G). CD4 counts dropped below the baseline (baseline set to 1 for day 0 dpi) as the infection progressed, showing an overall decrease in CD4 and an increase in the CD8 population. In contrast, the mean CD4/CD8 ratio in uninfected mice did not fall below the baseline and trended towards greater than one throughout the time course (Fig. S3J-L).
GFP expression is induced by infection with HIV-Cre in HILT mice
While establishing the features of HIV-1 infection in HILT mice, we also examined if the infection could be followed in these mice using the red-to-green switch as an in vivo proxy for HIV infection status. For this study, six well-engrafted mice with more than 20% dsRed expression in the CD4+ compartment (Fig. S3H, I) were infected in 4 independent experiments to determine red to green switch in vivo in HILT mice. The infected mice were bled to determine plasma viremia, and the HIV-Cre-activated red-to-green switch was monitored in their peripheral blood (Fig. 2H, I). Switched green cells were observed in peripheral blood during acute infection and arose with plasma viremia. The average plasma viremia at 14-15 dpi (terminal time point) was 3.7 x 105 copies/ml. Mice were euthanized at 15 dpi, and spleen, lung, and bone marrow were harvested to measure the levels of switched cells. The red-to-green switch in CD3 + CD8- cells in peripheral blood, spleen, lungs, and bone marrow was observed at 15 dpi (Fig. S3H). Additionally, a red-to-green switch was not observed in the CD8+ T-cell compartment in the infected mice, confirming the high specificity of switching due to infection in the CD4 + T-cell compartment (Fig. S3H). Red-to-green switch was not observed in any of the uninfected mice. On average, 0.071%, 0.073%, 0.076%, and 0.13% CD3 + CD8- cells switched in peripheral blood, spleen, lung, and bone marrow, respectively (Fig. S3I, Table S2). In summary, HILT mice show the capacity to reveal HIV infectivity through the red-to-green switch mechanism.
Single-cell RNA sequencing identifies HIV-1 transcripts in HILT mice
Human CD4+ T-cells are composed of diverse subsets and activation states that can be partly resolved by their distinct transcriptional states32,48. To assess the transcriptomic states of infected CD4 T-cells and their persistence at single cell level in the HILT-marked huHSC mice, scRNA-seq was performed on four types of sorted CD3 + CD8- splenocytes: (1) dsRed+, (2) Unmarked, (3) GFP+ and (4) Mixed cells (dsRed+ and GFP+) of three acutely infected mice, two 10-days treated mice, one 29-days treated mice, and two uninfected mice (Fig. S4A, B).
After sequencing, reads were analyzed with the Cell Ranger pipeline48 using default parameters and aligned to our custom reference genome, which incorporated six regions on HIV clone, NL-CreI (RHPA), based on HIV splice donor (D1 and D4) and acceptors (A1 and A7) (Fig. 3A)49. Across these datasets from infected and treated mice, we observed a wide range of HIV transcript counts to over 1000 in a total of 989 cells (Fig. 3B). In this analysis we grouped HIV RNA+ cells based on the number of HIV transcripts detected, defining HIV RNA+ low cells as containing 2-10 HIV UMIs and HIV RNA+ high cells as containing ≥11 HIV UMIs. There was a total of 479 HIV+ low and 510 HIV+ high cells observed throughout our dataset. When looking at the overall distribution of HIV transcripts in acutely infected versus ART-treated mice, nearly all of the HIV RNA+ cells were from acutely infected mice. With 315, 369, and 301 HIV RNA+ cells in all three acute mice and 4 HIV RNA+ cells in one treated mouse (T1) (Fig. 4E). No HIV RNA+ cells were detected in other treated or uninfected mice (T2, T3, U1, and U2).

A Segmented HIV Cre I genome with HIV transcript and overlap annotations (blue and red) used in the 10x Genomics Cell Ranger Count custom reference. Locations of splice donors (D1 and D4) and acceptors (A1 and A7) within the genome of HIV Cre I are determined based on splice sites within the HIV-1 genome53. B Representative scatter plot of total HIV UMIs detected in cells across acutely infected and treated mice in proportion to total cellular UMI counts. Cells with >2 transcripts were defined as HIV RNA+ (n = 989). HIV RNA+ low with a range of 2 to 10 HIV transcripts per cell (n = 479) and HIV RNA+ high encompassing cells with >11 transcripts (n = 510). UMAP of all cells (n = 40,782). Cells were colored based on identified cell type (C), predicted cell type (D), HIV RNA status (E), experimental condition (F), and mouse ID (G). (Mouse ID: A1-A3 acutely infected mice, T1-T3 ART-treated mice, U1-U2 uninfected mice) (H) Bar plots representing HIV-1 RNA+ high and HIV RNA+ low cells across identified CD4 T cell subsets. The black dotted line shows the average distribution of HIV RNA+ cells (two-sided Fisher exact test: p = 0.0005).

Distribution of HIV-1 RNA across diverse CD4 T-cell clusters in acutely infected HILT mice and GFP+ cells are enriched among clusters within ART-treated mice (A) UMAP of cells from acutely infected mice A1-3 (n = 17,346) highlighting cell types. B UMAP highlighting the location of HIV RNA+ cells with high or low transcript detection (n = 985). C UMAP of cells for representative ART-treated mice T1/T2 (n = 8503) highlighting identified cell types. D UMAP highlighting the location of HILT-marked GFP+, putative latently infected cells (n = 37). E Number of HIV RNA+ cells isolated from each mouse in the pooled analysis, acutely infected (A1-A3), ART-treated (T1-T3), and uninfected mice (U1-U2). F Proportion of cell types within acutely infected and ART-treated mice datasets (two-sided Fisher exact test, p = 0.0005). The black dotted lines show the average percentage GFP+ across all T cells. G Distribution of HIV RNA+ cells within identified cell types within the acute dataset (two-sided Fisher exact test: p = 0.0005). H The deviation from the mean % HIV RNA+ cells among all cells, represented as a log2-fold change for each cluster. I Distribution of HILT-marked GFP+ cells within identified cell types (two-sided Fisher exact test: p = 0.0025). The black dotted lines show the average percentage of GFP+ across all T cells. Within the ART-treated mice, the percentage of GFP+ (putative latent) is represented for each cluster. J The deviation from the mean % GFP+ cells among all cells for each cluster represented as a log2-fold change.
We profiled 47,850 CD4 T-cells from eight mice, including 19,986 from acutely infected, 10,391 from treated, and 17,473 from uninfected mice, regardless of the presence or absence of HIV transcripts in the cells (Table S3). Batch effects were corrected using fastMNN50. After individual QC, 40,782 cells were integrated, clustered, and visualized based on a manual annotation, predicted CD4 T-cell type, the number of HIV transcripts, experimental conditions, and individual mouse IDs, respectively, using Uniform Manifold Approximation and Projection for Dimension Reduction (UMAP) (Fig. 3C–G). Unsupervised clustering revealed fourteen clusters that were annotated based on the expression of marker genes selected from literature (Fig. 3C, Table S4)51,52,53. We identified helper CD4 T-cells, memory cells, naïve cells, proliferating cells, and regulatory T-cells (Tregs) and subsets within these larger groups. Using multimodal reference mapping to a reference PBMC dataset54, we validated those designations and predicted nine distinct CD4 T-cell types (Fig. 3D). This reference mapping approach grouped the T cell types into a smaller number of predicted cell phenotypes.
HIV RNA+ cells were dispersed across most predicted CD4 T-cell types (Fig. 3D, E, Fig. S4D). HIV RNA+ cells were found in all predicted CD4 T-cell subtypes except naïve and γδT cells, which were low in abundance (CD4 Naïve n = 97, γδT n = 104) and only found in five and four of all eight mice, respectively (Fig. S4D, Table S5). When comparing the UMAP distribution of cells by conditions (Fig. 3F, Fig. S5) or by individual engrafted mice (Fig. 3G), the overall distribution of cell types was broadly maintained.
We also examined the distribution of HIV RNA+ cells in manually annotated cell types and observed that both HIV RNA+ high and HIV RNA+ low groups were present and dispersed throughout most CD4 T-cells (Fig. 3H). We observed HIV RNA+ cells in all subtypes except Th22/Th9 memory cells and Th2 cells. We note that both these cell types were scarce within our datasets (Th22/Th9 memory cells n = 146, Th2 n = 155) and only found in three and four of all eight mice. We identified the highest number of HIV RNA+ cells (4.8%) in Th1 Proliferating CD4 T-cells (Fig. 3H, Table S6, Fisher test p = 0.0005). These results are consistent with the preference of HIV for active replication in activated, proliferating T cells, though low and high RNA expression is observed across a broad distribution of T cell lineages.
HIV-1 transcripts and transcriptionally latent GFP+ cells are enriched in T-cell clusters and present across heterogeneous CD4 T-cell clusters in both acutely infected and ART-treated mice
A focused analysis was conducted on cells from acutely infected mice (A1-A3). By narrowing our analysis to these datasets, we mapped the distribution of HIV RNA+ cells across cell types within acute infection (Fig. 4A, B). In mouse #A1, flow sorting enriched for GFP+ and dsRed+/GFP- cells, which were processed together and labeled as mixed cells (Fig. S4A, B). In mice #A2 and #A3, GFP+ cells and dsRed+/GFP- cells were processed as separate samples but sequenced together. In total, we analyzed 17,346 cells distributed across 14 identified cell types.
By annotating cells with the HIV RNA expression status, we assessed whether HIV expression of both high and low was associated with distinct CD4 T-cell types. Cells were distributed throughout the UMAP (Fig. 4B). We observed both HIV RNA+ high and HIV RNA+ low cells in a majority of identified cell types, except for low abundant Th2 and Th22/Th9 Memory cells, indicating that most, if not all, CD4 T-cell states can support active HIV infection (Fig. 4F). HIV RNA+ cells were detected almost exclusively in the acutely infected mice, and not the ART-treated or uninfected mice (Fig. 4E). The percentage of HIV RNA+ cells within the cell types was significantly heterogeneous (Fisher test, p = 0.0005), ranging from the highest 8.5% in Th1 Proliferating CD4 T-cells and the lowest 1.3% in cytolytic proliferating CD4 T-cells (Fig. 4F, Table S7). We evaluated the representation of acutely infected cells within each subcluster, and found increased representation in proliferating CD4, Th1 proliferating CD4 and Tfh memory cells, and decreased representation within cytolytic proliferating CD4, two naïve cell clusters, and cytolytic memory T cell clusters (Fig. 4G, H).
Within cells collected from our 15dpi acutely infected datasets, HIV RNA transcripts were detected in dsRed+ HILT-marked CD4 + T-cells with a high range of detected HIV-specific transcripts (Fig. 3B). Surprisingly, we did not detect HIV RNA in GFP+ sorted cells from those same datasets. We interpret this as indicating that at 15 days post infection, replicating viruses do not continue to express functional cre-recombinase after and that the replicating virus represents HIV that carries non-functional cre. We note however, that since we observed robust enrichment of HIV-Cre DNA within GFP+ cells from acutely infected humanized mice (Fig. 1I), we also conclude that HILT-marked GFP+ cells are highly enriched with HIV that are functionally defined by having expressed an active cre enzyme and then persisted in a transcriptionally latent state.
Next, we examined HILT-marked, GFP+ cells in ART-treated mice at 10- and 29-days post-ART initiation, when plasma viremia was undetectable. Flow-sorted GFP+ cells were isolated from two ART-treated mice (T1, T2) (Fig. S4A, B) and sequenced alongside dsRed+/GFP- sorted cells of the same mouse. Within the analyzed 8503 cells we observed up to 2.3% GFP+ (n = 37) cells across a majority of the thirteen clusters, notably no Th2 cells were identified within the ART-treated datasets (Fig. 4D, F, I, J). This observation could be explained by the low number of GFP+ cells as well as the rarity or lack of some of these observed cell types within treated mice (Fig. 4C–E). Furthermore, we observed a significant difference in the distribution of these cells across the cell types (Fisher, p = 0.0025) with enrichment of GFP+ cells within Tregs, cytolytic memory, and Tfh/Th22 cells (Fig. 4F, I, J, Table S7) and lower representation in proliferating cells, naïve CD4-1, and Tfh memory cells (Fig. 4H, J).
Notably, some of these cell types were significantly more abundant within this dataset in direct comparison to the acute data sets (A1-A3) (Fisher test, p = 0.0005) (Fig. 4F, Table S7). The proportion of proliferating cell types (Proliferating CD4, and Th1-Proliferating), Tregs (Tregs, IL10RA+ Tregs), as well as certain CD4 T-helper cell types Th2, Th1/Th17 Memory Th22/Th9 Memory cells were significantly increased within the acutely infected mice. Interestingly, we found an inverse effect in Cytolytic Memory and Tfh Memory proportions as well as naïve CD4 T-cells. Furthermore, the proportion of Tfh/Th22 cells was not significantly changed between both conditions. When analyzing these results, however, we note that the total number of cells analyzed from the acutely infected was almost twice the amount in the ART-treated dataset. These results suggest that the Infection with HIV-Cre and the GFP + HILT phenotype that marks cells as transcriptionally latent cells infected with HIV-Cre, are decreased in some cell types and enriched in others and can persist in a broad range CD4 T-cell types in our humanized mouse system.
Acute and ART-treated datasets reveal distinct transcriptional patterns in overlapping pathways
We next examined datasets of acutely infected mice to identify transcriptional signatures associated with high HIV-1 RNA expression. A differential gene expression (DGE) analysis between two populations (HIV RNA+ high vs. HIV RNA-) was conducted initially on all CD4 T-cell subsets, independent of cell type designation, within three acutely infected mice datasets (A1-A3). The results of mouse #A3 are displayed as a volcano plot (Figs. 5A, S6A, B). We observed many highly downregulated RNA binding proteins in HIV RNA+ high cells, including HNRNPs and SRS proteins involved in RNA metabolism. Notably, HNRNPs have distinct interactions in viral synthesis and HIV-1 mRNA splicing55,56, while several genes of the SRS family are associated with the downregulation of HIV-1 replication through control of alternative splicing of the viral genome55,57,58,59,60. In comparison, Genes like JUN, MDM2, and BAX displayed a positive log-fold change across all acutely infected mice (Figs. 5A, S6A, B), reflecting alterations in T-cell activation (JUN), proliferation (MDM2), and apoptosis (BAX). Notably, BAX, a well-studied proapoptotic protein, is activated by Vpr61,62.

A Volcano Plot for DGE analysis of HIV RNA+ high vs HIV RNA- cells in 15 dpi acutely infected mouse ID A3. Red-colored dots represent any differentially expressed gene <0.05 FDR, while blue color dots represent genes with <0.05 FDR found in all three datasets of acutely infected mice (common genes). B Enrichr gene set enrichment analysis of common genes within acutely infected mice displaying top 5 gene sets by -log p-value (one-sided Fisher’s exact test, Benjamini-Hochberg multiple testing correction). C IPA analysis of the top 5 enriched canonical pathways by p-value (one-sided Fisher’s exact test, Benjamini-Hochberg multiple testing correction) and colored by z-score. D Volcano Plot for HILT marked dsRed-/GFP+ cells vs HILT marked dsRed+/GFP- cells within 10-days and 29-days ART-treated mice T1/T2. Red dots represent any differentially expressed gene <0.05 FDR threshold, while green color dots represent genes <0.05 FDR found in all mice. E Enrichr gene set enrichment analysis of common genes within ART-treated mice displaying top 5 gene sets by -log p-value (one-sided Fisher’s exact test, Benjamini-Hochberg multiple testing correction). F IPA shows the top 5 enriched canonical pathways by p-value (one-sided Fisher’s exact test, Benjamini-Hochberg multiple testing correction) and colored by z-score in ART-treated mice. Pathway comparison analysis of total CD4 cells and subsets of CD4 cells sorted by -log pvalue (one-sided Fisher’s exact test, Benjamini-Hochberg multiple testing correction) (G) and z-score (H). HIV RNA+ high cells (n = 508) from acutely infected mice vs uninfected dsRed+/GFP- cells (n = 7504) from uninfected mice U1/U2 with HILT-marked GFP+ cells vs HILT marked dsRed+/GFP- cells within 10-days and 29-days ART-treated mice T1/T2 were used.
We then focused on the top 145 common significant DEGs found in all three acute datasets in the Enrichr gene set enrichment analysis63,64,65. HSPA8 and HNRNPA2B1 were connected gene sets relating to mRNA splicing and Regulation of the Apoptotic Process. JUN and PHLDA3, which were upregulated in Fig. 5A, were also identified to be part of the apoptotic regulation gene set and the positive regulation of cell cycle G2/M phase transition (Fig. 5B). To examine predicted pathway perturbations that resemble pathways changed during acute infection, we employed Ingenuity Pathway Analysis (IPA) (Fig. 5C). Most significant was the Kinetochore metaphase signaling pathway, associated with cell cycle regulation. In addition, EIF2 Signaling, Granzyme A Signaling, Sirtuin Signaling, and G2/M DNA Damage checkpoint regulation in the cell cycle were among the top pathways by p-value and were upregulated during acute infection.
We analyzed GFP+ cells surviving ART treatment for 10 or 29 days, all of which were HIV RNA-negative. DGE analysis compared GFP+ and GFP- cells within the treated mice dataset (#T1-T2). Upregulation of MHC Class-I genes (HLA-A, HLA-B) and downregulation of Class-II MHC gene (HLA-DQA) were observed. Additionally, ribosomal genes associated with translation (e.g., RPS and RPL) exhibited high log fold change in GFP+ cells (Fig. 5D), linked with cytoplasmic translation and macromolecule biosynthetic processes in transcriptionally repressed cells. In ART-treated mice, gene set enrichment analysis of 117 genes (Fig. 5E) revealed significant pathways related to biosynthesis of molecules in translation and antigen processing and presentation via MHC class-I, with higher significance compared to our analysis of acute infection.
IPA of the ART-treated dataset revealed a high degree of overlap of the top pathways in the acutely infected mouse. The EIF2 Signaling pathway was upregulated in acutely infected cells and downregulated in GFP+ cells. While the Sirtuin pathway, Oxidative phosphorylation, and Signaling by Rho family GTPases show positive z-scores with higher p-values (Fig. 5F). To determine the extent to which initial differential expression analysis is influenced by differences in the infection rate among T cell subpopulations, we examined whether the same pathways are affected within CD4 T-cell subsets within populations with the highest number of HIV RNA+ high cells. In this analysis, we examined two predicted cell types from reference map annotation and three manually annotated Seurat cell clusters (Fig. 5G, H). Comparative analysis of HIV RNA+ high cells vs. uninfected cells showed that the top pathways remained highly significant across all T-cell subsets, with p-values much greater significance than within the acute mice themselves (Fig. 5C). This indicates that the main pathways enriched in acutely infected cells are common to different T cell subsets. Across T cell subsets, the EIF2 signaling pathway was consistently the most significant, with high p-values and positive z-scores across subsets except in the treated condition (Fig. 5G, H). Oxidative phosphorylation and Sirtuin signaling pathways were also consistently dysregulated across subsets, indicating their involvement in acute infection and transcriptionally latently infected cells.
Top DGE pathways in acutely infected humanized mice resemble those identified in acutely infected human patient-derived cells
To assess the comparability of humanized mouse T-cells to those in humans, we validated our pathway analysis results against a study by Collora et al., 202234, which used ECCITE-Seq to profile 267 HIV-1 RNA+ cells from six human patient samples. Implementing their DGE results within our analysis, we observed remarkably similar pathways with high statistical significance when compared to total CD4 T-cell populations from acutely infected humanized mice (Fig. 6A, B). The top pathways, including EIF2 Signaling, Oxidative Phosphorylation, Sirtuin Signaling, and Protein Ubiquitination, were consistently top-ranked according to p-value. Our model system yielded p-values of much greater significance, due to higher numbers of HIV RNA+ cells from our acutely infected in vivo samples (Fig. 6A). Despite this difference, the z-score directionality of top pathways remained consistent, further highlighting the utility of the HILT system (Fig. 6B).

IPA pathway comparison analysis of between HIV-1 RNA+ cells of Collora et al., 2022 dataset and HIV-1 RNA+ cells from Total CD4 T-cells in our dataset represented in heatmap showing -log P-value (one-sided Fisher’s exact test, Benjamini-Hochberg multiple testing correction) (A), and Z-score (B). C Comparison of top canonical pathways ranked by -log p-value across acutely infected mice A1-A3 and ART-treated mice T1/T2 (one-sided Fisher’s exact test, Benjamini-Hochberg multiple testing correction). Expression log ratio represented in heatmaps of selected genes within EIF2 signaling (D), Sirtuin Signaling (E), and Protein Ubiquitination Pathway (F) across DEG of HIV RNA+ high vs. HIV RNA- cells within three acute (A1-A3) and GFP+ vs dsRed+/GFP- cells of two ART-treated mice datasets (T1/T2). The negative expression log ratio is colored blue, while the high is yellow, indicating a positive expression log ratio.
Acutely infected T cells and latently infected T cells perturb similar differentially expressed pathways but frequently modulate them in opposite directions
We next conducted a focused comparative pathway analysis directly comparing all three acute and treated datasets separately (Fig. 6C, Fig. S7A). In our comparison analysis, we again found the EIF2 signaling and Sirtuin pathways, with higher significance in HIV RNA+ high cells in acute infection and in GFP+ cells in treated mice. When we examined the expression of chosen genes associated with three selected pathways EIF2, Sirtuin, and protein ubiquitination, some of the genes were differentially expressed in acute versus ART-treated conditions (Fig. 6D–F). In the EIF2 pathway, BCL2, RPS27L, and AGO2 were highly downregulated in treated GFP+ cells. Whereas HNRPA1, EIF2S1, EIF3M, PPP1CA, PTBP1 were downregulated in HIV RNA+ high cells in acute Infection (Fig. 6D). JUN, GADD45A, NAMPT, XPC, and BAX showed diminished expression in the treated condition in the Sirtuin pathway (Fig. 6E). In protein ubiquitination, which is related to cell cycle, genes like MDM2, DNAJB9, and USP9Y, showed enhanced expression in acute infection, while in GFP+ cells of treated mice, expression of these genes is suppressed (Fig. 6F). Besides these top three pathways, we also looked at genes associated with Kinetochore metaphase signaling, Coronavirus pathogenesis, and inhibition of ARE-mediated mRNA degradation pathways (Fig. S7 B–D). Overall, it is reassuring that in the changes we observe, the top gene expression changes associated with acute infection are driven in opposite direction in latency. We suggest that a comparison of these data sets elucidates key transcriptional differences that may predispose cells towards active versus latent HIV infection.