WEN Lingdu, WANG Zihong, ZHANG Guoming, LAI Xi, YANG Hongyu
Objective To explore the value of an oral squamous cell carcinoma (OSCC) diagnostic model constructed by using principal component analysis (PCA) to analyze a database of differentially expressed genes in OSCC and to provide a reference for clinical diagnosis and treatment. Methods RNA-seq expression data of OSCC and normal control samples were obtained from The Cancer Genome Atlas (TCGA) database, and then, normalized and differentially expressed genes (DEGs) were identified by R software. DEGs were enriched by Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis to identify their main biological characteristics. 70% of DEGs expression data in RNA-seq were randomly selected as the training set and 30% were selected as the test set. Then, the PCA method was applied to analyze the training set data and extract the principal components (PCs) related to the diagnosis of OSCC in order to construct a PCA model. Then, the receiver operating characteristic (ROC) curves of PCA models in the training set and the test set were respectively drawn, and the area under curve (AUC) was calculated to evaluate the accuracy of the PCA model in the diagnosis of OSCC. Results RNA-seq expression data of OSCC and normal control samples obtained from TCGA database included 330 samples and 32 samples, respectively. Using false discovery rate (FDR) <0.001 and |log2 fold change| (|log2FC|) >4 as the thresholds, a total of 159 downregulated and 248 upregulated DEGs were identified, which were mainly enriched in cellular components such as intermediate fiber and melanosomal membrane, pigment and salivation-related biological processes and mainly involved in salivary secretion and tyrosine metabolism pathways (P.adjust<0.05 and Q<0.05). The DEGs were proposed as tumor markers for OSCC, and PCA analysis of the training set showed that the cumulative ratio of variance of PC1, PC2 and PC3: [including submaxillary gland androgen regulated protein 3B (SMR3B), proline rich 27 (PRR27), histatin 3 (HTN3), statherin (STATH), cystatin D (CST5), BPI fold containing family A member 2 (BPIFA2), proline rich protein Hae Ⅲ subfamily 2 (PRH2), keratin 35(KRT35), histatin 1 (HTN1), amylase alpha 1B (AMY1B)] were 0.873, 0.100 and 0.023, respectively, and the total weight of the three was 0.996. The PCA diagnostic model of OSCC was further constructed by combining the eigenvectors of the above three components. The ROC curves of the training set and test set showed that the AUC values of the PCA model were 0.852 and 0.844, respectively, which were higher than those of other single genes. Conclusion The OSCC diagnostic model based on the expression levels of SMR3B, PRR27, HTN3, STATH, CST5, BPIFA2, PRH2, KRT35, HTN1 and AMY1B constructed with the PCA method and DEGs has a high diagnostic advantage. This study provides a theoretical basis for the early genetic diagnosis of OSCC and the application of the PCA model in clinical diagnosis.