Current research


The evolution of transcriptomics for tumor

In our recently published paper, the prognosis can be reflected by the transcriptome ploidy of a tumor. A rarely studied, yet very interesting topic is that what is the difference of transcriptomics (phenotype) in different tumor evolutional branches. This investigation may identify treatment and metastasis specific tumor subclones, inspiring better drug development methods.

Develop fast tools for tumor squencing data analysis

Given the large size of next generation sequencing data, commonly used tools for mutation calling and deconvolution usually take a very long time for a patient cohort, such as weeks if not months. Fast tools that utilize the power of multi-cores, GPU and TPU are urgently needed to sigfinicantly reduce the time cost, thus greatly speed up both the research community and clinical practice.

Pathological image data analysis

Histopathological imaging (e.g., H&E) examination is still a gold standard today for cancer dianosis and stage classification. These images usually have a high resolution at single cell/nucleus. The morphology of tumor cells, as well as the spatial distribution of tumor infiltrating immune cells are supposed to reflect the tumor genotype and patient’s prognosis. Machine learning, specifically deep learning models, can be applied in this field.


Previous research


3/2018 – 8/2019 Chongqing, P. R. China

Deep Learning Based Natural Language Processing (YuCun Big Data Technology Co., Ltd.). I was working as the team leader of the algorithm development for natural language processing. We used deep convolutional networks, recurrent networks (e.g., LSTM), as well as recently developed deep learning models (e.g., BERT, GPT) to classify documents, extract abstract from documents and identify entities (including company name, person name and address).

10/2013 – 1/2018 University of Birmingham, the United Kingdom

Amino Acid Contact & Distance Prediction, Protein 3D Structure Prediction (PhD). Amino acid contact & distance were predicted by using deep neural network (up to 8-layer feedforward networks in a primary version and up to ~200-layer convolutional networks in an updated version). Protein structures were predicted based on the predicted amino acid contact & distance constraints with the Rosetta suite.

7/2014 – 7/2015 University of Birmingham, the United Kingdom

Ligand Matching (PhD). Used the idea of scaffold hopping to find function-similar ligands in the whole PDB to each of several templates, potential hits were searched based on the comparison of the similarity of the distance matrices of the scaffold atoms in the template and atoms in protein bound ligands.

10/2013 – 10/2014 University of Birmingham, the United Kingdom

Building a Cross-reference Database Joining PDB, UniProt and Pfam (PhD). A three-way cross-reference database was built to make it more convenient to search protein structures based on the domain or sequence information, or search protein domains based on protein structure or sequence information. Building tools include Java, Python, MySQL, the algorithm of Dynamic Programming.

09/2010 – 07/2013 Peking University, P. R. China

Biomedical Image Processing (Master). Analyzed magnetic resonance imaging (MRI) images and microscopic images of mitochondrion in collaboration with Peking University First Hospital and the Institute of Molecular Medicine at Peking University. Methods adopted in this project include Principle Component Analysis, Independent Component Analysis, the Reaction & Diffusion Equation, K-Means, Fuzzy C-Means, etc.