About Glycan Fragment DB
We have developed Glycan Fragment DB (GFDB) by searching the entire PDB, identifying PDB structures with biologically relevant carbohydrate moieties, and classifying PDB glycan structures based on their primary sequence and glycosidic linkage. Figure 1 illustrates a schematic view of hierarchical building procedure that we have developed for the GFDB; e.g., starting from a known glycan with sugar 1-2-(3-5)-4 (green box) to fragments enclosed by each red box. As of August 2012, the GFDB contains 5,360 PDB entries that contain at least one carbohydrate molecule and 20,467 glycan chains. Among those glycan chains, 11,735 (57%) are N-linked glycan chains and 788 (4%) are O-linked glycan. And the remaining 7,944 (39%) exist as ligands. For the glycan structures with more than 2 carbohydrates, the hierarchical fragmentation identified a total of 81,370 fragment structures with 4,267 unique glycan sequences; a unique glycan sequence has more than 2 carbohydrates and is defined by the carbohydrate sequence and the glycosidic linkages. Figure 2 shows the number of unique glycan sequences as a function of glycan chain length with (red) and without (black) the hierarchical fragmentation, illustrating that more unique sequences can be extracted from the hierarchical search. Further examination reveals that the unique glycan sequences in the GFDB can cover up to 86% of glycan sequences in the KEGG glycan database with a maximum of two missing glycosidic linkages; 13% (no gap), 41% (one linkage missing), and 32% (two linkages missing).