c d

📌 C—D

cadherin calcium channel Calvin cycle CAMK capsid carbohyrate carbohydrate metabolism carbon carbon fixation (carbon assimilation) carbonyl group carboxylic acid catabolism catalysis catalytic triad catenation catenin cell adhesion cell adhesion molecule cell disruption cell signaling cell surface receptor cell unroofing cellular respiration cellulose central dogma of molecular biology ceramide chemical bond chemical element chemical groups chemical polarity chemokine chemotaxis ChIP-exo chloroplast DNA cholesterol chromatin chromatin immunoprecipitation chromomere chromosomal deletion syndrome chromosome abnormality chromosome jumping circular RNA cis-natural antisense transcript cloning vector coding region codons coenzyme A cofactor (biochemistry) collagen complementary DNA complementarity (molecular biology) condensation reaction conjugate acid connexin connexon conserved sequence covalent bond (molecular bond) coverage (genetics) cross-link cyclin cytochrome c cytokine

Dalton (unit) degradosome deletion (genetics) denaturation (biochemistry) deoxyribonuclease deoxyribose C5H10O4 dephosphorylation deprotonation dipeptide direct DNA damage directionality (molecular biology) disaccharide disulfides, organic DNA DNA adduct DNA, antisense DNA-binding domain DNA-binding protein DNA, chloroplast DNA codon table DNA computing DNA condensation DNA damage (naturally occurring)11 DNA damage theory of aging DNA digital data storage DNA-directed RNA interference DNA ends (Sticky and blunt ends) DNA ligase DNA methylation DNA methyltransferase DNA microarray DNA mismatch repair DNA polymerase DNA primase DNA profiling DNA repair DNA replication — Cell cycle regulation DNA replication — Elongation DNA replication, eukaryotic DNA Replication fork — Leading strand, lagging strand DNA replication — Initiation DNA replication — Okazaki fragments DNA replication — Pre-replication complex DNA replication — Replication through nucleosomes DNA replication — Termination DNA replisome DNA sense DNA sequencer DNA sequencing DNA sequencing theory DNA virus DNase I hypersensitive site double-stranded RNA viruses duplex sequencing

Cadherins (named for "calcium-dependent adhesion") are a type of cell adhesion molecule (CAM) that is important in the formation of adherens junctions to bind cells with each other. Cadherins are a class of type-1 transmembrane proteins. They are dependent on calcium (Ca2+) ions to function, hence their name. Cell-cell adhesion is mediated by extracellular cadherin domains, whereas the intracellular cytoplasmic tail associates with numerous adaptor and signaling proteins, collectively referred to as the cadherin adhesome. (W)

Principal interactions of structural proteins at cadherin-based adherens junction. Actin filaments are linked to α-actinin and to membrane through vinculin. The head domain of vinculin associates to E-cadherin via α-, β-, and γ-catenins. The tail domain of vinculin binds to membrane lipids and to actin filaments..

The diagram shows a cell-cell junction called adherens or zonula adherens. It also contains the main proteins involved in it.

Domain organization of different types of cadherins.

Domain organization of different types of cadherins showing unique features of protocadherins :Extracellular domain is longer and intracellular domain lack attachment with cytoskeleton..

Cartoon representation of a repeating unit in the extracellular E-cadherin ectodomain of the mouse found in the PDB 3Q2V, rendered using Pymol (DeLano Scientific).

Jieming Chen rendered this from PyMol, using the PDB file from the work of Harrison OJ, Jin X, Hong S, Bahna F, Ahlsen G, Brasch J, Wu Y, Vendome J, Felsovalyi K, Hampton CM, Troyanovsky RB, Ben-Shaul A, Frank J, Troyanovsky SM, Shapiro L, Honig B. - The paper in which this is published is Structure. 2011 Feb 9;19(2):244-56. Harrison OJ, Jin X, Hong S et. al. (2011) .
calcium channel
calcium channel is an ion channel which shows selective permeability to calcium ions. It is sometimes synonymous with voltage-gated calcium channel,  although there are also ligand-gated calcium channels. (W)

Depiction of binding sites of various antagonistic drugs in the L-type calcium channel.

Calvin cycle

The Calvin cycle, light-independent reactions, bio synthetic phase, dark reactions, or photosynthetic carbon reduction (PCR) cycle of photosynthesis are the chemical reactions that convert carbon dioxide and other compounds into glucose. These reactions occur in the stroma, the fluid-filled area of a chloroplast outside the thylakoid membranes. These reactions take the products (ATP and NADPH) of light-dependent reactions and perform further chemical processes on them. [The Calvin cycle uses the reducing powers ATP and NADPH from the light dependent reactions to produce sugars for the plant to use. These substrates are used in a series of reduction-oxidation reactions to produce sugars in a step-wise process. There is not a direct reaction that converts CO2 to a sugar because all of the energy would be lost to heat. There are three phases to the light-independent reactions, collectively called the Calvin cycle: carbon fixation, reduction reactions, and ribulose 1,5-bisphosphate (RuBP) regeneration. (W)

The internal structure of a chloroplast.

Overview of the Calvin cycle and carbon fixation.
CAMK, also written as CaMK, is an abbreviation for the Ca2+/calmodulin-dependent protein kinase class of enzymes. CAMKs are activated by increases in the concentration of intracellular calcium ions (Ca2+) and calmodulin. When activated, the enzymes transfer phosphates from ATP to defined serine or threonine residues in other proteins, so they are serine/threonine-specific protein kinases. Activated CAMK is involved in the phosphorylation of transcription factors and therefore, in the regulation of expression of responding genes. CAMK also works to regulate the cell life cycle (i.e. programmed cell death), rearrangement of the cell's cytoskeletal network, and mechanisms involved in the learning and memory of an organism. (W)

Figure 1: Diagram of how CAMK II becomes active in the presence of calcium or calmodulin.

Figure 2: Graphic illustration of the crude domains of Calcium/calmodulin-dependent protein kinase 1.

Figure 3: Image of CAMK 2A which is a form of Calcium/calmodulin-dependent kinase in its crystalline form.
Cartoon representation of the molecular structure of protein registered with 1hkx code.

A capsid is the protein shell of a virus, enclosing its genetic material. It consists of several oligomeric (repeating) structural subunits made of protein called protomers. The observable 3-dimensional morphological subunits, which may or may not correspond to individual proteins, are called capsomeres. The proteins making up the capsid are called capsid proteins or viral coat proteins (VCP). The capsid and inner genome is called the nucleocapsid.

Capsids are broadly classified according to their structure. The majority of the viruses have capsids with either helical or icosahedral structure. Some viruses, such as bacteriophages, have developed more complicated structures due to constraints of elasticity and electrostatics. The icosahedral shape, which has 20 equilateral triangular faces, approximates a sphere, while the helical shape resembles the shape of a spring, taking the space of a cylinder but not being a cylinder itself. The capsid faces may consist of one or more proteins. For example, the foot-and-mouth disease virus capsid has faces consisting of three proteins named VP1–3. (W)

Schematic of a cytomegalovirus.

Illustration of geometric model changing between two possible capsids. A similar change of size has been observed as the result of a single amino-acid mutation.

Icosahedral capsid of an adenovirus.

Virus capsid T-numbers.

(A) Spherical capsids of various sizes are composed of 12 pentamers (represented as darkened pentagons) and a variable number of hexamers. (B) Quasi-equivalence posits that one may produce a pentamer from a hexamer by removing one subunit and its environment (the shaded triangular region) and joining the unpaired interfaces. This operation imposes pentameric dihedral angle values (“endo angles”) onto its neighboring hexameric angles, which, if unchallenged, propagate through the hexamers (depicted by arrows) in what we call endo angle propagation. (W)

The prolate structure of a typical head on a bacteriophage.

3D model of a helical capsid structure of a virus.
A carbohydrate is a biomolecule consisting of carbon (C), hydrogen (H) and oxygen (O) atoms, usually with a hydrogen-oxygen atom ratio of 2:1 (as in water) and thus with the empirical formula Cm(H2O)n (where m may be different from n). This formula holds true for monosaccharides. Some exceptions exist; for example, deoxyribose, a sugar component of DNA, has the empirical formula C5H10O4. The carbohydrates are technically hydrates of carbon; structurally it is more accurate to view them as aldoses and ketoses. (W)

Lactose is a disaccharide found in animal milk. It consists of a molecule of D-galactose and a molecule of D-glucose bonded by beta-1-4 glycosidic linkage.

carbohydrate metabolism

Carbohydrate metabolism is the whole of the biochemical processes responsible for the metabolic formation, breakdown, and interconversion of carbohydrates in living organisms.

Carbohydrates are central to many essential metabolic pathways. Plants synthesize carbohydrates from carbon dioxide and water through photosynthesis, allowing them to store energy absorbed from the sunlight internally. When animals and fungi consume plants, they use cellular respiration to break down these stored carbohydrates to make energy available to cells. Both animals and plants temporarily store the released energy in the form of high-energy molecules, such as ATP, for use in various cellular processes.

Although humans consume a variety of carbohydrates, digestion breaks down complex carbohydrates into a few simple monomers (monosaccharides) for metabolism: glucose, fructose, and galactose. Glucose constitutes about 80% of the products and is the primary structure that is distributed to cells in the tissues, where it is broken down or stored as glycogen. In aerobic respiration, the main form of cellular respiration used by humans, glucose and oxygen are metabolized to release energy, with carbon dioxide and water as byproducts. Most of the fructose and galactose travel to the liver, where they can be converted to glucose.

Some simple carbohydrates have their own enzymatic oxidation pathways, as do only a few of the more complex carbohydrates. The disaccharide lactose, for instance, requires the enzyme lactase to be broken into its monosaccharide components, glucose and galactose. (W)

Diagram of the relationship between the processes of carbohydrate metabolism, including glycolysis, gluconeogenesis, glycogenesis, glycogenolysis, fructose metabolism, and galactose metabolism.
Carbon (from Latin: carbo "coal") is a chemical element with the symbol C and atomic number 6. It is nonmetallic and tetravalent — making four electrons available to form covalent chemical bonds. It belongs to group 14 of the periodic table. Three isotopes occur naturally, 12C and 13C being stable, while 14C is a radionuclide, decaying with a half-life of about 5,730 years. Carbon is one of the few elements known since antiquity.

Carbon is the 15th most abundant element in the Earth's crust, and the fourth most abundant element in the universe by mass after hydrogen, helium, and oxygen. Carbon's abundance, its unique diversity of organic compounds, and its unusual ability to form polymers at the temperatures commonly encountered on Earth enables this element to serve as a common element of all known life. It is the second most abundant element in the human body by mass (about 18.5%) after oxygen. (W)

Some allotropes of carbon: a) diamond; b) graphite; c) lonsdaleite; d–f) fullerenes (C60, C540, C70); g) amorphous carbon; h) carbon nanotube.

Correlation between the carbon cycle and formation of organic compounds. In plants, carbon dioxide formed by carbon fixation can join with water in photosynthesis (green) to form organic compounds, which can be used and further converted by both plants and animals.

Diagram of the carbon cycle. The black numbers indicate how much carbon is stored in various reservoirs, in billions tonnes ("GtC" stands for gigatonnes of carbon; figures are circa 2004). The purple numbers indicate how much carbon moves between reservoirs each year. The sediments, as defined in this diagram, do not include the ≈70 million GtC of carbonate rock and kerogen.

📹 Geometry of carbon bonds (VİDEO)

📹 Geometry of carbon bonds (LINK)

Tetrahedral and trigonal planar bond geometries, conformation and rotation.


carbon fixation (carbon assimilation)

Carbon fixation or сarbon assimilation is the process by which inorganic carbon (particularly in the form of carbon dioxide) is converted to organic compounds by living organisms. The organic compounds are then used to store energy and as building blocks for other important biomolecules. The most prominent example of carbon fixation is photosynthesis; another form known as chemosynthesis can take place in the absence of sunlight.

Organisms that grow by fixing carbon are called autotrophs, which include photoautotrophs (which use sunlight), and lithoautotrophs (which use inorganic oxidation). Heterotrophs are not themselves capable of carbon fixation but are able to grow by consuming the carbon fixed by autotrophs. "Fixed carbon", "reduced carbon", and "organic carbon" may all be used interchangeably to refer to various organic compounds. (W)

Cyanobacteria such as these carry out photosynthesis. Their emergence foreshadowed the evolution of many photosynthetic plants, which oxygenated Earth's atmosphere.

carbonyl group

In organic chemistry, a carbonyl group is a functional group composed of a carbon atom double-bonded to an oxygen atom: C=O. It is common to several classes of organic compounds, as part of many larger functional groups. A compound containing a carbonyl group is often referred to as a carbonyl compound. (W)

A compound containing a carbonyl group (C=O).

A carbonyl compound.
carboxylic acid
A carboxylic acid is an organic compound that contains a carboxyl group (C(=O)OH) attached to an R-group. The general formula of a carboxylic acid is R–COOH, with R referring to the alkyl group. Carboxylic acids occur widely. Important examples include the amino acids and fatty acids. Deprotonation of a carboxylic acid gives a carboxylate anion. (W)

.Structure of a carboxylic acid

Carboxylate Anion.

3D structure of a carboxylic acid.


Catabolism is the set of metabolic pathways that breaks down molecules into smaller units that are either oxidized to release energy or used in other anabolic reactions. Catabolism breaks down large molecules (such as polysaccharides, lipids, nucleic acids, and proteins) into smaller units (such as monosaccharides, fatty acids, nucleotides, and amino acids, respectively). Catabolism is the breaking-down aspect of metabolism, whereas anabolism is the building-up aspect.

Cells use the monomers released from breaking down polymers to either construct new polymer molecules or degrade the monomers further to simple waste products, releasing energy. Cellular wastes include lactic acid, acetic acid, carbon dioxide, ammonia, and urea. The creation of these wastes is usually an oxidation process involving a release of chemical free energy, some of which is lost as heat, but the rest of which is used to drive the synthesis of adenosine triphosphate (ATP). This molecule acts as a way for the cell to transfer the energy released by catabolism to the energy-requiring reactions that make up anabolism. (Catabolism is seen as destructive metabolism and anabolism as constructive metabolism). Catabolism, therefore, provides the chemical energy necessary for the maintenance and growth of cells. Examples of catabolic processes include glycolysis, the citric acid cycle, the breakdown of muscle protein in order to use amino acids as substrates for gluconeogenesis, the breakdown of fat in adipose tissue to fatty acids, and oxidative deamination of neurotransmitters by monoamine oxidase. (W)

Simplified diagram of catabolism of proteins, carbohydrates and fats..

Catalysis is the process of increasing the rate of a chemical reaction by adding a substance known as a catalyst, which is not consumed in the catalyzed reaction and can continue to act repeatedly. Because of this, only very small amounts of catalyst are required to alter the reaction rate in most cases.

In general, chemical reactions occur faster in the presence of a catalyst because the catalyst provides an alternative reaction pathway with a lower activation energy than the non-catalyzed mechanism. In catalyzed mechanisms, the catalyst usually reacts to form a temporary intermediate, which then regenerates the original catalyst in a cyclic process. A substance which provides a mechanism with a higher activation energy does not decrease the rate because the reaction can still occur by the non-catalyzed route. An added substance which does reduce the reaction rate is not considered a catalyst but a reaction inhibitor. Catalysis may be classified as either homogeneous or heterogeneous. A homogeneous catalysis is one whose molecules are dispersed in the same phase (usually gaseous or liquid) as the reactant's molecules. A heterogeneous catalysis is one whose molecules are not in the same phase as the reactant's, which are typically gases or liquids that are adsorbed onto the surface of the solid catalyst. Enzymes and other biocatalysts are often considered as a third category. (W)

catalytic triad

A catalytic triad is a set of three coordinated amino acids that can be found in the active site of some enzymes. Catalytic triads are most commonly found in hydrolase and transferase enzymes (e.g. proteases, amidases, esterases, acylases, lipases and β-lactamases). An Acid-Base-Nucleophile triad is a common motif for generating a nucleophilic residue for covalent catalysis. The residues form a charge-relay network to polarise and activate the nucleophile, which attacks the substrate, forming a covalent intermediate which is then hydrolysed to release the product and regenerate free enzyme. The nucleophile is most commonly a serine or cysteine amino acid, but occasionally threonine or even selenocysteine. The 3D structure of the enzyme brings together the triad residues in a precise orientation, even though they may be far apart in the sequence (primary structure).

As well as divergent evolution of function (and even the triad's nucleophile), catalytic triads show some of the best examples of convergent evolution. Chemical constraints on catalysis have led to the same catalytic solution independently evolving in at least 23 separate superfamilies. Their mechanism of action is consequently one of the best studied in biochemistry. (W)

The enzyme TEV proteasecontains an example of a catalytic triad of residues (red) in its active site. The triad consists of an aspartate (acid), histidine (base) and serine (nucleophile). The substrate (black) is bound by the binding site to orient it next to the triad. (PDB: 1LVM​).

In chemistry, catenation is the bonding of atoms of the same element into a series, called a chain. A chain or a ring shape may be open if its ends are not bonded to each other (an open-chain compound), or closed if they are bonded in a ring (a cyclic compound). (W)
Catenins are a family of proteins found in complexes with cadherin cell adhesion molecules of animal cells. The first two catenins that were identified became known as α-catenin and β-catenin. α-Catenin can bind to β-catenin and can also bind filamentous actin (F-actin). β-Catenin binds directly to the cytoplasmic tail of classical cadherins. Additional catenins such as γ-catenin and δ-catenin have been identified. The name "catenin" was originally selected ('catena' means 'chain' in Latin) because it was suspected that catenins might link cadherins to the cytoskeleton (W)

Interactions of structural proteins at cadherin-based adherens junction. The exact means by which cadherins are linked to actin filaments is still under investigation.

The diagram shows a cell-cell junction called adherens or zonula adherens. It also contains the main proteins involved in it..

Figure 1. β-catenin at cell-to-cell contacts of P19 embryonal carcinoma cells.

Mouse P19 embryonal carcinoma cells immunostained to show the location of beta-catenin at cell-to-cell contacts. User:JWSchmidtMy personal image. Summary Mouse P19 embryonal carcinoma cells immunostained to show the location of beta-catenin at cell-to-cell contacts..
cell adhesion
Cell adhesion is the process by which cells interact and attach to neighbouring cells through specialised molecules of the cell surface. This process can occur either through direct contact between cell surfaces such as cell junctions or indirect interaction, where cells attach to surrounding extracellular matrix, a gel-like structure containing molecules released by cells into spaces between them. Cells adhesion occurs from the interactions between cell-adhesion molecules (CAMs), transmembrane proteins located on the cell surface. Cell adhesion links cells in different ways and can be involved in signal transduction for cells to detect and respond to changes in the surroundings. Other cellular processes regulated by cell adhesion include cell migration and tissue development in multicellular organisms. Alterations in cell adhesion can disrupt important cellular processes and lead to a variety of diseases, including cancer  and arthritis. Cell adhesion is also essential for infectious organisms, such as bacteria or viruses, to cause diseases. (W)

Schematic of cell adhesion.

Overview diagram of different types of cell junctions present in epithelial cells, including cell–cell junctions and cell–matrix junctions.

Adheren junction showing homophilic binding between cadherins and how catenin links it to actin filaments.

Gap junctions showing connexons and connexins.

Hemidesmosomes diagram showing interaction between integrins and laminin, including how integrins are linked to keratin intermediate filaments
cell adhesion molecule

Cell adhesion molecules (CAMs) are a subset of cell adhesion proteins located on the cell surface involved in binding with other cells or with the extracellular matrix (ECM) in the process called cell adhesion. In essence, cell adhesion molecules help cells stick to each other and to their surroundings. Cell adhesion is a crucial component in maintaining tissue structure and function. In fully developed animals, these molecules play an integral role in creating force and movement and consequently ensure that organs are able to execute their functions. In addition to serving as "molecular glue", cell adhesion is important in affecting cellular mechanisms of growth, contact inhibition, and apoptosis. Oftentimes aberrant expression of CAMs will result in pathologies ranging from frostbite to cancer.

Combined with cell junctions and ECM, CAMs help hold animal cells together. (W)

cell disruption
Cell disruption is a method or process for releasing biological molecules from inside a cell.


The production of biologically interesting molecules using cloning and culturing methods allows the study and manufacture of relevant molecules. Except for excreted molecules, cells producing molecules of interest must be disrupted. This page discusses various methods. Another method of disruption is called cell unroofing. (W)

Laboratory cell disruptor.
A cell disruptor-Genie .
cell signaling
In biologycell signaling  or cell-cell communication, governs the basic activities of cells and coordinates multiple-cell actions. A signal is an entity that codes or conveys information. Biological processes are complex molecular interactions that involve a lot of signals. The ability of cells to perceive and correctly respond to their microenvironment is the basis of developmenttissue repair, and immunity, as well as normal tissue homeostasis. Errors in signaling interactions and cellular information processing may cause diseases such as cancerautoimmunity, and diabetes. By understanding cell signaling, clinicians may treat diseases more effectively and, theoretically, researchers may develop artificial tissues. (W)

Notch-mediated juxtacrine signal between adjacent cells.

Transmembrane receptor working principle.

Overview of signal transduction pathways.

Key components of a signal transduction pathway (MAPK/ERK pathway shown).

cell surface receptor

Cell surface receptors (membrane receptorstransmembrane receptors) are receptors that are embedded in the plasma membrane of cells. They act in cell signaling by receiving (binding to) extracellular molecules. They are specialized integral membrane proteins that allow communication between the cell and the extracellular space. The extracellular molecules may be hormonesneurotransmitterscytokinesgrowth factorscell adhesion molecules, or nutrients; they react with the receptor to induce changes in the metabolism and activity of a cell. In the process of signal transductionligand binding affects a cascading chemical change through the cell membrane. (W)

The seven-transmembrane α-helix structure of a G-protein-coupled receptor.

The 7TM helixes of bovine rhodopsin. Based on PDB 1hzx and the Heller/Schaefer/Schulten lipid bilayer coordinates..

A schematic of a transmembrane receptor.
E = extracellular space P = plasma membrane I = intracellular space

External reactions and internal reactions for signal transduction.

Three conformation states of acetylcholine receptor 
cell unroofing
Cell unroofing is any of various methods to isolate and expose the cell membrane of cells. Differently from the more common membrane extraction protocols performed with multiple steps of centrifugation (which goal is to separate the membrane fraction from a cell lysate), in cell unroofing the aim is to tear and preserve patches of the plasma membrane in order to perform in situ experiments using (microscopy and biomedical spectroscopy). (W)

The most common processes of cell unroofing. (left) Sandwich of two cells between two coverslips. (right) Lateral flux of medium allows to break the cells.

cellular respiration

Cellular respiration is a set of metabolic reactions and processes that take place in the cells of organisms to convert chemical energy from oxygen molecules or nutrients into adenosine triphosphate (ATP), and then release waste products. The reactions involved in respiration are catabolic reactions, which break large molecules into smaller ones, releasing energy because weak high-energy bonds, in particular in molecular oxygen, are replaced by stronger bonds in the products. Respiration is one of the key ways a cell releases chemical energy to fuel cellular activity. The overall reaction occurs in a series of biochemical steps, some of which are redox reactions. Although cellular respiration is technically a combustion reaction, it clearly does not resemble one when it occurs in a living cell because of the slow, controlled release of energy from the series of reactions. (W)

Cellular respiration including glycolysis, Krebs cycle (AKA citric acid cycle), and the electron transport chain (L)

Out of the cytoplasm it goes into the Krebs cycle with the acetyl CoA. It then mixes with CO2 and makes 2 ATP, NADH, and FADH. From there the NADH and FADH go into the NADH reductase, which produces the enzyme. The NADH pulls the enzyme's electrons to send through the electron transport chain. The electron transport chain pulls H+ ions through the chain. From the electron transport chain, the released hydrogen ions make ADP for an end result of 32 ATP. O2 provides most of the energy for the process and combines with protons and the electrons to make water. Lastly, ATP leaves through the ATP channel and out of the mitochondria.

Efficiency of ATP production The table below describes the reactions involved when one glucose molecule is fully oxidized into carbon dioxide. It is assumed that all the reduced coenzymes are oxidized by the electron transport chain and used for oxidative phosphorylation. (W)

Step coenzyme yield ATP yield Source of ATP
Glycolysis preparatory phase −2 Phosphorylation of glucose and fructose 6-phosphate uses two ATP from the cytoplasm.
Glycolysis pay-off phase 4 Substrate-level phosphorylation
2 NADH 3 or 5 Oxidative phosphorylation : Each NADH produces net 1.5 ATP (instead of usual 2.5) due to NADH transport over the mitochondrial membrane
Oxidative decarboxylation of pyruvate 2 NADH 5 Oxidative phosphorylation
Krebs cycle 2 Substrate-level phosphorylation
6 NADH 15 Oxidative phosphorylation
2 FADH2 3 Oxidative phosphorylation
Total yield 30 or 32 ATP From the complete oxidation of one glucose molecule to carbon dioxide and oxidation of all the reduced coenzymes.

Comparison of aerobic respiration and most known fermentation types in eukaryotic cells.

Stoichiometry of aerobic respiration and most known fermentation types in eucaryotic cell. Numbers in circles indicate counts of carbon atoms in molecules, C6 is glucose C6H12O6, C1 carbon dioxide CO2. Mitochondrial outer membrane is omitted..

📹 Cellular Respiration — Citric Acid Cycle / blausen (LINK)


Under aerobic conditions, cellular respiration occurs within mitochondria. Pyruvic acid is transported into the mitochondrial matrix and metabolized into carbon dioxide and water in reactions termed the Citric Acid Cycle or Kreb’s Cycle. A small amount of ATP, two molecules, and a number of reduced electron carriers, six NADH and two FADH2, are produced by the oxidation of pyruvic acid in the Kreb’s Cycle. Reduced electron carriers donate high energy electrons to proteins of the Electron Transport System (ETS) embedded in the inner mitochondrial membrane. As electrons pass between ETS proteins, their energy reduction is coupled to pumping hydrogen ions from the matrix into the intermembrane space, creating an electrochemical gradient. Water is formed when oxygen, the final electron acceptor, binds hydrogen ions and low energy electrons from ETS. The electrochemical gradient is used to produce 32 molecules of ATP per molecule of glucose in the process of chemiosmosis.


Cellulose is an organic compound with the formula (C6H10O5)n, a polysaccharide consisting of a linear chain of several hundred to many thousands of β(1→4) linked D-glucose units. Cellulose is an important structural component of the primary cell wall of green plants, many forms of algae and the oomycetes. Some species of bacteria secrete it to form biofilms. Cellulose is the most abundant organic polymer on Earth. The cellulose content of cotton fiber is 90%, that of wood is 40–50%, and that of dried hemp is approximately 57%. (W)

Cellulose, a linear polymer of D-glucose units (two are shown) linked by β(1→4)-glycosidic bonds.

Three-dimensional structure of cellulose.

Cellulose under a microscope.

The arrangement of cellulose and other polysaccharides in a plant cell wall..

A triple strand of cellulose showing the hydrogen bonds (cyan lines) between glucose strands.
central dogma of molecular biology
The central dogma of molecular biology is an explanation of the flow of genetic information within a biological system. It is often stated as "DNA makes RNA, and RNA makes protein", although this is not its original meaning.

It was first stated by Francis Crick in 1957, then published in 1958:

" The Central Dogma. This states that once "information" has passed into protein it cannot get out again. In more detail, the transfer of information from nucleic acid to nucleic acid, or from nucleic acid to protein may be possible, but transfer from protein to protein, or from protein to nucleic acid is impossible. Information means here the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein."

— Francis Crick, 1958

and re-stated in a Nature paper published in 1970:

"The central dogma of molecular biology deals with the detailed residue-by-residue transfer of sequential information. It states that such information cannot be transferred back from protein to either protein or nucleic acid".

— Francis Crick

Information flow in biological systems.

Overview of the central dogma of molecular biology.

An overview of the (basic) central dogma of molecular biochemistry with all enzymes labeled..

Unusual flows of information highlighted in green.

Ceramides are a family of waxy lipid molecules. A ceramide is composed of sphingosine and a fatty acid. Ceramides are found in high concentrations within the cell membrane of eukaryotic cells, since they are component lipids that make up sphingomyelin, one of the major lipids in the lipid bilayer. Contrary to previous assumptions that ceramides and other sphingolipids found in cell membrane were purely supporting structural elements, ceramide can participate in a variety of cellular signaling: examples include regulating differentiation, proliferation, and programmed cell death (PCD) of cells.

The word ceramide comes from the Latin cera (wax) and amide. Ceramide is a component of vernix caseosa, the waxy or cheese-like white substance found coating the skin of newborn human infants. (W)

Ceramide. R represents the alkyl portion of a fatty acid.

General structures of sphingolipids.
chemical bond

A chemical bond is a lasting attraction between atoms, ions or molecules that enables the formation of chemical compounds. The bond may result from the electrostatic force of attraction between oppositely charged ions as in ionic bonds or through the sharing of electrons as in covalent bonds. (W)

Examples of Lewis dot-style representations of chemical bonds between carbon (C), hydrogen (H), and oxygen (O). Lewis dot diagrams were an early attempt to describe chemical bonding and are still widely used today..
chemical element

In chemistry an element is a species of atom having the same number of protons in its atomic nuclei (that is, the same atomic number, or Z). For example, the atomic number of oxygen is 8, so the element oxygen describes all atoms which have 8 protons.

In total, 118 elements have been identified. The first 94 occur naturally on Earth, and the remaining 24 are synthetic elements. There are 80 elements that have at least one stable isotope and 38 that have exclusively radionuclides, which decay over time into other elements. Iron is the most abundant element (by mass) making up Earth, while oxygen is the most common element in the Earth's crust. (W)

Abundance (atom fraction) of the chemical elements in Earth's upper continental crust as a function of atomic number. The rarest elements in the crust (shown in yellow) are rare due to a combination of factors: all but one are the densest siderophiles (iron-loving) elements in the Goldschmidt classification, meaning they have a tendency to mix well with metallic iron, depleting them by being relocated deeper into the Earth's core. Their abundance in meteoroids is higher. Additionally, tellurium has been depleted by preaccretional sorting in the nebula via formation of volatile hydrogen telluride.

📂Elements in our galaxy

Elements in our galaxy (W)

Elements in our galaxy Parts per million by mass
Hydrogen 739,000
Helium 240,000
Oxygen 10,400
Carbon 4,600
Neon 1,340
Iron 1,090
Nitrogen 960
Silicon 650
Magnesium 580
Sulfur 440
Potassium 210
Nickel 100


chemical groups

BIOLOGY — A Global Approach, 2018 (L)

chemical polarity

In chemistry, polarity is a separation of electric charge leading to a molecule or its chemical groups having an electric dipole moment, with a negatively charged end and a positively charged end.

Polar molecules must contain polar bonds due to a difference in electronegativity between the bonded atoms. A polar molecule with two or more polar bonds must have a geometry which is asymmetric in at least one direction, so that the bond dipoles do not cancel each other.

Polar molecules interact through dipole–dipole intermolecular forces and hydrogen bonds. Polarity underlies a number of physical properties including surface tension, solubility, and melting and boiling points. (W)

A water molecule, a commonly used example of polarity. Two charges are present with a negative charge in the middle (red shade), and a positive charge at the ends (blue shade).  
Chemokines (Greek -kinos, movement) are a family of small cytokines, or signaling proteins secreted by cells. Their name is derived from their ability to induce directed chemotaxis in nearby responsive cells; they are chemotactic cytokines. (W)

Small cytokines (intecrine/chemokine), interleukin-8 like.
Solution structure of IL-8 as published in the Protein Data Bank (PDB: 1IL8).

Chemotaxis (from chemo- + taxis) is the movement of an organism in response to a chemical stimulus. Somatic cells, bacteria, and other single-cell or multicellular organisms direct their movements according to certain chemicals in their environment. This is important for bacteria to find food (e.g., glucose) by swimming toward the highest concentration of food molecules, or to flee from poisons (e.g., phenol). In multicellular organisms, chemotaxis is critical to early development (e.g., movement of sperm towards the egg during fertilization) and subsequent phases of development (e.g., migration of neurons or lymphocytes) as well as in normal function and health (e.g., migration of leukocytes during injury or infection). In addition, it has been recognized that mechanisms that allow chemotaxis in animals can be subverted during cancer metastasis. The aberrant chemotaxis of leukocytes and lymphocytes also contribute to inflammatory diseases such as atherosclerosis, asthma, and arthritis.

chemotaxis occurs if the movement is toward a higher concentration of the chemical in question; negative chemotaxis if the movement is in the opposite direction. Chemically prompted kinesis (randomly directed or nondirectional) can be called chemokinesis. (W)

Capillary tube assay for chemotaxis. Motile prokaryotes sense chemicals in their environment and change their motility accordingly. Absent chemicals, movement is completely random. When an attractant or repellent is present, runs become longer and tumbles become less frequent. The result is net movement towards or away from the chemical (i.e., up or down the chemical gradient). The net movement can be seen in the beaker, where the bacteria accumulate around the origin of the attractant, and away from the origin of the repellent.

The process of chemotaxis can be demonstrated using a capillary tube assay(shown above). The motile prokaryotes can sense chemicals in their environment and change their motility accordingly. When no chemicals are present, movement is completely random. When a repellent or attractant chemical is present, the motility changes; runs become longer and tumbles become less frequent so that the net movement towards or away from the chemical can be achieved. The net movement can be seen in the beaker, where the bacteria accumulate around the attractant, and away from the repellent.

Correlation of swimming behaviour and flagellar rotation.
ChIP-exo is a chromatin immunoprecipitation based method for mapping the locations at which a protein of interest (transcription factor) binds to the genome. It is a modification of the ChIP-seq protocol, improving the resolution of binding sites from hundreds of base pairs to almost one base pair. It employs the use of exonucleases to degrade strands of the protein-bound DNA in the 5'-3' direction to within a small number of nucleotides of the protein binding site. The nucleotides of the exonuclease-treated ends are determined using some combination of DNA sequencingmicroarrays, and PCR. These sequences are then mapped to the genome to identify the locations on the genome at which the protein binds. (W)

ChIP-exo workflow. (W)
chloroplast DNA
Chloroplasts have their own DNA, often abbreviated as cpDNA. It is also known as the plastome when referring to genomes of other plastids. Its existence was first proven in 1962. The first complete chloroplast genome sequences were published in 1986, Nicotiana tabacum (tobacco) by Sugiura and colleagues and Marchantia polymorpha (liverwort) by Ozeki et al. Since then, hundreds of chloroplast DNAs from various species have been sequenced, but they are mostly those of land plants and green algaeglaucophytes, red algae, and other algae groups are extremely underrepresented, potentially introducing some bias in views of "typical" chloroplast DNA structure and content. (W)

Gene map of chloroplast DNA from Nicotiana tabacum. Segments with labels on the inside reside on the B strand of DNA, segments with labels on the outside are on the A strand. Notches indicate introns. (W)

The 154 kb chloroplast DNA map of a model flowering plant (Arabidopsis thaliana: Brassicaceae) showing genes and inverted repeats. (W)

The plastid genome of a model flowering plant (Arabidopsis thaliana: Brassicaceae).

Chloroplast DNA replication via multiple D loop mechanisms. Adapted from Krishnan NM, Rao BJ's paper "A comparative approach to elucidate chloroplast genome replication.".


Cholesterol (from the Ancient Greek chole- (bile) and stereos (solid), followed by the chemical suffix -ol for an alcohol) is an organic molecule. It is a sterol (or modified steroid), a type of lipid. Cholesterol is biosynthesized by all animal cells and is an essential structural component of animal cell membranes.

Cholesterol also serves as a precursor for the biosynthesis of steroid hormones, bile acid and vitamin D. Cholesterol is the principal sterol synthesized by all animals. In vertebrates, hepatic cells typically produce the greatest amounts. It is absent among prokaryotes (bacteria and archaea), although there are some exceptions, such as Mycoplasma, which require cholesterol for growth.

François Poulletier de la Salle first identified cholesterol in solid form in gallstones in 1769. However, it was not until 1815 that chemist Michel Eugène Chevreul named the compound "cholesterine". (W)

Chemical structure of cholesterol.

Ball-and-stick model of cholesterol.
Chromatin is a complex of DNA and proteinfound in eukaryotic cells. Its primary function is packaginglong DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important roles in reinforcing the DNA during cell division, preventing DNA damage, and regulating gene expression and DNA replication. During mitosis and meiosis, chromatin facilitates proper segregation of the chromosomes in anaphase; the characteristic shapes of chromosomes visible during this stage are the result of DNA being coiled into highly condensed chromatin. (W)

The major structures in DNA compaction: DNA, the nucleosome, the 10 nm "beads-on-a-string" fibre, the 30 nm chromatin fibre and the metaphase chromosome.

Basic units of chromatin structure.

The histone octamer is formed from two subunits each of the four core histones. The four H3 and H4 subunits form a tightly packed tetramer that associates with two H2A/H2B dimers to form each octamer. About 147 bp of DNA is wrapped around the octamer to form a nucleosome. Nucleosomes can be arrayed in the loosely packed beads-on-a-string form of chromatin, but are generally more tightly packaged into the 30-nm fiber. Fiber formation requires histone tails and additional proteins, neither of which is shown here.
chromatin immunoprecipitation

Chromatin immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique used to investigate the interaction between proteins and DNA in the cell. It aims to determine whether specific proteins are associated with specific genomic regions, such as transcription factors on promoters or other DNA binding sites, and possibly defining cistromes. ChIP also aims to determine the specific location in the genome that various histone modifications are associated with, indicating the target of the histone modifiers.

Briefly, the conventional method is as follows:

  1. DNA and associated proteins on chromatin in living cells or tissues are crosslinked (this step is omitted in Native ChIP).
  2. The DNA-protein complexes (chromatin-protein) are then sheared into ~500 bp DNA fragments by sonication or nuclease digestion.
  3. Cross-linked DNA fragments associated with the protein(s) of interest are selectively immunoprecipitated from the cell debris using an appropriate protein-specific antibody.
  4. The associated DNA fragments are purified and their sequence is determined. Enrichment of specific DNA sequences represents regions on the genome that the protein of interest is associated with in vivo. (W)

ChIP-sequencing workflow.
A diagram illustrating the principles of Chromatin Immunoprecipitation Sequencing (ChIP-sequencing), where the location of binding by a specific protein to DNA is investigated.

A chromomere, also known as an idiomere, is one of the serially aligned beads or granules of a eukaryotic chromosome, resulting from local coiling of a continuous DNA thread. Chromeres are regions of chromatin that have been compacted through localized contraction. In areas of chromatin with the absence of transcription, condensing of DNA and protein complexes will result in the formation of chromomeres. It is visible on a chromosome during the prophase of meiosis and mitosis. Giant banded (Polytene) chromosomes resulting from the replication of the chromosomes and the synapsis of homologs without cell division is a process called endomitosis. These chromosomes consist of more than 1000 copies of the same chromatid that are aligned and produce alternating dark and light bands when stained. The dark bands are the chromomere.

It is unknown when chromomeres first appear on the chromosome. Chromomeres can be observed best when chromosomes are highly condensed. The chromomeres are present during leptotene phase of prophase I during meiosis. During zygotene phase of prophase I, the chromomeres of homologs align with each other to form homologous rough pairing (homology searching). These chromomeres helps provide a unique identity for each homologous pairs. They appear as dense granules during leptotene stage

There are more than 2000 chromomeres on 20 chromosomes of maize. (W)

Polytene chromosome of Drosophila. (b) Displays the chromomeric and interchromomeric bands of the chromosome.

Phase-contrast image of Drosophila melanogaster polytene chromosomes. The end of the X-chromosome is marked with an arrow. Chromocentre is in the upper right corner. B shows a magnification of chromomere and interchromomere bands. Alternative b&w version from the same preparation: Image: Drosophila_polytene_chromosomes.jpg
chromosomal deletion syndrome

Chromosomal deletion syndromes result from deletion of parts of chromosomes. Depending on the location, size, and whom the deletion is inherited from, there are a few known different variations of chromosome deletions. Chromosomal deletion syndromes typically involve larger deletions that are visible using karyotyping techniques. Smaller deletions result in Microdeletion syndrome, which are detected using fluorescence in situ hybridization (FISH)

Examples of chromosomal deletion syndromes include 5p-Deletion (cri du chat syndrome), 4p-Deletion (Wolf-Hirschhorn syndrome), Prader–Willi syndrome, and Angelman syndrome. (W)

An example of chromosomal deletions.
chromosome abnormality
A chromosomal disorder, chromosomal anomaly, chromosomal aberration, or chromosomal mutation is a missing, extra, or irregular portion of chromosomal DNA. It can be from a typical number of chromosomes or a structural abnormality in one or more chromosomes. Chromosome mutation was formerly used in a strict sense to mean a change in a chromosomal segment, involving more than one gene. The term "karyotype" refers to the full set of chromosomes from an individual; this can be compared to a "normal" karyotype for the species via genetic testing. A chromosome anomaly may be detected or confirmed in this manner. Chromosome anomalies usually occur when there is an error in cell division following meiosis or mitosis. There are many types of chromosome anomalies. They can be organized into two basic groups, numerical and structural anomalies. (W)

The three major single-chromosome mutations: deletion (1), duplication (2) and inversion (3).

The two major two-chromosome mutations: insertion (1) and Translocation (2).
chromosome jumping

Chromosome jumping is a tool of molecular biology that is used in the physical mapping of genomes. It is related to several other tools used for the same purpose, including chromosome walking.

Chromosome jumping is used to bypass regions difficult to clone, such as those containing repetitive DNA, that cannot be easily mapped by chromosome walking, and is useful in moving along a chromosome rapidly in search of a particular gene.

Chromosome jumping allows more rapid movement through the genome compared to other techniques, such as chromosome walking, and can be used to generate genomic markers with known chromosomal locations.

Chromosome jumping enables two ends of a DNA sequence to be cloned without the middle section. Genomic DNA may be partially digested using restriction endonucleases and with the aid of DNA ligase, the fragments are circularized. From a known sequence, a primer is designed to sequence across the circularised junction. This primer is used to jump 100 kb-300 kb intervals: a sequence 100 kb away would have come near the known sequence on circularisation. Thus, sequences not reachable by chromosome walking can be sequenced. Chromosome walking can be used from the new jump position (in either direction) to look for gene-like sequences, or additional jumps can be used to progress further along the chromosome. (W)

circular RNA

Circular RNA (or circRNA) is a type of single-stranded RNA which, unlike linear RNA, forms a covalently closed continuous loop. In circular RNA the 3' and 5' ends normally present in an RNA molecule have been joined together. This feature confers numerous properties to circular RNA, many of which have only recently been identified.

Many types of circular RNA arise from otherwise protein-coding genes. Some circular RNA have been shown to code for proteins. Some types of circular RNA have recently shown potential as gene regulators. The biological function of most circular RNA are unclear.

Because circular RNA does not have 5' or 3' ends, they are resistant to exonuclease-mediated degradation and are presumably more stable than most linear RNA in cells. Circular RNA has been linked to some diseases such as cancer. (W)

CircRNA biogenesis. A. mRNA splicing, with alternative splice variants. All mRNAs have cap and polyA tail. B. CircRNA formation via backsplicing.

Circular RNA biogenesis from pre-mRNA via backsplicing event. A. Canonical splicing of mRNA with different splice variants. All mature mRNAs have cap (at the 5' end, dark violet half-circle) and poly-A tail (adenine nucleotides at the 3' end) B circRNA formation, circular RNAs can have one or more exons, sometimes retaining an intron (dark line near 2 exon). (W)

Pre-mRNA to mRNA splicing.

Pre-mRNA is the first form of RNA created through transcription in protein synthesis. The pre-mRNA lacks structures that the messenger RNA (mRNA) requires. First all introns have to be removed from the transcribed RNA through a process known as splicing. Before the RNA is ready for export, a Poly(A)tail is added to the 3’ end of the RNA and a 5’ cap is added to the 5’ end.
cis-natural antisense transcript
Natural antisense transcripts (NATs) are a group of RNAs encoded within a cell that have transcript complementarity to other RNA transcripts. They have been identified in multiple eukaryotes, including humans, mice, yeast and Arabidopsis thaliana. This class of RNAs includes both protein-coding and non-coding RNAs. Current evidence has suggested a variety of regulatory roles for NATs, such as RNA interference (RNAi), alternative splicing, genomic imprinting, and X-chromosome inactivation. NATs are broadly grouped into two categories based on whether they act in cis or in trans. Trans-NATs are transcribed from a different location than their targets and usually have complementarity to multiple transcripts with some mismatches. MicroRNAs (miRNA) are an example of trans-NATs that can target multiple transcripts with a few mismatches. Cis-natural antisense transcripts (cis-NATs) on the other hand are transcribed from the same genomic locus as their target but from the opposite DNA strand and form perfect pairs. (W)

Figure 1: Orientations of cis-NATs within the genome.

Transcription collision model for expression inhibition.

Figure 3: Aberrant transcription of antisense transcripts can result in inhibition of oncogenes and allow cell to continue past cell cycle check points. Putative new oncogenes and tumor suppressor genes can be found by looking for upregulated antisense transcripts in cancer cells.
cloning vector

A cloning vector is a small piece of DNA that can be stably maintained in an organism, and into which a foreign DNA fragment can be inserted for cloning purposes. The cloning vector may be DNA taken from a virus, the cell of a higher organism, or it may be the plasmid of a bacterium. The vector therefore contains features that allow for the convenient insertion or removal of a DNA fragment to or from the vector, for example by treating the vector and the foreign DNA with a restriction enzyme that cuts the DNA. DNA fragments thus generated contain either blunt ends or overhangs known as sticky ends, and vector DNA and foreign DNA with compatible ends can then be joined together by molecular ligation. After a DNA fragment has been cloned into a cloning vector, it may be further subcloned into another vector designed for more specific use.

There are many types of cloning vectors, but the most commonly used ones are genetically engineered plasmids. Cloning is generally first performed using Escherichia coli, and cloning vectors in E. coli include plasmids, bacteriophages (such as phage λ), cosmids, and bacterial artificial chromosomes (BACs). Some DNA, however, cannot be stably maintained in E. coli, for example very large DNA fragments, and other organisms such as yeast may be used. Cloning vectors in yeast include yeast artificial chromosomes (YACs). (W)

Schematic representation of the pBR322 plasmid, one of the first plasmids widely used as a cloning vector.

The pUC plasmid has a high copy number, contains a multiple cloning site (polylinker), a gene for ampicillin antibiotic selection, and can be used for blue-white screen.

coding region
The coding region of a gene, also known as the CDS (from coding sequence), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to non-coding regions over different species and time periods can provide a significant amount of important information regarding gene organization and evolution of prokaryotes and eukaryotes. This can further assist in mapping the human genome and developing gene therapy. (W)

Transcription: RNA Polymerase (RNAP) uses a template DNA strand and begins coding at the promoter sequence (green) and ends at the terminator sequence (red) in order to encompass the entire coding region into the product mRNA (teal). [I have a doubt if the 5' and 3' end are shown incorrectly in this figure].

The coding region (teal) is flanked by untranslated regions, the 5' cap, and the poly(A) tail which together form the mature mRNA.

Examples of the various forms of point mutations that may exist within coding regions. Such alterations may or may not have phenotypic changes, depending on whether or not they code for different amino acids during translation.

The Crick, Brenner, Barnett and Watts-Tobin experiment first demonstrated that codons consist of three DNA bases. Marshall Nirenberg and Heinrich J. Matthaei were the first to reveal the nature of a codon in 1961.

They used a cell-free system to translate a poly-uracil RNA sequence (i.e., UUUUU...) and discovered that the polypeptide that they had synthesized consisted of only the amino acid phenylalanine. They thereby deduced that the codon UUU specified the amino acid phenylalanine.

This was followed by experiments in Severo Ochoa's laboratory that demonstrated that the poly-adenine RNA sequence (AAAAA...) coded for the polypeptide poly-lysine and that the poly-cytosine RNA sequence (CCCCC...) coded for the polypeptide poly-proline. Therefore, the codon AAA specified the amino acid lysine, and the codon CCC specified the amino acid proline. Using various copolymers most of the remaining codons were then determined.

Subsequent work by Har Gobind Khorana identified the rest of the genetic code. Shortly thereafter, Robert W. Holley determined the structure of transfer RNA (tRNA), the adapter molecule that facilitates the process of translating RNA into protein. This work was based upon Ochoa's earlier studies, yielding the latter the Nobel Prize in Physiology or Medicine in 1959 for work on the enzymology of RNA synthesis.

Extending this work, Nirenberg and Philip Leder revealed the code's triplet nature and deciphered its codons. In these experiments, various combinations of mRNA were passed through a filter that contained ribosomes, the components of cells that translate RNA into protein. Unique triplets promoted the binding of specific tRNAs to the ribosome. Leder and Nirenberg were able to determine the sequences of 54 out of 64 codons in their experiments. Khorana, Holley and Nirenberg received the 1968 Nobel for their work.

The three stop codons were named by discoverers Richard Epstein and Charles Steinberg. "Amber" was named after their friend Harris Bernstein, whose last name means "amber" in German. The other two stop codons were named "ochre" and "opal" in order to keep the "color names" theme. (W)

A (corrected) chart showing the relationship between codons and amino acids.

RNA codon table

 The codon AUG both codes for methionine and serves as an initiation site: the first AUG in an mRNA's coding region is where translation into protein begins.[46] The other start codons listed by GenBank are rare in eukaryotes and generally codes for Met/fMet.[47]
B ^ ^ ^ The historical basis for designating the stop codons as amber, ochre and opal is described in an autobiography by Sydney Brenner[48] and in a historical article by Bob Edgar.[49]

coenzyme A

Coenzyme A (CoA, SHCoA, CoASH) is a coenzyme, notable for its role in the synthesis and oxidation of fatty acids, and the oxidation of pyruvate in the citric acid cycle. All genomes sequenced to date encode enzymes that use coenzyme A as a substrate, and around 4% of cellular enzymes use it (or a thioester) as a substrate. In humans, CoA biosynthesis requires cysteine, pantothenate (vitamin B5), and adenosine triphosphate (ATP). (W)

Structure of coenzyme A.


cofactor (biochemistry)

A cofactor is a non-protein chemical compound or metallic ion that is required for an enzyme's activity as a catalyst, a substance that increases the rate of a chemical reaction. Cofactors can be considered "helper molecules" that assist in biochemical transformations. The rates at which these happen are characterized in an area of study called enzyme kinetics. Cofactors typically differ from ligands in that they often derive their function by remaining bound. (W)

The succinate dehydrogenase complex showing several cofactors, including flavin, iron-sulfur centers, and heme.
Collagen is the main structural protein in the extracellular matrix found in the body's various connective tissues. As the main component of connective tissue, it is the most abundant protein in mammals, making up from 25% to 35% of the whole-body protein content. Collagen consists of amino acids bound together to form a triple helix of elongated fibril known as a collagen helix. It is mostly found in connective tissue such as cartilage, bones, tendons, ligaments, and skin. (W)

Tropocollagen molecule: three left-handed procollagens (red, green, blue) join to form a right-handed triple helical tropocollagen.

complementary DNA

In genetics, complementary DNA (cDNA) is DNA synthesized from a single-stranded RNA (e.g., messenger RNA (mRNA) or microRNA (miRNA)) template in a reaction catalyzed by the enzyme reverse transcriptase. cDNA is often used to clone eukaryotic genes in prokaryotes. When scientists want to express a specific protein in a cell that does not normally express that protein (i.e., heterologous expression), they will transfer the cDNA that codes for the protein to the recipient cell. In molecular biology, cDNA is also generated to analyze transcriptomic profiles in bulk tissue, single cells, or single nuclei in assays such as microarrays and RNA-seq.

cDNA is also produced naturally by retroviruses (such as HIV-1, HIV-2, simian immunodeficiency virus, etc.) and then integrated into the host's genome, where it creates a provirus.

The term cDNA is also used, typically in a bioinformatics context, to refer to an mRNA transcript's sequence, expressed as DNA bases (GCAT) rather than RNA bases (GCAU). (W)

Output from a cDNA microarray used in testing.

complementarity (molecular biology)

In molecular biology, complementarity describes a relationship between two structures each following the lock-and-key principle. In nature complementarity is the base principle of DNA replication and transcription as it is a property shared between two DNA or RNA sequences, such that when they are aligned antiparallel to each other, the nucleotide bases at each position in the sequences will be complementary, much like looking in the mirror and seeing the reverse of things. This complementary base pairing allows cells to copy information from one generation to another and even find and repair damage to the information stored in the sequences.

The degree of complementarity between two nucleic acid strands may vary, from complete complementarity (each nucleotide is across from its opposite) to no complementarity (each nucleotide is not across from its opposite) and determines the stability of the sequences to be together. Furthermore, various DNA repair functions as well as regulatory functions are based on base pair complementarity. In biotechnology, the principle of base pair complementarity allows the generation of DNA hybrids between RNA and DNA, and opens the door to modern tools such as cDNA libraries. While most complementarity is seen between two separate strings of DNA or RNA, it is also possible for a sequence to have internal complementarity resulting in the sequence binding to itself in a folded configuration. (W)

Complementarity between two antiparallel strands of DNA. The top strand goes from the left to the right and the lower strand goes from the right to the left lining them up.

Left: the nucleotide base pairs that can form in double-stranded DNA. Between A and T there are two hydrogen bonds, while there are three between C and G. Right: two complementary strands of DNA.

Match up between two DNA bases (guanine and cytosine) showing hydrogen bonds (dashed lines) holding them together.

Match up between two DNA bases (adenine and thymine) showing hydrogen bonds (dashed lines) holding them together.

A sequence of RNA that has internal complementarity which results in it folding into a hairpin.

A sequence of RNA showing hairpins (far right and far upper left), and internal loops (lower left structure).

condensation reaction

A condensation reaction is a class of organic addition reaction that typically proceeds in a step-wise fashion to produce the addition product, usually in equilibrium, and a water molecule (hence named condensation). The reaction may otherwise involve the functional groups of the molecule, and formation of a small molecule such as ammonia, ethanol, or acetic acid instead of water. It is a versatile class of reactions that can occur in acidic or basic conditions or in the presence of a catalyst. This class of reactions is a vital part of life as it is essential to the formation of peptide bonds between amino acids and the biosynthesis of fatty acids. (W)

Idealized scheme showing condensation of two amino acids to give a peptide bond.
conjugate acid

A conjugate acid, within the Brønsted–Lowry acid–base theory, is a chemical compound formed when an acid donates a proton (H+) to a base—in other words, it is a base with a hydrogen ion added to it, as in the reverse reaction it loses a hydrogen ion. On the other hand, a conjugate base is what is left over after an acid has donated a proton during a chemical reaction. Hence, a conjugate base is a species formed by the removal of a proton from an acid, as in the reverse reaction it is able to gain a hydrogen ion. Because some acids are capable of releasing multiple protons, the conjugate base of an acid may itself be acidic.

In summary, this can be represented as the following chemical reaction:

Acid + Base ⇌ Conjugate Base + Conjugate Acid

Johannes Nicolaus Brønsted
and Martin Lowry introduced the Brønsted–Lowry theory, which proposed that any compound that can transfer a proton to any other compound is an acid, and the compound that accepts the proton is a base. A proton is a nuclear particle with a unit positive electrical charge; it is represented by the symbol H+ because it constitutes the nucleus of a hydrogen atom, that is, a hydrogen cation.

A cation can be a conjugate acid, and an anion can be a conjugate base, depending on which substance is involved and which acid–base theory is the viewpoint. The simplest anion which can be a conjugate base is the solvated electron whose conjugate acid is the atomic hydrogen. (W)

Reaction of NH4 to NH3.
Connexins (Cx) (TC# 1.A.24), or gap junction proteins, are structurally related transmembrane proteins that assemble to form vertebrate gap junctions. An entirely different family of proteins, the innexins, form gap junctions in invertebrates. Each gap junction is composed of two hemichannels, or connexons, which consist of homo- or heterohexameric arrays of connexins, and the connexon in one plasma membrane docks end-to-end with a connexon in the membrane of a closely opposed cell. The hemichannel is made of six connexin subunits, each of which consist of four transmembrane segments. Gap junctions are essential for many physiological processes, such as the coordinated depolarization of cardiac muscle, proper embryonic development, and the conducted response in microvasculature. For this reason, mutations in connexin-encoding genes can lead to functional and developmental abnormalities. (W)

Connexin-26 dodekamer + 12 Ca (orange), Human.
Connexin-26 dodecamer. A gap junction, composed of twelve identical connexin proteins, six in the membrane of each cell. Each of these six units is a single polypeptide which passes the membrane four times (referred to as four-pass transmembrane proteins).

The diagram shows a gap junction and its main element. connexon. together with the structure of the connexin.

Life cycle and protein associations of connexins. Connexins are synthesized on ER-bound ribosomes and inserted into the ER cotranslationally. This is followed by oligomerization between the ER and trans-Golgi network (depending on the connexin type) into connexons, which are then delivered to the membrane via the actin or microtubule networks. Connexons may also be delivered to the plasma membrane by direct transfer from the rough ER. Upon insertion into the membrane, connexons may remain as hemichannels or they dock with compatible connexons on adjacent cells to form gap junctions. Newly delivered connexons are added to the periphery of pre-formed gap junctions, while the central "older" gap junction fragment are degraded by internalization of a double-membrane structure called an annular junction into one of the two cells, where subsequent lysosomal or proteasomal degradation occurs, or in some cases the connexons are recycled to the membrane (indicated by dashed arrow). During their life cycle, connexins associate with different proteins, including (1) cytoskeletal components as microtubules, actin, and actin-binding proteins α-spectrin and drebrin, (2) junctional molecules including adherens junction components such as cadherins, α-catenin, and β-catenin, as well as tight junction components such as ZO-1 and ZO-2, (3) enzymes such as kinases and phosphatases which regulate the assembly, function, and degradation, and (4) other proteins such as caveolin. This image was prepared by Hanaa Hariri for Dbouk et al., 2009..

In biology, a connexon, also known as a connexin hemichannel, is an assembly of six proteins called connexins that form the pore for a gap junction between the cytoplasm of two adjacent cells. This channel allows for bidirectional flow of ions and signaling molecules. The connexon is the hemichannel supplied by a cell on one side of the junction; two connexons from opposing cells normally come together to form the complete intercellular gap junction channel. However, in some cells, the hemichannel itself is active as a conduit between the cytoplasm and the extracellular space, allowing the transference of ions and small molecules lower than 1-2 KDa. Little is known about this function of connexons besides the new evidence suggesting their key role in intracellular signaling.

Connexons made of the same type of connexins are considered homomeric, while connexons made of differing types of connexins are heteromeric. (W)

Connexon and connexin structure.

The diagram shows a gap junction and its main element. connexon. together with the structure of the connexin.

conserved sequence

In evolutionary biology, conserved sequences are identical or similar sequences in nucleic acids (DNA and RNA) or proteins across species (orthologous sequences), or within a genome (paralogous sequences), or between donor and receptor taxa (xenologous sequences). Conservation indicates that a sequence has been maintained by natural selection.

A highly conserved sequence is one that has remained relatively unchanged far back up the phylogenetic tree, and hence far back in geological time. Examples of highly conserved sequences include the RNA components of ribosomes present in all domains of life, the homeobox sequences widespread amongst Eukaryotes, and the tmRNA in Bacteria. The study of sequence conservation overlaps with the fields of genomics, proteomics, evolutionary biology, phylogenetics, bioinformatics and mathematics. (W)

A multiple sequence alignment of five mammalian histone H1 proteins Sequences are the amino acids for residues 120-180 of the proteins. Residues that are conserved across all sequences are highlighted in grey. Below each site (i.e., position) of the protein sequence alignment is a key denoting conserved sites (*), sites with conservative replacements (:), sites with semi-conservative replacements (.), and sites with non-conservative replacements ( ).

A sequence logo for the LexA-binding motif of gram-positive bacteria. As the adenosine at position 5 is highly conserved, it appears larger than other characters.

This image from the ECR browser[21] shows the result of aligning different vertebrate genomes to the human genome at the conserved OTX2 gene. Top: Gene annotations of exons and introns of the OTX2 gene. For each genome, sequence similarity (%) compared to the human genome is plotted. Tracks show the zebrafish, dog, chicken, western clawed frog, opossum, mouse, rhesus macaque and chimpanzee genomes. The peaks show regions of high sequence similarity across all genomes, showing that this sequence is highly conserved.
covalent bond (molecular bond)

A covalent bond, also called a molecular bond, is a chemical bond that involves the sharing of electron pairs between atoms. These electron pairs are known as shared pairs or bonding pairs, and the stable balance of attractive and repulsive forces between atoms, when they share electrons, is known as covalent bonding. For many molecules, the sharing of electrons allows each atom to attain the equivalent of a full outer shell, corresponding to a stable electronic configuration. In organic chemistry, covalent bonds are much more common than ionic bonds. (W)

A covalent bond forming H2 (right) where two hydrogen atoms share the two electrons.

Lewis and MO diagrams of an individual 2e bond and 3e bond.

Early concepts in covalent bonding arose from this kind of image of the molecule of methane. Covalent bonding is implied in the Lewis structure by indicating electrons shared between atoms.

📂 Covalent bond

Covalent bond (W)

A covalent bond, also called a molecular bond,is a chemical bond that involves the sharing of electron pairs between atoms. These electron pairs are known as shared pairs or bonding pairs, and the stable balance of attractive and repulsive forces between atoms, when they share electrons, is known as covalent bonding. For many molecules, the sharing of electrons allows each atom to attain the equivalent of a full outer shell, corresponding to a stable electronic configuration. In organic chemistry, covalent bonds are much more common than ionic bonds.

Covalent bonding includes many kinds of interactions, including σ-bonding, π-bonding, metal-to-metal bonding, agostic interactions, bent bonds, and three-center two-electron bonds. The term covalent bond dates from 1939. The prefix co- means jointly, associated in action, partnered to a lesser degree, etc.; thus a "co-valent bond", in essence, means that the atoms share "valence", such as is discussed in valence bond theory.

In the molecule H2, the hydrogen atoms share the two electrons via covalent bonding. Covalency is greatest between atoms of similar electronegativities. Thus, covalent bonding does not necessarily require that the two atoms be of the same elements, only that they be of comparable electronegativity. Covalent bonding that entails sharing of electrons over more than two atoms is said to be delocalized.


The term covalence in regard to bonding was first used in 1919 by Irving Langmuir in a Journal of the American Chemical Society article entitled "The Arrangement of Electrons in Atoms and Molecules". Langmuir wrote that "we shall denote by the term covalence the number of pairs of electrons that a given atom shares with its neighbors."

The idea of covalent bonding can be traced several years before 1919 to Gilbert N. Lewis, who in 1916 described the sharing of electron pairs between atoms. He introduced the Lewis notation or electron dot notation or Lewis dot structure, in which valence electrons (those in the outer shell) are represented as dots around the atomic symbols. Pairs of electrons located between atoms represent covalent bonds. Multiple pairs represent multiple bonds, such as double bonds and triple bonds. An alternative form of representation, not shown here, has bond-forming electron pairs represented as solid lines.

Lewis proposed that an atom forms enough covalent bonds to form a full (or closed) outer electron shell. In the diagram of methane shown here, the carbon atom has a valence of four and is, therefore, surrounded by eight electrons (the octet rule), four from the carbon itself and four from the hydrogens bonded to it. Each hydrogen has a valence of one and is surrounded by two electrons (a duet rule) – its own one electron plus one from the carbon. The numbers of electrons correspond to full shells in the quantum theory of the atom; the outer shell of a carbon atom is the n = 2 shell, which can hold eight electrons, whereas the outer (and only) shell of a hydrogen atom is the n = 1 shell, which can hold only two.

While the idea of shared electron pairs provides an effective qualitative picture of covalent bonding, quantum mechanics is needed to understand the nature of these bonds and predict the structures and properties of simple molecules. Walter Heitler and Fritz London are credited with the first successful quantum mechanical explanation of a chemical bond (molecular hydrogen) in 1927. Their work was based on the valence bond model, which assumes that a chemical bond is formed when there is good overlap between the atomic orbitals of participating atoms.


coverage (genetics)
Coverage (or depth) in DNA sequencing is the number of unique reads that include a given nucleotide in the reconstructed sequence. Deep sequencing refers to the general concept of aiming for high number of unique reads of each region of a sequence. (W)

An overlap of the product of three sequencing runs, with the read depth at each point indicated.

In chemistry and biology a cross-link is a bond that links one polymer chain to another. These links may take the form of covalent bonds or ionic bonds and the polymers can be either synthetic polymers or natural polymers (such as proteins).

In polymer chemistry "cross-linking" usually refers to the use of cross-links to promote a change in the polymers' physical properties.

When "crosslinking" is used in the biological field, it refers to the use of a probe to link proteins together to check for protein–protein interactions, as well as other creative cross-linking methodologies.

Although the term is used to refer to the "linking of polymer chains" for both sciences, the extent of crosslinking and specificities of the crosslinking agents vary greatly. As with all science, there are overlaps, and the following delineations are a starting point to understanding the subtleties. (W)

Schematic Structure Example of the Vulcanization_of_POLYIsoprene with n = 0, 1, 2, 3 ...

is an example of cross-linking. Schematic presentation of two "polymer chains" (blue and green) cross-linked after the vulcanization of natural rubber with sulfur (n = 0, 1, 2, 3 …)..
Cyclin is a family of proteins that controls the progression of a cell through the cell cycle by activating cyclin-dependent kinase (CDK) enzymes or group of enzymes required for synthesis of cell cycle. (W)

Expression of human cyclins through the cell cycle.

cytochrome c

Three-dimensional structure of cytochrome c (green) with a heme molecule coordinating a central Iron atom (orange). PDB id, 1HRC, Bushnell et al., “High-resolution three-dimensional structure of horse heart cytochrome c.” J Mol Biol. 1990 Jul 20;214(2):585-95. PubMed PMID: 2166170.  
The cytochrome complex, or cyt c is a small hemeprotein found loosely associated with the inner membrane of the mitochondrion. It belongs to the cytochrome c family of proteins and plays a major role in cell apoptosis. Cytochrome c is highly water-soluble, unlike other cytochromes, and is an essential component of the electron transport chain, where it carries one electron. It is capable of undergoing oxidation and reduction as its iron atom converts between the ferrous and ferric forms, but does not bind oxygen. It transfers electrons between Complexes III (Coenzyme Q – Cyt C reductase) and IV (Cyt C oxidase). In humans, cytochrome c is encoded by the CYCS gene. (W)

Cytokines are a broad and loose category of small proteins (~5–20 kDa) important in cell signaling. Cytokines are peptides and cannot cross the lipid bilayer of cells to enter the cytoplasm. Cytokines have been shown to be involved in autocrine, paracrine and endocrine signaling as immunomodulating agents. Their definite distinction from hormones is still part of ongoing research.

Cytokines include chemokines, interferons, interleukins, lymphokines, and tumour necrosis factors, but generally not hormones or growth factors (despite some overlap in the terminology). Cytokines are produced by a broad range of cells, including immune cells like macrophages, B lymphocytes, T lymphocytes and mast cells, as well as endothelial cells, fibroblasts, and various stromal cells; a given cytokine may be produced by more than one type of cell. They act through cell surface receptors and are especially important in the immune system; cytokines modulate the balance between humoral and cell-based immune responses, and they regulate the maturation, growth, and responsiveness of particular cell populations. Some cytokines enhance or inhibit the action of other cytokines in complex ways. They are different from hormones, which are also important cell signaling molecules. Hormones circulate in higher concentrations, and tend to be made by specific kinds of cells. Cytokines are important in health and disease, specifically in host immune responses to infection, inflammation, trauma, sepsis, cancer, and reproduction.

The word comes from Greek: cyto, from Greek "κύτος" kytos "cavity, cell" + kines, from Greek "κίνησις" kinēsis "movement". (W)

3D medical animation still showing secretion of cytokines.


Dalton (unit)

The dalton or unified atomic mass unit (symbols: Da or u) is a unit of mass widely used in physics and chemistry. It is defined as 1/12 of the mass of an unbound neutral atom of carbon-12 in its nuclear and electronic ground state and at rest. The atomic mass constant, denoted mu is defined identically, giving mu = m(12C)/12 = 1 Da.

This unit is commonly used in physics and chemistry to express the mass of atomic-scale objects, such as atoms, molecules, and elementary particles, both for discrete instances and multiple types of ensemble averages. For example, an atom of helium-4 has a mass of 4.0026 Da. This is an intrinsic property of the isotope and all helium-4 have the same mass. Acetylsalicylic acid (aspirin), C9H8O4, has an average mass of approximately 180.157 Da. However, there are no acetylsalicylic acid molecules with this mass. The two most common masses of individual acetylsalicylic acid molecules are 180.04228 Da and 181.04565 Da.

The molecular masses of proteins, nucleic acids, and other large polymers are often expressed with the units kilodaltons (kDa), megadaltons (MDa), etc. Titin, one of the largest known proteins, has a molecular mass of between 3 and 3.7 megadaltons. The DNA of chromosome 1 in the human genome has about 249 million base pairs, each with an average mass of about 650 Da, or 156 GDa total. (W)


The degradosome is a multiprotein complex present in most bacteria that is involved in the processing of ribosomal RNA and the degradation of messenger RNA and is regulated by Non-coding RNA. It contains the proteins RNA helicase B, RNase E and Polynucleotide phosphorylase.

The store of cellular RNA in the cells is constantly fluctuating. For example, in Escherichia coli, Messenger RNA's life expectancy is between 2 and 25 minutes, in other bacteria it might last longer. Even in resting cells, RNA is degraded in a steady state, and the nucleotide products of this process are later reused for fresh rounds of nucleic acid synthesis. RNA turnover is very important for gene regulation and quality control.

All organisms have various tools for RNA degradation, for instance ribonucleases, helicases, 3'-end nucleotidyltransferases (which add tails to transcripts), 5'-end capping and decapping enzymes and assorted RNA-binding proteins that help to model RNA for presentation as substrate or for recognition. Frequently, these proteins associate into stable complexes in which their activities are coordinate or cooperative. Many of these RNA metabolism proteins are represented in the components of the multi-enzyme RNA degradosome of Escherichia coli, which is constituted by four basic components: the hydrolytic endo-ribonuclease RNase E, the phosphorolytic exo-ribonuclease PNPase, the ATP-dependent RNA helicase (RhIB) and a glycolytic enzyme enolase.

The RNA degradosome was discovered in two different laboratories while they were working on the purification and characterization of E. coli, RNase E and the factors that could have an influence on the activity of the RNA-degrading enzymes, concretely, PNPase. It was found while two of its major compounds were being studied. (W)

This would represent the basic structure of RNA Degradosome. The structure has been drawn symmetrically, however, it's a dynamic structure so the noncatalytic region of RNase E would form a random coil, and each of these coils would act independently from the other ones.

This picture shows RNA's degradation process with the specific phases.
deletion (genetics)

In genetics, a deletion (also called gene deletion, deficiency, or deletion mutation) (sign: Δ) is a mutation (a genetic aberration) in which a part of a chromosome or a sequence of DNA is left out during DNA replication. Any number of nucleotides can be deleted, from a single base to an entire piece of chromosome.

The smallest single base deletion mutations occur by a single base flipping in the template DNA, followed by template DNA strand slippage, within the DNA polymerase active site.

Deletions can be caused by errors in chromosomal crossover during meiosis, which causes several serious genetic diseases. Deletions that do not occur in multiples of three bases can cause a frameshift by changing the 3-nucleotide protein reading frame of the genetic sequence. Deletions are representative of eukaryotic organisms, including humans and not in prokaryotic organisms, such as bacteria. (W)

Deletion on a chromosome.
denaturation (biochemistry)

Denaturation is a process in which proteins or nucleic acids lose the quaternary structure, tertiary structure, and secondary structure which is present in their native state, by application of some external stress or compound such as a strong acid or base, a concentrated inorganic salt, an organic solvent (e.g., alcohol or chloroform), radiation or heat.If proteins in a living cell are denatured, this results in disruption of cell activity and possibly cell death. Protein denaturation is also a consequence of cell death. Denatured proteins can exhibit a wide range of characteristics, from conformational change and loss of solubility to aggregation due to the exposure of hydrophobic groups. Denatured proteins lose their 3D structure and therefore cannot function.

Protein folding
is key to whether a globular or membrane protein can do its job correctly; it must be folded into the right shape to function. However, hydrogen bonds, which play a big part in folding, are rather weak and thus easily affected by heat, acidity, varying salt concentrations, and other stressors which can denature the protein. This is one reason why homeostasis is physiologically necessary in many life forms.

This concept is unrelated to denatured alcohol, which is alcohol that has been mixed with additives to make it unsuitable for human consumption. (W)

Spiegelei – Das Protein (Eiweiß) erfährt durch Zufuhr von Energie in Form von Wärme (Braten) eine Denaturierung (Gerinnung).

The effects of temperature on enzyme activity. Top - increasing temperature increases the rate of reaction (Q10 coefficient). Middle - the fraction of folded and functional enzyme decreases above its denaturation temperature. Bottom - consequently, an enzyme's optimal rate of reaction is at an intermediate temperature.

Functional proteins have four levels of structural organization: 1) Primary structure: the linear structure of amino acids in the polypeptide chain 2) Secondary structure: hydrogen bonds between peptide group chains in an alpha helix or beta sheet 3) Tertiary structure: three-dimensional structure of alpha helixes and beta helixes folded 4) Quaternary structure: three-dimensional structure of multiple polypeptides and how they fit together.

(Top) The protein albumin in the egg white undergoes denaturation and loss of solubility when the egg is cooked. (Bottom) Paperclips provide a visual analogy to help with the conceptualization of the denaturation process.
A deoxyribonuclease (DNase, for short) is an enzyme that catalyzes the hydrolytic cleavage of phosphodiester linkages in the DNA backbone, thus degrading DNA. Deoxyribonucleases are one type of nuclease, a generic term for enzymes capable of hydrolyzing phosphodiester bonds that link nucleotides. A wide variety of deoxyribonucleases are known, which differ in their substrate specificities, chemical mechanisms, and biological functions. (W)

Crystals of a DNase protein.
deoxyribose C5H10O4
Deoxyribose, or more precisely 2-deoxyribose, is a monosaccharide with idealized formula H−(C=O)−(CH2)−(CHOH)3−H. Its name indicates that it is a deoxy sugar, meaning that it is derived from the sugar ribose by loss of an oxygen atom. (W)

🛑 deoxyribose

  • Monosakkarid.
  • DNA’nın ön-bileşeni.
  • Bir DNA nükleotidi = deoxyriboz + organik baz (1′ riboz karbonuna bağlı adenin (A), thymin (T), guanin (G) ya da cytosine (C)).


Numbered ribose carbons on cytidine.



The structure of the chain form of 2-deoxy-Dribose as verified at CAS Common Chemistry.

A model of D-deoxyribose in chain form.

Chemical structure of D-deoxyribose.

A model of D-deoxyribose.

📂 Deoxyribose

Deoxyribose C5H10O4 (W)

Deoxyribose, or more precisely 2-deoxyribose, is a monosaccharide with idealized formula H−(C=O)−(CH2)−(CHOH)3−H. Its name indicates that it is a deoxy sugar, meaning that it is derived from the sugar ribose by loss of an oxygen atom. Since the pentose sugars arabinose and ribose only differ by the stereochemistry at C2′, 2-deoxyribose and 2-deoxyarabinose are equivalent, although the latter term is rarely used because ribose, not arabinose, is the precursor to deoxyribose.


Deoxyribose was discovered in 1929 by Phoebus Levene.


Several isomers exist with the formula H−(C=O)−(CH2)−(CHOH)3−H, but in deoxyribose all the hydroxyl groups are on the same side in the Fischer projection. The term "2-deoxyribose" may refer to either of two enantiomers: the biologically important d-2-deoxyribose and to the rarely encountered mirror image l-2-deoxyribose. d-2-deoxyribose is a precursor to the nucleic acid DNA.2-deoxyribose is an aldopentose, that is, a monosaccharide with five carbon atoms and having an aldehyde functional group.

In aqueous solution, deoxyribose primarily exists as a mixture of three structures: the linear form H−(C=O)−(CH2)−(CHOH)3−H and two ring forms, deoxyribofuranose ("C3′-endo"), with a five-membered ring, and deoxyribopyranose ("C2′-endo"), with a six-membered ring. The latter form is predominant (whereas the C3′-endo form is favored for ribose).

Chemical equilibrium of deoxyribose in solution


Biological importance

As a component of DNA, 2-deoxyribose derivatives have an important role in biology. The DNA (deoxyribonucleic acid) molecule, which is the main repository of genetic information in life, consists of a long chain of deoxyribose-containing units called nucleotides, linked via phosphate groups. In the standard nucleic acid nomenclature, a DNA nucleotide consists of a deoxyribose molecule with an organic base (usually adenine, thymine, guanine or cytosine) attached to the 1′ ribose carbon. The 5′ hydroxyl of each deoxyribose unit is replaced by a phosphate (forming a nucleotide) that is attached to the 3′ carbon of the deoxyribose in the preceding unit.

The absence of the 2′ hydroxyl group in deoxyribose is apparently responsible for the increased mechanical flexibility of DNA compared to RNA, which allows it to assume the double-helix conformation, and also (in the eukaryotes) to be compactly coiled within the small cell nucleus. The double-stranded DNA molecules are also typically much longer than RNA molecules. The backbone of RNA and DNA are structurally similar, but RNA is single stranded, and made from ribose as opposed to deoxyribose.

Other biologically important derivatives of deoxyribose include mono-, di-, and triphosphates, as well as 3′-5′ cyclic monophosphates.


Deoxyribose is generated from ribose 5-phosphate by enzymes called ribonucleotide reductases. These enzymes catalyse the deoxygenation process.



Dephosphorylation is the removal of a phosphate (PO43-) group from an organic compound by hydrolysis. It is a reversible post-translational modification. Dephosphorylation and its counterpart, phosphorylation, activate and deactivate enzymes by detaching or attaching phosphoric esters and anhydrides. A notable occurrence of dephosphorylation is the conversion of ATP to ADP and inorganic phosphate.

Dephosphorylation employs a type of hydrolytic enzyme, or hydrolase, which cleave ester bonds. The prominent hydrolase subclass used in dephosphorylation is phosphatase. Phosphatase removes phosphate groups by hydrolysing phosphoric acid monoesters into a phosphate ion and a molecule with a free hydroxyl (-OH) group.

The reversible phosphorylation-dephosphorylation reaction occurs in every physiological process, making proper function of protein phosphatases necessary for organism viability. Because protein dephosphorylation is a key process involved in cell signalling, protein phosphatases are implicated in conditions such as cardiac disease, diabetes, and Alzheimer's disease. (W)

Crystallographic structure of human phosphatase and tensin homolog (PTEN). The active site of the blue N-terminal phosphatase domain is shown in yellow. The C-terminal C2 domain is shown in red.


Deprotonation (or dehydronation) is the removal (transfer) of a proton (or hydron, or hydrogen cation), (H+) from a Brønsted–Lowry acid in an acid-base reaction. The species formed is the conjugate base of that acid. The complementary process, when a proton is added (transferred) to a Brønsted–Lowry base, is protonation (or hydronation). The species formed is the conjugate acid of that base. (W)

Deprotonation of acetic acid by a hydroxide ion.
A dipeptide is an organic compound derived from two amino acids. The constituent amino acids can be the same or different. When different, two isomers of the dipeptide are possible, depending on the sequence. Several dipeptides are physiologically important, and some are both physiologically and commercially significant. A well known dipeptide is aspartame, an artificial sweetener. (W)

Glycylglycine is the simplest dipeptide.
direct DNA damage

Direct DNA damage can occur when DNA directly absorbs a UVB photon, or for numerous other reasons. UVB light causes thymine base pairs next to each other in genetic sequences to bond together into pyrimidine dimers, a disruption in the strand, which reproductive enzymes cannot copy. It causes sunburn and it triggers the production of melanin.

Other names for the "direct DNA damage" are:

Due to the excellent photochemical properties of DNA, this nature-made molecule is damaged by only a tiny fraction of the absorbed photons. DNA transforms more than 99.9% of the photons into harmless heat (but the damage from the remaining < 0.1% is still enough to cause sunburn). The transformation of excitation energy into harmless heat occurs via a photochemical process called internal conversion. In DNA, this internal conversion is extremely fast, and therefore efficient. This ultrafast (subpicosecond) internal conversion is a powerful photoprotection provided by single nucleotides. However, the Ground-State Recovery is much slower (picoseconds) in G·C−DNA duplexes and hairpins. It is presumed to be even slower for double-stranded DNA in conditions of the nucleus. The absorption spectrum of DNA shows a strong absorption for UVB radiation and a much lower absorption for UVA radiation. Since the action spectrum of sunburn is indistinguishable from the absorption spectrum of DNA, it is generally accepted that the direct DNA damages are the cause of sunburn. While the human body reacts to direct DNA damages with a painful warning signal, no such warning signal is generated from indirect DNA damage. (W)

directionality (molecular biology)

Directionality, in molecular biology and biochemistry, is the end-to-end chemical orientation of a single strand of nucleic acid. In a single strand of DNA or RNA, the chemical convention of naming carbon atoms in the nucleotide sugar-ring means that there will be a 5′-end (usually pronounced "five prime end" ), which frequently contains a phosphate group attached to the 5′ carbon of the ribose ring, and a 3′-end (usually pronounced "three prime end"), which typically is unmodified from the ribose -OH substituent. In a DNA double helix, the strands run in opposite directions to permit base pairing between them, which is essential for replication or transcription of the encoded information. (W)

A furanose (sugar-ring) molecule with carbon atoms labeled using standard notation. The 5′ is upstream; the 3′ is downstream. DNA and RNA are synthesized in the 5′ to 3′ direction..

In the DNA segment shown, the 5′ to 3′ directions are down the left strand and up the right strand.

A disaccharide (also called a double sugar or bivose) is the sugar formed when two monosaccharides (simple sugars) are joined by glycosidic linkage. Like monosaccharides, disaccharides are soluble in water. Three common examples are sucrose, lactose, and maltose. (W)

Sucrose, a disaccharide formed from condensation of a molecule of glucose and a molecule of fructose.

disulfides, organic

Symmetrical disulfides are compounds of the formula R2S2. Most disulfides encountered in organo sulfur chemistry are symmetrical disulfides. Unsymmetrical disulfides (also called heterodisulfides) are compounds of the formula RSSR'. They are less common in organic chemistry, but most disulfides in nature are unsymmetrical. (W)

cystine, crosslinker in many proteins.

lipoic acid, a vitamin.

Ph2S2, a common organic disulfide.


DNA 📤 DNA Learning Center
📤 OpenStax

Deoxyribonucleic acid (DNA) is a molecule composed of two polynucleotide chains that coil around each other to form a double helix carrying genetic instructions for the development, functioning, growth and reproduction of all known organisms and many viruses. DNA and ribonucleic acid (RNA) are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides), nucleic acids are one of the four major types of macromolecules that are essential for all known forms of life. (W)

(a) Each DNA nucleotide is made up of a sugar, a phosphate group, and a base. (b) Cytosine and thymine are pyrimidines. Guanine and adenine are purines. (L)

DNA (a) forms a double stranded helix, and (b) adenine pairs with thymine and cytosine pairs with guanine.

The difference between the ribose found in RNA and the deoxyribose found in DNA is that ribose has a hydroxyl group at the 2' carbon.

A eukaryote contains a well-defined nucleus, whereas in prokaryotes, the chromosome lies in the cytoplasm in an area called the nucleoid.

These figures illustrate the compaction of the eukaryotic chromosome.

Each chromosome consists of one continuous thread-like molecule of DNA coiled tightly around proteins, and contains a portion of the 6,400,000,000 basepairs (DNA building blocks) that make up your DNA. (L)

DNA adduct
In molecular genetics, a DNA adduct is a segment of DNA bound to a cancer-causing chemical. This process could be the start of a cancerous cell, or carcinogenesis. DNA adducts in scientific experiments are used as biomarkers of exposure and as such are themselves measured to reflect quantitatively, for comparison, the amount of carcinogen exposure to the subject organism, for example rats or other living animals. Under experimental conditions for study, such DNA adducts are induced by known carcinogens, of which commonly used is DMBA (7,12-dimethylbenz(a)anthracene). For example, the term "DMBA-DNA adduct" in a scientific journal refers to a piece of DNA that has DMBA attached to it. The presence of such an adduct indicates prior exposure to a potential carcinogen, but does not by itself indicate the presence of cancer in the subject animal. (W)

A metabolite of benzo[a]pyrene forms an intercalated DNA adduct, at center.

DNA damaged by carcinogenic 2-aminofluorene.

Structures of DNA damaged by the carcinogenic aromatic amine 2-aminofluorene (AF). Left: AF in the B-DNA major groove, the predominant structure at a mutational coldspot. Right: AF inserted into the helix with displacement of the damaged guanine, the predominant structure at a mutational hotspot. Color code: AF: blue; AF-damaged guanine: yellow; cytosine partner to damaged guanine: gray. Molecular Understanding of Mutagenicity Brian E. Hingerty, Oak Ridge National Laboratory Suse Broyde, New York University Dinshaw J. Patel, Memorial Sloan Kettering Cancer Center Research Objectives To elucidate why certain DNA base sequences are mutational hotspots when damaged by carcinogenic environmental chemicals. Computational Approach Molecular mechanics calculations in combination with data from NMR experiments in the form of distances between hydrogens on the carcinogen-damaged DNA molecule are employed to produce molecular views of the damaged DNA that are in agreement with the data. The computations are carried out with the molecular mechanics program DUPLEX on the Cray C90. Accomplishments The aromatic amines are a category of environmental carcinogens present in tobacco smoke, automobile exhaust, dyes and other industrial products, and broiled meats and fish. These substances, when activated biochemically, can bind to DNA and subsequently cause a mutation when the DNA replicates. Such mutations are widely believed to be the initiating event in carcinogenesis by these substances. Often, the target base in the DNA to which the carcinogen binds is guanine (G). Interestingly, it has been found that a carcinogen-bound guanine may be highly mutagenic (a hotspot) or weakly or non-mutagenic, depending on what the neighbor bases are. One example of such a sequence that has been of considerable interest comes from the E. coli bacterium. It is known as the NarI sequence and contains the bases G1-G2-C-G3, where C is the base cytosine. Surprisingly, G3 is a mutational hotspot when bound by certain aromatic amine carcinogens while G1 and G2 are not. The underlying reason for this difference has been a mystery and is of great importance because it is a paradigm for mutational hotspots, such as in the p53 gene, which are found mutated in many human tumors. We have elucidated the structure of a DNA duplex containing the NarI sequence linked at G1, G2, or G3 with a model aromatic amine carcinogen known as 2-aminofluorene (AF), using a combination of high-resolution NMR solution studies and molecular mechanics computations. These studies have revealed a striking difference in structure when the carcinogen damage is at G3, compared to G1 or G2. When the AF is at G1 or G2, it resides preponderantly in the major groove of an unperturbed B-DNA double helix. However, when the AF is at G3, it resides half the time in a position where it is inserted into the helix, causing the damaged guanine to be displaced from its normal helix-inserted position. It is plausible that this structural distortion, if also present during DNA replication in the cell, could be responsible for the failure of the DNA to replicate normally when the hotspot is damaged, leading to the mutatagenic consequence. Significance This work is the first delineation of structural distinctions between mutagenic hotspots and coldspots, revealing how subtle differences in base sequence can produce remarkable differences in structure that can explain the hotspot phenomenon. Publications Mao B., Gu Z., Hingerty B. E., Broyde S. and Patel D. J. N. d. Solution structure of the aminofluorene [AF]-intercalated conformer of the syn [AF]-C8-dG adduct opposite dC in a DNA duplex. Biochemistry, In Press. Mao B., Gu Z., Hingerty B. E., Broyde S. and Patel D. J. N. d. Solution structure of the aminofluorene [AF]-external conformer of the anti [AF]-C8-dG adduct opposite dC in a DNA duplex. Biochemistry, In Press.

DNA, antisense

The DNA sense strand looks like the messenger RNA (mRNA) transcript, and can therefore be used to read the expected codon sequence that will ultimately be used during translation (protein synthesis) to build an amino acid sequence and then a protein. For example, the sequence "ATG" within a DNA sense strand corresponds to an "AUG" codon in the mRNA, which codes for the amino acid methionine. However, the DNA sense strand itself is not used as the template for the mRNA; it is the DNA antisense strand which serves as the source for the protein code, because, with bases complementary to the DNA sense strand, it is used as a template for the mRNA. Since transcription results in an RNA product complementary to the DNA template strand, the mRNA is complementary to the DNA antisense strand.

Hence, a base triplet 3′-TAC-5′ in the DNA antisense strand (complementary to the 5′-ATG-3′ of the DNA sense strand) is used as the template which results in a 5′-AUG-3′ base triplet in the mRNA. The DNA sense strand will have the triplet ATG, which looks similar to the mRNA triplet AUG but will not be used to make methionine because it will not be directly used to make mRNA. The DNA sense strand is called a "sense" strand not because it will be used to make protein (it won't be), but because it has a sequence that corresponds directly to the RNA codon sequence. By this logic, the RNA transcript itself is sometimes described as "sense". (W)

Schematic showing how antisense DNA strands can interfere with protein translation.
DNA-binding domain
DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence (a recognition sequence) or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure. (W)

Example of a DNA-binding domain in the context of a protein. The N-terminal DNA-binding domain (labeled) of Lac repressor is regulated by a C-terminal regulatory domain (labeled). The regulatory domain binds an allosteric effector molecule (green). The allosteric response of the protein is communicated from the regulatory domain to the DNA binding domain through the linker region.

Annotated structure of LacI dimer bound to operator DNA and OPNF based on PDB structure 1efa. Created by user SocratesJedi using UCSF Chimera on 2011-10-27. Molecular graphics images were produced using the UCSF Chimera package from the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco (supported by NIH P41 RR001081).

Crystallographic structure (PDB1R4O​) of a dimer of the zinc finger containing DBD of the glucocorticoid receptor (top) bound to DNA (bottom). Zinc atoms are represented by grey spheres and the coordinating cysteine sidechains are depicted as sticks.

DNA-binding protein

DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, because it exposes more functional groups that identify a base pair. However, there are some known minor groove DNA-binding ligands such as netropsin, distamycin, Hoechst 33258, pentamidine, DAPI and others. (W)

Cro protein complex with DNA.

Interaction of DNA (orange) with histones (blue). These proteins' basic amino acids bind to the acidic phosphate groups on DNA.

The lambda repressor helix-turn-helix transcription factor bound to its DNA target.

DNA, chloroplast

Chloroplasts have their own DNA, often abbreviated as cpDNA. It is also known as the plastome when referring to genomes of other plastids. Its existence was first proven in 1962. The first complete chloroplast genome sequences were published in 1986, Nicotiana tabacum (tobacco) by Sugiura and colleagues and Marchantia polymorpha (liverwort) by Ozeki et al. Since then, hundreds of chloroplast DNAs from various species have been sequenced, but they are mostly those of land plants and green algaeglaucophytes, red algae, and other algae groups are extremely underrepresented, potentially introducing some bias in views of "typical" chloroplast DNA structure and content. (W)

Gene map of tobacco chloroplast DNA. Data was taken from this paper, and input into a Open document spreadsheet to calculate the degree and radian measures, and to generate the SVG data.
Each DNA segment is identified with a path id in the source SVG(albeit not labeled with a visible text object)
Segments with labels on the outside are located on the A strand, segments with labels on the inside are located on the B strand. Segments narrower than the surrounding ones (the notches) indicate introns. Unlabeled faded segments represent open reading frames. (W)

The 154 kb chloroplast DNA map of a model flowering plant (Arabidopsis thaliana: Brassicaceae) showing genes and inverted repeats.

Chloroplast genome map of Arabidopsis thaliana (154,478 bp ; NCBI accession number NC_000932.1Sato S., Nakamura Y., Kaneko T., Asamizu E. and Tabata S., 1999. Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Research 6 (5), 283-290.). The 136 genes are color coded: small subunit ribosomal proteins (rps, yellow), large subunit ribosomal proteins (rpl, orange), hypothetical chloroplast open reading frames (ycf, lemon), protein-coding genes either involved in photosynthetic reactions (green) or in other functions (red), ribosomal RNAs (rrn, blue), and transfer RNAs (trn, black). Introns are in grey. Some genes consist of 5′ and 3′ portions. Strand 1 and 2 genes are transcribed clockwise and counterclockwise, respectively.

The innermost circle provides the boundaries of the large and small single copy regions (LSC and SSC, violet) separated by a pair of inverted repeats (IRa and IRb, black). (W)

Chloroplast DNA replication via multiple D loop mechanisms. Adapted from Krishnan NM, Rao BJ's paper "A comparative approach to elucidate chloroplast genome replication."

DNA codon table

List of genetic codes (LINK)

The standard genetic code is traditionally represented as an RNA codon table because, when proteins are made in a cell by ribosomes, it is mRNA that directs protein synthesis. The mRNA sequence is determined by the sequence of genomic DNA. In this context, the standard genetic code is referred to as "translation table 1." However, with the rise of computational biology and genomics, most genes are now discovered at the DNA level, so a DNA codon table is becoming increasingly useful. The DNA codons in such tables occur on the sense DNA strand and are arranged in a 5' → 3' direction. There is the existence of symmetrical and asymmetrical characteristics in genetic codes.

There are 64 different codons in DNA and the below tables; all but three specify an amino acid. These three other codons, deemed stop codons, have specific names: UAG is amber, UGA is opal (sometimes also called umber), and UAA is ochre. Also called "termination" or "nonsense" codons, these sequences signal the release of the nascent polypeptide from the ribosome. Another three codons, which specify an amino acid, are called start codons. The most common start codon is AUG, which is read as methionine. Alternative start codons depending on the organism include "GUG" or "UUG"; these codons normally represent valine and leucine, respectively, but as start codons they are translated as methionine or formylmethionine. These start codons, along with sequences such as an initiation factor, initiate translation.

The first table, the standard table, can be used to translate nucleotide triplets into the corresponding amino acid or the appropriate signal if it is a start or stop codon. The second table, appropriately called the inverse, does the opposite: it can be used to deduce a possible triplet code if the amino acid order is known. (W)

The three consecutive DNA bases, called nucleotide triplets or codons, are translated into amino acids (GCA to Alanine, AGA to arginine, GAT to Aspartic acid, AAT to asparagine, and TGT to cysteine in this example).

Standard DNA codon table


A The codon ATG both codes for methionine and serves as an initiation site: the first ATG in an mRNA's coding region is where translation into protein begins. The other start codons listed by GenBank are rare in eukaryotes and generally codes for Met/fMet.
B ^ ^ ^ The historical basis for designating the stop codons as amber, ochre and opal is described in the autobiography of Sydney Brenner[8] and in a historical article by Bob Edgar.
DNA computing
DNA computing is an emerging branch of computing which uses DNA, biochemistry, and molecular biology hardware, instead of the traditional silicon-based computer technologies. Research and development in this area concerns theory, experiments, and applications of DNA computing. Although the field originally started with the demonstration of a computing application by Len Adleman in 1994, it has now been expanded to several other avenues such as the development of storage technologies, nanoscale imaging modalities, synthetic controllers and reaction networks, etc. (W)

DNA condensation

DNA condensation refers to the process of compacting DNA molecules in vitro or in vivo. Mechanistic details of DNA packing are essential for its functioning in the process of gene regulation in living systems. Condensed DNA often has surprising properties, which one would not predict from classical concepts of dilute solutions. Therefore, DNA condensation in vitro serves as a model system for many processes of physics, biochemistry and biology. In addition, DNA condensation has many potential applications in medicine and biotechnology.

DNA diameter is about 2 nm, while the length of a stretched single molecule may be up to several dozens of centimetres depending on the organism. Many features of the DNA double helix contribute to its large stiffness, including the mechanical properties of the sugar-phosphate backbone, electrostatic repulsion between phosphates (DNA bears on average one elementary negative charge per each 0.17 nm of the double helix), stacking interactions between the bases of each individual strand, and strand-strand interactions. DNA is one of the stiffest natural polymers, yet it is also one of the longest molecules. This means that at large distances DNA can be considered as a flexible rope, and on a short scale as a stiff rod. Like a garden hose, unpacked DNA would randomly occupy a much larger volume than when it is orderly packed. Mathematically, for a non-interacting flexible chain randomly diffusing in 3D, the end-to-end distance would scale as a square root of the polymer length. For real polymers such as DNA, this gives only a very rough estimate; what is important, is that the space available for the DNA in vivo is much smaller than the space that it would occupy in the case of a free diffusion in the solution. To cope with volume constraints, DNA can pack itself in the appropriate solution conditions with the help of ions and other molecules. Usually, DNA condensation is defined as "the collapse of extended DNA chains into compact, orderly particles containing only one or a few molecules".(3) This definition applies to many situations in vitro and is also close to the definition of DNA condensation in bacteria as "adoption of relatively concentrated, compact state occupying a fraction of the volume available".(4) In eukaryotes, the DNA size and the number of other participating players are much larger, and a DNA molecule forms millions of ordered nucleoprotein particles, the nucleosomes, which is just the first of many levels of DNA packing.

  1. Bloomfield, VA (1997). "DNA condensation by multivalent cations". Biopolymers. 44 (3): 269–82. CiteSeerX doi:10.1002/(SICI)1097-0282(1997)44:3<269::AID-BIP6>3.0.CO;2-T. PMID 9591479.
  2. ^ Zimmerman, SB; Murphy, LD (1996). "Macromolecular crowding and the mandatory condensation of DNA in bacteria". FEBS Letters. 390 (3): 245–8. doi:10.1016/0014-5793(96)00725-9. PMID 8706869.


Basic units of genomic organization in bacteria and eukaryotes.

Basic units of genomic organization in bacteria and eukaryotes Genomic DNA, depicted as a grey line, is negatively supercoiled in both bacteria and eukaryotes. However, the negatively supercoiled DNA is organized in the plectonemic form in bacteria, whereas it is organized in the toroidal form in eukaryotes. Nucleoid associated proteins (NAPs), shown as colored spheres, restrain half of the plectonemic supercoils, whereas almost all of the toroidal supercoils are induced as well as restrained by nucleosomes (colored orange), formed by wrapping of DNA around histones.

Different levels of DNA condensation. (1) Single DNA strand. (2) Chromatin strand (DNA with histones). (3) Chromatin during interphase with centromere. (4) Condensed chromatin during prophase. (Two copies of the DNA molecule are now present) (5) Chromosome during metaphase.
DNA damage (naturally occurring)

DNA damage is distinctly different from mutation, although both are types of error in DNA. DNA damage is an abnormal chemical structure in DNA, while a mutation is a change in the sequence of standard base pairs. DNA damages cause changes in the structure of the genetic material and prevents the replication mechanism from functioning and performing properly.

DNA damage and mutation have different biological consequences. While most DNA damages can undergo DNA repair, such repair is not 100% efficient. Un-repaired DNA damages accumulate in non-replicating cells, such as cells in the brains or muscles of adult mammals, and can cause aging. (Also see DNA damage theory of aging.) In replicating cells, such as cells lining the colon, errors occur upon replication past damages in the template strand of DNA or during repair of DNA damages. These errors can give rise to mutations or epigenetic alterations. Both of these types of alteration can be replicated and passed on to subsequent cell generations. These alterations can change gene function or regulation of gene expression and possibly contribute to progression to cancer.

Throughout the cell cycle there are various checkpoints to ensure the cell is in good condition to progress to mitosis. The three main checkpoints are at G1/s, G2/m, and at the spindle assembly checkpoint regulating progression through anaphase. G1 and G2 checkpoints involve scanning for damaged DNA. During S phase the cell is more vulnerable to DNA damage than any other part of the cell cycle. G2 checkpoint checks for damaged DNA and DNA replication completeness. DNA damage is an alteration in the chemical structure of DNA, such as a break in a strand of DNA, a base missing from the backbone of DNA, or a chemically changed base such as 8-OHdG. DNA damage can occur naturally or via environmental factors. The DNA damage response (DDR) is a complex signal transduction pathway which recognizes when DNA is damaged and initiates the cellular response to the damage. (W)

DNA damage in non-replicating cells, if not repaired and accumulated can lead to aging. DNA damage in replicating cells, if not repaired can lead to either apoptosis or to cancer.

Initiation of DNA demethylation at a CpG site. In adult somatic cells DNA methylation typically occurs in the context of CpG dinucleotides (CpG sites), forming 5-methylcytosine-pG, or 5mCpG. Reactive oxygen species (ROS) may attack guanine at the dinucleotide site, forming 8-hydroxy-2'-deoxyguanosine (8-OHdG), and resulting in a 5mCp-8-OHdG dinucleotide site. The base excision repair enzyme OGG1 targets 8-OHdG and binds to the lesion without immediate excision. OGG1, present at a 5mCp-8-OHdG site recruits TET1 and TET1 oxidizes the 5mC adjacent to the 8-OHdG. This initiates demethylation of 5mC.
DNA damage theory of aging
The DNA damage theory of aging proposes that aging is a consequence of unrepaired accumulation of naturally occurring DNA damages. Damage in this context is a DNA alteration that has an abnormal structure. Although both mitochondrial and nuclear DNA damage can contribute to aging, nuclear DNA is the main subject of this analysis. Nuclear DNA damage can contribute to aging either indirectly (by increasing apoptosis or cellular senescence) or directly (by increasing cell dysfunction). (W)
DNA digital data storage

DNA digital data storage is the process of encoding and decoding binary data to and from synthesized strands of DNA.

While DNA as a storage medium has enormous potential because of its high storage density, its practical use is currently severely limited because of its high cost and very slow read and write times.

In June 2019, scientists reported that all 16 GB of text from Wikipedia's English-language version have been encoded into synthetic DNA. (W)

DNA-directed RNA interference

DNA-directed RNA interference (ddRNAi) is a gene-silencing technique that utilizes DNA constructs to activate an animal cell's endogenous RNA interference (RNAi) pathways. DNA constructs are designed to express self-complementary double-stranded RNAs, typically short-hairpin RNAs (shRNA), that once processed bring about silencing of a target gene or genes. Any RNA, including endogenous mRNAs or viral RNAs, can be silenced by designing constructs to express double-stranded RNA complementary to the desired mRNA target.

This mechanism has great potential as a novel therapeutic to silence disease-causing genes. Proof-of-concept has been demonstrated across a range of disease models, including viral diseases such as HIV, hepatitis B or hepatitis C, or diseases associated with altered expression of endogenous genes such as drug-resistant lung cancer, neuropathic pain, advanced cancer and retinitis pigmentosa. (W)

Figure 1: ddRNAi mechanism of action.

Figure 2: ddRNAi DNA constructs.
Aligned ddRNAi construct designs for the production of therapeutic shRNAs

DNA ends (Sticky and blunt ends)

DNA ends refer to the properties of the end of DNA molecules, which may be sticky or blunt based on the enzyme which cuts the DNA. The restriction enzyme belong to a larger class of enzymes called exonucleases and endonucleases. Exonucleases remove nucleotide from ends whereas endonuclease cuts at specific position within the DNA.

The concept is used in molecular biology, especially in cloning or when subcloning inserts DNA into vector DNA. Such ends may be generated by restriction enzymes that cut the DNA – a staggered cut generates two sticky ends, while a straight cut generates blunt ends.

Single-stranded DNA molecules

A single-stranded non-circular DNA molecule has two non-identical ends, the 3' end and the 5' end (usually pronounced "three prime end" and "five prime end"). The numbers refer to the numbering of carbon atoms in the deoxyribose, which is a sugar forming an important part of the backbone of the DNA molecule. In the backbone of DNA the 5' carbon of one deoxyribose is linked to the 3' carbon of another by a phosphodiester bond linkage. The 5' carbon of this deoxyribose is again linked to the 3' carbon of the next, and so forth.

Variations in double-stranded molecules

When a molecule of DNA is double stranded, as DNA usually is, the two strands run in opposite directions. Therefore, one end of the molecule will have the 3' end of strand 1 and the 5' end of strand 2, and vice versa in the other end. However, the fact that the molecule is two stranded allows numerous different variations.

Blunt ends

The simplest DNA end of a double stranded molecule is called a blunt end. Blunt end otherwise called as non cohesive restriction enzyme. In a blunt-ended molecule both strands terminate in a base pair. Blunt ends are not always desired in biotechnology since when using a DNA ligase to join two molecules into one, the yield is significantly lower with blunt ends. When performing subcloning, it also has the disadvantage of potentially inserting the insert DNA in the opposite orientation desired. On the other hand, blunt ends are always compatible with each other. Here is an example of a small piece of blunt-ended DNA:



Overhangs and sticky ends

Non-blunt ends are created by various overhangs. An overhang is a stretch of unpaired nucleotides in the end of a DNA molecule. These unpaired nucleotides can be in either strand, creating either 3' or 5' overhangs. These overhangs are in most cases palindromic.

The simplest case of an overhang is a single nucleotide. This is most often adenosine and is created as a 3' overhang by some DNA polymerases. Most commonly this is used in cloning PCR products created by such an enzyme. The product is joined with a linear DNA molecule with a 3' thymine overhang. Since adenine and thymine form a base pair, this facilitates the joining of the two molecules by a ligase, yielding a circular molecule. Here is an example of an A-overhang:


Longer overhangs are called cohesive ends or sticky ends. They are most often created by restriction endonucleases when they cut DNA. Very often they cut the two DNA strands four base pairs from each other, creating a four-base 5' overhang in one molecule and a complementary 5' overhang in the other. These ends are called cohesive since they are easily joined back together by a ligase.

For example, these two "sticky" ends are compatible:


They can form complementary base pairs in the overhang region:


Also, since different restriction endonucleases usually create different overhangs, it is possible to create a plasmid by excising a piece of DNA (using a different enzyme for each end) and then joining it to another DNA molecule with ends trimmed by the same enzymes. Since the overhangs have to be complementary in order for the ligase to work, the two molecules can only join in one orientation. This is often highly desirable in molecular biology. (W)

DNA ligase

DNA ligase is a specific type of enzyme, a ligase, (EC that facilitates the joining of DNA strands together by catalyzing the formation of a phosphodiester bond. It plays a role in repairing single-strand breaks in duplex DNA in living organisms, but some forms (such as DNA ligase IV) may specifically repair double-strand breaks (i.e. a break in both complementary strands of DNA). Single-strand breaks are repaired by DNA ligase using the complementary strand of the double helix as a template, with DNA ligase creating the final phosphodiester bond to fully repair the DNA.

DNA ligase is used in both DNA repair and DNA replication (see Mammalian ligases). In addition, DNA ligase has extensive use in molecular biology laboratories for recombinant DNA experiments (see Research applications). Purified DNA ligase is used in gene cloning to join DNA molecules together to form recombinant DNA.. (W)

DNA damage, due to environmental factors and normal metabolic processes inside the cell, occurs at a rate of 1,000 to 1,000,000 molecular lesions per cell per day. A special enzyme, DNA ligase (shown here in color), encircles the double helix to repair a broken strand of DNA. DNA ligase is responsible for repairing the millions of DNA breaks generated during the normal course of a cell's life. Without molecules that can mend such breaks, cells can malfunction, die, or become cancerous. DNA ligases catalyse the crucial step of joining breaks in duplex DNA during DNA repair, replication and recombination, and require either Adenosine triphosphate (ATP) or Nicotinamide adenine dinucleotide (NAD+) as a cofactor. Shown here is DNA ligase I repairing chromosomal damage. The three visable protein structures are: The DNA binding domain (DBD) which is bound to the DNA minor groove both upstream and downstream of the damaged area. The OB-fold domain (OBD) unwinds the DNA slightly over a span of six base pairs and is generally involved in nucleic acid binding. The Adenylation domain (AdD) contains enzymatically active residues that join the broken nucleotides together by catalyzing the formation of a phosphodiester bond between a phosphate and hydroxyl group. It is likely that all mammalian DNA ligases (Ligases I, III, and IV) have a similar ring-shaped architecture and are able to recognize DNA in a similar manner. (See:Nature Article 2004, PDF).

A pictorial example of how a ligase works (with sticky ends).

The image demonstrates how ligase (yellow oval) catalyzes two DNA fragment strands. The ligase joins the two fragments of DNA to form a longer strand of DNA by "pasting" them together.
DNA methylation

DNA methylation is a biological process by which methyl groups are added to the DNA molecule. Methylation can change the activity of a DNA segment without changing the sequence. When located in a gene promoter, DNA methylation typically acts to repress gene transcription.In mammals, DNA methylation is essential for normal development and is associated with a number of key processes including genomic imprinting, X-chromosome inactivation, repression of transposable elements, aging, and carcinogenesis.

Two of DNA's four bases, cytosine and adenine, can be methylated. Cytosine methylation is widespread in both eukaryotes and prokaryotes, even though the rate of cytosine DNA methylation can differ greatly between species: 14% of cytosines are methylated in Arabidopsis thaliana, 4% to 8% in Physarum, 7.6% in Mus musculus, 2.3% in Escherichia coli, 0.03% in Drosophila, 0.006% in Dictyostelium and virtually none (0.0002 to 0.0003%) in Caenorhabditis or fungi such as Saccharomyces cerevisiae and S. pombe (but not N. crassa).:3699 Adenine methylation has been observed in bacterial, plant, and recently in mammalian DNA, but has received considerably less attention. (W)

DNA methylation.

Representation of a DNA molecule that is methylated. The two white spheres represent methyl groups. They are bound to two cytosine nucleotide molecules that make up the DNA sequence.

This image shows a DNA molecule that is methylated on both strands on the center cytosine. DNA methylation plays an important role for epigenetic gene regulation in development and cancer. [Details: The picture shows the crystal structure of a short DNA helix with sequence "accgcCGgcgcc", which is methylated on both strands at the center cytosine. The structure was taken from the Protein Data Bank (accession number 329D), rendering was performed with VMD (Visual Molecular Dynamics rendering program) and post-processing was done in Photoshop.

Typical DNA methylation landscape in mammals.
DNA methyltransferase
In biochemistry, the DNA methyltransferase (DNA MTase, DNMT) family of enzymes catalyze the transfer of a methyl group to DNA. DNA methylation serves a wide variety of biological functions. All the known DNA methyltransferases use S-adenosyl methionine (SAM) as the methyl donor. (W)

3D structure of protein deposited with PDB code 2ar0.
DNA microarray
A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to genotype multiple regions of a genome. Each DNA spot contains picomoles (10−12 moles) of a specific DNA sequence, known as probes (or reporters or oligos). These can be a short section of a gene or other DNA element that are used to hybridize a cDNA or cRNA (also called anti-sense RNA) sample (called target) under high-stringency conditions. Probe-target hybridization is usually detected and quantified by detection of fluorophore-, silver-, or chemiluminescence-labeled targets to determine relative abundance of nucleic acid sequences in the target. The original nucleic acid arrays were macro arrays approximately 9 cm × 12 cm and the first computerized image based analysis was published in 1981. It was invented by Patrick O. Brown. (W)

Hybridization of the target to the probe.

Diagram of typical dual-colour microarray experiment.

Examples of levels of application of microarrays. Within the organisms, genes are transcribed and spliced to produce mature mRNA transcripts (red). The mRNA is extracted from the organism and reverse transcriptase is used to copy the mRNA into stable ds-cDNA (blue). In microarrays, the ds-cDNA is fragmented and fluorescently labelled (orange). The labelled fragments bind to an ordered array of complementary oligonucleotides, and measurement of fluorescent intensity across the array indicates the abundance of a predetermined set of sequences. These sequences are typically specifically chosen to report on genes of interest within the organism's genome.
DNA mismatch repair

DNA mismatch repair (MMR) is a system for recognizing and repairing erroneous insertion, deletion, and mis-incorporation of bases that can arise during DNA replication and recombination, as well as repairing some forms of DNA damage.

Mismatch repair is strand-specific. During DNA synthesis the newly synthesised (daughter) strand will commonly include errors. In order to begin repair, the mismatch repair machinery distinguishes the newly synthesised strand from the template (parental). In gram-negative bacteria, transient hemimethylation distinguishes the strands (the parental is methylated and daughter is not). However, in other prokaryotes and eukaryotes, the exact mechanism is not clear. It is suspected that, in eukaryotes, newly synthesized lagging-strand DNA transiently contains nicks (before being sealed by DNA ligase) and provides a signal that directs mismatch proofreading systems to the appropriate strand. This implies that these nicks must be present in the leading strand, and evidence for this has recently been found. Recent work has shown that nicks are sites for RFC-dependent loading of the replication sliding clamp PCNA, in an orientation-specific manner, such that one face of the donut-shape protein is juxtaposed toward the 3'-OH end at the nick. Loaded PCNA then directs the action of the MutLalpha endonuclease to the daughter strand in the presence of a mismatch and MutSalpha or MutSbeta. (W)

Diagram of DNA mismatch repair pathways. The first column depicts mismatch repair in eukaryotes, while the second depicts repair in most bacteria. The third column shows mismatch repair, to be specific in E. coli.

Micrograph showing loss of staining for MLH1 in colorectal adenocarcinoma in keeping with DNA mismatch repair (left of image) and benign colorectal mucosa (right of image).

Cartoon representation of the molecular structure of protein registered with 1h7u code.
DNA mismatch repair protein, C-terminal domain .
DNA polymerase

A DNA polymerase is a member of a family of enzymes that catalyze the synthesis of DNA molecules from nucleoside triphosphates, the molecular precursors of DNA. These enzymes are essential for DNA replication and usually work in groups to create two identical DNA duplexes from a single original DNA duplex. During this process, DNA polymerase "reads" the existing DNA strands to create two new strands that match the existing ones. These enzymes catalyze the chemical reaction

triphosphate + DNAnpyrophosphate + DNAn+1.

DNA polymerase adds nucleotides to the three prime (3')-end of a DNA strand, one nucleotide at a time. Every time a cell divides, DNA polymerases are required to duplicate the cell's DNA, so that a copy of the original DNA molecule can be passed to each daughter cell. In this way, genetic information is passed down from generation to generation. (W)

Structure of Homo sapiens DNA polymerase beta, pdb file 7ICG. A bound DNA is also indicated.

DNA polymerase moves along the old strand in the 3'–5' direction, creating a new strand having a 5'–3' direction.

DNA polymerase with proofreading ability.

DNA primase

DNA primase is an enzyme involved in the replication of DNA and is a type of RNA polymerase. Primase catalyzes the synthesis of a short RNA (or DNA in some organisms) segment called a primer complementary to a ssDNA (single-stranded DNA) template. After this elongation, the RNA piece is removed by a 5' to 3' exonuclease and refilled with DNA. (W)

Asymmetry in the synthesis of leading and lagging strands, with role of DNA primase shown.

Steps in DNA synthesis, with role of DNA primase shown.
DNA profiling

DNA profiling (also called DNA fingerprinting) is the process of determining an individual's DNA characteristics. DNA analysis intended to identify a species, rather than an individual, is called DNA barcoding.

DNA profiling is a forensic technique in criminal investigations, comparing criminal suspects' profiles to DNA evidence so as to assess the likelihood of their involvement in the crime. It is also used in parentage testing, to establish immigration eligibility, and in genealogical and medical research. DNA profiling has also been used in the study of animal and plant populations in the fields of zoology, botany, and agriculture. (W)

1: A cell sample is taken – usually a cheek swab or blood test 2: DNA is extracted from sample 3: Cleavage of DNA by restriction enzyme – the DNA is broken into small fragments 4: Small fragments are amplified by the polymerase chain reaction – results in many more fragments 5: DNA fragments are separated by electrophoresis 6: The fragments are transferred to an agar plate 7: On the agar plate specific DNA fragments are bound to a radioactive DNA probe 8: The agar plate is washed free of excess probe 9: An x-ray film is used to detect a radioactive pattern 10: The DNA is compared to other DNA samples.

Variations of VNTR allele lengths in 6 individuals.
DNA repair

DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA damage, resulting in as many as 1 million individual molecular lesions per cell per day. Many of these lesions cause structural damage to the DNA molecule and can alter or eliminate the cell's ability to transcribe the gene that the affected DNA encodes. Other lesions induce potentially harmful mutations in the cell's genome, which affect the survival of its daughter cells after it undergoes mitosis. As a consequence, the DNA repair process is constantly active as it responds to damage in the DNA structure. When normal repair processes fail, and when cellular apoptosis does not occur, irreparable DNA damage may occur, including double-strand breaks and DNA crosslinkages (interstrand crosslinks or ICLs). This can eventually lead to malignant tumors, or cancer as per the two hit hypothesis.

The rate of DNA repair is dependent on many factors, including the cell type, the age of the cell, and the extracellular environment. A cell that has accumulated a large amount of DNA damage, or one that no longer effectively repairs damage incurred to its DNA, can enter one of three possible states:

  1. an irreversible state of dormancy, known as senescence
  2. cell suicide, also known as apoptosis or programmed cell death
  3. unregulated cell division, which can lead to the formation of a tumor that is cancerous

The DNA repair ability of a cell is vital to the integrity of its genome and thus to the normal functionality of that organism. Many genes that were initially shown to influence life span have turned out to be involved in DNA damage repair and protection.

The 2015 Nobel Prize in Chemistry was awarded to Tomas Lindahl, Paul Modrich, and Aziz Sancar for their work on the molecular mechanisms of DNA repair processes. (W)

DNA damage resulting in multiple broken chromosomes.

Double-strand break repair pathway models.

(Left panel): Canonical C-NHEJ. The heterodimer Ku80-Ku70 binds to DNA ends, which then recruits DNA-PKcs. In subsequent steps, several proteins including Artemis, polynucleotide kinase (PNK), and members of the polymerase X family process the DNA ends. In the last step, ligase IV associated with its co-factors Xrcc4 and Cernunos/XLF joins the ends. (Right Panel): Resection as a common initiation step for HR and A-EJ at DSB. 53BP1, RIF1 and Ku70-80 heterodimer protect DSB ends from resection and HR and A-EJ actions. The CDK1/2-dependent phosphorylation of CtIP and EXO1 favors the initiation of resection and extension, respectively. Recently, REV7/MAD2L2 was described as an inhibitor of resection and HR, although its role in A-EJ inhibition was not directly studied and remains hypothetical. A short ssDNA resection allows for A-EJ but not homologous recombination, while a long ssDNA resection allows for both A-EJ and HR; however, HR requires the presence of homologous sequences. Recently, POLQ polymerase was shown to inhibit HR and to promote A-EJ at DSBs. A-EJ results in repair that is error-prone and is associated with deletions at the repair junctions with frequent use of microhomologies that are distant from the DSB. Alternative-EJ: Parp1 plays a role in the initiation process, and it has been proposed that a single-strand DNA resection reveals complementary microhomologies (two to four nucleotides or more in length) that can anneal, with gap-filling completing the end-joining. A-EJ is always associated with deletions at the junctions and can involve microhomologies (MMEJ or microhomologies-mediated EJ) that are distant from the DSB. Subsequently, Xrcc1 and ligase III (which can be substituted by ligase I) complete the A-EJ process. Homologous recombination: The first step, which is the initiation of resection, involves the removal of ~50–100 bases of DNA from the 5' end by the MRN complex (Mre11-Rad50-Nbs1) in conjunction with CtIP. The second step, resection extension, is carried out by two alternate pathways involving either the 5' to 3' exonuclease EXO1 or the helicase-topoisomerase complex BLM-TOPIIIα-RMI1-2 in concert with the nuclease CtIP/DNA2. WRN helicase has also been shown to act with CtIP and to stimulate resection in human cells.

DNA ligase, shown above repairing chromosomal damage, is an enzyme that joins broken nucleotides together by catalyzing the formation of an internucleotide ester bond between the phosphate backbone and the deoxyribose nucleotides.

DNA damage, due to environmental factors and normal metabolic processes inside the cell, occurs at a rate of 1,000 to 1,000,000 molecular lesions per cell per day. A special enzyme, DNA ligase (shown here in color), encircles the double helix to repair a broken strand of DNA. DNA ligase is responsible for repairing the millions of DNA breaks generated during the normal course of a cell's life. Without molecules that can mend such breaks, cells can malfunction, die, or become cancerous. DNA ligases catalyse the crucial step of joining breaks in duplex DNA during DNA repair, replication and recombination, and require either Adenosine triphosphate (ATP) or Nicotinamide adenine dinucleotide (NAD+) as a cofactor. Shown here is DNA ligase I repairing chromosomal damage. The three visable protein structures are: The DNA binding domain (DBD) which is bound to the DNA minor groove both upstream and downstream of the damaged area. The OB-fold domain (OBD) unwinds the DNA slightly over a span of six base pairs and is generally involved in nucleic acid binding. The Adenylation domain (AdD) contains enzymatically active residues that join the broken nucleotides together by catalyzing the formation of a phosphodiester bond between a phosphate and hydroxyl group. It is likely that all mammalian DNA ligases (Ligases I, III, and IV) have a similar ring-shaped architecture and are able to recognize DNA in a similar manner. (See:Nature Article 2004, PDF) . (W)

Stephen P. Bell (MIT/HHMI) | 1a: Chromosomal DNA Replication | The DNA Replication Fork


📹 Stephen P. Bell (MIT/HHMI) | 1a: Chromosomal DNA Replication: The DNA Replication Fork

Stephen P. Bell (MIT/HHMI) | 1b: Chromosomal DNA Replication | Initiation of DNA Replication


📹 Stephen P. Bell (MIT/HHMI) | 1b: Chromosomal DNA Replication | Initiation of DNA Replication

Stephen P. Bell (MIT/HHMI) | 2: Single-Molecule Studies of Eukaryotic DNA Replication


📹 Stephen P. Bell (MIT/HHMI) | 2: Single-Molecule Studies of Eukaryotic DNA Replication

DNA Replication | MIT 7.01SC Fundamentals of Biology


📹 DNA Replication | MIT 7.01SC Fundamentals of Biology

DNA replication — Cell cycle regulation

DNA replication is a tightly orchestrated process that is controlled within the context of the cell cycle. Progress through the cell cycle and in turn DNA replication is tightly regulated by the formation and activation of pre-replicative complexes (pre-RCs) which is achieved through the activation and inactivation of cyclin-dependent kinases (Cdks, CDKs). Specifically it is the interactions of cyclins and cyclin dependent kinases that are responsible for the transition from G1 into S-phase.

During the G1 phase of the cell cycle there are low levels of CDK activity. This low level of CDK activity allows for the formation of new pre-RC complexes but is not sufficient for DNA replication to be initiated by the newly formed pre-RCs. During the remaining phases of the cell cycle there are elevated levels of CDK activity. This high level of CDK activity is responsible for initiating DNA replication as well as inhibiting new pre-RC complex formation. Once DNA replication has been initiated the pre-RC complex is broken down. Due to the fact that CDK levels remain high during the S phase, G2, and M phases of the cell cycle no new pre-RC complexes can be formed. This all helps to ensure that no initiation can occur until the cell division is complete.

In addition to cyclin dependent kinases a new round of replication is thought to be prevented through the downregulation of Cdt1. This is achieved via degradation of Cdt1 as well as through the inhibitory actions of a protein known as geminin. Geminin binds tightly to Cdt1 and is thought to be the major inhibitor of re-replication. Geminin first appears in S-phase and is degraded at the metaphase-anaphase transition, possibly through ubiquination by anaphase promoting complex (APC).

Various cell cycle checkpoints are present throughout the course of the cell cycle that determine whether a cell will progress through division entirely. Importantly in replication the G1, or restriction, checkpoint makes the determination of whether or not initiation of replication will begin or whether the cell will be placed in a resting stage known as G0. Cells in the G0 stage of the cell cycle are prevented from initiating a round of replication because the minichromosome maintenance proteins are not expressed. Transition into the S-phase indicates replication has begun. (W)

DNA replication — Elongation

The formation of the pre-replicative complex (pre-RC) marks the potential sites for the initiation of DNA replication. Consistent with the minichromosome maintenance complex encircling double stranded DNA, formation of the pre-RC does not lead to the immediate unwinding of origin DNA or the recruitment of DNA polymerases. Instead, the pre-RC that is formed during the G1 of the cell cycle is only activated to unwind the DNA and initiate replication after the cells pass from the G1 to the S phase of the cell cycle.

Once the initiation complex is formed and the cells pass into the S phase, the complex then becomes a replisome. The eukaryotic replisome complex is responsible for coordinating DNA replication. Replication on the leading and lagging strands is performed by DNA polymerase ε and DNA polymerase δ. Many replisome factors including Claspin, And1, replication factor C clamp loader and the fork protection complex are responsible for regulating polymerase functions and coordinating DNA synthesis with the unwinding of the template strand by Cdc45-Mcm-GINS complex. As the DNA is unwound the twist number decreases. To compensate for this the writhe number increases, introducing positive supercoils in the DNA. These supercoils would cause DNA replication to halt if they were not removed. Topoisomerases are responsible for removing these supercoils ahead of the replication fork.

The replisome is responsible for copying the entire genomic DNA in each proliferative cell. The base pairing and chain formation reactions, which form the daughter helix, are catalyzed by DNA polymerases. These enzymes move along single-stranded DNA and allow for the extension of the nascent DNA strand by "reading" the template strand and allowing for incorporation of the proper purine nucleobases, adenine and guanine, and pyrimidine nucleobases, thymine and cytosine. Activated free deoxyribonucleotides exist in the cell as deoxyribonucleotide triphosphates (dNTPs). These free nucleotides are added to an exposed 3'-hydroxyl group on the last incorporated nucleotide. In this reaction, a pyrophosphate is released from the free dNTP, generating energy for the polymerization reaction and exposing the 5' monophosphate, which is then covalently bonded to the 3' oxygen. Additionally, incorrectly inserted nucleotides can be removed and replaced by the correct nucleotides in an energetically favorable reaction. This property is vital to proper proofreading and repair of errors that occur during DNA replication. (W)

A depiction of telomerase progressively elongating telomeric DNA.

Eukaryotic replisome complex and associated proteins. A loop occurs in the lagging strand.

DNA replication, eukaryotic

Eukaryotic DNA replication is a conserved mechanism that restricts DNA replication to once per cell cycle. Eukaryotic DNA replication of chromosomal DNA is central for the duplication of a cell and is necessary for the maintenance of the eukaryotic genome.

DNA replication is the action of DNA polymerases synthesizing a DNA strand complementary to the original template strand. To synthesize DNA, the double-stranded DNA is unwound by DNA helicases ahead of polymerases, forming a replication fork containing two single-stranded templates. Replication processes permit the copying of a single DNA double helix into two DNA helices, which are divided into the daughter cells at mitosis. The major enzymatic functions carried out at the replication fork are well conserved from prokaryotes to eukaryotes, but the replication machinery in eukaryotic DNA replication is a much larger complex, coordinating many proteins at the site of replication, forming the replisome.

The replisome is responsible for copying the entirety of genomic DNA in each proliferative cell. This process allows for the high-fidelity passage of hereditary/genetic information from parental cell to daughter cell and is thus essential to all organisms. Much of the cell cycle is built around ensuring that DNA replication occurs without errors.

In G1 phase of the cell cycle, many of the DNA replication regulatory processes are initiated. In eukaryotes, the vast majority of DNA synthesis occurs during S phase of the cell cycle, and the entire genome must be unwound and duplicated to form two daughter copies. During G2, any damaged DNA or replication errors are corrected. Finally, one copy of the genomes is segregated to each daughter cell at mitosis or M phase. These daughter copies each contain one strand from the parental duplex DNA and one nascent antiparallel strand.

This mechanism is conserved from prokaryotes to eukaryotes and is known as semiconservative DNA replication. The process of semiconservative replication for the site of DNA replication is a fork-like DNA structure, the replication fork, where the DNA helix is open, or unwound, exposing unpaired DNA nucleotides for recognition and base pairing for the incorporation of free nucleotides into double-stranded DNA. (W)

DNA Replication fork — Leading strand, lagging strand

The replication fork is a structure that forms within the long helical DNA during DNA replication. It is created by helicases, which break the hydrogen bonds holding the two DNA strands together in the helix. The resulting structure has two branching "prongs", each one made up of a single strand of DNA. These two strands serve as the template for the leading and lagging strands, which will be created as DNA polymerase matches complementary nucleotides to the templates; the templates may be properly referred to as the leading strand template and the lagging strand template.

Scheme of the replication fork. a: template, b: leading strand, c: lagging strand, d: replication fork, e: primer, f: Okazaki fragments.

Description: Depiction of DNA replication with replication fork, strands and okazaki-fragments. a: template strands, b: leading strand, c: lagging strand, d: replication fork, e: RNA primer, f: Okazaki fragment.

DNA is read by DNA polymerase in the 3′ to 5′ direction, meaning the nascent strand is synthesized in the 5' to 3' direction.
Since the leading and lagging strand templates are oriented in opposite directions at the replication fork, a major issue is how to achieve synthesis of nascent (new) lagging strand DNA, whose direction of synthesis is opposite to the direction of the growing replication fork.

Many enzymes are involved in the DNA replication fork.

DNA replication or DNA synthesis is the process of copying a double-stranded DNA molecule. This process {like many other biological processes} is paramount to all life as we know it.

Leading strand

The leading strand is the strand of nascent DNA which is synthesized in the same direction as the growing replication fork. This sort of DNA replication is continuous.

Lagging strand

The lagging strand is the strand of nascent DNA whose direction of synthesis is opposite to the direction of the growing replication fork. Because of its orientation, replication of the lagging strand is more complicated as compared to that of the leading strand. As a consequence, the DNA polymerase on this strand is seen to "lag behind" the other strand.

The lagging strand is synthesized in short, separated segments. On the lagging strand template, a primase "reads" the template DNA and initiates synthesis of a short complementary RNA primer. A DNA polymerase extends the primed segments, forming Okazaki fragments. The RNA primers are then removed and replaced with DNA, and the fragments of DNA are joined together by DNA ligase. (W)

DNA replication — Initiation

For a cell to divide, it must first replicate its DNA. DNA replication is an all-or-none process; once replication begins, it proceeds to completion. Once replication is complete, it does not occur again in the same cell cycle. This is made possible by the division of initiation of the pre-replication complex. (W)

Overview of the steps in DNA replication.

Duplication of the DNA begins with origin unwinding, followed by the synthesis of RNA primers (jagged lines) on both DNA strands. DNA polymerase delta or epsilon extends these primers by adding new DNA (green lines) only in a 5' to 3' direction. On the leading strands, this results in the continuous synthesis of long DNA molecules. Lagging strands, in contrast, are synthesized discontinuously: primers are placed on the template every ~200 nucleotides and extended to form short Okazaki fragments. For simplicity, this diagram does not show the replacement of primers with DNA or the synthesis of telomeres at the chromosome ends.

Role of initiators for initiation of DNA replication.

There are AT-rich and initiator-loading sequences on replicators. Initiators bind the initiator-loading sequences and at once the AT-rich sequences are rewound. Initiators recruit other proteins involved in DNA replication.

DNA replication — Okazaki fragments

Okazaki fragments are short sequences of DNA nucleotides (approximately 150 to 200 base pairs long in eukaryotes) which are synthesized discontinuously and later linked together by the enzyme DNA ligase to create the lagging strand during DNA replication. They were discovered in the 1960s by the Japanese molecular biologists Reiji and Tsuneko Okazaki, along with the help of some of their colleagues.

During DNA replication, the double helix is unwound and the complementary strands are separated by the enzyme DNA helicase, creating what is known as the DNA replication fork. Following this fork, DNA primase and then DNA polymerase begin to act in order to create a new complementary strand. Because these enzymes can only work in the 5’ to 3’ direction, the two unwound template strands are replicated in different ways. One strand, the leading strand, undergoes a continuous replication process since its template strand has 3’ to 5’ directionality, allowing the polymerase assembling the leading strand to follow the replication fork without interruption. The lagging strand, however, cannot be created in a continuous fashion because its template strand has 5’ to 3’ directionality, which means the polymerase must work backwards from the replication fork. This causes periodic breaks in the process of creating the lagging strand. The primase and polymerase move in the opposite direction of the fork, so the enzymes must repeatedly stop and start again while the DNA helicase breaks the strands apart. Once the fragments are made, DNA ligase connects them into a single, continuous strand. The entire replication process is considered "semi-discontinuous" since one of the new strands is formed continuously and the other is not.

During the 1960s, Reiji and Tsuneko Okazaki conducted experiments involving DNA replication in the bacterium Escherichia coli. Before this time, it was commonly thought that replication was a continuous process for both strands, but the discoveries involving E. coli led to a new model of replication. The scientists found there was a discontinuous replication process by pulse-labeling DNA and observing changes that pointed to non-contiguous replication. (W)

Asymmetry in the synthesis of leading and lagging strands.

Duplication of the DNA begins with origin unwinding, followed by the synthesis of RNA primers (jagged lines) on both DNA strands. DNA polymerase delta or epsilon extends these primers by adding new DNA (green lines) only in a 5' to 3' direction. On the leading strands, this results in the continuous synthesis of long DNA molecules. Lagging strands, in contrast, are synthesized discontinuously: primers are placed on the template every ~200 nucleotides and extended to form short Okazaki fragments. For simplicity, this diagram does not show the replacement of primers with DNA or the synthesis of telomeres at the chromosome ends.

Many enzymes are involved in the DNA replication fork.

Synthesis of Okazaki fragments.

Primase adds RNA primers onto the lagging strand, which allows synthesis of Okazaki fragments from 5' to 3'. However, primase creates RNA primers at a much lower rate than that at which DNA polymerase synthesizes DNA on the leading strand. DNA polymerase on the lagging strand also has to be continually recycled to construct Okazaki fragments following RNA primers. This makes the speed of lagging strand synthesis much lower than that of the leading strand. To solve this, primase acts as a temporary stop signal, briefly halting the progression of the replication fork during DNA replication. This molecular process prevents the leading strand from overtaking the lagging strand.

DNA replication — Pre-replication complex

A pre-replication complex (pre-RC) is a protein complex that forms at the origin of replication during the initiation step of DNA replication. Formation of the pre-RC is required for DNA replication to occur. Complete and faithful replication of the genome ensures that each daughter cell will carry the same genetic information as the parent cell. Accordingly, formation of the pre-RC is a very important part of the cell cycle. (W)

A simplified schematic of the loading of the eukaryotic pre-replication complex.

DNA replication — Replication through nucleosomes

Eukaryotic DNA must be tightly compacted in order to fit within the confined space of the nucleus. Chromosomes are packaged by wrapping 147 nucleotides around an octamer of histone proteins, forming a nucleosome. The nucleosome octamer includes two copies of each histone H2A, H2B, H3, and H4. Due to the tight association of histone proteins to DNA, eukaryotic cells have proteins that are designed to remodel histones ahead of the replication fork, in order to allow smooth progression of the replisome. There are also proteins involved in reassembling histones behind the replication fork to reestablish the nucleosome conformation.

There are several histone chaperones that are known to be involved in nucleosome assembly after replication. The FACT complex has been found to interact with DNA polymerase α-primase complex, and the subunits of the FACT complex interacted genetically with replication factors. The FACT complex is a heterodimer that does not hydrolyze ATP, but is able to facilitate "loosening" of histones in nucleosomes, but how the FACT complex is able to relieve the tight association of histones for DNA removal remains unanswered. (W)

Depiction of replication through histones. Histones are removed from DNA by the FACT complex and Asf1. Histones are reassembled onto newly replicated DNA after the replication fork by CAF-1 and Rtt106.

DNA replication — Termination

Termination of eukaryotic DNA replication requires different processes depending on whether the chromosomes are circular or linear. Unlike linear molecules, circular chromosomes are able to replicate the entire molecule. However, the two DNA molecules will remain linked together. This issue is handled by decatenation of the two DNA molecules by a type II topoisomerase. Type II topoisomerases are also used to separate linear strands as they are intricately folded into a nucleosome within the cell.

As previously mentioned, linear chromosomes face another issue that is not seen in circular DNA replication. Due to the fact that an RNA primer is required for initiation of DNA synthesis, the lagging strand is at a disadvantage in replicating the entire chromosome. While the leading strand can use a single RNA primer to extend the 5' terminus of the replicating DNA strand, multiple RNA primers are responsible for lagging strand synthesis, creating Okazaki fragments. This leads to an issue due to the fact that DNA polymerase is only able to add to the 3' end of the DNA strand. The 3'-5' action of DNA polymerase along the parent strand leaves a short single-stranded DNA (ssDNA) region at the 3' end of the parent strand when the Okazaki fragments have been repaired. Since replication occurs in opposite directions at opposite ends of parent chromosomes, each strand is a lagging strand at one end. Over time this would result in progressive shortening of both daughter chromosomes. This is known as the end replication problem.

The end replication problem is handled in eukaryotic cells by telomere regions and telomerase. Telomeres extend the 3' end of the parental chromosome beyond the 5' end of the daughter strand. This single-stranded DNA structure can act as an origin of replication that recruits telomerase. Telomerase is a specialized DNA polymerase that consists of multiple protein subunits and an RNA component. The RNA component of telomerase anneals to the single stranded 3' end of the template DNA and contains 1.5 copies of the telomeric sequence. Telomerase contains a protein subunit that is a reverse transcriptase called telomerase reverse transcriptase or TERT. TERT synthesizes DNA until the end of the template telomerase RNA and then disengages. This process can be repeated as many times as needed with the extension of the 3' end of the parental DNA molecule. This 3' addition provides a template for extension of the 5' end of the daughter strand by lagging strand DNA synthesis. Regulation of telomerase activity is handled by telomere-binding proteins. (W)

A depiction of telomerase progressively elongating telomeric DNA.

DNA replisome

The replisome is a complex molecular machine that carries out replication of DNA. The replisome first unwinds double stranded DNA into two single strands. For each of the resulting single strands, a new complementary sequence of DNA is synthesized. The net result is formation of two new double stranded DNA sequences that are exact copies of the original double stranded DNA sequence.

In terms of structure, the replisome is composed of two replicative polymerase complexes, one of which synthesizes the leading strand, while the other synthesizes the lagging strand. The replisome is composed of a number of proteins including helicase, RFC, PCNA, gyrase/topoisomerase, SSB/RPA, primase, DNA polymerase III, RNAse H, and ligase. (W)

A representation of the structures of the replisome during DNA replication.

DNA sense

Because of the complementary nature of base-pairing between nucleic acid polymers, a double-stranded DNA molecule will be composed of two strands with sequences that are reverse complements of each other. To help molecular biologists specifically identify each strand individually, the two strands are usually differentiated as the "sense" strand and the "antisense" strand. An individual strand of DNA is referred to as positive-sense (also positive (+) or simply sense) if its nucleotide sequence corresponds directly to the sequence of an RNA transcript which is translated or translatable into a sequence of amino acids (provided that any thymine bases in the DNA sequence are replaced with uracil bases in the RNA sequence). The other strand of the double-stranded DNA molecule is referred to as negative-sense (also negative (-) or antisense), and is reverse complementary to both the positive-sense strand and the RNA transcript. It is actually the antisense strand that is used as the template from which RNA polymerases construct the RNA transcript, but the complementary base-pairing by which nucleic acid polymerization occurs means that the sequence of the RNA transcript will look identical to the sense strand, apart from the RNA transcript's use of uracil instead of thymine.

Sometimes the phrases coding strand and template strand are encountered in place of sense and antisense, respectively, and in the context of a double-stranded DNA molecule the usage of these terms is essentially equivalent. However, the coding/sense strand need not always contain a code that is used to make a protein; both protein-coding and non-coding RNAs may be transcribed.

The terms "sense" and "antisense" are relative only to the particular RNA transcript in question, and not to the DNA strand as a whole. In other words, either DNA strand can serve as the sense or antisense strand. Most organisms with sufficiently large genomes make use of both strands, with each strand functioning as the template strand for different RNA transcripts in different places along the same DNA molecule. In some cases, RNA transcripts can be transcribed in both directions (i.e. on either strand) from a common promoter region, or be transcribed from within introns on either strand (see "ambisense" below). (W)

DNA sequencer

A DNA sequencer is a scientific instrument used to automate the DNA sequencing process. Given a sample of DNA, a DNA sequencer is used to determine the order of the four bases: G (guanine), C (cytosine), A (adenine) and T (thymine). This is then reported as a text string, called a read. Some DNA sequencers can be also considered optical instruments as they analyze light signals originating from fluorochromes attached to nucleotides.

The first automated DNA sequencer, invented by Lloyd M. Smith, was introduced by Applied Biosystems in 1987. It used the Sanger sequencing method, a technology which formed the basis of the “first generation” of DNA sequencers and enabled the completion of the human genome project in 2001. This first generation of DNA sequencers are essentially automated electrophoresis systems that detect the migration of labelled DNA fragments. Therefore, these sequencers can also be used in the genotyping of genetic markers where only the length of a DNA fragment(s) needs to be determined (e.g. microsatellites, AFLPs). (W)

A row of DNA sequencing machines on SteelSentry tables (3730xl DNA Analyzer machines from Applied Biosystems).
DNA sequencing

DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The advent of rapid DNA sequencing methods has greatly accelerated biological and medical research and discovery.

Knowledge of DNA sequences has become indispensable for basic biological research, and in numerous applied fields such as medical diagnosis, biotechnology, forensic biology, virology and biological systematics. Comparing healthy and mutated DNA sequences can diagnose different diseases including various cancers, characterize antibody repertoire, and can be used to guide patient treatment. Having a quick way to sequence DNA allows for faster and more individualized medical care to be administered, and for more organisms to be identified and cataloged.

The rapid speed of sequencing attained with modern DNA sequencing technology has been instrumental in the sequencing of complete DNA sequences, or genomes, of numerous types and species of life, including the human genome and other complete DNA sequences of many animal, plant, and microbial species.

The first DNA sequences were obtained in the early 1970s by academic researchers using laborious methods based on two-dimensional chromatography. Following the development of fluorescence-based sequencing methods with a DNA sequencer, DNA sequencing has become easier and orders of magnitude faster. (W)

An example of the results of automated chain-termination DNA sequencing.

The 5,386 bp genome of bacteriophage φX174. Each coloured block represents a gene.

The map shows the complete circular single-stranded DNA genome (5386 bp) of Enterobacteria phage ΦX174 (accession NC_001422). This DNA genome was the first one ever sequenced (Fred Sanger and colleagues: 1977). This genome contains 11 genes (A, A*, B-H, J, K). Genes B, K, E are overlapping with genes A, C, D. Gene A* is nested within the larger A gene, with internal translation initiation in the same reading frame as gene A.

Genomic DNA is fragmented into random pieces and cloned as a bacterial library. DNA from individual bacterial clones is sequenced and the sequence is assembled by using overlapping DNA regions.

Multiple, fragmented sequence reads must be assembled together on the basis of their overlapping areas.

An Illumina HiSeq 2500 sequencer (photo taken at the Max Planck Genomzentrum Cologne, Germany).

A BGI MGISEQ-2000RS sequencer.

Library preparation for the SOLiD platform.

Two-base encoding scheme. In two-base encoding, each unique pair of bases on the 3' end of the probe is assigned one out of four possible colors. For example, "AA" is assigned to blue, "AC" is assigned to green, and so on for all 16 unique pairs. During sequencing, each base in the template is sequenced twice, and the resulting data are decoded according to this scheme.
DNA sequencing theory
DNA sequencing theory is the broad body of work that attempts to lay analytical foundations for determining the order of specific nucleotides in a sequence of DNA, otherwise known as DNA sequencing. The practical aspects revolve around designing and optimizing sequencing projects (known as "strategic genomics"), predicting project performance, troubleshooting experimental results, characterizing factors such as sequence bias and the effects of software processing algorithms, and comparing various sequencing methods to one another. In this sense, it could be considered a branch of systems engineering or operations research. The permanent archive of work is primarily mathematical, although numerical calculations are often conducted for particular problems too. DNA sequencing theory addresses physical processes related to sequencing DNA and should not be confused with theories of analyzing resultant DNA sequences, e.g. sequence alignment. Publications sometimes do not make a careful distinction, but the latter are primarily concerned with algorithmic issues. Sequencing theory is based on elements of mathematics, biology, and systems engineering, so it is highly interdisciplinary. The subject may be studied within the context of computational biology. (W)
DNA virus
A DNA virus is a virus that has a genome made of deoxyribonucleic acid (DNA) that is replicated by a DNA polymerase. They can be divided between those that have two strands of DNA in their genome, called double-stranded DNA (dsDNA) viruses, and those that have one strand of DNA in their genome, called single-stranded DNA (ssDNA) viruses. dsDNA viruses primarily belong to two realms: Duplodnaviria and Varidnaviria, and ssDNA viruses are almost exclusively assigned to the realm Monodnaviria, which also includes dsDNA viruses. Additionally, many DNA viruses are unassigned to higher taxa. Viruses that have a DNA genome that is replicated through an RNA intermediate by a reverse transcriptase are separately considered reverse transcribing viruses and are assigned to the kingdom Pararnavirae in the realm Riboviria.

DNA viruses are ubiquitous worldwide, especially in marine environments where they form an important part of marine ecosystems, and infect both prokaryotes and eukaryotes. They appear to have multiple origins, as viruses in Monodnaviria appear to have emerged from archaeal and bacterial plasmids on multiple occasions, though the origins of Duplodnaviria and Varidnaviria are less clear. Prominent disease-causing DNA viruses include herpesviruses, papillomaviruses, and poxviruse. (W)

Orthopoxvirus particles.

Illustrated sample of Duplodnaviria virions.
Computer-generated illustration showing a sample of virion morphological diversity in the realm Duplodnaviria. From left to right: Siphoviridae (Caudovirales), Myoviridae (Caudovirales), Podoviridae (Caudovirales), Herpesviridae (Herpesvirales). .

A ribbon diagram of the MCP of Pseudoalteromonas virus PM2, with the two jelly roll folds colored in red and blue

The major capsid protein P2 from the bacteriophage PM2, illustrating the double jelly roll fold found in the major capsid proteins of double-stranded DNA viruses of the PRD1-adenovirus lineage. The two distinct jelly roll domains are colored red and blue, with intervening loops and helices colored orange. The topology and connectivity in each jelly roll is identical to that illustrated in File:4oq9 chainA jellyroll.png. The trimeric assembly is illustrated in File:2w0c_trimer.png and the full capsid in File:2w0c_capsid.png. Rendered from chain H of PDB ID 2W0C: Abrescia NG, Grimes JM, Kivelä HM, Assenberg R, Sutton GC, Butcher SJ, Bamford JK, Bamford DH, Stuart DI. Insights into virus evolution and membrane biogenesis from the structure of the marine lipid-containing bacteriophage PM2. Molecular cell. 2008 Sep 5;31(5):749-61. DOI 10.1016/j.molcel.2008.06.026.
DNase I hypersensitive site

In genetics, DNase I hypersensitive sites (DHSs) are regions of chromatin that are sensitive to cleavage by the DNase I enzyme. In these specific regions of the genome, chromatin has lost its condensed structure, exposing the DNA and making it accessible. This raises the availability of DNA to degradation by enzymes, such as DNase I. These accessible chromatin zones are functionally related to transcriptional activity, since this remodeled state is necessary for the binding of proteins such as transcription factors.

Since the discovery of DHSs 30 years ago, they have been used as markers of regulatory DNA regions. These regions have been shown to map many types of cis-regulatory elements including promoters, enhancers, insulators, silencers and locus control regions. A high-throughput measure of these regions is available through DNase-Seq. (W)

DNase I hypersensitive sites within chromatin.
double-stranded RNA viruses

Double-stranded RNA viruses (dsRNA viruses) are a polyphyletic group of viruses that have double-stranded genomes made of ribonucleic acid. The double-stranded genome is used to transcribe a positive-strand RNA by the viral RNA-dependent RNA polymerase (RdRp). The positive-strand RNA may be used as messenger RNA (mRNA) which can be translated into viral proteins by the host cell's ribosomes. The positive-strand RNA can also be replicated by the RdRp to create a new double-stranded viral genome.

Double-stranded RNA viruses are classified in two separate phyla Duplornaviricota and Pisuviricota (specifically class Duplopiviricetes), which are in the kingdom Orthornavirae and realm Riboviria. The two groups do not share a common dsRNA virus ancestor. Double-stranded RNA viruses evolved two separate times from positive-strand RNA viruses. In the Baltimore classification system, dsRNA viruses belong to Group III.

Virus group members vary widely in host range (animals, plants, fungi, and bacteria), genome segment number (one to twelve), and virion organization (T-number, capsid layers, or turrets). Double-stranded RNA viruses include the rotaviruses, known globally as a common cause of gastroenteritis in young children, and bluetongue virus, an economically significant pathogen of cattle and sheep. The family Reoviridae is the largest and most diverse dsRNA virus family in terms of host range. (W)

Electron micrograph of rotaviruses. The bar = 100 nm.
Note the wheel-like appearance of some of the rotavirus particles. The observance of such particles gave the virus its name ('rota' being the Latin word meaning wheel). Bar = 100 nanometers. Source: Cell culture. Method: Negative-stain Transmission Electron Microscopy.
duplex sequencing
Duplex sequencing is a library preparation and analysis method for next-generation sequencing (NGS) platforms that employs random tagging of double stranded DNA to detect mutations with higher accuracy and lower error rate. This method uses degenerate molecular tags in addition to sequencing adapters to recognize reads originating from each strand of DNA. The generated sequencing reads then will be analyzed using two methods: single strand consensus sequences (SSCSs) and Duplex consensus sequences (DCSs) assembly. Duplex sequencing theoretically can detect mutations with frequencies as low as 5 x 10−8 that is more than 10,000 fold higher in accuracy compared to the conventional next-generation sequencing methods (W)

Figure 1) Duplex sequencing library preparation workflow: Two adapter oligos go through several steps (Annealing, Synthesis, dT-tailing) to generate double stranded unique tags with 3'-dT-overhangs. Then the duplex tag adapters ligate to the double stranded DNA templates. Finally Illumina sequencing adapters are inserted into the tagged-DNA fragments and form the final libraries containing DS adapters, Illumina sequencing adapters and template DNA.

Figure 2) Duplex sequencing overview: Duplex tagged libraries containing sequencing adapters are amplified and result in two types of products each originates from a single strand of DNA. After sequencing the PCR products, the generated reads divide into tag families based on the genomic position, duplex tags and the neighboring sequencing adapter. Please note that sequence tag α is the reverse complement of sequence tag β and vice versa.