Novel AI-based software allows quick, reliable imaging of proteins in cells

Novel AI-based software allows quick, reliable imaging of proteins in cells
Image source: Google

Washington, US: "TomoTwin paves the way for automated identification and localization of proteins directly in their cellular environment, expanding the potential of cryo-ET," adds co-first author Gavin Rice. Cryo-ET offers the ability to reveal the basis of life and the origins of diseases by deciphering how biomolecules work within a cell.

In a cryo-ET experiment, scientists utilise a transmission electron microscope to create 3D images of the cellular volume containing complicated biomolecules, known as tomograms. To obtain a more detailed image of each protein, they average as many copies as feasible, similar to how photographers capture the same image at multiple exposures in order to integrate them in a precisely exposed image eventually. Before averaging the proteins, it is critical to correctly identify and locate them in the image. "Scientists can obtain hundreds of tomograms daily," Rice says, "but we lacked tools to fully identify the molecules within them."

Hand-picking

So far, researchers used algorithms based on templates of already known molecular structures to search for matches in the tomograms, but these tend to be error-prone. Identifying molecules by hand is another option which ensures high-quality picking but takes days to weeks per dataset.

Another possibility would be to use a form of supervised machine learning. These tools can be very accurate but currently lack usability, as they require manually labelling thousands of examples to train the software for each new protein, an almost impossible task for small biological molecules in a crowded cellular environment.

TomoTwin

The newly developed software TomoTwin overcomes many of these obstacles: It learns to pick the molecules that are similar in shape within a tomogram and maps them to a geometric space - the system is rewarded for placing similar proteins near each other and penalised otherwise. In the new map, researchers can isolate and accurately identify the different proteins and use this to locate them inside the cell. "One advantage of TomoTwin is that we provide a pre-trained picking model," says Rice. By removing the training step, the software can even run on local computers - where processing a tomogram usually requires 60-90 minutes, runtime on the MPI supercomputer Raven is reduced to 15 minutes per tomogram.

TomoTwin allows researchers to pick dozens of tomograms in the time it takes to manually pick a single one, therefore increasing the throughput of data and the averaging rate to obtain a better image. The software can currently locate globular proteins or protein complexes more significant than 150 kilodaltons in cells; in the future, the Raunser group aims to include membrane proteins, filamentous proteins, and proteins of smaller sizes.