2 January 2024

Optimising retrosynthetic synthesis for small molecule discovery


Molecules are the building blocks of discovery in chemistry, and retrosynthetic analysis often serves as the guide that takes chemists from questions to answers. It allows researchers to take a complex molecule and unravel it into simpler parts, providing a roadmap for chemists to craft new compounds. However, this essential journey has always had its challenges.


While undeniably powerful, traditional retrosynthesis has a reputation for being both time-consuming and labour-intensive. It involves hours of meticulous research and planning, the complexity of which leads to bottlenecks and the slowing down of innovation in the lab. So, is there a better way to optimise retrosynthesis that offers insights without compromising on speed?

Chemists typically rely on the presence of similar molecules in databases, so the first step is to look for similar molecules.

But what if the molecules are new? The main problem occurs when the molecules are new and there are no close analogues on the database system, taking as long as several days. Not only must the chemists come up with ideas on how to build the molecule, but, they need to check if each of those ideas is plausible based on literature examples.

Machine learning could be the answer. This blog explores the use and benefits of machine learning models such as SmartChemistry® as an aid to retrosynthesis, and how it can revolutionise the practice and accelerate small molecule discovery.

The current state of retrosynthesis

While this approach has served as the bedrock of organic chemistry for decades, it isn’t without its drawbacks.


The foremost challenge is the sheer complexity of many molecules. Their intricate molecular structures can resemble a labyrinth, and deciphering them manually is incredibly time-intensive. Chemists might spend hours, or even weeks, mapping out retrosynthetic routes for just one compound.

Knowledge dependency

Retrosynthesis relies heavily on the chemist’s knowledge and experience. The ability to identify the most efficient synthetic pathways is often acquired through years of practice, making it a skill that can be challenging to impart quickly to newcomers.

Limited exploration, knowledge and experience

Due to the time constraints and the vast number of possible synthetic pathways, traditional retrosynthesis can lead to chemists exploring only a fraction of the potential routes. This limited exploration can hinder innovation in molecule discovery. Traditional retrosynthesis relies on the existing routes of similar analogous, thereby the method is flawed if a chemist does not have the knowledge or experience.

As the landscape of chemistry research evolves, so do the expectations placed on modern laboratories. Researchers today are under pressure to deliver results faster and more efficiently. This imperative for efficient exploration, coupled with the increasing complexity of molecules being synthesised, has illuminated the need for a paradigm shift in retrosynthetic analysis.

The role of machine learning in retrosynthetic analysis

Machine learning is already transforming industries worldwide, and chemistry is no exception. At its core, machine learning involves the development of algorithms that enable computers to learn patterns and make predictions or decisions based on data. In chemistry, it represents a convergence of scientific expertise and computational innovation.

In chemistry, machine learning can extend beyond computational analytics to empower chemists directly in the laboratory. From drug discovery to materials science, machine learning can streamline processes and offer unprecedented insights that create a catalyst for innovation.

Optimising retrosynthesis with machine learning

One of the most promising applications of machine learning in chemistry is its role in retrosynthesis. Machine learning algorithms can be harnessed to enhance and optimise retrosynthetic analysis in a variety of ways.

Reaction prediction

Machine learning models, often trained on vast datasets of chemical reactions, can predict the outcomes of reactions with remarkable accuracy. This capability enables chemists to anticipate which reactions are likely to succeed and which may not, saving valuable time and resources.

Route planning

Planning the synthesis of complex molecules involves considering multiple pathways and deciding on the most efficient route. Machine learning algorithms can rapidly explore a multitude of possibilities, evaluating factors like reaction efficiency and cost, to suggest optimal synthetic routes.

Reaction condition optimisation

Machine learning can assist in fine-tuning reaction conditions within experiments. By analysing experimental data and reaction outcomes, algorithms can recommend precise conditions such as temperature, pressure and catalysts to maximise yield and minimise byproducts. They can even filter out unrealistic recommendations - for example, if the ‘optimal’ temperature were so high that it would be unachievable, the algorithm could recognise the limitation and remove it from its recommendations.

Retrosynthesis acceleration

Machine learning can expedite the retrosynthetic analysis process by rapidly generating potential synthetic routes, allowing chemists to focus on refining and implementing these routes in the lab.

Collectively, these applications promise to revolutionise the way chemists approach retrosynthesis. By integrating machine learning tools into their workflows, researchers can reduce the trial-and-error aspect of synthetic planning, drastically reduce time-consuming tasks, and make more informed decisions about the reactions they are pursuing.

Benefits of machine learning in retrosynthesis

The above explains how machine learning can be used to optimise retrosynthetic practices, but how can this actually benefit chemists in the lab? Incorporating machine learning into retrosynthetic workflows offers a multitude of advantages, reshaping the landscape of chemical research and development.

Speed and efficiency

Machine learning accelerates the traditionally slow process of retrosynthesis by swiftly generating potential synthetic routes based on extensive datasets and learned patterns. Because of this, what once took days or weeks can now be achieved in a fraction of the time, enabling labs to increase their output.

Resource optimisation

The wastage of reagents and other materials can be a significant drawback in chemical synthesis. This is largely due to the trial-and-error nature of manual retrosynthesis but can be overcome by implementing machine learning algorithms. This way, chemists know to focus on the most promising pathways, thereby reducing resource consumption. This not only saves money but also aligns with sustainability practices and helps reduce the lab’s environmental footprint. Considering that even for making one gram of a molecule, a chemist uses at least one litre of chemicals, this includes starting materials, reagents and solvents, avoiding wastage by minimising the risk of failure feels like a move in the right direction.

Scope of discovery

The ability of machine learning to suggest unconventional synthetic routes broadens the possibilities of molecule discovery. Chemists can explore pathways they might not have considered, which could potentially lead to the discovery of entirely new compounds with diverse applications.

Enhanced precision

With no room for human error, machine learning models trained upon vast datasets can provide incredibly precise recommendations. This means that fewer experimental iterations are required to refine the reactions, resulting in higher yields and purities and fewer costly failed experiments.

Data-driven insights

Rather than relying on the chemist’s own knowledge and experience, machine learning models operate solely on the information in front of them. This provides a completely rational basis for selecting synthetic routes and optimising reaction conditions, reducing the element of guesswork or the possibility of unconscious bias.


Whether designing syntheses for a single molecule or an entire library of compounds, machine learning models are able to adapt to the size of the project without compromising. Where a chemist may struggle with time, labour or other constraints imposed by the scope of the job, machine learning algorithms can process both large and small projects with ease.

Navigating the future of machine learning in chemistry

As machine learning continues to make significant improvements in retrosynthetic analysis, it’s imperative to anticipate future developments for the technology and how this might impact chemistry.

We expect to see the quality of data grow in chemistry and, with it, the quality of machine learning algorithms that are trained on this data. This in turn will help to ensure that machine learning models can better provide reasoning for their recommendations, which is crucial for the reproducibility of experiments and ensuring continued learning for the chemists themselves.

While there are still early-stage concerns around machine learning, such as ethical questions surrounding data and privacy, the uptake in machine learning for chemistry is expected to continue. In light of this, governance and regulatory guidelines are likely to come to fruition as the technology becomes more commonplace, and this will boost the presence of machine learning models in the lab even further as chemists become more comfortable.

The use of machine learning in chemistry, and particularly for optimising retrosynthesis, will continue to grow as more organisations see the impressive value and potential it can bring to the lab. For more information about SmartChemistry®, our machine learning experimental insights platform, get in touch with our experts.