Optimization in Sanger Sequencing
Luisa Carpente, Ana Cerdeira-Pena, Silvia Lorenzo-Freire, Ángeles S. Places
TL;DR
The paper addresses organizing DNA samples on $96$-well PCR plates for Sanger sequencing under thermocycler constraints to minimize plate usage while maximizing occupancy. It introduces an ILP model for exact solutions on small problems and a simulated annealing–based heuristic to handle large-scale instances, validated on real laboratory data. The heuristic achieves superior plate utilization and occupancy distribution versus commercial LabWare software and is implemented in the lab as SimPCR. Overall, it demonstrates a challenging bin-packing–like scheduling problem with practical impact on sequencing time and cost.
Abstract
The main objective of this paper is to solve the optimization problem that is associated with the classification of DNA samples in PCR plates for Sanger sequencing. To achieve this goal, we design an integer linear programming model. Given that the real instances involve the classification of thousands of samples and the linear model can only be solved for small instances, the paper includes a heuristic to cope with bigger problems. The heuristic algorithm is based on the simulated annealing technique. This algorithm obtains satisfactory solutions to the problem in a short amount of time. It has been tested with real data and yields improved results compared to some commercial software typically used in (clinical) laboratories. Moreover, the algorithm has already been implemented in the laboratory and is being successfully used.
