Disputation

Alexander Gower,

Knowledge Models and Inference Frameworks for Scientific Discovery

Översikt

Scientific discovery is an active process of designing, testing, and improving theories about the natural world. Automating this process is a grand challenge for 21st century science. This thesis examines scientific inquiry as it relates to machine learning, offering contributions to knowledge representations and reasoning frameworks, demonstrated in systems biology.

Systems biology is an integrationist approach to biological science, meaning organisms are treated as complex systems whose behaviour is dictated by the interaction of their constituent parts. Eukaryotic organisms are extremely complex, and research progress in systems biology can be slow. Recent advances in robotics and artificial intelligence (AI) offer great opportunity for automating scientific discovery in this field. Using the model organism Saccharomyces cerevisiae (baker’s yeast), this thesis explores: the philosophical motivations for automation in biological research; knowledge models and hypotheses in systems biology; and computational models of metabolism.

The first main contribution is a first-order logic framework for modelling cellular physiology, which enables abduction of hypotheses for improvement of knowledge models, using the automated theorem prover (ATP) iProver. The second contribution is an ontology for describing theory changes and hypotheses in a semantic and storage-efficient manner. The third main contribution is an application of graph neural networks (GNNs) to learn knowledge graph embeddings grounded in empirical data and ontology structures. The final contribution is an end-to-end demonstration of autonomous hypothesis generation and experimentation, with hypotheses modelled using ontology terms to support large language model (LLM) agents and human scientists.

These contributions demonstrate the power of knowledge graphs for autonomous scientific discovery. This thesis also argues that scientific discovery is better modelled as supervised learning—specifically active learning for AI scientists—than reinforcement learning; mapping concepts from machine learning algorithms to the domain produces systems that align with established scientific values, leading to improved theories.
Alexander Gower | Chalmers