SMAD (Statistical Model-based Algorithm Design)

Tetsuo Shibuya
Human Genome Center, Institute of Medical Science, University of Tokyo

SMAD (Statistical Model-based Algorithm Design) is a strategy that we propose to use for designing faster algorithms against inputs that follow some statistical model.

For example, we designed very fast expected linear time searching algorithms [1][2] for the fundamental problem of searching over protein 3-D structure databases, considering that the 3-D structures in the databases follow the famous statistical molecular model called the `freely-jointed chain model'. The algorithm in [1] runs in O(Nm) time for arbirary inputs in the worst case (where N is the database size and m is the query size), but it runs in O(N) time in average against the databases that statistically follows the model. (According to the experiments on the PDB database, the algorithm *actually* runs in linear time and is practically much faster than previous algorithms.)

Reference

[1] Tetsuo Shibuya, Searching Protein 3-D Structures in Linear Time, Proc. 13th Annual International Conference on Research in Computational Molecular Biology (RECOMB 2009), 2009 LNBI 5541, pp 1-15. (Best Paper Award) (PDF file)
[2] Tetsuo Shibuya, Jesper Jansson and Kunihiko Sadakane, Linear-Time Protein 3-D Structure Searching with Insertions and Deletions, 9th Workshop on Algorithms in Bioinformatics (WABI 2009), LNCS, vol. 5724, pp. 310-320, 2009.
[3] Tetsuo Shibuya, Jesper Jansson, and Kunihiko Sadakane, Linear-Time Protein 3-D Structure Searching with Insertions and Deletions, BMC Algorithms for Molecular Biology, to appear.
[4] Tetsuo Shibuya, Searching Protein 3-D Structures in Linear Time, Journal of Computational Biology, to appear.
[5] Tetsuo Shibuya, Searching Protein 3-D Structures in Faster Than Linear Time, Journal of Computational Biology, to appear.