One particularly difficult aspect of developing MosAIc was creating an algorithm that could find not only similarities in color or style, but also in meaning and theme, Hamilton said. The researchers examined a deep network of “activations,” or features, for each image in the open-access collections of both museums. The distance between the deep network “activations” was how the researchers judged the similarity.
The researchers also used a new image search data structure called “KNN Tree,” which groups images into a tree-like structure. To find the closest match for an image, the algorithm starts at the “trunk” of the cluster, then follows the most promising “branch” until the closest image is found. The data structure itself improves by allowing the tree to be “pruned” based on the characteristics of the image.
Hamilton said he hopes the work started at MosAIc can be expanded to other fields, such as the humanities, social sciences and medicine. “These fields are rich in information that has never been processed with these techniques and can be a source of great inspiration for both computer scientists and domain experts,” he said.