SAPIEN
SELF-AWARE PROCESSING INTERFACE & ENGINE

THE CHALLENGE

Enabling system designers to intelligently optimize their systems towards a specific goal.

PROJECT DETAILS

In meta compressor application that learns to choose the best compression algorithm and settings dynamically for each file type for the optimal compression ratio, SAPIEN yielded an average of ~150% improvement in compression ratio compared to a basket of compression programs using only 15 lines of Java code.

Development of SAPIEN has been funded by DARPA through Phase I and Phase II SBIRs.

OUR APPROACH

SAPIEN provides the tools to add self-awareness for optimization to systems that monitor their performance by building models of this performance, then suggesting new configurations likely to improve this performance, continually learning.  SAPIEN combines fast, industrial-strength machine learning and optimization implementations.

SAPIEN’s model building component is a proprietary general-purpose decision tree implementation that combines the Hoeffding Trees (Domingos and Hulten) and Random Forests (Breiman) algorithms with Multi-Task Learning (Caruana) and Automatic Model Calibration, yielding a unique and powerful combination of speed, scalability, accuracy, and robustness.  SAPIEN’s optimization component is a proprietary model-driven implementation of Extremal Optimization (Boettcher) that supports arbitrary user-defined utility functions as well as hard and soft constraints.

The algorithms used provide a number of qualities desirable to system designers:

  • Speed: All SAPIEN algorithms run in linear time and are written in highly optimized Java code for maximum performance
  • Scalability: data streams with millions of rows and tens of thousands of columns are handled with ease without needing to fit in main memory
  • Multi-task Learning and Optimization: SAPIEN can model and optimize multiple aspects of system performance simultaneously, even when those aspects are non-linearly correlated.
  • Anytime Learning and Optimization: The model building and optimization algorithms can be stopped at anytime to produce best-so-far results
  • Heterogeneous data support: floating point, integer, and string data can intermix freely with no normalization or preprocessing required.