
A leading NSF-funded marine science center needed to connect coarse-grained ocean biogeochemical models to fine-grained microbial metabolic models, a prediction problem with no established machine learning solution. Available data was fragmented across six incompatible sources spanning 40 years of oceanographic time series, field metabolomics, omics data, and laboratory experiments.
- Developed custom data integration pipelines unifying heterogeneous oceanographic sources, and engineered depth-binned, cast-aggregated feature vectors suitable for ML training
- Proposed model architecture for multi-output regression from DOC composition to individual metabolite concentrations
- Proposed model architecture for cross-modal prediction linking phytoplankton community composition to bacterial transporter expression profiles
- Delivered a fully reproducible data pipeline and ML architecture roadmap, directly informing the client's NSF presentation and next data collection campaign
- Identified binding data constraints that shaped prioritization of subsequent modeling phases
By resolving fragmented legacy data and architecting two novel ML prediction pipelines, ISC gave the client a credible technical foundation to secure continued NSF funding and focus their next field campaign where it matters most.
Work with Insight Softmax
If you have a problem that can be solved with data, we can help. Our problem-solving approach works across company sizes and industries. Contact us to set up a free consultation.