Machine Learning Insights

A Machine Learning (ML) insight trains a custom model on your ground-truth mineral samples and applies it to satellite imagery, turning a small set of sampled points into a wall-to-wall map of predicted mineral concentration or anomaly class across your area of interest. Unlike spectral indices such as NDVI or NDWI, an ML insight learns the relationship between your specific mineral assays and the spectral response in your specific geological setting. It only works if you have a representative set of labeled samples.

When to Use an ML Insight #

Use an ML insight when all of the following are true:

You have ground-truth samples with measured mineral concentrations (Au_PPB, Cu_PPM, Fe_PCT, etc.) tied to GPS coordinates.
The samples cover a representative range of values (not all zeros, not all the same).
The samples are distributed across the AOI (not clustered in one corner).
You want to extrapolate from those points to every pixel in the AOI.

If you have no ground-truth data, use a spectral index or PCA instead. ML cannot learn without labels.

Regression vs Classification: Which Mode? #

The platform offers two output modes. Pick by what kind of answer you want.

You want to know…	Pick
The predicted concentration at each pixel (e.g. “this pixel is ~120 ppb Au”)	Regression (recommended)
Whether a pixel is anomalous or not (e.g. “low / medium / high”)	Classification
The model output should match how a geologist thinks (continuous gradient)	Regression
You explicitly want N discrete buckets for a prospectivity map	Classification

Default: Regression. For continuous mineral concentration targets (Au_PPB, Cu_PPM, etc.), regression measures variance explained directly and gives you predictions in real units (ppb, ppm). Classification has to bin the target into arbitrary buckets, which loses information when the underlying signal is smooth.

In internal smoke tests on a Spanish gold-prospect AOI, regression on Sentinel-2 reached R² = 0.645 (i.e. the model explains ~65% of the variance in Au_PPB); the classification path on the same data peaked around 41% accuracy with auto-picked bins.

Step-by-Step Guide #

1. Open the Classifier Panel #

In the workspace, open Insights → New Insight → Machine Learning.

2. Pick Your AOI, Mineral, and Date #

AOI: any camp or drill-area you have set up.
Mineral: the column from your ground-truth samples to learn (Au_PPB, Cu_PPM, etc.). The samples must already be uploaded under that mineral name.
Date: the satellite imagery acquisition date. The platform composites a ±30 / 15-day window around this date, so pick a date close to when your samples were collected (or when the surface state was representative).

3. Choose the Instrument (Recommended Public Satellites) #

Instrument	Resolution	Bands	Best for
Sentinel-2 (S2)	10 m	12	Most regression cases — best balance of resolution and spectral information. Recommended starting point.
EMIT	60 m	286 (hyperspectral)	Classification on heterogeneous mineralogy. Underperforms S2 on continuous regression because of resolution.
EnMAP	30 m	224 (hyperspectral)	Specialised mineral mapping when you need fine spectral discrimination AND can tolerate the coarser pixel size.

Rule of thumb: start with Sentinel-2 unless you have a specific reason to use a hyperspectral sensor. The 10 m pixel size matters more than the extra bands for most exploration targets.

4. Algorithm #

Option	What it does
Auto (recommended)	Trains Random Forest and CART in parallel and shows both results on the run detail panel. The card with the higher validation score gets a gold border. Picking the winner is up to you.
Random Forest	Best single algorithm for regression in internal smoke tests.
CART	A single decision tree. Slightly better than RF on the classification path in some cases; tends to overfit on regression.

Cost of auto: ~2× wall time, since both algorithms train simultaneously. Worth it for the comparison — no downside other than waiting a couple more minutes.

5. Imagery Masks #

By default the platform enables two masks per instrument:

Vegetation
Clouds

You can deselect them, but on clear AOIs they are effectively a no-op, and on cloudy dates they can save the run from contaminating the training samples with clouds or canopy. Leave them on unless you have a specific reason.

6. Advanced Options (optional) #

Maximum Number of Classes (classification only). Only visible when you pick Classification mode. The system auto-picks the actual number of bins from the data, capped at this value. Default 5 is fine.

Validation Strategy (spatial cross-validation). Hidden inside the Advanced section, default OFF. Splits training samples into spatial blocks for validation, instead of the default random 80/20 holdout.

Leave OFF unless you know your samples are uniformly distributed across the AOI. On AOIs where samples cluster around drill paths (most exploration sites), spatial CV makes every fold worse — in internal smoke tests, enabling it on a clustered AOI dropped validation R² from 0.645 to roughly 0.

7. Submit #

Click Train. The insight appears immediately with asset_status: exporting and a progress ring. Training typically takes 2–4 minutes for a single-algorithm run, 4–6 minutes for auto.

You can navigate away — the progress ring updates in real time via polling. You will receive a notification when training reaches done or failed.

Reading the Results #

Click on the trained insight to open the Run Detail panel.

For Regression Runs #

You will see a metrics block with:

R² — fraction of variance explained on the validation set. 0 means the model is no better than always predicting the mean; 1 means perfect fit. 0.5+ is operationally useful for exploration; 0.65+ is strong. Negative means the model is worse than guessing — do not deploy.
RMSE (Root Mean Squared Error) in the target’s units (e.g. ppb). The typical prediction error, weighted toward large errors.
MAE (Mean Absolute Error) in the target’s units. The typical prediction error, treating all errors equally. Usually a more honest “average error” number than RMSE.
95% PI half-width (Prediction Interval, conformal) in the target’s units. Read this as: “in 95% of cases, the true value falls within ±this distance of the predicted value, assuming the pixel is similar to the training data.” This is your honest uncertainty number.

Example for an Au_PPB regression: R² = 0.645, RMSE = 453 ppb, MAE = 76 ppb, PI ±174 ppb. The model is right within ±76 ppb on average, and 95% of predictions are within ±174 ppb.

If you ran auto mode, you will see two cards side by side (Random Forest + CART). The winner (higher validation R²) gets a gold border. If R² values are close, prefer the model with the tighter PI — it is more certain about each pixel.

For Classification Runs #

You will see:

Validation accuracy — fraction of validation samples whose predicted class matches the true class. For 5 classes, random would be 20%; aim for 50%+.
Confusion matrix — rows are true classes, columns are predicted classes. Diagonal cells are correct predictions; off-diagonal cells show which classes the model confuses.
Per-class precision / recall / F1 — useful for spotting that the model is good on the bulk class but failing on the rare anomaly class.
Class distribution — how many samples fell into each auto-picked bin.

The auto-pick selects the smallest number of classes (between 2 and your max) that still explains the data’s variance.

Viewing the Layer on the Map #

Once training completes and the layer is exported, the insight becomes visible as a map layer in your workspace (a few minutes after the training metrics appear).

Regression: a smooth gradient over the operational range. The gradient is clipped to the 95th percentile of non-zero training values (e.g. 0–200 ppb for an AOI whose extreme outlier was 21800 ppb). This is intentional, so anomalies saturate at the top color instead of crushing the rest of the map into a single shade.
Classification: discrete colored zones, one color per class.

Below the legend you will see:

The numeric range (e.g. “0 — 201 ppb”) read from the layer’s stretch.
The prediction interval if regression (e.g. “±174 ppb (95% PI)”).

The pixel inspector (click any pixel) shows the predicted value in real units.

Tips and Troubleshooting #

My validation score is very low (R² ~ 0 or accuracy ~ random).

Most common causes:

Not enough mineral samples (< 50 non-zero values).
All samples in the same corner of the AOI. Check the validation-strategy section above — leave spatial CV off.
Wrong instrument (try Sentinel-2 if you started with EMIT).
The mineral signal is genuinely not detectable from satellite at this AOI/date. Try a different date or accept that this target does not work.

Training takes too long.

Default budget is 20 hyperparameter trials. Auto mode runs two algorithms, so ~40 trials total. If the AOI has thousands of samples and 50 spectral bands, you may see 5+ minutes — that is normal. The progress ring updates as phases complete.

The map is too saturated or too flat.

The gradient is auto-clipped to the operational range, so extreme anomalies look the same (top color). This is the correct behavior for anomaly detection. To see absolute concentration, use the pixel inspector.

EMIT gives a worse regression score than Sentinel-2.

Counter-intuitive but typical: EMIT has 50 spectral bands but only 60 m resolution. For continuous targets, the resolution loss dominates the spectral gain. Stick with S2 for regression on most AOIs.

Auto mode says RF wins but CART has a tighter PI.

Pick by what matters to you. Higher R² = better average fit. Tighter PI = more certain about each individual prediction. For drill-target prioritisation, the tighter PI usually wins.

Getting Started

Insights

Y-Cloud

Alert System

Spectral Signatures

Filters

Settings & Tools