training data provenance
Query use case
Do we trust the providers/origin of all training data used - using whitelist
Schemas used
Pseudo code
FUNCTION ai_system_providers_trusted_with_whitelist(AI_System_ID, Whitelist_Emails)
// Step 1: Retrieve provider UUIDs associated with the AI system
SET Provider_UUIDs = get list of providers contributing data to AI_System_ID
// Step 2: Retrieve provider email addresses
SET Provider_Emails = map provider UUIDs to their identity email addresses
// Step 3: Check if all provider emails are in the whitelist
IF Provider_Emails is a subset of Whitelist_Emails THEN
RETURN True
ELSE
RETURN False
END FUNCTION
Explanation
-
Find relevant data sources:
- Retrieve the configuration verification credential (
ConfigVcId
) for the AI system. - Extract the weights verification credential (
WeightsVcId
) used in training. - Ensure that the
WeightsVcId
is classified as"Weights"
. - Trace back to the training system that produced these weights.
- Identify the datapack used in the training process.
- Retrieve the configuration verification credential (
-
Extract the list of Data Verification Credentials (
DataVcIds
) used in training from the datapack. -
Determine the providers who contributed this data:
- For each
DataVcId
, check its attestations and extract provider UUIDs where the attestation type is"provided"
.
- For each
-
Map provider UUIDs to their email identities.
-
Check if all provider emails exist in the whitelist and return
True
only if every provider is trusted.
Query
ai_system_providers_trusted_with_whitelist(AiSystemId, Whitelist)
link to query- link to simulator