This page displays the full internal data-ingestion-policy.md for transparency. The same rules apply network-wide.
# Mulembe Nation – Data Ingestion Policy **Version:** 1.0 | **Effective:** 2026-05-13 | **Owner:** Kevin Usagi Ememwa This document governs what data may be ingested into the Mulembe Nation context graph (wp_mn_* tables). All ingestion scripts and editorial workflows must comply. ## 1. Legitimate Sources (ingest without additional consent) - Public records: gazette notices, court filings, parliamentary hansard, audit reports, budget documents. - Published journalism: articles, fact-checks, documentaries (with attribution). - Academic papers and openly licensed datasets: Wikidata, Afrobarometer, V-Dem, KNBS, World Bank. - Public promotional materials: press releases, official social media of public figures, event flyers shared in public spaces. - Community content that is explicitly **public**: public Facebook/Instagram pages, public YouTube videos, open-access community newsletters. ## 2. Requires-Consent Sources - Any content from **private groups** (WhatsApp groups requiring invitation, Telegram private channels, private Facebook groups). - Personal communications (direct messages, emails, SMS). - Unreleased community data not yet made public. - Material where the original poster has a reasonable expectation of privacy. For these sources, explicit, documented consent from the original contributor is required before any part of the content can be ingested into the main graph. Exception: the wp_mn_observation_log may store raw content for internal editorial review without consent, but promotion to the graph requires consent. ## 3. Special Category Data (Kenya DPA Section 44) The following require explicit consent or a journalism/public interest exemption with documented editorial justification: - Religious beliefs or affiliation (except where publicly self-identified as part of an official role). - Health data (including medical conditions). - Ethnic or tribal origins (except as publicly self-identified). - Political opinions (except as expressed in public forums or official candidate profiles). Processing these categories for the graph must be logged in wp_mn_observation_log with consent_status and, if using journalism exemption, a note in editorial_notes. ## 4. Anonymisation & Aggregation - Before promoting a community observation to the graph, remove any personally identifiable information (PII) unless the person is a public figure acting in a public capacity. - Aggregate location data to at least county level unless the source is a public record. - For sensitive events, delay promotion for 30 days unless the event is already widely reported. ## 5. Unsolicited Information (WhatsApp, social media, etc.) - **Public groups/channels:** treat content as public material. Still, avoid storing private individuals' names unless they are public figures or the information is already published elsewhere. - **Private groups:** do not ingest raw content directly. Instead, use the wp_mn_observation_log to store anonymised summaries or thematic tags. Promotion to the graph requires explicit consent from the original author or clear editorial justification under the journalism exemption. - **Right to be forgotten:** Any individual may request removal of their personal data from the graph (excluding public records). A simple email to privacy@mulembenation.co.ke initiates the process. We will respond within 30 days. ## 6. Retention Limits - Observations in wp_mn_observation_log must be either promoted to the graph or discarded within **90 days**. - Deleted observations are permanently removed after 180 days. ## 7. Compliance & Audit - All ingestion scripts write to wp_mn_verification_queue first – never directly to main tables. - The data_source column on each entity/relationship/metric must contain a verifiable citation. - Quarterly audits will be run to ensure no private sources have leaked. *This policy is version-controlled in the mulembe-core Git repository. Changes require editorial approval and a version bump.*