Logo - Keyrus
  • Playbook
  • Services
    Data advisory & consulting
    Data & analytics solutions
    Artificial Intelligence (AI)
    Enterprise Performance Management (EPM)
    Digital & multi-experience
  • Insights
  • Partners
  • Careers
  • About us
    What sets us apart
    Company purpose
    Innovation & Technologies
    Committed Keyrus
    Regulatory compliance
    Investors
    Management team
    Brands
    Locations
  • Contact UsJoin us

Expert opinion

Data Obfuscation Framework in Microsoft Fabric Lakehouse (Encryption & Masking)

Abdourahmane Bah | Keyrus NORAM

Handling sensitive data in your Lakehouse? Here’s how we dynamically secure PII at the Bronze layer using encryption, partial masking, and full masking.

When ingesting raw data into Microsoft Fabric Lakehouse, it’s critical to protect sensitive information such as names, emails, and healthcare identifiers before it flows into downstream layers.

Implementing dynamic data obfuscation is crucial for protecting personally Identifiable Information (PII) and ensuring compliance with regulatory requirements.

However, applying obfuscation within the Fabric Lakehouse comes with certain challenges due to current platform limitations. At this time, Fabric Lakehouse does not support native row-level or column-level security, meaning users can still access full datasets through Fabric notebooks or the Lakehouse browser.

In contrast, platforms like Databricks offer built-in functions such as is_member to enforce dynamic, user-based access controls, capabilities that are not yet available in Fabric Lakehouse. This makes implementing fine-grained security controls more complex in the Fabric environment.

Architecture Overview

To enhance the Medallion Architecture in Microsoft Fabric, we introduced a Landing Zone for secure obfuscation before data reaches the Bronze layer.

  1. Raw data is first ingested into the Landing Zone (restricted workspace).

  2. Obfuscation logic is applied using notebooks, including encryption, partial, and full masking, and other data security actions.

  3. Once secured, the data is exposed to Bronze Lakehouses via shortcuts. This ensures sensitive data never leaves the secured zone unprotected.

Each environment (DEV, QA, PROD) has its own Landing Zone and Bronze Lakehouse.— Raw data enters temporary schemas (e.g., temp) in the Landing Zone.

  • Data is retained in temporary schemas until obfuscation is successfully applied. In case of any issues during the process, the temporary tables should be dropped to prevent non-obfuscated data from remaining in the landing zone.

  • Obfuscated data is moved to clean schemas and then shortcut to the Bronze layer.

  • Temporary tables are deleted post-obfuscation. This approach ensures consistent, secure data handling across all environments.

  • Access to the Landing Zone is restricted and should be granted only to users who are authorized to view unmasked data

Data Obfuscation Methods:

PySpark Code Example

Data Encryption functions

Install the ‘cryptography’ package in your Fabric workspace: https://cryptography.io/en/latest/

from cryptography.fernet import Fernet
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

#Set Up Encryption Key (should be securely stored)

#Replace 'mykey' with a real Fernet key (e.g., from a Key Vault or secure store)
encryption_key = 'mykey'

# Broadcast the key across Spark workers
broadcast_key = spark.sparkContext.broadcast(encryption_key)

#Encryption Function

def encrypt(data: str) -> str:
    """
    Encrypts a given string using the Fernet key.
    Returns the encrypted Base64-encoded string.
    """
    if data is None:
        return None
    data = str(data)
    cipher = Fernet(broadcast_key.value)
    return cipher.encrypt(data.encode()).decode()

#Decryption Function
def decrypt(encrypted_data: str) -> str:
    """
    Decrypts a previously encrypted Base64-encoded string using the Fernet key.
    """
    if encrypted_data is None:
        return None
    cipher = Fernet(broadcast_key.value)
    return cipher.decrypt(encrypted_data.encode()).decode()

#Key Generator (Optional - one-time use)

def get_new_key() -> bytes:
    """
    Generates a new Fernet key (to be stored securely).
    """
    return Fernet.generate_key()


#Register UDFs for Spark DataFrame and SQL Use

encrypt_udf = udf(encrypt, StringType())
decrypt_udf = udf(decrypt, StringType())

spark.udf.register("encrypt", encrypt, StringType())
spark.udf.register("decrypt", decrypt, StringType())

Partial Masking Function

def partial_mask(column_value, mask):
    """
    Obfuscates a string by replacing the middle characters with a given mask.

    :param column_value: The original string.
    :param mask: The replacement characters (e.g., "*****").
    :return: The obfuscated string with only the first and last characters visible.
    """
    if not column_value:
        return column_value

    column_value = str(column_value)
    if len(column_value) <= 2:
        return mask

    return f"{column_value[0]}{mask}{column_value[-1]}"


# Register as Spark UDF
partial_mask_udf = udf(lambda col: partial_mask(col, "*****"), StringType())

# Optional: Register in Spark SQL if needed
spark.udf.register("partial_mask", lambda col: partial_mask(col, "*****"), StringType()) 

Full Masking

Use REPLACE(column_name, ‘******’) in Spark SQL for irreversible full masking.

Example: Input vs. Output

Below is an example showing how obfuscation transforms real data.

Input Table:

Obfuscated Output Table:

Conclusion:

Dynamic data obfuscation is a critical step in securing sensitive information at the earliest stage of the data pipeline. By introducing a controlled obfuscation layer before exposing data to the Bronze Lakehouse in Microsoft Fabric, we not only reduce the risk of data leakage but also improve compliance with privacy regulations.

This framework is scalable, environment-agnostic, and ensures that only protected data flows downstream in your Lakehouse architecture.

Connect with Abdourahmane Bah on LinkedIn.

Learn more about Data Fabric

Logo - Keyrus
New York City

252 West 37th st., Suite 1400 New York, NY 10018

Phone:+1 646 664 4872