Posts

Databricks Q&A

Image
On May 5 th , I had the pleasure of presenting “Data Cleansing using Databricks ( https://www.meetup.com/ohio-north-database-training/events/314363881/ ).  During the meeting, many good questions were raised.  Listed below are the answers to these questions:   What are jobs in databricks? Jobs are workloads that can be scheduled, managed, and automated without manual intervention. Workloads can be notebooks, SQL queries, or pipelines on a cluster.   How do jobs compare to other automation tools?   What is a “Delta Live Table”? Delta Live Tables (DLT) is a Databricks feature that makes it much easier to build and run data pipelines, either batch or streaming. Simply write the transformations in SQL or Python, and DLT takes care of setting up the infrastructure, tracking dependencies, handling errors, and enforcing data quality rules through its built ‑ in expectations. ...

6 Ways to Prevent Sensitive Data Leaks in AI/ML Applications

Image
1. Utilize Dynamic Data Masking This is a built-in feature in SQL Server 2016 and later. Learn more here .   2. Use Proper Prompt Engineering Provide great detail to ensure the LLM stays on track and adheres to the instructions. Include specifics like the length of the response, what to include and not include in the response, etc.   3. Utilize Content Safety This is a feature available in Microsoft Foundry for many models   4. Use Identity features in Azure OpenAI Azure OpenAI can prevent data leaks by replacing insecure static keys with dynamic, role-based authentication. In addition, leveraging Microsoft Entra ID and Managed Identities, organizations can enforce strict "zero-trust" access controls that ensure only authorized users or applications can interact with sensitive AI resources.  Learn more here . 5. Replace sensitive data columns with foreign key ...

Databricks Q&A

Image
On May 5 th , I had the pleasure of presenting “Data Cleansing using Databricks ( https://www.meetup.com/ohio-north-database-training/events/314363881/ ).  During the meeting, many good questions were raised.  Listed below are the answers to these questions:   What are jobs in databricks? Jobs are workloads that can be scheduled, managed, and automated without manual intervention. Workloads can be notebooks, SQL queries, or pipelines on a cluster.   How do jobs compare to other automation tools? Feature Jobs Workflows Delta Live Tables Purpose Automate & schedule tasks Orchestrate multi-step pipelines Declarative ETL pipelines Best for ETL, ML training, batch jobs Complex DAGs Data quality + streaming/batch ETL Com...

5 Advantages of Granite 4.1 LLMs

Granite 4.1 is IBM’s new family of dense decoder ‑ only LLMs (3B, 8B, 30B) trained on ~15 trillion tokens with a five ‑ phase pre ‑ training pipeline, followed by 4.1M curated SFT (Supervised Fine Tuning). The family of models is released under Apache 2.0.   Granite   4.1 models consistently match or outperform larger competitors, with lower hardware requirements: 30B model outperforms Google’s Gemma ‑ 4 ‑ 31B ‑ it 8B model beats Gemma ‑ 4 ‑ 26B ‑ A4B ‑ it Dense architecture ensures predictable latency and stable token usage   Enterprise ‑ Grade Predictable Inference Granite   4.1 is designed for real ‑ world business workloads where speed, cost, and determinism matter. Strong instruction ‑ following and tool ‑ calling without long chains of thought. Dense models avoid the variability of MoE (Mixture-of-Experts) routing FP8 quantization options reduce memory footprint while preserving accuracy.   High ‑...

AIProjectClient vs. Azure Open AI client

Image
Overview AIProjectClient is the unified Azure AI Foundry project SDK. It's for full Azure AI Foundry project management + agents + datasets + indexes + evaluations + OpenAI client generation. Azure OpenAI Client for Direct model inference (chat, embeddings, images) using OpenAI ‑ compatible endpoints.   Capabilities AIProjectClient provides control in the following areas: 1. Full lifecycle Agents: Create, update, delete, and run Azure AI Agents.  Azure OpenAI client cannot do this.  2. Datasets & File Management: Upload documents, create datasets, and use them in agents or evaluations.  3. Search Indexes: Create and manage RAG indexes inside your project.  4. Evaluations: Rules, taxonomies, evaluators, and insights for model quality.  5. Unified OpenAI Client: AIProjectClient can generate a fully configured OpenAI client (`get_openai_client()`), so no need to manage separate credentials....

MS Foundry Developer Migration Checklist

Image
Microsoft Foundry recently implemented a variety of changes across various areas. To assist with these changes, a Developer Migration Checklist is provided below to help maneuver these changes.   1. Migrate to the Unified SDK 2.0 [] Replace all uses of the old azure-ai-agents package with azure-ai-projects 2.0. [] Update code to use AIProjectClient for model inference, agents, evaluations, memory, and tracing. [] Remove legacy preview flags and update any custom tool or MCP integrations. [] Validate that all tool schemas and agent configurations work under the new client.   2. Move Agents to the Foundry Agent Service (GA) [] Migrate existing agent deployments to the new Agent Service runtime. [] Update agent code to use the OpenAI Responses compatible interface. [] Reconfigure private networking, Entra RBAC, and tracing endpoints. [] Test agent behavior in the updated Agent Playground and tracing...

What is Query Boosting, Weighting, and Thresholding?

Image
Query Boosting means increasing the importance of certain terms or fields in a search query so they influence the ranking more strongly. Sometimes not all parts of a query are equally important. For example: - In a product search, matching the title might matter more than matching the description. - In a document search, matching a keyword might matter more than matching the body text.   For example, if you search for: title:"machine learning"^3 description:"machine learning" The "^3" means “boost the title match 3× more than the description match.”     Weighting is the general idea of assigning different levels of importance to features, fields, or signals during ranking or scoring. Boosting is a type of weighting, but weighting can apply to: - Query terms  - Document fields  - Machine ‑ learning features  - User behavior signals (clicks, recency, popul...