Semantic Kernel: Fundamentals

16 min readJan 12, 2024

What is Semantic Kernel?

Semantic Kernel (SK) is an open-source software development kit from Microsoft that allows developers to combine conventional programming with AI. AI has the intelligence but alone, cannot do hard work that programming languages can traditionally do. Traditional programming can do the work but lacks thinking. Combined, they achieve things that were not possible before. In this context, semantic kernel is simply an interface between them, an operating system for AI powered applications.

In this story, our goal is ensuring that the reader has a fundamental understanding on creating an AI application utilizing semantic kernel. Taking the beautiful article from Dev·edium as reference, making it pythonista friendly, through the journey, we will explore key concepts like semantic/native functions and semantic memory usage and recalls.

Resulting application will be a retrieval augmented one (RAG). Our app must generate an article for a given topic using the contents of URL provided by the user as reference.

Since SK is very effective in creating RAG applications, let’s elaborate this topic a bit further.

RAG

Retrieval-Augmented Generation (RAG) is a method used in AI applications that combines the benefits of pre-trained language models with the benefits of information retrieval systems.

In a typical language model, the model generates responses based on patterns it learned during training. However, it doesn’t have the ability to pull in specific pieces of information from a large corpus of documents. RAG addresses this limitation by first retrieving relevant documents or “knowledge snippets” from a large corpus based on the input query, and then feeding these documents into the language model to inform the generation process. This allows the model to generate responses that are not only fluent and coherent, but also grounded in specific pieces of information from the corpus.

This approach is particularly useful in steering the model in the direction that user prefers in generation process.

To give an example, a traditional prompt may look like:

"Write an engaging and informative article about the impact of artificial intelligence on healthcare. The article should discuss the current applications of AI in healthcare, potential future developments, and the challenges and ethical considerations involved. Please ensure the article is written in a clear and accessible style, suitable for a general audience."

while the ragged one can look like:

"Given the following examples:

- Example 1: 'Artificial intelligence is revolutionizing diagnostics with algorithms that can detect diseases such as cancer more accurately than human doctors.'
- Example 2: 'AI-powered predictive analytics can help healthcare providers anticipate patient needs and intervene earlier to prevent health crises.'
- Example 3: 'Ethical considerations in AI for healthcare include issues of privacy, bias in algorithms, and the need for human oversight.'

Write an engaging and informative article about the impact of artificial intelligence on healthcare. The article should discuss the current applications of AI in healthcare, potential future developments, and the challenges and ethical considerations involved. Please ensure the article is written in a clear and accessible style, suitable for a general audience."

where Examples are automatically retrieved based on a similarity metric before sending the prompt to the model. This helps create better results. Notice that the RAG method is all about the prompting.

Fine-Tuning

One the other hand, we have fine-tuning, which is a rival technique to RAG. Fine-tuning is a process in machine learning where a pre-trained model (a model that has already been trained on a large dataset) is adapted to a new, similar task. Fine-tuning involves continuing the training process with the new dataset, allowing the model to adjust its weights, learned features in other words, to the new task.

The idea behind fine-tuning is that the model has already learned useful features from the original task that can be applied to the new task. By continuing the training process with the new data, these features can be adapted to make the model perform well on the new task.

It is important to remark that fine-tuning operation is very costly.

Check this article to get more insights on how a typical fine-tuning can be done.

RAG vs Fine-Tuning ?

RAG method is important to master because in the course of 2023, more and more evidences implied that the RAG method doesn’t have any problems surpassing fine-tuning in performance. Here is a comparison between a generalist GPT-4 vs Med-Palm2 which is intensively trained on medical data. With correct prompting, GPT-4 exhibits much better results in many aspects in comparison to Google’s specifically trained Med-Palm.

Figure 1: RAG is better than Fine-Tuning

Read the full article here or watch the video on the topic to explore further.

Now that we have a good understanding of RAG and that Semantic Kernel is very good at implementing such applications, let’s put our hands on it.

Plugins And Functions

Plugins and Functions are two key abstractions in SK. A prompt is treated as a Function that can be invoked using Semantic Kernel, while a Plugin, or as sometimes referred as a Skill, is a group of related Functions (prompts) that collectively accomplish a shared task.

Semantic Functions

Semantic Functions are invocable prompts. In Semantic Kernel powered applications, desired functionality can be modeled as a combination of functions from one or more plugins:

Input being the topic and output being the generated article, our desired functionality is article generation.

Let’s look inside functions block assuming that all those functions are semantic ones (prompts), and the way we execute them is cascade: feeding the result of a function as input to the next one.

Figure 2: Cascade calls

Our semantic functions (prompts) are Topic Type, Title, Subtitle, Table Of Contents, Chapter, Calibrate. Topic Type classifies the topic: discussion, claim, or a fact for example. This information is then used in creating title and subtitles. We then use all this information in creating table of contents for our content. Based on what we so far obtained, we have chapters generated and finally our result is calibrated with respect to our needs.

GenerateContent plugin/skill which contain all abovementioned functions looks like:

Skills/
└── GenerateContent/
    ├── TopicType/
    │   ├── config.json
    │   └── skprompt.txt
    ├── Title/
    │   ├── config.json
    │   └── skprompt.txt
    ├── Subtitle/
    │   ├── config.json
    │   └── skprompt.txt
    ├── TableOfContents/
    │   ├── config.json
    │   └── skprompt.txt
    ├── Chapter/
    │   ├── config.json
    │   └── skprompt.txt   
    └── Calibrate/
        ├── config.json
        └── skprompt.txt

A Semantic Function is composed of two files: config.json and skprompt.txt.

├── GenerateContent/
│   └── TopicType
│       ├── config.json
│       └── skprompt.txt

The first file, config.json defines LLM related variables, Semantic Functions parameters and function related descriptions:

{
    "schema": 1,
    "type": "completion",
    "description": "Find the type of the sentence provided",
    "completion": {
        "max_tokens": 256,
        "temperature": 0,
        "top_p": 0,
        "presence_penalty": 0,
        "frequency_penalty": 0
    },
    "input": {
        "parameters": [
            {
                "name": "input",
                "description": "Input for this semantic function.",
                "defaultValue": ""
            }
        ]
    },
    "default_backends": []
}

In the second file, skprompt.txt, the function’s capability is described in natural language. Like ordinary functions, Semantic Functions also have parameters, represented by {{$}} symbols.

TopicType function (skprompt.txt)

THE FOLLOWING SENTENCE IMPLIES A FACT:+++++Many employees demand to spend more of their working hours in home-office+++++THE FOLLOWING SENTENCE IMPLIES A CLAIM:+++++The number of people who work from home has increased dramatically in recent years and this will have terrible consequences for society.+++++WHAT DOES THEN THE FOLLOWING SENTENCE IMPLY:+++++{{$input}}+++++YOUR RESPONSE MUST BE:- 1 OR 2 WORDS- STRAIGHTFORWARD AND EASY TO UNDERSTAND

Let us assume the topic that we will feed the system with is:
“Many employees demand to spend more of their working hours in home-office. Discuss chances and risks with respect to the required IT-infrastructure.”

This is at the same time our input to TopicType function. When this function is called, the output, namely the response of the LLM is Discussion.

Native functions

Working with prompt templates is great, but what if we need to manipulate data before or after feeding it to the semantic function, or retrieve/save information? Or you might want to perform math operations where even the most powerful LLMs are not good at. This is where native functions come in. They are native C# or Python functions decorated as @sk_function

In our app, we can devise a native function which is creating a random integer between numbers fetched from the user. The LLM can then be asked to embed that many references/citations within the article. Our native function would then look like:

class Citations:

    @sk_function(
        description="Determine the number of citations",
        name="CitationsNumber",
        input_description="The value to take the square root of",
    )
    def generate_random_integer(self, context: SKContext) -> str:
        return str(random.randint(context["min"], context["max"]))

Native functions are very useful because they can be called before or after semantic functions, nested in them and share the same context.

We will see them in action in next story. To see some examples, check official Microsoft documentation. Find another nice one incorporating Bing connector here.

Preparing Services

To speak to LLMs, we need to prepare our kernel and add AI services.

*Figure 3: Concept of kernel sits in the middle of orchestration*

Semantic Kernel provides LLM connectors for HuggingFace, OpenAI and Azure OpenAI. In this story, we use Azure OpenAI.

import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import (
    AzureChatCompletion,
    AzureTextEmbedding,
)

We use AzureChatCompletion and AzureTextEmbedding to add AI and embedding services.

    kernel = sk.Kernel()

    azure_chat_service = AzureChatCompletion(
        deployment_name="gpt-35-turbo-16k",
        endpoint=os.getenv("OPENAI_ENDPOINT"),
        api_key=os.getenv("OPENAI_API_KEY"),
    )

    azure_embedding_service = AzureTextEmbedding(
        deployment_name="text-embedding-ada-002",
        endpoint=os.getenv("OPENAI_ENDPOINT"),
        api_key=os.getenv("OPENAI_API_KEY"),
    )

    kernel.add_chat_service("azure_chat_completion", azure_chat_service)
    kernel.add_text_embedding_generation_service("ada", azure_embedding_service)

AI service handles the communication to the LLM and embedding service helps get embeddings of texts so that vector search can be done over them. Vector search capability is necessary in RAG applications because no LLM has a context length where we can fit thousands of pages today.

Now it is time to load the kernel with our skill:

    generateContent = kernel.import_semantic_skill_from_directory(
        "ai/skills", "generateContent"
    )

Cascade Calls

Let’s now see our kernel in action.

Our topic is:

# Main input
topic="Many employees demand to spend more of their working hours in home-office. Discuss chances and risks with respect to the required IT-infrastructure."

Referring to Figure 1, we will call TopicType, Title, Subtitle and TableOfContents functions in order. First extract the function so that we can call it. There are many ways of doing it:

    TopicType = kernel.skills.data["generatecontent"]["topictype"]
    TopicType = kernel.skills.get_function("generatecontent", "topictype")
    TopicType = generateContent['TopicType']

Then we simply call it with our input:

topic_type = TopicType(topic)

This is how the result looks like:

Notice that Semantic Kernel doesn’t intrinsically offer an error handling capability in case response from LLM is not healthy. Therefore, developer must implement it. The most straightforward check would be something like:

if topic_type.error_occured:
    sys.exit("Error occurred, stopping program.")

I will recommend a validation class in the next story to cover it.

For now we continue with our chain. Next one is title generation:

Title = generateContent['Title']

Title prompt looks like:

GENERATE AN ARTICLE TITLE BASED ON {{$topic_type}} BELOW

TITLE MUST BE:
- CATCHY AND INFORMATIVE
- OPTIMIZED FOR SEARCH ENGINES
- LENGTH IS 60-70 CHARACTERS
- STRAIGHTFORWARD AND EASY TO UNDERSTAND

GENERATE ESSAY TITLE BASED ON {{$topic_type}}:
+++++
{{$input}}
+++++

Unlike TopicType, Title function expects two inputs: $topic_type and $input, input being the topic itself. That means we cannot use above pattern: TopicType(topic). We need a different way of injecting our inputs. This is where context variables come into play:

    context_variables = sk.ContextVariables()
    
    context_variables['topic_type'] = topic_type.result
    context_variables['input'] = topic

    title = Title(variables=context_variables)

Rendered title is: ‘“Balancing the Pros and Cons of Home-Office: Assessing IT Infrastructure for Remote Work”’

We could have injected context type too:

    context = kernel.create_new_context()

    context["topic_type"] = topic_type.result
    context["input"] = topic

    title = Title(context=context)

Bear in mind that injecting context type is a better practice as we will explore in following stories. It is also worth mentioning that context_variables is simply equal to context.variables.

Accepted argument types list can be found in source code.

Then to create subtitle, we use Subtitle prompt:

GENERATE EXACTLY ONE ARTICLE SUBTITLE FOR THE {{$topic_type}} BELOW:

TITLE MUST BE:
- CLEAR AND CONCISE
- INTERESTING AND ENGAGING TO THE READER
- INCLUDE KEYWORDS RELATED TO THE ARTICLE POST
- UNIQUE AND NOT GENERIC

GENERATE THE SUBTITLE FOR TITLE: 
+++++
{{$title}}
+++++

context_variables['title'] = title.result    

SubTitle = generateContent['Subtitle']
sub_title = SubTitle(variables=context_variables)

Rendered subtitle is: ‘“Unveiling the Secrets of Ancient Egyptian Hieroglyphics: Decoding the Language of the Pharaohs”’

Similarly, obtain table of contents;

    context_variables['sub_title'] = sub_title.result

    TableOfContents = generateContent['TableOfContents']
    tableOfContents = TableOfContents(variables=context_variables)

by using TableOfContents prompt:

GENERATE A TABLE OF CONTENT FOR THE ARTICLE WITH TITLE AND SUBTITLE BELOW

THE TABLE OF CONTENT MUST BE:
- PROVIDE A CLEAR, CONCISE OVERVIEW OF THE BLOG'S CONTENT
- STRUCTURED IN A LOGICAL WAY THAT REFLECTS THE FLOW OF THE BLOG
- INCLUDE ALL MAIN SECTIONS AND SUBSECTIONS IN THE BLOG
- USE CLEAR, UNDERSTANDABLE LANGUAGE THAT GIVES THE READER AN IDEA OF WHAT EACH SECTION WILL 

GENERATE THE TABLE OF CONTENT FOR BLOG WITH TITLE AND SUBTITLE:
+++++
TITLE:{{$input}}
SUBTITLE:{{$sub_title}}
+++++

PROVIDE YOUR ANSWER IN JSON FORMAT:
[
    {
        "chapter": "chapter name",
        "topics": [
            "Topic 1", "Topic 2", .... , "Topic n"
        ]
    }
]

Returned tableOfContents variable is a json in expected format:

[
   {
        "chapter": "Introduction",
        "topics": [
            "The rise of remote work",
            "Importance of IT infrastructure for remote work",
            "Purpose of the blog"
        ]
    },
    {
        "chapter": "Pros of Home-Office",
        "topics": [
            "Flexibility and work-life balance",
            "Increased productivity",
            "Cost savings"
        ]
    },....
.....
.....

Until here, execution duration is 19 seconds.

Now we can iterate over it, extract subtopics and chapter title:

    rendered_article_list = [context_variables['title']]

    table_of_contents_deserialized = json.loads(tableOfContents.result)

    Chapter = generateContent["Chapter"]
    for chapter in table_of_contents_deserialized:
        context_variables['chapter'] = chapter['chapter']
        context_variables["sub_topics"] = "\n".join(f"- {element}" for element in chapter["topics"])
        generated_chapter = Chapter(variables=context_variables)
        rendered_article_list.append(generated_chapter.result)

and use prompt for creating the chapter:

Chapter function

FOR THE ARTICLE BASED ON THE INFORMATION BELOW:
+++++
TOPIC:{{$topic}}
TITLE:{{$title}}
SUBTITLE:{{$sub_title}}
+++++

THE NARRATIVE CONTENT MUST
- PROVIDE ORIGINAL AND VALUABLE CONTENT 
- BE CLEAR AND ENGAGING
- GIVE EXAMPLES IF NEEDED
- NOT CONTAIN TITLES, SUBTITLES, BULLET POINTS, NUMBERED LISTS
- CONTAIN ONLY PARAGRAPH OR PARAGRAPHS

WRITE THE CHAPTER AS A CONTINUOUS PIECE OF PROSE, WITHOUT USING ANY SUBHEADINGS, BULLET POINTS, OR NUMBERED LISTS
{{$chapter}}
USING SUBTOPICS:
{{$sub_topics}}

Rendered article is then in form of a list of generated chapters.

Calibrate function is purposefully omitted. It will be subject to demonstration in the next stories.

Find the code here.

Nested Calls

There is another way of calling semantic functions avoiding cascade preparations and calls. Semantic Kernel supports injecting another function within the prompt, not only variables. Then we can alter our prompts like:

TopicType:

THE FOLLOWING SENTENCE IMPLIES A FACT:
+++++
Many employees demand to spend more of their working hours in home-office
+++++

THE FOLLOWING SENTENCE IMPLIES A CLAIM:
+++++
The number of people who work from home has increased dramatically in recent years and this will have terrible consequences for society.
+++++

WHAT DOES THEN THE FOLLOWING SENTENCE IMPLY:
+++++
{{$input}}
+++++

YOUR RESPONSE MUST BE:
- 1 OR 2 WORDS
- STRAIGHTFORWARD AND EASY TO UNDERSTAND

Title:

GENERATE AN ESSAY TITLE BASED ON {{generateContent.TopicType $input}} BELOW

TITLE MUST BE:
- CATCHY AND INFORMATIVE
- OPTIMIZED FOR SEARCH ENGINES
- LENGTH IS 60-70 CHARACTERS
- STRAIGHTFORWARD AND EASY TO UNDERSTAND

GENERATE ESSAY TITLE BASED ON {{generateContent.TopicType $input}}:
+++++
{{$input}}
+++++

Subtitle:

GENERATE EXACTLY ONE ARTICLE SUBTITLE FOR THE {{generateContent.TopicType $input}} BELOW:

TITLE MUST BE:
- CLEAR AND CONCISE
- INTERESTING AND ENGAGING TO THE READER
- INCLUDE KEYWORDS RELATED TO THE ARTICLE POST
- UNIQUE AND NOT GENERIC

GENERATE THE SUBTITLE FOR TITLE: 
+++++
{{generateContent.Title $input}}
+++++

TableOfContents:

GENERATE A TABLE OF CONTENT FOR THE ARTICLE WITH TITLE AND SUBTITLE BELOW

THE TABLE OF CONTENT MUST BE:
- PROVIDE A CLEAR, CONCISE OVERVIEW OF THE BLOG'S CONTENT
- STRUCTURED IN A LOGICAL WAY THAT REFLECTS THE FLOW OF THE BLOG
- INCLUDE ALL MAIN SECTIONS AND SUBSECTIONS IN THE BLOG
- USE CLEAR, UNDERSTANDABLE LANGUAGE THAT GIVES THE READER AN IDEA OF WHAT EACH SECTION WILL 

GENERATE THE TABLE OF CONTENT FOR BLOG WITH TITLE AND SUBTITLE:
+++++
TITLE: {{generateContent.Title $input}}
SUBTITLE: {{generateContent.Subtitle $input}}
+++++

PROVIDE YOUR ANSWER IN JSON FORMAT:
[
    {
        "chapter": "chapter name",
        "topics": [
            "Topic 1", "Topic 2", .... , "Topic n"
        ]
    }
]

and only call TableOfContents function:

    TableOfContents = generateContent['TableOfContents']
    tableOfContents = TableOfContents(topic)

It will do the job, calling all necessary functions (Title, Subtitle, TopicType) behind the scenes. However, it is worth noting that in this case calling order is very different than the one in Figure 2:

intermediary variables like title and subtitle are lost.

On the other hand, this takes 9 seconds to execute, 2x as fast as the cascade pattern. You can choose the pattern fitting your use case. Find the code here.

Retrieval Augmentation

I mentioned in the beginning of the story that this was going to be a retrieval augmented application. So let’s make it happen.

First we need information to support our prompts with. We will use the article here. As web scraper, I will go for bs4 and as text chunker, I will go for Langchain’s recursive text splitter:

import requests
from bs4 import BeautifulSoup  
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Reference
url = "https://blog-idceurope.com/home-office-is-an-advantage-but-security-risks-remain/"

response = requests.get(url, timeout=2)
if response.status_code == 200 and 'text/html' in response.headers['Content-Type']:
    soup = BeautifulSoup(response.text, 'html.parser')
    text = soup.get_text()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)
splitted_text = text_splitter.split_text(text)

Now that we have our split text, we want to store it in some vector form, preferably in a vector database and fetch parts of it when necessary:

Supported databases can be found here. For Milvus for instance, loading memory connector to our kernel looks like:

from semantic_kernel.connectors.memory.milvus import MilvusMemoryStore
kernel.register_memory_store(memory_store=MilvusMemoryStore())

For the sake of simplicity, we use local memory for embedding storage:

kernel.register_memory_store(memory_store=sk.memory.VolatileMemoryStore())
kernel.import_skill(sk.core_skills.TextMemorySkill())

Now let’s store the embeddings (note that this is where the embedding service is used):

    memory_collection_name = "resourceEssay"

    i = 1
    for chunk in splitted_text:
        await kernel.memory.save_information_async(
            collection=memory_collection_name,
            text=chunk,
            id=i,
        )
        i += 1

Our kernel’s memory object then looks like:

Notice that embeddings are also there.

Now kernel object is almost ready for RAG applications. Let’s do a search:

ask = "How does remote working effect competitiveness?"
memories = await kernel.memory.search_async(memory_collection_name, ask, limit=5, min_relevance_score=0.50)

The most relevant chunk is:

‘But it requires the deployment of substantial security measures to limit risks in a digital environment that remains highly threatening.Home Office: A Key to Attracting ProfessionalsOur research reveals that companies are using hybrid and remote working models to strengthen their competitiveness. From the employer point of view, offering a home office opportunity can improve employee satisfaction. In many cases, it also boosts productivity, resulting in better products and services and greater’

One last operation necessary before being able to use it for retrieval purposes is stating the relevance score and collection name to be used during prompt execution. Relevance score determines the threshold above which the results are to be taken into account.

To do that, similar to abovementioned patterns, both context_variables and context type can be used:

    context_variables["relevance"] = 0.7
    context_variables["collection"] = "resourceEssay"

    # or
    context[sk.core_skills.TextMemorySkill.COLLECTION_PARAM] = "resourceEssay"
    context[sk.core_skills.TextMemorySkill.RELEVANCE_PARAM] = 0.7

To convert our traditional schema to a RAG enabled one, which will work the same both for cascade pattern and nested pattern. The only change we need to introduce is the recall section:

CONTEXT:
{{Recall  $chapter}}

New Chapter function:

FOR THE ARTICLE BASED ON THE INFORMATION BELOW:
+++++
TOPIC:{{$topic}}
TITLE:{{$title}}
SUBTITLE:{{$sub_title}}
CONTEXT:
{{Recall  $chapter}}
+++++

THE NARRATIVE CONTENT MUST
- PROVIDE ORIGINAL AND VALUABLE CONTENT 
- BE CLEAR AND ENGAGING
- GIVE EXAMPLES IF NEEDED
- NOT CONTAIN TITLES, SUBTITLES, BULLET POINTS, NUMBERED LISTS
- CONTAIN ONLY PARAGRAPH OR PARAGRAPHS

WRITE THE CHAPTER AS A CONTINUOUS PIECE OF PROSE, WITHOUT USING ANY SUBHEADINGS, BULLET POINTS, OR NUMBERED LISTS
{{$chapter}}
USING SUBTOPICS:
{{$sub_topics}}

Upon execution, kernel handles the information retrieval task based on provided relevance score and embed the fetched text in the prompt in a way completely transparent to developer. Find the code here.

One thing to notice is that in order for Recall mechanism to work, you must call the code with run_async method as shown here. If you run it like this as before, recall mechanism will not work:

TOPIC: 
TITLE: "The Pros and Cons of Home-Office: Assessing IT Infrastructure Risks and Opportunities"
SUBTITLE: "Unveiling the Secrets of Successful Content Marketing Strategies: Boost Your Online Presence and Drive Customer Engagement"
SUBTOPICS:
- Overview of the blog's content
- Importance of assessing IT infrastructure risks and opportunities in a home-office setup
CHAPTER: Introduction
CONTEXT: EMPTY

This can be confirmed by using this debug prompt (yes, currently SK doesn’t provide an easy way to see the executed prompt).

Conclusion

Semantic Kernel is an organized way of storing and executing prompts with a lot of supported concepts with various design patterns possible. It can sit in the middle of orchestration of any application that employs any interaction with supported LLMs hosted in Hugging Face, OpenAI or Azure OpenAI.

Thanks to its connectors, SK is also very convenient for RAG applications as it can intrinsically manage vector operations with minimum effort. It is open source and actively maintained by community and Microsoft.

All in all, in the middle of AI hype, Semantic Kernel is one of the skills you may want to add into your inventory in 2024.

In the next story, I will

recommend a validation pattern for LLM interactions
recommend a kernel configuration pattern in order to help equip kernel with various skills and connectors
demonstrate native functions in action and how they can be taken advantage of in complex design patterns
demonstrate chaining pattern
demonstrate orchestration by function chaining mechanism by adding an abstraction layer

Stay tuned!