My 2025 AI Journey : Article #4 : Vector Database Overview

cyber technology concept design with digital face and network diagram

Vector Database Overview

SNOW! I love snow, part of my youth I lived in Kansas City, so snow is part of my happy memories. I also like to tell everyone that I had my first ski lesson in the Swiss Alps (Davos, Switzerland). ? Anyhow, I said that to say, I had planned to write this yesterday but the snow was on the way and my focus was elsewhere. Unfortunately, in my area, we only got a wintery mix and now even that is already gone. I’m hoping we will still get a little snow before the winter is over.

While looking for the Vector Database article I previously read, to learn all about it, about creating a Chatbot. There seemed to be tons of AI articles, I have to admit that it was overwhelming and exciting at the same time. I also feel behind in all my AI knowledge but I have to start somewhere.

In my Wednesday article, I started to write about Vector Databases and said I would go into more detail today. Just to reiterate, this is my understanding of Vector Database, right now…

  1. Take some unstructured data. 
  2. Use a Transformer Model to transform it into a vector embedding, which is a ‘numerical’ representation of that data (a vector)
  3. Add it to the vector database.
  4. Then you can do a ‘similarity search’ on it.

I created this “AI Overview” (with a focus on Chatbot Steps) diagram and will go into more detail on the Vector Database part…

Traditional databases store structured data, in rows and columns. Structured data is data organized in a predefined format. Thinking in terms of an Excel spreadsheet, your username can be stored in the first box (row1,column1) and its format is character; your password can be stored in the second box (row1, column2) and its format is varchar (variable character), etc. 

A vector database, on the other hand, takes a massive dataset of unstructured data, information that doesn’t have a set format or structure, and allows you to store, index, and search it. This is done by leveraging embeddings from machine learning models. 

  1. Unstructured data can be data like text, documents,  audio files, video files, images, and emails. This data can be taken as a whole. 
  2. With machine learning models, such as Image Transformer, Audio Transformer, and NLP Text Transformer, an embedding (also known as a vector) is created. Below are a few transformers that can be used:
    1. HuggingFace – sentence transformers
    2. Img2Vec – image transformers
    3. Facebook Kats – time series data transformer
  3. A vector (or embedding) is a mathematical entity with both magnitude and direction. In terms of data, vectors are used to represent various types of information. This vector is then stored in the Vector Database. Below are a few Vector Databases, ones that I have heard about (in no particular order)
    1. Milvus
    2. Pinecone
    3. Qdrant 
    4. MongoDB Atlas
  4. Now Similarity Search can be performed

This is a big-picture view of Vector databases. I have downloaded a few Vector Databases and will download any open-source models/transformers. Also, it looks like SQL can be used with Vector Databases, but I’m not sure about that. As I continue on my journey I will be documenting what I learn.

They say that “the best way to learn is to teach” so these articles are to help me learn and I hope others can get something from them as well. ?

_______________________________________________________
All articles posted Wednesdays & Saturdays by 8PM

Leave a Comment

Your email address will not be published. Required fields are marked *