Building on some earlier articles, let's look at how we can use embeddings for not just text but images. In fact, let's take it a stage further and embrace a multi-modal model / tokenizer called "openai/clip-vit-base-patch32" (from the guys at OpenAI and available for download on Huggingface) for the embeddings.
These steps are required for the Python-specific code.
mkdir embedding cd embedding/
Managing the correct environment.
python3 -m venv . source bin/activate
Installing the necessary packages.
pip install transformers pip install torch pip install pillow pip install numpy pip install fastapi pip install uvicorn
These steps are required for the Javascript-specific code.
mkdir api cd api/
Installing the necessary packages.
npm install axios npm install commander npm install compromise npm install cookie-parser npm install cors npm install dotenv npm install express npm install express-es6-template-engine npm install he npm install joi npm install moment npm install multer npm install nodemon npm install pg npm install pg-hstore npm install pgvector npm install sequelize npm install sequelize-auto npm install sequelize-pagination npm install uuid npm install prettier
Let's create a throw-away Python script that shows us how to use the model / tokenizer to generate embeddings from an image.
test-image-embedding.py
from transformers import AutoProcessor, AutoModelForZeroShotImageClassification import torch from PIL import Image processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32") model = AutoModelForZeroShotImageClassification.from_pretrained("openai/clip-vit-base-patch32") image_path = "blonde-woman.png" image = Image.open(image_path) inputs = processor(images=image, return_tensors="pt") with torch.no_grad(): image_features = model.get_image_features(**inputs) import numpy as np image_features_np = image_features.numpy() print(image_features_np[0])
Let's run the script:
python test-image-embedding.py [-5.40132821e-02 -1.93054989e-01 -3.95455211e-03 1.91171229e-01 -3.75162125e-01 9.22233611e-03 -6.77868351e-03 -1.44027546e-01 -6.85096860e-01 -8.37914348e-02 1.77640423e-01 1.25645205e-01 -3.46377909e-01 -1.06426030e-02 3.37736040e-01 -5.01924753e-01 -2.18784124e-01 2.61504769e-01 -5.83848804e-02 -1.09245747e-01 1.24091530e+00 3.88815776e-02 -6.49410486e-02 -5.26623964e-01 -1.60188228e-01 1.56638145e-01 4.17744070e-01 -2.97322392e-01 3.41349468e-02 2.08359420e-01 -5.40926814e-01 2.63510168e-01 -4.04175133e-01 3.67467046e-01 -1.64146096e-01 2.66779751e-01 1.82682618e-01 2.32544899e-01 -8.85631964e-02 3.41229677e-01 -3.25820118e-01 3.25782239e-01 -1.00399166e-01 -3.37315708e-01 7.60497153e-02 5.88586748e-01 3.53650153e-01 1.76775903e-01 -3.95788670e-01 7.47422576e-02 1.70323253e-01 -4.80736852e-01 3.90328407e-01 -1.02524199e-01 -4.68754172e-01 6.71297908e-02 6.51074648e-02 1.48725420e-01 -2.87767947e-01 -4.08542484e-01 3.17822427e-01 2.42569029e-01 2.53024817e-01 -6.30636588e-02 -2.72080183e-01 1.74388930e-01 2.75049806e-02 -9.65860009e-01 1.58574894e-01 6.06836230e-02 -3.32381606e-01 2.68457413e-01 -5.07304609e-01 6.31776750e-02 -3.61066125e-02 -2.78394133e-01 1.16228595e-01 4.81769830e-01 1.85684770e-01 -1.47018865e-01 1.60435230e-01 -1.33053288e-01 1.97314218e-01 6.58724189e-01 -5.75581118e-02 9.00070518e-02 1.54322326e+00 -4.79513288e-01 3.89355779e-01 1.57677174e-01 2.44223967e-01 -1.73689425e-01 -6.77042580e+00 6.37175322e-01 6.37708008e-02 -2.70652354e-01 -7.96685517e-02 -2.32987404e-01 -7.92721748e-01 -1.04933548e+00 2.44591787e-01 -6.62040293e-01 -2.64316320e-01 -4.00982589e-01 -5.97513139e-01 -2.11769968e-01 1.03814578e+00 -1.32618055e-01 1.26016200e-01 3.51098686e-01 -1.24301620e-01 -1.70093596e+00 1.29336268e-01 1.94604993e-01 -1.68729663e-01 2.31825486e-01 5.47995746e-01 -1.33333594e-01 8.10879767e-02 -3.01703572e-01 -2.95724571e-01 3.98139864e-01 -9.58971530e-02 -1.08954504e-01 -4.29436088e-01 -1.44610956e-01 3.82165127e-02 -2.05728635e-01 -1.38953060e-01 -2.42466778e-02 7.01146245e-01 -5.64890504e-01 -2.79967010e-01 9.18756962e-01 -2.06098072e-02 6.28092408e-01 1.79266632e-02 -1.78675815e-01 9.51453745e-02 -1.21693417e-01 -3.74287605e-01 -1.23257786e-02 -1.89352110e-01 2.10022867e-01 -2.02477545e-01 1.34273767e-01 4.31957319e-02 3.86657000e-01 -2.51282036e-01 3.98504376e-01 -2.08202928e-01 7.97248930e-02 7.68799782e-01 3.64595577e-02 -3.01613957e-02 -5.65646648e-01 3.44998807e-01 3.46816480e-02 -1.95582837e-01 2.49393523e-01 -1.82594836e-01 -1.31168455e-01 -5.85692525e-02 1.20639294e-01 -8.85587931e-02 5.55291846e-02 8.23581457e-01 2.85749793e-01 -2.15871558e-01 -8.33057016e-02 1.37687400e-02 2.75512874e-01 -4.22006100e-02 -2.50869513e-01 5.02870604e-02 -3.95914316e-02 9.10528362e-01 4.83117253e-02 2.83215702e-01 2.40392461e-01 4.55593139e-01 -1.46761909e-01 7.09548593e-01 1.37666523e-01 -2.25647822e-01 1.03631921e-01 1.63447559e-02 -1.09305717e-01 2.52433270e-01 2.16893464e-01 2.96454787e-01 4.57040787e-01 2.77082980e-01 9.31173563e-04 -3.76722664e-02 1.28894463e-01 -2.51551390e-01 -5.28474808e-01 4.10098463e-01 -5.47439873e-01 -1.95222080e-01 1.54247344e-01 4.13270950e-01 -3.02311599e-01 2.29907945e-01 -5.08035123e-01 2.22629935e-01 -1.00009888e-01 9.34797227e-02 1.66726500e-01 1.11586547e+00 4.47894096e-01 -2.09018052e-01 4.17754471e-01 -4.44357216e-01 -1.17062956e-01 -6.20028377e-03 1.18191361e-01 3.16576779e-01 -2.21808419e-01 1.66782290e-02 6.11949444e-01 -6.92677647e-02 1.11415312e-01 -3.96258175e-01 1.32691696e-01 1.25753418e-01 -1.14196293e-01 -2.36799404e-01 -9.57945138e-02 7.86440596e-02 -1.26604736e-02 7.92662948e-02 1.90126315e-01 1.22421324e-01 -5.18528104e-01 -8.53140280e-02 -4.20529962e-01 1.04288846e-01 -1.54540062e-01 3.00026596e-01 1.11398101e-02 -1.07361630e-01 -5.09832144e-01 -1.15582354e-01 -7.68002331e-01 -1.98700160e-01 2.93628752e-01 -4.63021874e-01 2.97747366e-02 -8.91805887e-02 2.78878957e-02 2.31536344e-01 -2.41299450e-01 -1.33296221e-01 -6.87943399e-03 -2.38735601e-01 -9.13164616e-02 -1.05474532e+00 -3.74551237e-01 2.15849027e-01 2.37684399e-02 8.90867934e-02 -9.56982136e-01 -2.10128516e-01 -3.77484918e-01 -4.70648468e-01 1.59837127e-01 -5.44727892e-02 -2.53751367e-01 6.87814504e-02 2.45265856e-01 2.55720019e-01 1.84735000e-01 -1.17877781e-01 -3.40856612e-02 -2.46014178e-01 2.67045856e-01 -6.36359155e-02 -4.24352705e-01 -3.55958045e-02 -1.09272853e-01 -7.68991947e-01 4.40233618e-01 -2.67008156e-01 -6.27204031e-02 -3.49275053e-01 -1.02338009e-01 -3.71907681e-01 -5.87717295e-01 -2.67961472e-01 -5.24523854e-03 -4.85436805e-02 3.26138794e-01 -2.64503777e-01 -2.90730029e-01 -6.72322392e-01 1.05043486e-01 5.12615889e-02 -1.37341663e-01 -1.25858381e-01 1.41833410e-01 -5.66676617e-01 -1.08450904e-01 9.41905379e-02 8.42899621e-01 -8.25493187e-02 1.47096068e-01 -8.69344175e-02 5.15298188e-01 -9.83569026e-02 2.25818157e-01 9.16545212e-01 -1.34785473e-01 -5.43541983e-02 7.77317524e-01 1.62835568e-01 -1.84425935e-01 -4.82330650e-01 -7.16550112e-01 2.48820931e-01 2.36239731e-02 -9.44472551e-02 -1.70316190e-01 4.80618775e-01 2.11296141e-01 2.08745807e-01 5.09166867e-02 6.24350160e-02 -2.41702929e-01 3.77914429e-01 -3.48765016e-01 1.94830775e-01 2.99026936e-01 -1.21971816e-01 1.81951791e-01 1.12946880e+00 3.82078409e-01 4.49622244e-01 5.36007404e-01 1.13294274e-01 -2.85540342e-01 -2.19394624e-01 1.78703219e-01 -4.22878526e-02 1.78477407e-01 -2.08368167e-01 2.83520818e-01 -1.74282804e-01 9.85179916e-02 -2.82502115e-01 -2.02017322e-01 4.04816628e-01 -5.81855029e-02 9.41279531e-03 4.75115597e-01 1.09255716e-01 1.01144004e+00 2.65411437e-01 -2.85278767e-01 -1.51898786e-01 -1.09826326e-01 1.77694798e-01 4.34331954e-01 7.72409022e-01 -1.56922638e-03 -3.24966818e-01 -1.11104822e+00 -4.12855536e-01 -2.00007945e-01 9.52722132e-02 -2.04372436e-01 3.60612571e-01 3.26962471e-01 8.62678140e-02 -5.95970213e-01 1.61683905e+00 1.04747035e-01 -2.17336237e-01 2.63533220e-02 -6.13715388e-02 -6.61238670e-01 -2.28930384e-01 -2.49526441e-01 -1.41146243e-01 5.08370042e-01 5.84921598e-01 -1.98167592e-01 1.17925346e-01 1.57207072e+00 -4.18866366e-01 -2.29065955e-01 -1.24670461e-01 -2.19841763e-01 -3.83840203e-01 -1.36076108e-01 4.36393261e-01 -1.76403493e-01 4.17627618e-02 1.16413474e-01 -2.03465819e-02 -2.52857059e-02 2.84987867e-01 -3.58117223e-01 1.81351304e-02 -1.90663040e-01 -1.02527514e-02 1.98255748e-01 9.24774706e-02 6.57313168e-02 -1.51287496e-01 -5.58991507e-02 6.23843819e-02 4.06558990e-01 1.49780214e-01 2.58329302e-01 2.18302280e-01 2.36582294e-01 2.04655513e-01 -1.81472927e-01 -2.86354125e-02 -4.89747524e-02 3.40251446e-01 -7.93629050e-01 1.09296322e-01 -5.80129176e-02 -5.86714447e-02 2.34155416e-01 -3.22000444e-01 -9.33362320e-02 -1.18350074e-01 -8.35705549e-04 8.91422868e-01 1.07855037e-01 -2.66656220e-01 1.09330118e-01 -9.29073095e-02 -6.46081567e-02 1.38101697e-01 -3.96992326e-01 -2.96084285e-01 3.48464325e-02 -4.63098288e-01 4.12537932e-01 -6.16104305e-02 -2.08837256e-01 -1.79680765e-01 -5.29724061e-02 -3.15027714e-01 2.51084805e-01 2.40888387e-01 3.85966599e-01 -2.13772476e-01 3.49766277e-02 -8.26992542e-02 3.52111906e-01 -4.54889536e-02 -2.41713375e-02 -1.09562993e-01 7.44453222e-02 1.56931877e-01 -3.50701958e-02 3.26353498e-02 -7.38986492e-01 6.67737186e-01 6.12415373e-04 3.15411568e-01 -2.33650491e-01 -1.91711351e-01 -2.39452198e-02 1.34741440e-01 6.45889193e-02 -1.89972386e-01 -4.44191992e-01 -1.11413486e-01 -1.14576057e-01 -2.61343271e-01 1.49857491e-01 -1.60966724e-01 6.66997731e-02 -5.55049181e-01 1.51970565e-01 -3.45902681e-01 7.42579773e-02 6.39410019e-02 6.14605665e-01 1.69301122e-01 -2.33500630e-01 -2.39544705e-01 -2.97120929e-01 2.80148119e-01 7.52456039e-02 5.05358279e-02 -6.12225473e-01 3.51461887e-01 6.72973037e-01 3.55735064e-01 1.63332045e-01 3.24618012e-01 -2.27422804e-01 -5.41151106e-01 1.24591038e-01 -1.64012462e-01 -1.93415880e-01 1.27287912e+00 -2.10435316e-01 -5.51056504e-01 -2.67660290e-01 -3.00241798e-01 9.22407210e-02 4.66299921e-01 3.70588720e-01]
Here, we're outputting the embedding. Looking good.
Okay. Let's create an embedding API using FastAPI so that we can call this Python code from our Express RESTful API.
from fastapi import FastAPI, HTTPException from transformers import AutoProcessor, AutoModelForZeroShotImageClassification import torch from PIL import Image from typing import List from fastapi.responses import JSONResponse import numpy as np from pydantic import BaseModel import base64 from io import BytesIO app = FastAPI() processor = AutoProcessor.from_pretrained("openai/clip-vit-base-patch32") model = AutoModelForZeroShotImageClassification.from_pretrained("openai/clip-vit-base-patch32") class SentenceRequest(BaseModel): sentence: str @app.post("/api/generate-embedding") async def generate_embedding(request: SentenceRequest): """ Generates sentence embedding for a given sentence. Args: request: A SentenceRequest object containing the sentence to embed. Returns: A JSON response containing the sentence embedding. """ try: sentence = request.sentence try: image = Image.open(BytesIO(base64.b64decode(sentence))) inputs = processor(images=image, return_tensors="pt") is_image = True except Exception as e: is_image = False inputs = processor(text=sentence, return_tensors="pt") with torch.no_grad(): if is_image: features = model.get_image_features(**inputs) else: features = model.get_text_features(**inputs) features_np = features.numpy() return {"embedding": features_np[0].tolist()} except Exception as e: raise HTTPException(status_code=500, detail=str(e))
Okay. So what we've done here is create a separate, stand-alone API on our own internal network. And our "public" API will use it when it needs to generate embeddings.
Okay. Let's create a CLI tool we can use to generate the embeddings for the SQL files. This tool will take a description or base64 encoded string, send it to the embedding API and then output the embedding in the format we can use for PostgreSQL.
const { program } = require("commander"); const { post } = require("axios"); program.version("0.0.1").description("A command-line tool for generating embeddings"); program .command("generate") .description("Generates embeddings given a sentence and that sentence might be text or base64 encoded string") .action(async (sentence) => { const response = await post( "http://image_vector_search_example_embeddings:7474/api/generate-embedding", { sentence: sentence, }, ); const embeddings = response.data.embedding; console.log(`ARRAY[${embeddings.join(", ")}]::vector(512)`); }); program.parse(process.argv);
You can use it as follows:
node console/embedding.js Usage: embedding [options] [command] A command-line tool for generating embeddings Options: -V, --version output the version number -h, --help display help for command Commands: generateGenerates embeddings given a sentence and that sentence might be text or base64 encoded string help [command] display help for command
Here is an example using a text description:
node console/embedding.js generate "Blonde woman standing in front of a concrete wall" ARRAY[0.18320761620998383, 0.03161871060729027, -0.24215754866600037, 0.002381939673796296, -0.3815241754055023, 0.11699826270341873, -0.06393322348594666, -0.32643017172813416, -0.06508059799671173, -0.016655657440423965, 0.12588918209075928, 0.08714832365512848, 0.1825539916753769, 0.3111313283443451, 0.1409342736005783, -0.2670823335647583, -0.05960197001695633, -0.057347264140844345, 0.0851236879825592, -0.018667513504624367, 0.40941762924194336, 0.06750501692295074, 0.010939118452370167, -0.21832513809204102, 0.007390803191810846, 0.17483380436897278, 0.013441544026136398, 0.21434886753559113, 0.17463350296020508, 0.3800116777420044, 0.43186306953430176, -0.6189807653427124, -0.09298565983772278, -0.009257301688194275, -0.063480906188488, 0.17932671308517456, -0.005677626468241215, 0.37541261315345764, -0.3637228012084961, 0.3403746485710144, -0.13458044826984406, 0.20003224909305573, 0.21956512331962585, -0.06931434571743011, -0.0733635425567627, 0.1582181453704834, 0.1460110992193222, -0.18139229714870453, -0.055013470351696014, -0.32213377952575684, 0.02394952066242695, -0.4399184286594391, 0.1836244761943817, -0.46314653754234314, -0.23396655917167664, 0.08557330071926117, 0.28264233469963074, -0.08427311480045319, -0.2938283383846283, -0.288257360458374, 0.3792520761489868, -0.18671190738677979, 0.38342440128326416, 0.13260459899902344, -0.26889917254447937, -0.1900402009487152, 0.12867343425750732, -0.34942781925201416, -0.14439786970615387, -0.13558758795261383, -0.11739916354417801, 0.07086839526891708, 0.15885205566883087, -0.023332800716161728, 0.19701221585273743, -0.2463913857936859, 0.31158074736595154, 0.19193068146705627, 0.12548136711120605, -0.21132497489452362, 0.24094906449317932, 0.18685618042945862, 0.27544763684272766, 0.18937289714813232, -0.1478014439344406, -0.008608809672296047, -0.28316086530685425, 0.19025227427482605, 0.07392159104347229, 0.056241899728775024, 0.19267235696315765, 0.40285584330558777, -0.734610378742218, 0.36875903606414795, 0.530949592590332, -0.2714320421218872, 0.192481130361557, 0.14686863124370575, -0.37205198407173157, -0.33547890186309814, 0.29407593607902527, -0.041811954230070114, 0.048648834228515625, -0.4006766378879547, 0.05889435112476349, -0.38627687096595764, -0.04212673753499985, 0.09793310612440109, 0.44871851801872253, -0.34463536739349365, -0.3357393443584442, -0.17218787968158722, 0.5231924057006836, -0.35198652744293213, -0.6286910176277161, 0.19060413539409637, 0.09887733310461044, -0.011740289628505707, 0.08027203381061554, -0.2311287373304367, 0.018516944721341133, -0.6229166388511658, 0.324867844581604, -0.42825278639793396, -0.07613208144903183, 0.198409765958786, -0.011468782089650631, -0.11272446066141129, 0.272502601146698, 0.4532114565372467, 0.1331571638584137, 0.3604752719402313, -0.2653084993362427, 1.8272792100906372, -0.5559777617454529, 0.5493554472923279, 0.012088047340512276, -0.007365081459283829, 0.052574507892131805, -0.006460706703364849, 0.020373916253447533, -0.098611019551754, -0.9145404696464539, 1.0089466571807861, 0.15015417337417603, -0.07220616191625595, 0.17096614837646484, 0.4602961838245392, -0.5027254223823547, 0.31969448924064636, -0.3716675341129303, -0.24277935922145844, 0.17500081658363342, -0.007330306340008974, -0.4239054024219513, -0.03606951981782913, 0.0037081113550812006, 0.039805926382541656, -0.050204697996377945, 0.4800964295864105, -0.46841755509376526, 0.2082263082265854, -0.03534712269902229, 0.029579732567071915, -0.03793754801154137, 0.36987927556037903, 0.2054104208946228, 0.12230900675058365, -0.4559875726699829, -0.13300426304340363, -0.43109604716300964, -0.2516193985939026, -0.04285894334316254, -0.02255505695939064, -0.14487141370773315, 0.20051859319210052, 0.11653833091259003, -0.20984508097171783, 0.2732856273651123, -0.2821314036846161, -0.23106496036052704, -0.2692067325115204, 0.23294344544410706, 0.23291316628456116, -0.09978078305721283, -0.42728888988494873, -0.3605228066444397, -0.06111738458275795, -0.15404172241687775, 0.4982629120349884, 0.04908117651939392, 0.44712162017822266, -0.22099830210208893, -0.4077518880367279, -0.474028617143631, 0.2252333164215088, 0.048096172511577606, -0.18651093542575836, -0.05525001883506775, -0.4308834671974182, -0.05295749381184578, 0.08328722417354584, 0.49740681052207947, -0.09472580254077911, -0.14828002452850342, -0.047292500734329224, 0.5883834958076477, 0.04992053285241127, -0.23273952305316925, 0.21077242493629456, 0.8096718192100525, 0.3225979804992676, 0.04389605298638344, 0.4176429808139801, 0.24365273118019104, -0.21760311722755432, 0.23110586404800415, -0.0115760937333107, 0.3578460216522217, -0.057855233550071716, 0.35614025592803955, 0.025827746838331223, -0.35924452543258667, -0.1638335883617401, -0.6468785405158997, 0.18430623412132263, 0.13102470338344574, -0.011148101650178432, -0.2766386866569519, -0.15531618893146515, -0.14854350686073303, 0.017327124252915382, -0.1750909686088562, 0.0494525209069252, 0.522391676902771, -0.3823597729206085, 0.625869870185852, -0.12237784266471863, -0.4376762807369232, -0.2474554032087326, 0.1449665129184723, 0.17815206944942474, -0.06530159711837769, 0.10113844275474548, -0.20453868806362152, 0.12214796245098114, -0.05643964558839798, 0.3488086462020874, -0.12369988858699799, -0.06253030896186829, 0.02321617119014263, -0.13727237284183502, -0.04255237430334091, 0.00031793946982361376, 0.11082957684993744, 0.13972096145153046, -0.16037924587726593, 0.04480183124542236, 0.17697395384311676, 0.1526934802532196, 0.11159809678792953, -0.1704074889421463, 0.521111011505127, -0.31910401582717896, 0.1905408650636673, -0.16451969742774963, 0.24540314078330994, -0.12473762780427933, 0.22389020025730133, -0.4061351716518402, -0.18735837936401367, 0.3350383937358856, 0.06288840621709824, -0.10963790118694305, 0.061673711985349655, -0.03069629706442356, -0.400185763835907, -0.11688192933797836, 0.024786440655589104, -0.21992020308971405, -0.3526710271835327, 0.09666597843170166, -0.21711817383766174, 0.015498803928494453, -0.04464738816022873, 0.08089850842952728, -0.4951034486293793, -0.003317497903481126, 0.14044064283370972, -0.4955803453922272, -0.2822767496109009, 0.09655940532684326, -0.15445704758167267, 0.2305513471364975, -0.0567755252122879, -0.4003731906414032, -0.1417245864868164, 0.023199480026960373, 0.17864802479743958, -0.6016622185707092, -0.21647945046424866, -0.012226293794810772, 0.12389914691448212, -0.15582191944122314, -0.30015435814857483, -0.7697263360023499, 0.0674356147646904, 0.057904962450265884, 0.3939399719238281, 0.5404149293899536, 0.19319896399974823, -0.2791910171508789, 1.8278433084487915, 0.30572158098220825, 0.4915773868560791, 0.48899975419044495, -0.2239573895931244, -0.5855585336685181, -0.15145756304264069, -0.32410115003585815, 0.13519792258739471, 0.33430036902427673, -0.07638221979141235, -0.04510635510087013, 0.045370277017354965, 0.2756231129169464, -0.21859993040561676, -0.021257346495985985, -0.15004757046699524, -0.5165911912918091, 0.025063667446374893, -0.463236540555954, -0.09404203295707703, -0.3654698133468628, -0.11659350991249084, 0.11813821643590927, 0.049309343099594116, 0.23712363839149475, 0.3683038353919983, 0.09738539904356003, -0.318829208612442, 0.14393991231918335, 0.08104397356510162, -0.259915828704834, 0.12142622470855713, -0.3120679557323456, 0.1939970850944519, 0.0712263286113739, 0.26806801557540894, -0.07302326709032059, 0.20989781618118286, -0.07551079243421555, 0.33883434534072876, 0.028321625664830208, -0.04849167540669441, -0.019721360877156258, -0.1519397348165512, 0.4897685647010803, 0.17571735382080078, -0.16773556172847748, 0.16563838720321655, -0.36857375502586365, 0.18253324925899506, 0.13409125804901123, -0.1410038024187088, 0.14604905247688293, -0.1321098506450653, -0.15063968300819397, 0.037616413086652756, 0.1624722182750702, -0.044324882328510284, -0.21483537554740906, 0.1227593645453453, -0.10093070566654205, -0.23645834624767303, -0.30633944272994995, 0.27131882309913635, 0.16170825064182281, -0.7078282833099365, 0.2671409547328949, 0.25450751185417175, 0.5182300209999084, -0.0952322781085968, -0.27931615710258484, 0.26408806443214417, 0.21787187457084656, -0.028779391199350357, -0.009930015541613102, -0.42213544249534607, 0.19434984028339386, -0.7372940182685852, -0.16505108773708344, -0.14029450714588165, 0.4269119203090668, 0.41406553983688354, 0.21454691886901855, 0.23170803487300873, 0.20499533414840698, 0.061538323760032654, 0.23449143767356873, -0.13078975677490234, -0.12735670804977417, 0.004640067461878061, 0.023091239854693413, 0.19143415987491608, -0.5198346972465515, 0.4803995192050934, -0.2935032248497009, 0.36202722787857056, -0.18045593798160553, -0.2110348343849182, -0.36658555269241333, -0.14479045569896698, 0.16400666534900665, 0.281674861907959, 0.27595600485801697, 0.30481716990470886, 0.10732381045818329, 0.0662091001868248, 0.057474128901958466, 0.1314658522605896, 0.3050023913383484, 0.21605689823627472, -0.356924831867218, 0.7546390891075134, 0.061432626098394394, 0.44972312450408936, 0.3294881284236908, -0.16517288982868195, 0.1717618703842163, 0.20931179821491241, 0.03940201550722122, 0.1584504246711731, 0.015166008844971657, -0.36681002378463745, 0.05253731831908226, -0.14946019649505615, -0.026552796363830566, -0.16984863579273224, -0.11084974557161331, -0.29363372921943665, -0.1455077975988388, -0.3935563564300537, -0.30397215485572815, 0.05908620357513428, 0.06961996853351593, -0.7056646347045898, -0.4824918508529663, -0.3544345498085022, 0.03629116714000702, 0.14571990072727203, 0.04891129955649376, -0.10198657959699631, 0.3325859010219574, 0.09042026102542877, 0.03576710447669029, -0.25503459572792053, 0.009950867854058743, -0.09176511317491531, -0.31916332244873047, 0.42978087067604065, -0.08646591752767563, 0.014205148443579674, -0.10038730502128601, 0.03813614696264267, 0.0994037315249443, -0.1212572529911995, -0.055020857602357864, -0.07485558837652206, -0.15094901621341705, 0.12740400433540344, 0.299965500831604, 0.31172463297843933, -0.057152025401592255, -0.3193354308605194, 0.2412383109331131, 0.1514105349779129, 0.02667844295501709, 0.8067172169685364, -0.35850760340690613, -0.23788148164749146, -0.196895033121109, 0.19713708758354187, -0.01743750460445881, -0.05868763476610184, -0.007379439659416676, 0.186065673828125, -0.07089875638484955, -0.04931574687361717, -0.01708357036113739, 0.5363094210624695, 0.3172820508480072, 0.0754103884100914, 0.6118435859680176, 0.18285530805587769, 0.38011351227760315, -0.057567957788705826, -0.18636561930179596, 0.12354051321744919, -0.14239268004894257, -0.01393868587911129, -0.1977919191122055, -0.2816285192966461, 0.13766126334667206, 0.3340049684047699, -0.05721529200673103, -0.04470556229352951, -0.47196903824806213, 0.06261805444955826, -0.06662864983081818, 0.022203583270311356, -0.17256979644298553]::vector(512)
Okay. We're now about half-way. Let's create a simple schema for persisting the images and their embeddings and we'll then build a simple front-end / back-end to demonstrate the functionality and wrap up things.
We're going to be a bit naughty here and store the image as a base64 string. The mimetype is to help us output it into html img element. But the key thing is the VECTOR(512) type in the database. The model / tokenizer we're using outputs 512 dimensions so we want to make sure we're using that too.
CREATE EXTENSION vector; CREATE TABLE images ( id SERIAL PRIMARY KEY, mimetype VARCHAR, image TEXT, embedding VECTOR(512) );
Let's create some entries into the database.
I've stripped out the gubbins so these are just for illustrative purposes. DO NOT TRY INSERTING!
INSERT INTO images (mimetype, image, embedding) VALUES ('image/png', 'iVBORw0KGgoAAAANSUhEUgAAAOAAAADgCAIAAACVT/22AAAAwXpUWHRSYXcgcHJvZmlsZSB0eXBlIGV4aWYAAHjabVDbDcMgDPz3FB0BPyBmHNJQqRt0/BrsREnTkzg/ddiG/nm/3mHQpor1jnXLQqV1JqE5zDK+OI1KptbUGEethdTMmIaI9SCzSc5EMJVMjohy7xjD55EwU/h9nxpzuA80jxAAAAABJRU5ErkJggg==',ARRAY[0.23651084303855896, -0.06072381138801575, -0.09976153075695038, 0.07470647990703583, -0.15835444629192352, -0.21511252224445343, -0.3690471351146698, -1.135039210319519, -0.5108910202980042, -0.31614992022514343, -0.09622760117053986, -0.011659342795610428, -0.15897372364997864, 0.07606512308120728]::vector(512)); INSERT INTO images (mimetype, image, embedding) VALUES ('image/png', 'iVBORw0KGgoAAAANSUhEUgAAAOAAAADgCAIAAACVT/22AAAAwXpUWHRSYXcgcHJvZmlsZSB0eXBlIGV4aWYAAHjabVDbDcMgDPz3FB0BPyBmHNJQqRt0/BrsREnTkzg/ddiG/nm/3mHQpor1jnXLQqV1JqE5zDK+OI1KptbUGEethdTMmIaI9SCzSc5EMJVMjohy7xjD55EwU/h9nxpzuA80jxAAAAABJRU5ErkJggg==',ARRAY[0.23651084303855896, -0.06072381138801575, -0.09976153075695038, 0.07470647990703583, -0.15835444629192352, -0.21511252224445343, -0.3690471351146698, -1.135039210319519, -0.5108910202980042, -0.31614992022514343, -0.09622760117053986, -0.011659342795610428, -0.15897372364997864, 0.07606512308120728]::vector(512)); INSERT INTO images (mimetype, image, embedding) VALUES ('image/png', 'iVBORw0KGgoAAAANSUhEUgAAAOAAAADgCAIAAACVT/22AAAAwXpUWHRSYXcgcHJvZmlsZSB0eXBlIGV4aWYAAHjabVDbDcMgDPz3FB0BPyBmHNJQqRt0/BrsREnTkzg/ddiG/nm/3mHQpor1jnXLQqV1JqE5zDK+OI1KptbUGEethdTMmIaI9SCzSc5EMJVMjohy7xjD55EwU/h9nxpzuA80jxAAAAABJRU5ErkJggg==',ARRAY[0.23651084303855896, -0.06072381138801575, -0.09976153075695038, 0.07470647990703583, -0.15835444629192352, -0.21511252224445343, -0.3690471351146698, -1.135039210319519, -0.5108910202980042, -0.31614992022514343, -0.09622760117053986, -0.011659342795610428, -0.15897372364997864, 0.07606512308120728]::vector(512));
Okay. Let's flesh out the Express App.
app.post("/", upload.single('file'), async (req, res) => { let sentence = "; if (!req.file) { sentence = req.body.sentence } else { const file = req.file; sentence = file.buffer.toString('base64'); } let matches = []; if (sentence.length > 0) { // Let's generate the appropriate embeddings... const response = await post( "http://image_vector_search_example_embeddings:7474/api/generate-embedding", { sentence: sentence, }, ); const embedding = response.data.embedding; const threshold = 0.1; const limit = 10; const results = await db.sequelize.query( `SELECT id, mimetype, image, embedding, 1 - (embedding <=> ARRAY[${embedding.join(", ")}]::vector(512)) AS similarity FROM images WHERE (1 - (embedding <=> ARRAY[${embedding.join(", ")}]::vector(512))) > ${threshold} ORDER BY similarity DESC LIMIT ${limit}`, ); matches = results[0]; } res.render("template", { locals: { sentence: req.file ? "" : sentence, matches }, partials: { partial: "/index", }, }); })
Okay. So this is really the brains of the App, pulling everything together. If it's an image that has been uploaded, it will be turned into a base64 encoded string and we'll get the embeddings for it, And if it's a text prompt, we'll ge the embeddings for it (but skipping the base64 encoding step. Either way, it's a string being sent to the Embeddings API which is multi-modal so it will work with both. And then the code searching for similar images based on the embeddings is practically a copy / paste from our previous tutorial.
And here's the markup inside the index view.
<main> <div class="column controls"> <div class="column"> <form method="post" action="/"> <label for="sentence">Search Text</label> <input type="text" value="${sentence}" name="sentence" id="sentence" /> <button>Search</button> </form> </div> <div class="column"> <form enctype="multipart/form-data" method="post" action="/"> <label for="sentence">Search Image</label> <input type="file" value="" name="file" id="file" /> <button>Search</button> </form> </div> </div> <div class="row matches"> ${matches.map((match) => (` <div class="column match"> <img alt="image of person" src="data:${match.mimetype};base64,${match.image}" /> <p>${match.similarity}</p> </div> `)).join('')} </div> </main>
It isn't the prettiest UI / UX but you can search by text or by uploading a similar image. This could be a powerful feature for the right set of requirements.