As we shipped our hand-rolled anti-spam CAPTCHA a few days ago, let's revisit it to see if it's actually any good at disguising the text.
This is what it generates at the moment.
Okay. As I have llama3.2-vision::11b running locally, let's give it our CAPTCHA and find out what it can infer.
ollama run llama3.2-vision:11b >>> Send a message (/? for help)
With Ollama on the command line, I just need to pass it the path to the file:
>>> captcha.png Added image 'captcha.png' The information in the question is: * A mathematical expression is given: 300 + 1 * The task is to evaluate the result of this expression. ## Step 1 First, we need to solve the addition problem. ## Step 2 Adding 1 to 300 gives us 301. The final answer is: $oxed{301}$
Oops! It solved it immediately. And without any prompting!
Okay..let's add some noise to the CAPTCHA. Say, a random distribution of dots.
const numDots = 100; for (let i = 0; i < numDots; i++) { // Generate random coordinates const x = Math.random() * 200; const y = Math.random() * 100; // Set dot color to black ctx.fillStyle = colours.color; // Draw a circle (dot) at the random coordinates ctx.beginPath(); ctx.arc(x, y, 1, 0, 2 * Math.PI); // Radius of 1 pixels ctx.fill(); }
And let's take a peek.
Let's see how we get on with this CAPTCHA!
>>> captcha.png Added image ''captcha.png'' The information in the question is: * A mathematical expression is given: 700 + 5 * The task is to evaluate the result of this expression. ## Step 1 First, we need to solve the addition problem. ## Step 2 Adding 5 to 700 gives us 705. The final answer is: $oxed{705}$
Okay, okay...why don't we automate generating a CAPTCHA and then piping it to a model to see if it can solve it each time? And perhaps create a loop so that if it can work out the correct answer, we keep adding more and more random dots until it is unable to solve it?
Let's break it down into a few steps:
We'll also need to revisit the captcha.js so that we can set the number of dots when we call it.
{ "dots": 100 }
Let's revisit the original code to make it more tweakable..
captcha.js// Adding weights parameter... const generateCaptcha = (weights) => { // updating loop... const { dots: numDots } = weights; for (let i = 0; i < numDots; i++) { // Generate random coordinates const x = Math.random() * 200; const y = Math.random() * 100; // Set dot color to black ctx.fillStyle = colours.color; // Draw a circle (dot) at the random coordinates ctx.beginPath(); ctx.arc(x, y, 1, 0, 2 * Math.PI); // Radius of 1 pixels ctx.fill(); } // ...
The "weights" object is the tweakable bit.
Okay, it's one thing manually calling Ollama from the command line, but if I want to do it programmatically I should use the RESTful API exposed locally.
I needed to remind myself what that looks like using curl before I start calling it in code.
curl -X POST http://localhost:11434/api/generate -H "Content-Type: application/json" -d '{ "stream": false, "model": "llama3.2-vision:11b", "prompt": "What is the solution to the equation hidden inside the image", "images": [""] }'
Okay, things are starting to take shape...
Let's start to pull it altogether now and let's create captcha-runner.js.
Below is the finished version:
captcha-runner.jsconst axios = require("axios"); const { generateCaptcha } = require("../library/captcha"); function queryOllama( image, prompt = "Please transcribe the simple mathematical problem expressed in the captcha. Do not include anything else.", model = "llama3.2-vision:11b", context = "You are a helpful bot trying to test the effectiveness of our captcha images. Please provide a transcription of the text hidden in the image. No summary or preamble required.", ) { return axios .post("http://host.docker.internal:11434/api/generate", { model: model, stream: false, prompt: context + prompt, images: [image], }) .then((response) => response.data.response) .catch((error) => { console.error("API call failed:", error.message); process.exit(1); }); } let weights = { dots: 100 }; const generateCaptchaAndTest = async (weights, counter) => { return new Promise((resolve, reject) => { const captcha = generateCaptcha(weights); await createPngFromBase64(captcha.image, `captcha-${counter}.png`); queryOllama(captcha.image.replace("data:image/png;base64,", "")) .then((response) => { console.log(response); console.log(captcha.equation); resolve({ response, equation: captcha.equation, weights, unsolved: response !== captcha.equation, }); }) .catch((e) => { reject(e); }); }); }; async function keepTrying() { let trying = true; let counter = 0; while (trying) { try { counter += 1; const result = await generateCaptchaAndTest(weights, counter); console.log(`Weights: ${JSON.stringify(weights)}`); console.log(`Unsolved: ${result.unsolved}`); if (result.unsolved) { console.log("Success!!!"); trying = false; break; } else { weights.dots += 50; console.log(`Incrementing weights: ${JSON.stringify(weights)}`); // Add a delay to avoid overwhelming the API await new Promise((resolve) => setTimeout(resolve, 1000)); } } catch (error) { console.error("An error occurred:", error); weights.dots += 50; await new Promise((resolve) => setTimeout(resolve, 1000)); } } } keepTrying();
After making changes to the captcha.js code so that it would accept weights when called, we were able to generate and test and then increment the weights until the model failed.
Example output from the command line.node captcha-runner.js Image saved to captcha-1.png Model Transcription: 400 - 3 = ? Captcha: 400 - 3 = ? Weights: {"dots":100} Solved: false Incrementing weights: {"dots":150} Image saved to captcha-2.png Model Transcription: 50 + 3 = ? Captcha: 50 + 3 = ? Weights: {"dots":150} Solved: false Incrementing weights: {"dots":200} Image saved to captcha-3.png Model Transcription: 30 + 10 = ? Captcha: 300 + 10 = ? Weights: {"dots":200} Solved: true Successfully solved!
Wahoo!!! We've got something that the model failed to correctly solve!
Let's take a peek!
To ensure it was bullet-proof, I repeated the process a number of times. Sometimes the LLM performed better and so the starting point for the number of dots increased each time. Eventually, however, it at settled around 800 dots.