Building an AI Application in 10 Minutes: A Step-by-Step Guide

Since the appearance of ChatGPT and Google Gemini, we have seen many apps being launched leveraging these models. The technique is called arbitrage, a method that takes one of the multiple tasks AI can do, enhances it with efficient prompts, and then provides it to customers as a feature.

This article will be a step-by-step guide on how to integrate the Google Gemini model into our Node.js application. We will see how to use both Gemini-pro which handles generative text, and Gemini-pro-vision which handles images.

We will start first by setting up the project, and then use it on an example of our own.

Requirements

Node.js version 18+
Google AI Platform account to get our API Key

Getting Started

First, create a new Node.js project inside an empty folder:

mkdir gemini-node  
cd gemini-node

npm init -y

Now, we will install the required packages for environment file and the Gemini SDK:

npm install dotenv @google/generative-ai

API Key Setup

On your Google AI Platform, click on the Get API Key button to get the API Key then save it in your environment file (.env) in the root directory:

API_KEY=YOUR_GEMINI_KEY

Create a new lib folder and create a config.js file within it, then for the next step, we will export our API Key:

const  dotenv  =  require("dotenv");
dotenv.config();

const  API_KEY  =  process.env.API_KEY;

module.exports  = { API_KEY };

Gemini SDK Setup

In this step, we will setup our models so we can use them in our code. Inside our lib folder, let's create 2 files, gemini.js for handling text and gemini-vision.js for handling images.

1- Google Gemini Pro

const { GoogleGenerativeAI } = require("@google/generative-ai");
const { API_KEY } = require("./config");

const googleAI = new GoogleGenerativeAI(API_KEY);
const geminiConfig = {
  temperature: 0.9,
  topP: 1,
  topK: 1,
  maxOutputTokens: 4096,
};

const geminiModel = googleAI.getGenerativeModel({
  model: "gemini-pro",
  geminiConfig,
});

const generateText = async (prompt) => {
  try {
    const result = await geminiModel.generateContent(prompt);
    const response = result.response;
    return response.text();
  } catch (error) {
    console.log("response error", error);
  }
};

module.exports = { generateText };

We will use the exported function to generate text from a prompt. We can use it to build a chat system, text translation and many more use cases.

2- Google Gemini Vision

const { GoogleGenerativeAI } = require("@google/generative-ai");
const { API_KEY } = require("./config");
const fs = require("fs");

const googleAI = new GoogleGenerativeAI(API_KEY);
const geminiConfig = {
  temperature: 0.4,
  topP: 1,
  topK: 32,
  maxOutputTokens: 4096,
};

const geminiModel = googleAI.getGenerativeModel({
  model: "gemini-pro-vision",
  geminiConfig,
});

const interactWithImage = async (filePath) => {
  try {
    const imageFile = fs.readFileSync(filePath);
    const imageBase64 = imageFile.toString("base64");

    const promptConfig = [
      { text: "Generate a caption from this image" },
      {
        inlineData: {
          mimeType: "image/jpeg",
          data: imageBase64,
        },
      },
    ];

    const result = await geminiModel.generateContent({
      contents: [{ role: "user", parts: promptConfig }],
    });

    return result.response.text();
  } catch (error) {
    console.log("response error", error);
  }
};

module.exports = { interactWithImage };

We will use the exported function to generate text from an image.

Building the app

Create a new index.js file and paste this code:

const main = async () => {
  console.log("Hello world!");
};

main();

Now, we can import our exported functions and use them to generate text, either from a question or an image prompt.

The function takes a file path as an argument, so you will need to add an image file to your project folder to test it.

Our code will convert into:

const { generateText } = require("./lib/gemini");
const { interactWithImage } = require("./lib/gemini-vision");
const path = require("path");

const main = async () => {
  // Text Generation
  let textFromPrompt = await generateText(
    "tell me about bootcamps in a sentence"
  );
  console.log(textFromPrompt);

  // Caption Generation
  const directoryName = path.join(__dirname, "fish.jpg");
  let captionFromImage = await interactWithImage(directoryName);
  console.log(captionFromImage);
};

main();

What's next

Now, you can use ExpressJS to build an API instead of a script. You can also research about best practices on writing prompts for your use cases.

These are a few examples on apps you can build:

SEO Meta data generator using Unsplash API
Resume reviewer by adding an extra script that gets content from PDF
Instagram bio writer from an image

Are you interested in building better apps? Join our Web Development bootcamp and learn how to build a fully functional product!

Your Future in Python, JavaScript, and SQL Starts Here: Code Labs Academy, the Best Online Bootcamp.