Since the appearance of ChatGPT and Google Gemini, we have seen many apps being launched leveraging these models. The technique is called arbitrage, a method that takes one of the multiple tasks AI can do, enhances it with efficient prompts, and then provides it to customers as a feature.
This article will be a step-by-step guide on how to integrate the Google Gemini model into our Node.js application. We will see how to use both Gemini-pro which handles generative text, and Gemini-pro-vision which handles images.
We will start first by setting up the project, and then use it on an example of our own.
Requirements
- Node.js version 18+
- Google AI Platform account to get our API Key
Getting Started
First, create a new Node.js project inside an empty folder:
mkdir gemini-node
cd gemini-node
npm init -y
Now, we will install the required packages for environment file and the Gemini SDK:
npm install dotenv @google/generative-ai
API Key Setup
On your Google AI Platform, click on the Get API Key button to get the API Key then save it in your environment file (.env) in the root directory:
API_KEY=YOUR_GEMINI_KEY
Create a new lib folder and create a config.js file within it, then for the next step, we will export our API Key:
const dotenv = require("dotenv");
dotenv.config();
const API_KEY = process.env.API_KEY;
module.exports = { API_KEY };
Gemini SDK Setup
In this step, we will setup our models so we can use them in our code. Inside our lib folder, let's create 2 files, gemini.js for handling text and gemini-vision.js for handling images.
1- Google Gemini Pro
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { API_KEY } = require("./config");
const googleAI = new GoogleGenerativeAI(API_KEY);
const geminiConfig = {
temperature: 0.9,
topP: 1,
topK: 1,
maxOutputTokens: 4096,
};
const geminiModel = googleAI.getGenerativeModel({
model: "gemini-pro",
geminiConfig,
});
const generateText = async (prompt) => {
try {
const result = await geminiModel.generateContent(prompt);
const response = result.response;
return response.text();
} catch (error) {
console.log("response error", error);
}
};
module.exports = { generateText };
We will use the exported function to generate text from a prompt. We can use it to build a chat system, text translation and many more use cases.
2- Google Gemini Vision
const { GoogleGenerativeAI } = require("@google/generative-ai");
const { API_KEY } = require("./config");
const fs = require("fs");
const googleAI = new GoogleGenerativeAI(API_KEY);
const geminiConfig = {
temperature: 0.4,
topP: 1,
topK: 32,
maxOutputTokens: 4096,
};
const geminiModel = googleAI.getGenerativeModel({
model: "gemini-pro-vision",
geminiConfig,
});
const interactWithImage = async (filePath) => {
try {
const imageFile = fs.readFileSync(filePath);
const imageBase64 = imageFile.toString("base64");
const promptConfig = [
{ text: "Generate a caption from this image" },
{
inlineData: {
mimeType: "image/jpeg",
data: imageBase64,
},
},
];
const result = await geminiModel.generateContent({
contents: [{ role: "user", parts: promptConfig }],
});
return result.response.text();
} catch (error) {
console.log("response error", error);
}
};
module.exports = { interactWithImage };
We will use the exported function to generate text from an image.
Building the app
Create a new index.js file and paste this code:
const main = async () => {
console.log("Hello world!");
};
main();
Now, we can import our exported functions and use them to generate text, either from a question or an image prompt.
The function takes a file path as an argument, so you will need to add an image file to your project folder to test it.
Our code will convert into:
const { generateText } = require("./lib/gemini");
const { interactWithImage } = require("./lib/gemini-vision");
const path = require("path");
const main = async () => {
// Text Generation
let textFromPrompt = await generateText(
"tell me about bootcamps in a sentence"
);
console.log(textFromPrompt);
// Caption Generation
const directoryName = path.join(__dirname, "fish.jpg");
let captionFromImage = await interactWithImage(directoryName);
console.log(captionFromImage);
};
main();
What's next
Now, you can use ExpressJS to build an API instead of a script. You can also research about best practices on writing prompts for your use cases.
These are a few examples on apps you can build:
- SEO Meta data generator using Unsplash API
- Resume reviewer by adding an extra script that gets content from PDF
- Instagram bio writer from an image
Are you interested in building better apps? Join our Web Development bootcamp and learn how to build a fully functional product!
Your Future in Python, JavaScript, and SQL Starts Here: Code Labs Academy, the Best Online Bootcamp.