OK – So, Here We Go Again!

My past post had some flaws and I never got around to fixing them. My bad there.

This post is going to step through building Pentaho BI Server 9.5 on a sparkly new Fedora 37 Server installation and then configuring it so that it can use an existing Postgres database installation that exists on another machine on the network – somewhere.

Fedora Server Setup

I’m going to assume that you’ve got Fedora 37 Server up and running and that you can login as a superuser. I suggest that you install the ‘Tools for Headless Management’ in the installer as this gives you a nice web-interface to connect via.

Watch out for how Fedora Server partitions your drives though – it’ll sometimes leave large chunks of disk unallocated and a stupidly small /home – or no /home at all! I usually run the Pentaho Server inside of a VM and 100GB of disk space is ample.

If you’re installing under VMWare or some other hypervisor then also install the hypervisor tools during the install process.

Postgres Setup

I’m going to assume that you’ve got Postgres installed somewhere; that it’s up and running and you’ve got credentials to create users and databases – for example the root Postgres account ‘postgres’. You’ll need to modify ‘postgres.conf’ and ‘pg_hba.config’ to allow ‘remote logins’. That is – the ability for a user to log in to Postgres from another machine. In this case it will be a user representing the database requirements of the Pentaho server. Remember that you will probably need to open port 5432 on the Postgres server to allow incoming connections as well.

Dealing with the Pentaho-bundled SQL scripts is a pain in the ass because you’ll likely want to modify them. I’m going to use pgAdmin to execute the scripts as it gives you better visibility to what’s going on, but you can use whatever client you like. Just make sure you can login as a powerful user before you go much further.

User Accounts

This walk-through is going to use the following accounts:

  • A user called ‘sysadmin’ exists as a Linux User and has super-user (sudo) capability. We’ll use this user-id to do the builds, create the pentaho Linux user account, copy files around etc.
  • A non-permissioned user called ‘pentaho’ as a Linux user. This is the account that will run and own the Pentaho server
  • A user called ‘svc_pentaho’ will be a database account used to log in to the Pentaho databases on the Postgres box

We’ll create these accounts as we go along as well as all the steps of building the projects from source-code.

Getting Started With Fedora

Login with your super-user account.

Pentaho 9.5 is compliant with Java 11, so we need to install that, along with Maven and Git, so let’s get those installed. Also, I’m not a masochist so I’m installing nano for the editing of files:

Install the following:
[sysadmin@localhost ~]$ sudo dnf install maven git nano java-11-openjdk maven-openjdk11 --allowerasing

Create the Pentaho user:

[sysadmin@localhost ~]$ sudo useradd pentaho

Then set the password:

[sysadmin@localhost ~]$ sudo passwd pentaho
Changing password for user pentaho.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Setting Up Java 11

Now we need to set the dominant Java to be Java 11, because right now it’s Java 17. We can use ‘alternatives’ for this.

[sysadmin@pentaho ~]$ sudo alternatives --config java
There are 2 programs which provide 'java'.
Selection Command

——————————
*+ 1 java-17-openjdk.x86_64 (/usr/lib/jvm/java-17-openjdk-17.0.5.0.8-1.fc37.x86_64/bin/java)
2 java-11-openjdk.x86_64 (/usr/lib/jvm/java-11-openjdk-11.0.17.0.8-1.fc37.x86_64/bin/java)
Enter to keep the current selection[+], or type selection number: 2

We can check that this has been set properly by calling Java:

[sysadmin@tigger ~]$ java -version
openjdk version "11.0.17" 2022-10-18
OpenJDK Runtime Environment (Red_Hat-11.0.17.0.8-2.fc36) (build 11.0.17+8)
OpenJDK 64-Bit Server VM (Red_Hat-11.0.17.0.8-2.fc36) (build 11.0.17+8, mixed mode, sharing)

Cool. So far, so good.

Test and Setup Maven

If we call Maven two things happen:

  • We confirm that Maven is going to use Java 11 and
  • It will make us a hidden .m2 folder in our home-folder that we’ll need in a later step

[sysadmin@localhost ~]$ mvn -version
Apache Maven 3.8.4 (Red Hat 3.8.4-3)
Maven home: /usr/share/maven
Java version: 11.0.17, vendor: Red Hat, Inc., runtime: /usr/lib/jvm/java-11-openjdk-11.0.17.0.8-2.fc36.x86_64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "6.0.9-200.fc36.x86_64", arch: "amd64", family: "unix"

Splendid.

Cloning The Pentaho Server Git Repository

Now clone the Git repository for the Pentaho BI server:

[sysadmin@localhost ~]$ git clone https://github.com/pentaho/pentaho-platform
Cloning into 'pentaho-platform'…
remote: Enumerating objects: 182134, done.
remote: Counting objects: 100% (679/679), done.
remote: Compressing objects: 100% (643/643), done.
remote: Total 182134 (delta 420), reused 260 (delta 30), pack-reused 181455
Receiving objects: 100% (182134/182134), 168.87 MiB | 16.82 MiB/s, done.
Resolving deltas: 100% (106085/106085), done.
Updating files: 100% (5800/5800), done.

This will have made a pentaho-server folder in your sysadmin home folder.

If you are going to set up a cluster, or you want to clone and build Kettle, then repeat the process with the Kettle repo. This isn’t necessary if you’re just wanting to build/install Pentaho Server.

[sysadmin@localhost ~]$ git clone https://github.com/pentaho/pentaho-kettle

Cloning into 'pentaho-kettle'…
remote: Enumerating objects: 797903, done.
remote: Counting objects: 100% (21275/21275), done.
remote: Compressing objects: 100% (11346/11346), done.
remote: Total 797903 (delta 10553), reused 20441 (delta 9797), pack-reused 776628
Receiving objects: 100% (797903/797903), 586.14 MiB | 16.60 MiB/s, done.
Resolving deltas: 100% (436053/436053), done.
Updating files: 100% (10376/10376), done.

Now we need to grab the Git config file and put it in to the .m2 folder that Maven made for us. Take care copy/pasting this because of the line-splits. The single command is in blue:

[sysadmin@pentaho ~]$ wget https://raw.githubusercontent.com/pentaho/maven-parent-poms/master/maven-support-files/settings.xml --directory-prefix=.m2
--2022-12-24 15:54:01-- https://raw.githubusercontent.com/pentaho/maven-parent-poms/master/maven-support-files/settings.xml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)… 185.199.109.133, 185.199.111.133, 185.199.108.133, …
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 2684 (2.6K) [text/plain]
Saving to: ‘.m2/settings.xml’
settings.xml 100%[===========================================>] 2.62K --.-KB/s in 0s
2022-12-24 15:54:02 (5.61 MB/s) - ‘.m2/settings.xml’ saved [2684/2684]
Saving to: ‘.m2/settings.xml

Excellent.

We can now start the build

Building With Maven

Note: Please see my post about log4js. I’m updating this on Christmas Eve and the problem still exists, so you’ll need to edit the pom.xml file as per my post.

To start the build, CD in to the project:

[sysadmin@localhost ~]$ cd pentaho-platform

Now call Maven:
[sysadmin@localhost ~]$ mvn clean install -DskipTests

…and now we wait…

If you get disk full / out of disk space issues it’s probably because the Fedora Server didn’t create a large enough /home partition for you. In my experience it’s just easier to rerun the installer and start over.

If you get errors relating to ‘syslog4js’ being unavailable, read my post.

If you get ‘bad gateway errors’ restart the build. Something has gone wrong downloading the code from Hitachi, most likely their server was busy and got bored of talking to you. This most often happens on the larger zip files such as the ‘pentaho-big-data-plugin’ because it’s a 500MB+ download. You’ll get a message like this:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-dependency-plugin:3.1.0:unpack (unpack-plugins) on project pentaho-solutions: Unable to find/resolve artifact.: Could not transfer artifact pentaho:pentaho-big-data-plugin:zip:9.5.0.0-20221224.204358-124 from/to pentaho-public (https://repo.orl.eng.hitachivantara.com/artifactory/pnt-mvn/): transfer failed for https://repo.orl.eng.hitachivantara.com/artifactory/pnt-mvn/pentaho/pentaho-big-data-plugin/9.5.0.0-SNAPSHOT/pentaho-big-data-plugin-9.5.0.0-20221224.204358-124.zip, status: 502 Bad Gateway -> [Help 1]

You can restart the build from where it failed by making note of the line in the error message where the module is specified:

[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <args> -rf :pentaho-solutions

So, in this example can restart with:

mvn clean install -DskipTests -rf :pentaho-solutions

After An Eternity of Waiting…

If all goes well you should have a build report that looks like this:

[INFO] Installing /home/sysadmin/pentaho-platform/assemblies/pentaho-server/target/pentaho-server-ce-9.5.0.0-SNAPSHOT.zip to /home/sysadmin/.m2/repository/pentaho/pentaho-server-ce/9.5.0.0-SNAPSHOT/pentaho-server-ce-9.5.0.0-SNAPSHOT.zip
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary for pentaho-solutions 9.5.0.0-SNAPSHOT:
[INFO] 
[INFO] pentaho-solutions .................................. SUCCESS [05:52 min]
[INFO] pentaho-server-manual-ce ........................... SUCCESS [ 55.194 s]
[INFO] pentaho-server-ce .................................. SUCCESS [01:41 min]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  08:31 min
[INFO] Finished at: 2022-12-24T17:00:53-07:00
[INFO] ------------------------------------------------------------------------

Hooray!

The build is in:

/home/sysadmin/.m2/repository/pentaho/pentaho-server-ce/9.5.0.0-SNAPSHOT/pentaho-server-ce-9.5.0.0-SNAPSHOT.zip

Copy the file to the home folder of ‘pentaho’. You’ll need to sudo the copy.

[sysadmin@pentaho ~]$ sudo cp ~/.m2/repository/pentaho/pentaho-server-ce/9.5.0.0-SNAPSHOT/pentaho-server-ce-9.5.0.0-SNAPSHOT.zip /home/pentaho

Now let’s be ‘pentaho’:

Being John Malkovich Pentaho

Switch user to ‘pentaho’:

[sysadmin@pentaho ~]$ sudo su - pentaho
[pentaho@pentaho ~]$ 

Unzip the zip file:

[pentaho@pentaho ~]$ unzip pentaho-server-ce-9.5.0.0-SNAPSHOT.zip

This will now give us a ‘pentaho-server’ folder in which our build now exists.

Configuring Pentaho Server

I’m now going to make reference to the headings and sub-headings of the instructions for Pentaho 9.4 as found here. I’m not going to retype it word for word. This documentation is a shambles and is clearly designed for working with the ‘Enterprise’ (IE – paid for) version of the Pentaho Server. It contains a lot of stuff we don’t have access to, such as the ‘Pentaho Datamart’. I’ll cite the actual steps that you need to do.

We will be following the archive installation instructions which means we’ll be using the Tomcat web-server as supplied in the build; rather than our own already existing web-server.

Initialize PostgreSQL Pentaho Repository database

We need to create three databases and the svc_pentaho Postgres database user account.

The build contains three SQL script files buried in /home/pentaho/pentaho-server/data/postgresql:

-rw-r--r--. 1 pentaho pentaho 363 Dec 24 15:52 create_jcr_postgresql.sql
-rw-r--r--. 1 pentaho pentaho 5647 Dec 24 15:52 create_quartz_postgresql.sql
-rw-r--r--. 1 pentaho pentaho 356 Dec 24 15:52 create_repository_postgresql.sql

Dealing with the Postgres command-line interface sucks, so connect to your Postgres box using a super-user account (eg: ‘postgres’) with a tool like pgAdmin, or dBeaver; and copy/paste what you need from the scripts as presented below. The original scripts are presented along with modified versions that use a single user account and a password from DinoPass.

create_jcr_postgresql.sql

Original:

drop database if exists jackrabbit;
drop user if exists jcr_user;
CREATE USER jcr_user PASSWORD 'password';
CREATE DATABASE jackrabbit WITH OWNER = jcr_user ENCODING = 'UTF8' TABLESPACE = pg_default;
GRANT ALL PRIVILEGES ON DATABASE jackrabbit to jcr_user;

Modified:

CREATE USER svc_pentaho PASSWORD 'smartkite58';
CREATE DATABASE jackrabbit95 WITH OWNER = svc_pentaho ENCODING = 'UTF8' TABLESPACE = pg_default;
GRANT ALL PRIVILEGES ON DATABASE jackrabbit to svc_pentaho;

create_quartz_postgresql.sql

Original:

--Begin--
-- note: this script assumes pg_hba.conf is configured correctly
-- \connect postgres postgres
drop database if exists quartz;
drop user if exists pentaho_user;
CREATE USER pentaho_user PASSWORD 'password';
CREATE DATABASE quartz WITH OWNER = pentaho_user ENCODING = 'UTF8' TABLESPACE = pg_default;
GRANT ALL ON DATABASE quartz to pentaho_user;
--End--

--Begin Connect--
\connect quartz pentaho_user
begin;

drop table if exists qrtz5_job_listeners;
drop table if exists qrtz5_trigger_listeners;
drop table if exists qrtz5_fired_triggers;
drop table if exists qrtz5_paused_trigger_grps;
drop table if exists qrtz5_scheduler_state;
drop table if exists qrtz5_locks;
drop table if exists qrtz5_simple_triggers;
drop table if exists qrtz5_cron_triggers;
drop table if exists qrtz5_blob_triggers;
drop table if exists qrtz5_triggers;
drop table if exists qrtz5_job_details;
drop table if exists qrtz5_calendars;

CREATE TABLE qrtz5_job_details
(
JOB_NAME VARCHAR(200) NOT NULL,
JOB_GROUP VARCHAR(200) NOT NULL,
DESCRIPTION VARCHAR(250) NULL,
JOB_CLASS_NAME VARCHAR(250) NOT NULL,
IS_DURABLE BOOL NOT NULL,
IS_VOLATILE BOOL NOT NULL,
IS_STATEFUL BOOL NOT NULL,
REQUESTS_RECOVERY BOOL NOT NULL,
JOB_DATA BYTEA NULL,
PRIMARY KEY (JOB_NAME,JOB_GROUP)
);

CREATE TABLE qrtz5_job_listeners
(
JOB_NAME VARCHAR(200) NOT NULL,
JOB_GROUP VARCHAR(200) NOT NULL,
JOB_LISTENER VARCHAR(200) NOT NULL,
PRIMARY KEY (JOB_NAME,JOB_GROUP,JOB_LISTENER),
FOREIGN KEY (JOB_NAME,JOB_GROUP)
REFERENCES qrtz5_JOB_DETAILS(JOB_NAME,JOB_GROUP)
);

CREATE TABLE qrtz5_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
JOB_NAME VARCHAR(200) NOT NULL,
JOB_GROUP VARCHAR(200) NOT NULL,
IS_VOLATILE BOOL NOT NULL,
DESCRIPTION VARCHAR(250) NULL,
NEXT_FIRE_TIME BIGINT NULL,
PREV_FIRE_TIME BIGINT NULL,
PRIORITY INTEGER NULL,
TRIGGER_STATE VARCHAR(16) NOT NULL,
TRIGGER_TYPE VARCHAR(8) NOT NULL,
START_TIME BIGINT NOT NULL,
END_TIME BIGINT NULL,
CALENDAR_NAME VARCHAR(200) NULL,
MISFIRE_INSTR SMALLINT NULL,
JOB_DATA BYTEA NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (JOB_NAME,JOB_GROUP)
REFERENCES qrtz5_JOB_DETAILS(JOB_NAME,JOB_GROUP)
);

CREATE TABLE qrtz5_simple_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
REPEAT_COUNT BIGINT NOT NULL,
REPEAT_INTERVAL BIGINT NOT NULL,
TIMES_TRIGGERED BIGINT NOT NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_cron_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
CRON_EXPRESSION VARCHAR(120) NOT NULL,
TIME_ZONE_ID VARCHAR(80),
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_blob_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
BLOB_DATA BYTEA NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_trigger_listeners
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
TRIGGER_LISTENER VARCHAR(200) NOT NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP,TRIGGER_LISTENER),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_calendars
(
CALENDAR_NAME VARCHAR(200) NOT NULL,
CALENDAR BYTEA NOT NULL,
PRIMARY KEY (CALENDAR_NAME)
);

CREATE TABLE qrtz5_paused_trigger_grps
(
TRIGGER_GROUP VARCHAR(200) NOT NULL,
PRIMARY KEY (TRIGGER_GROUP)
);

CREATE TABLE qrtz5_fired_triggers
(
ENTRY_ID VARCHAR(95) NOT NULL,
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
IS_VOLATILE BOOL NOT NULL,
INSTANCE_NAME VARCHAR(200) NOT NULL,
FIRED_TIME BIGINT NOT NULL,
PRIORITY INTEGER NOT NULL,
STATE VARCHAR(16) NOT NULL,
JOB_NAME VARCHAR(200) NULL,
JOB_GROUP VARCHAR(200) NULL,
IS_STATEFUL BOOL NULL,
REQUESTS_RECOVERY BOOL NULL,
PRIMARY KEY (ENTRY_ID)
);

CREATE TABLE qrtz5_scheduler_state
(
INSTANCE_NAME VARCHAR(200) NOT NULL,
LAST_CHECKIN_TIME BIGINT NOT NULL,
CHECKIN_INTERVAL BIGINT NOT NULL,
PRIMARY KEY (INSTANCE_NAME)
);

CREATE TABLE qrtz5_locks
(
LOCK_NAME VARCHAR(40) NOT NULL,
PRIMARY KEY (LOCK_NAME)
);

INSERT INTO qrtz5_locks values('TRIGGER_ACCESS');
INSERT INTO qrtz5_locks values('JOB_ACCESS');
INSERT INTO qrtz5_locks values('CALENDAR_ACCESS');
INSERT INTO qrtz5_locks values('STATE_ACCESS');
INSERT INTO qrtz5_locks values('MISFIRE_ACCESS');

ALTER TABLE qrtz5_job_listeners OWNER TO pentaho_user;
ALTER TABLE qrtz5_trigger_listeners OWNER TO pentaho_user;
ALTER TABLE qrtz5_fired_triggers OWNER TO pentaho_user;
ALTER TABLE qrtz5_paused_trigger_grps OWNER TO pentaho_user;
ALTER TABLE qrtz5_scheduler_state OWNER TO pentaho_user;
ALTER TABLE qrtz5_locks OWNER TO pentaho_user;
ALTER TABLE qrtz5_simple_triggers OWNER TO pentaho_user;
ALTER TABLE qrtz5_cron_triggers OWNER TO pentaho_user;
ALTER TABLE qrtz5_blob_triggers OWNER TO pentaho_user;
ALTER TABLE qrtz5_triggers OWNER TO pentaho_user;
ALTER TABLE qrtz5_job_details OWNER TO pentaho_user;
ALTER TABLE qrtz5_calendars OWNER TO pentaho_user;

commit;
--End Connect--

Modified:

CREATE DATABASE quartz95 WITH OWNER = svc_pentaho ENCODING = 'UTF8' TABLESPACE = pg_default;
GRANT ALL ON DATABASE quartz95 to svc_pentaho;
--- At this point connect to quartz in your client and execute the following:

CREATE TABLE qrtz5_job_details
(
JOB_NAME VARCHAR(200) NOT NULL,
JOB_GROUP VARCHAR(200) NOT NULL,
DESCRIPTION VARCHAR(250) NULL,
JOB_CLASS_NAME VARCHAR(250) NOT NULL,
IS_DURABLE BOOL NOT NULL,
IS_VOLATILE BOOL NOT NULL,
IS_STATEFUL BOOL NOT NULL,
REQUESTS_RECOVERY BOOL NOT NULL,
JOB_DATA BYTEA NULL,
PRIMARY KEY (JOB_NAME,JOB_GROUP)
);

CREATE TABLE qrtz5_job_listeners
(
JOB_NAME VARCHAR(200) NOT NULL,
JOB_GROUP VARCHAR(200) NOT NULL,
JOB_LISTENER VARCHAR(200) NOT NULL,
PRIMARY KEY (JOB_NAME,JOB_GROUP,JOB_LISTENER),
FOREIGN KEY (JOB_NAME,JOB_GROUP)
REFERENCES qrtz5_JOB_DETAILS(JOB_NAME,JOB_GROUP)
);

CREATE TABLE qrtz5_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
JOB_NAME VARCHAR(200) NOT NULL,
JOB_GROUP VARCHAR(200) NOT NULL,
IS_VOLATILE BOOL NOT NULL,
DESCRIPTION VARCHAR(250) NULL,
NEXT_FIRE_TIME BIGINT NULL,
PREV_FIRE_TIME BIGINT NULL,
PRIORITY INTEGER NULL,
TRIGGER_STATE VARCHAR(16) NOT NULL,
TRIGGER_TYPE VARCHAR(8) NOT NULL,
START_TIME BIGINT NOT NULL,
END_TIME BIGINT NULL,
CALENDAR_NAME VARCHAR(200) NULL,
MISFIRE_INSTR SMALLINT NULL,
JOB_DATA BYTEA NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (JOB_NAME,JOB_GROUP)
REFERENCES qrtz5_JOB_DETAILS(JOB_NAME,JOB_GROUP)
);

CREATE TABLE qrtz5_simple_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
REPEAT_COUNT BIGINT NOT NULL,
REPEAT_INTERVAL BIGINT NOT NULL,
TIMES_TRIGGERED BIGINT NOT NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_cron_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
CRON_EXPRESSION VARCHAR(120) NOT NULL,
TIME_ZONE_ID VARCHAR(80),
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_blob_triggers
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
BLOB_DATA BYTEA NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_trigger_listeners
(
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
TRIGGER_LISTENER VARCHAR(200) NOT NULL,
PRIMARY KEY (TRIGGER_NAME,TRIGGER_GROUP,TRIGGER_LISTENER),
FOREIGN KEY (TRIGGER_NAME,TRIGGER_GROUP)
REFERENCES qrtz5_TRIGGERS(TRIGGER_NAME,TRIGGER_GROUP)
);

CREATE TABLE qrtz5_calendars
(
CALENDAR_NAME VARCHAR(200) NOT NULL,
CALENDAR BYTEA NOT NULL,
PRIMARY KEY (CALENDAR_NAME)
);

CREATE TABLE qrtz5_paused_trigger_grps
(
TRIGGER_GROUP VARCHAR(200) NOT NULL,
PRIMARY KEY (TRIGGER_GROUP)
);

CREATE TABLE qrtz5_fired_triggers
(
ENTRY_ID VARCHAR(95) NOT NULL,
TRIGGER_NAME VARCHAR(200) NOT NULL,
TRIGGER_GROUP VARCHAR(200) NOT NULL,
IS_VOLATILE BOOL NOT NULL,
INSTANCE_NAME VARCHAR(200) NOT NULL,
FIRED_TIME BIGINT NOT NULL,
PRIORITY INTEGER NOT NULL,
STATE VARCHAR(16) NOT NULL,
JOB_NAME VARCHAR(200) NULL,
JOB_GROUP VARCHAR(200) NULL,
IS_STATEFUL BOOL NULL,
REQUESTS_RECOVERY BOOL NULL,
PRIMARY KEY (ENTRY_ID)
);

CREATE TABLE qrtz5_scheduler_state
(
INSTANCE_NAME VARCHAR(200) NOT NULL,
LAST_CHECKIN_TIME BIGINT NOT NULL,
CHECKIN_INTERVAL BIGINT NOT NULL,
PRIMARY KEY (INSTANCE_NAME)
);

CREATE TABLE qrtz5_locks
(
LOCK_NAME VARCHAR(40) NOT NULL,
PRIMARY KEY (LOCK_NAME)
);

INSERT INTO qrtz5_locks values('TRIGGER_ACCESS');
INSERT INTO qrtz5_locks values('JOB_ACCESS');
INSERT INTO qrtz5_locks values('CALENDAR_ACCESS');
INSERT INTO qrtz5_locks values('STATE_ACCESS');
INSERT INTO qrtz5_locks values('MISFIRE_ACCESS');

ALTER TABLE qrtz5_job_listeners OWNER TO svc_pentaho;
ALTER TABLE qrtz5_trigger_listeners OWNER TO svc_pentaho;
ALTER TABLE qrtz5_fired_triggers OWNER TO svc_pentaho;
ALTER TABLE qrtz5_paused_trigger_grps OWNER TO svc_pentaho;
ALTER TABLE qrtz5_scheduler_state OWNER TO svc_pentaho;
ALTER TABLE qrtz5_locks OWNER TO svc_pentaho;
ALTER TABLE qrtz5_simple_triggers OWNER TO svc_pentaho;
ALTER TABLE qrtz5_cron_triggers OWNER TO svc_pentaho;
ALTER TABLE qrtz5_blob_triggers OWNER TO svc_pentaho;
ALTER TABLE qrtz5_triggers OWNER TO svc_pentaho;
ALTER TABLE qrtz5_job_details OWNER TO svc_pentaho;
ALTER TABLE qrtz5_calendars OWNER TO svc_pentaho;

create_repository_postgresql.sql

Original:

drop database if exists hibernate;
drop user if exists hibuser;
CREATE USER hibuser PASSWORD ‘password’;
CREATE DATABASE hibernate WITH OWNER = hibuser ENCODING = ‘UTF8’ TABLESPACE = pg_default;
GRANT ALL PRIVILEGES ON DATABASE hibernate to hibuser;

Modified:

CREATE DATABASE hibernate95 WITH OWNER = svc_pentaho ENCODING = 'UTF8' TABLESPACE = pg_default;
GRANT ALL PRIVILEGES ON DATABASE hibernate95 to svc_pentaho;

So – to recap, all I’ve done with the three SQL files is:

  • Remove references to drop any existing databases/tables
  • Create databases jackrabbit95, hibernate95, quartz95
  • Create one user ‘svc_pentaho’ (instead of using three separate users)
  • Assign ownership of the databases and tables to ‘svc_pentaho’

It’s probably a good idea to check that you can log in as ‘svc_pentaho’ with the password you assigned.

Configure PostgreSQL Pentaho Repository Database

Set up Quartz on PostgreSQL
pentaho-server/pentaho-solutions/system/quartz/quartz.properties

Check that these two lines are as follows:

approx line 300: org.quartz.jobStore.driverDelegateClass = org.quartz.impl.jdbcjobstore.PostgreSQLDelegate
approx line 379: org.quartz.dataSource.myDS.jndiURL = Quartz
Set Hibernate settings for PostgreSQL
pentaho-server/pentaho-solutions/system/hibernate/hibernate-settings.xml

Replace the config-file tag line with:

<config-file>system/hibernate/postgresql.hibernate.cfg.xml</config-file>
Modify Jackrabbit repository information for PostgreSQL

Refer to the Pentaho documentation.

pentaho-server/pentaho-solutions/system/jackrabbit/repository.xml

This section is fiddly get right the first time, so make a backup of the above file. Chances are you’ll fuck this up at least once and its helpful to fall-back to the default file.

repository.xml is made up of multiple sections, and some of the sections contain configuration blocks that are valid for either MS-SQL Server, Oracle, MySQL, H2 and Postgres. For each section where multiple databases are listed make sure that only the Postgres block is enabled. Remove or comment out the blocks that don’t relate to Postgres. Read up on how to comment out lines in an XML file if you are unsure what you’re doing.

For example, the first section is as follows:

  <!--
  <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
    <param name="driver" value="javax.naming.InitialContext"/>
    <param name="url" value="java:comp/env/jdbc/jackrabbit"/>
    <param name="schema" value="mysql"/>
    <param name="schemaObjectPrefix" value="fs_repos_"/>
  </FileSystem>

  <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
    <param name="driver" value="javax.naming.InitialContext"/>
    <param name="url" value="java:comp/env/jdbc/jackrabbit"/>
    <param name="schemaObjectPrefix" value="fs_repos_"/>
    <param name="tablespace" value="pentaho_tablespace"/>
  </FileSystem>

  <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
    <param name="driver" value="javax.naming.InitialContext"/>
    <param name="url" value="java:comp/env/jdbc/jackrabbit"/>
    <param name="schema" value="postgresql"/>
    <param name="schemaObjectPrefix" value="fs_repos_"/>
  </FileSystem>

  <FileSystem class="org.apache.jackrabbit.core.fs.db.MSSqlFileSystem">
    <param name="driver" value="javax.naming.InitialContext"/>
    <param name="url" value="java:comp/env/jdbc/jackrabbit"/>
    <param name="schema" value="mssql"/>
    <param name="schemaObjectPrefix" value="fs_repos_"/>
  </FileSystem>
  -->

  <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
     <param name="path" value="${rep.home}/repository"/>
  </FileSystem>

Notice how all the blocks for ‘FileSystem’ are commented out except for one definition that defines the ‘FileSystem’ class as ‘fs.local.LocalFileSystem’. Recomment/edit the file so that the ‘postgresql’ block is enabled:

  <!--
  <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
    <param name="driver" value="javax.naming.InitialContext"/>
    <param name="url" value="java:comp/env/jdbc/jackrabbit"/>
    <param name="schema" value="mysql"/>
    <param name="schemaObjectPrefix" value="fs_repos_"/>
  </FileSystem>

  <FileSystem class="org.apache.jackrabbit.core.fs.db.OracleFileSystem">
    <param name="driver" value="javax.naming.InitialContext"/>
    <param name="url" value="java:comp/env/jdbc/jackrabbit"/>
    <param name="schemaObjectPrefix" value="fs_repos_"/>
    <param name="tablespace" value="pentaho_tablespace"/>
  </FileSystem>
-->
  <FileSystem class="org.apache.jackrabbit.core.fs.db.DbFileSystem">
    <param name="driver" value="javax.naming.InitialContext"/>
    <param name="url" value="java:comp/env/jdbc/jackrabbit"/>
    <param name="schema" value="postgresql"/>
    <param name="schemaObjectPrefix" value="fs_repos_"/>
  </FileSystem>
<!--
  <FileSystem class="org.apache.jackrabbit.core.fs.db.MSSqlFileSystem">
    <param name="driver" value="javax.naming.InitialContext"/>
    <param name="url" value="java:comp/env/jdbc/jackrabbit"/>
    <param name="schema" value="mssql"/>
    <param name="schemaObjectPrefix" value="fs_repos_"/>
  </FileSystem>

  <FileSystem class="org.apache.jackrabbit.core.fs.local.LocalFileSystem">
     <param name="path" value="${rep.home}/repository"/>
  </FileSystem>
  -->

Do this for all the sections of the file where configs exist for multiple different databases.

Replace the last node ( <Journal class=”org.apache.jackrabbit.core.journal.MemoryJournal”/>) with the ‘DatabaseJournal’ XML as shown on the Pentaho web-page.

Perform Tomcat-specific connection tasks

Download driver and apply to the Pentaho Server

Pentaho comes with a driver for Postgres, but there’s no harm in giving the Pentaho server a newer version if available.

From https://jdbc.postgresql.org/download/ download the latest JDBC driver and copy it in to the Tomcat /lib folder:

[pentaho@pentaho pentaho-server]$ wget https://jdbc.postgresql.org/download/postgresql-42.5.1.jar
--2022-12-24 22:14:30--  https://jdbc.postgresql.org/download/postgresql-42.5.1.jar
Resolving jdbc.postgresql.org (jdbc.postgresql.org)... 72.32.157.228, 2001:4800:3e1:1::228
Connecting to jdbc.postgresql.org (jdbc.postgresql.org)|72.32.157.228|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1046770 (1022K) [application/java-archive]
Saving to: ‘postgresql-42.5.1.jar’

postgresql-42.5.1.jar                                                   100%[====================================================>]   1022K  2.46MB/s    in 0.4s    

2022-12-24 22:14:32 (2.46 MB/s) - ‘postgresql-42.5.1.jar’ saved [1046770/1046770]
[pentaho@pentaho pentaho-server]$ mv postgresql-42.5.1.jar /home/pentaho/pentaho-server/tomcat/lib/

Modify JDBC connection information in the Tomcat XML file

pentaho-server/tomcat/webapps/pentaho/META-INF/context.xml

Edit this file and remove all of the <Resource> nodes. Now paste in with just these three:

	<Resource name="jdbc/Hibernate"
	          auth="Container"
	          type="javax.sql.DataSource"
	          factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory"
	          maxActive="20"
	          minIdle="0"
	          maxIdle="5"
	          initialSize="0"
	          maxWait="10000"
	          username="hibuser"
	          password="password"
	          driverClassName="org.postgresql.Driver"
	          url="jdbc:postgresql://localhost:5432/hibernate"
	          validationQuery="select 1"
	          jdbcInterceptors="ConnectionState"
	          defaultAutoCommit="true"/>
	<Resource name="jdbc/Quartz"
	          auth="Container"
	          type="javax.sql.DataSource"
	          factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory"
	          maxActive="20"
	          minIdle="0"
	          maxIdle="5"
	          initialSize="0"
	          maxWait="10000"
	          username="pentaho_user"
	          password="password"
	          testOnBorrow="true"
	          driverClassName="org.postgresql.Driver"
	          url="jdbc:postgresql://localhost:5432/quartz"
	          validationQuery="select 1"/>
	<Resource name="jdbc/jackrabbit"
	          auth="Container"
	          type="javax.sql.DataSource"
	          factory="org.pentaho.di.core.database.util.DecryptingDataSourceFactory"
	          maxActive="20"
	          minIdle="0"
	          maxIdle="5"
	          initialSize="0"
	          maxWait="10000"
	          username="jcr_user"
	          password="password"
	          driverClassName="org.postgresql.Driver"
	          url="jdbc:postgresql://localhost:5432/jackrabbit"
	          validationQuery="select 1"
	          jdbcInterceptors="ConnectionState"
	          defaultAutoCommit="true"/>

For each ‘resource’ make sure you update:

  • The username to the Postgres user we made earlier (svc_pentaho)
  • The password to whatever you set
  • Replace ‘localhost’ with the IP of your Postgres server and if necessary the port that the Postgres server is running on
  • Change the name of the database to connect to to whatever you created in the Postgres configuration steps (jackrabbit95 etc)

Almost Done…

Well done – you’ve completed now the essential pieces of setting up the Pentaho server.

The Pentaho server generates a number of log files and cache versions of config files when it runs. When troubleshooting start up problems these can sometimes get in the way of diagnosing issues.

In particular the directory pentaho-solutions/system/jackrabbit/repository should be removed every time you edit the configuration file pentaho-solutions/system/jackrabbit/repository.xml. If you don’t remove this folder then the edits you’ve made to repository.xml will not be processed as jackrabbit will use the cached copy.

Edit the Pentaho server startup script (start-pentaho.sh) and add these lines at the start.

rm ./tomcat/logs/*.*
rm -rf ./tomcat/work/*.*
rm -rf ./tomcat/temp/*.*
rm -rf ./pentaho-solutions/system/jackrabbit/repository
rm -rf ./pentaho-solutions/system/karaf/caches

You can now start it with:

[pentaho@pentaho pentaho-server]$ ./start-pentaho.sh 

Most of the useful logging can be found in the /tomcat/logs folder.