MindsDB – Machine Learning with MySQL Tables Using PHP

MindsDB – Machine Learning with MySQL Tables Using PHP

Introduction

Machine learning has been around for a long time, and the emphasis on making sense of data was a selling point for many small and medium businesses. My first blush with machine learning was when I was assigned the topic ‘Prediction of Electricity Market Prices‘ for my final year thesis. The prices were downloaded into a CSV and fit into the software Weka (Java) to run a time series forecasting model. Honestly, I wasn’t sure what parameters to tweak or the logic behind the model, but ultimately, it returns a list of ‘future’ prices. All that matters is that the model works, right?

Background

I have been analyzing soccer odds movements for the past 6 years and had coded more than 50 PHP scripts to predict possible outcomes of upcoming matches using odds data stored in MySQL database. Some scripts’ algorithm earned me profits, but I will lose all within the next few days. It is difficult to remind yourself that soccer prediction is not possible when you have a massive database of odds related data. I thought I would think out of the box; hence I wrote my final predictive script this year, but to lose money yet again was the final straw. I took a one-month break and was searching for readily available predictive models on the Internet when I came across MindsDB.

Prerequisites

There are many types of database connectors and programming languages you can use with MindsDB, but we will be using MySQL and PHP in this tutorial.

  • Install Python 3.8.x and pip 22.x.x in Linux environment. Minimum versions required at https://docs.mindsdb.com/setup/self-hosted/pip/linux/
  • A basic understanding of MySQL SELECT statements.
  • tmux to run MindsDB in a single terminal session.
  • A moderately good computer to run train and predict data. My KVM is 4 cores, 8 GB of RAM, an SSD, and it takes ~2 minutes to train a model and 4-5 minutes to run a simulation.
  • What is MindsDB? Read https://docs.mindsdb.com/

Step 1 – Install Python, pip and MindsDB

There are several ways to install Python, but the below manual installation steps are useful when one cannot get the required version to install automatically, when using e.g., dnf

$ sudo dnf install gcc openssl-devel bzip2-devel libffi-devel
$ cd /opt wget https://www.python.org/ftp/python/3.8.13/Python-3.8.13.tgz
$ tar xzf Python-3.8.3.tgz
$ cd Python-3.8.13
$ sudo ./configure --enable-optimizations
$ sudo make altinstall
$ python --version

Some pitfalls exist when multiple versions of Python and pip exist in the system. We need to update .bashrc file to switch one version as default. Below, you can see that the environment has 2 versions of Python, and we need to use v3.8 to run MindsDB.

$ whereis python
python: /usr/bin/python3.6 /usr/bin/python3.6m /usr/lib/python3.6 /usr/lib64/python3.6 /usr/local/bin/python3.8 /usr/local/bin/python3.8-config /usr/local/lib/python3.8 /usr/include/python3.6m /usr/share/man/man1/python.1.gz

$ cat ~/.bashrc
# User specific aliases and functions
alias python='/usr/local/bin/python3.8'
alias pip='/usr/local/bin/pip3.8'
alias pip3='/usr/local/bin/pip3.8'

Follow these steps to install MindsDB. The MindsDB SQL Editor (here) can then be accessed via http://127.0.0.1:47334/ or http://<VM public IP>:47334/ or http://<mindsdb.example.com> with Nginx proxy. The reason we want to run MindsDB in a ⁣tmux session is because we would rather not start/stop MindsDB with SSH every time we intend to use it.

$ pip install mindsdb
$ tmux new -s [session_name]
$ python -m mindsdb

# To return to MindsDB session on another day
$ tmux a -t [session_name]

Step 2 – Prepare Training Data Set

Garbage in, garbage out means that if you use bad data to train a predictor model, then you’ll get bad output predictions. I love to predict soccer outcomes, while others like to predict the rise and fail of bitcoins. Therefore, there is no single set of cleaning instructions that works for all data sets, but I thought I would share my thoughts.

  1. Create a new MySQL table to contain the training data and not train a predictor off a huge production table directly.
  2. We may duplicate entire production data to a training table then do a cleanup, or we can write a PHP script to selectively migrate useful data across using SELECT and INSERT statements.
  3. We can still store data in the training table that we aren’t certain if we will use it later. This is possible because MindsDB allows us to train a model with specific MySQL column fields using SELECT
  4. Ideally, the training data should have a column name where we can split X% of the rows to train the model and the rest (100 – X)% to test the accuracy of the model. E.g., My soccer data table includes match `startTime` column and I use WHERE startTime < '2022-07-29' to train model and WHERE startTime >= '2022-07-29' for simulation tests.

Sample Data Set

idhome_teamaway_teamstartTimeleagueresult_ft1X2result_ouft1X2_01ft1X2_02ft1X2_03
62277NewcastleLiverpool2022-04-30English PremierAU7.004.401.35
62279Aston VillaNorwich2022-04-30English PremierHU1.374.306.50
62426Sporting KCFC Dallas2022-05-01US Soccer LeagueDO2.203.102.85
62486UdineseInter2022-05-02Italian LeagueAO5.003.901.50
Past soccer matches odds: _ft1X2 (Home/Draw/Away) and _ou (Over/Under)

Step 3 – Generate List of Possible Models

Depending on the number of parameters present in the data set, we will end up with 50 or even 100 models. We experiment with different combinations of parameters to train the best predictor. In Step 4, we are going to use function train_model() to train a predictor. It will be easier to keep a models file separate from the main PHP file that connects the training and predicting functions, so we can update and keep track of it easily.

/**
  * Filename - models_train.php
  * A list of models to include into main PHP file
  * Used by function train_model() in Step 4
  */
// Train full-time result models
// Model 'k2' using MySQL table dataset5 and dataset7
if ($_GET['train'] == 'dataset5_ft_k2' || $_GET['train'] == 'dataset7_ft_k2') {
    $sql_train = "ft1X2_01, ft1X2_03";
}
// Model 'k4' using MySQL table dataset5
if ($_GET['train'] == 'dataset5_ft_k4') {
    $sql_train = "ft1X2_01, ft1X2_02, ft1X2_03";
}

// Train total goals over/under models
// Model 'h3' using MySQL table dataset4
if ($_GET['train'] == 'dataset4_ou_h3') {
    $sql_train = "ou_hcap, ft1X2_01, ft1X2_02, ft1X2_03";
}

Step 4 – Create Predictor using PHP

You can call it train or create, predictor or model, but in the simplest term, what we are doing is asking MindsDB to create a predictive model from a MySQL data set.

  1. $modelName (e.g., dataset1_ft1X2_a1): _dataset1– After messing with MindsDB for a few days, we are sure to have a few datasets. This allows us to use data from different SQL tables, while the model parameters are kept constant. _ft1X2– The purpose of the model is ft1X2. I’m predicting full-time Home/Draw/Away and may change to _ou which is total goals Over/Under. _a1– I may use Column 1,3,4 to predict ft1X2 and later train another model _a2 using Column 3,4.
  2. $target (e.g., result_ft1X2): Which column name do you want MindsDB to predict? The result_ft1X2 contains 3 distinct possible win outcomes (H – home, D – draw, A – away)
  3. $train_cutOffDate (e.g., startTime <= ‘2022-09-23’): Every row represent a match, and we can control the amount of data to train the model with startTime. For example, we could use the first 3 months of matches data to train a model and the last 4th month to run a simulation test on accuracy. Or, you can use the first two months to train and the next two months to run a longer simulation test.
  4. $sql_train (e.g., ft1X2_01, ft1X2_03): The parameters (MySQL column names) to be used to train the model. Hypothetically, these 2 odds values will represent the strength difference between the two teams.
  5. $mysqli3: This is essentially the PHP way of connecting to the MySQL database.
/**
  * Filename - main.php
  * Call function train_model() to train the model
  * Usage: http://example.mindsdb.com/main.php?train=dataset1_ft1X2_a1&target=result_ft1X2
  */
if (isset($_GET['target']) == 1) {
    $train_cutOffDate = '2022-09-23 23:59:00'; // inclusive 23:59:00 cutOffDate's matches
    require_once 'models_train.php'; // Step 3

    train_model($_GET['train'], $_GET['target'], $train_cutOffDate, $sql_train, $mysqli3);
}

/**
  * Train NEW model using PHP
  * $modelName - consists of 2 details: tableName_modelName
  * $target - column name to predict
  * $train_cutOffDate - number of data rows used to train model
  * $sql_train - parameters (column names) used to train model
  * $mysqli3 - MySQL connection to training table
  */
function train_model ($modelName, $target, $train_cutOffDate, $sql_train, $mysqli3) {
    $model_details = explode('_', $modelName);

    // Delete existing model
    $drop = "DROP PREDICTOR mindsdb.$modelName";
    if ($result = $mysqli3 -> query($drop)) {
        echo "Returned rows are: " . var_dump($result);
    }
    else {
        echo $mysqli3->error;
    }

    /**
     * Exclude certain data in training
     * Exclude matches not having half-time bets when predicting half-time pick-the-score (htpts)
     */
    if ($target == 'result_htpts') {
        $sql_add = " AND ht1X2_01_o IS NOT NULL"; // for htpts models
    }
    else {
        $sql_add= ''; // for all other models
    }

    /**
     * Train NEW model
     * SANITIZE DATA HERE !!!!
     * Parameters (MySQL columns) used for training should not be NULL unless data set allows
     */
    $parameters = explode(',', $sql_train); // e.g. ft1X2_01_o, ft1X2_02_o, ft1X2_03_o, ou_hcap_o
    $sql_parameter = '';
    foreach ($parameters as $parameter) {
        $sql_parameter .= ' AND '.trim($parameter).' IS NOT NULL';
    }

    // Create MindsDB predictor now!!
    $train = "CREATE PREDICTOR mindsdb.$modelName
    FROM demo
        (SELECT home_team, away_team, type_name, $target, ".$sql_train."
            FROM $model_details[0]
            WHERE $target IS NOT NULL ".$sql_parameter." AND startTime <= '$train_cutOffDate'".$sql_add.")
    PREDICT $target";
    echo '<br>'.$train;

    if ($result = $mysqli3 -> query($train)) {
        echo "<br>Trained model `".$modelName."`";
    }
    else {
        echo $mysqli3->error;
    }
}

Conclusion

MindsDB is a tool for exploring machine learning that is easy to learn and has a strong community support through Slack. It works with a few dozen databases and comes with an in-built query web interface similar to phpMyAdmin. If you are unfamiliar with databases, it can import your data set in CSV format. I have created more than 100 models trying to predict half-time, full-time and total goals to date because different sets of training parameters (MySQL column names) are used for each prediction targets. Likewise, I also manipulate `startTime` which controls the size of my training data set, which is equivalent to the number of matches. In our next MindsDB tutorial, we will write a PHP function to run simulations dynamically and present the output in a webpage.

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *