Both teams have been iterating well and driving towards Raven becoming a Compute Provider on Ocean Market. We mentioned publishing a range of algorithms from Machine Learning to Federated Analytics and Federated Learning. Logistic Regression is the first we’re giving in-depth walk-throughs on.
The rest of this post is organized as follows:
Section 1 gives an overview of what Logistic Regression is.
Section 2 covers different applications and use cases of Logistic Regression.
Section 3 is an intro to using Logistic Regression for Classification in Ocean’s Compute-to-Data.
Section 4 walks you through on how to publish the Iris Flower Dataset so we can run Logistic Regression on it.
Section 5 walks you through on how to publish the Logistic Regression Algorithm on Ocean Market.
Section 6 brings everything we learned in Sections 1 through 5 together to run Logistic Regression on Iris via Ocean’s Compute-to-Data.
Section 7 is for the devs! It walks you through everything done in Sections 4, 5, and 6 via console / command line / Python code.
Section 8 concludes.
1. Raven published Logistic Regression, a supervised machine learning algorithm for classification, on Ocean Market
The Supervised Learning methodology of Logistic Regression is used to predict the categorical dependent variable using a set of independent factors. A categorical dependent variable’s output is predicted using Logistic Regression. Thus, the result must be a discrete or categorical value. Logistic Regression is similar to Linear Regression. However, they are used differently. Linear Regression is used for solving Regression problems, whereas Logistic regression is used for solving the classification problems.
Some of the examples of classification problems are: spam or not spam, fraud or not fraud credit card charge, dog or cat. Logistic Regression usually limits the cost function between 0 and 1. The algorithm then transforms its output using the Logistic Sigmoid Function to return a probability value.
What is the Sigmoid Function?
In Machine Learning, we use the Sigmoid Function to map predictions (any real value) to probabilities (a number between 0 and 1).
Sigmoid Function Graph
Decision Boundary
Our classifier (Logistic Regresssion in this case) gives us a set of outputs/categories/classes. We pass the inputs through a prediction function and it returns a class with a probability score between 0 and 1.
Let’s take, for example, Cats and Dogs. We will need to decide on a threshold value. Anything above the probability threshold, we will classify the object as a Dog. If the probability value goes below the threshold, then we classify it as a Cat.
In the graph above, we have chosen the probability threshold to be 0.5. If the prediction function returns a value of 0.7, then we would classify this observation as Dog. If our prediction returned a value of 0.2, then we would classify the observation as Cat.
2. The application of Logistic Regression involves all the use cases where the data needs to be classified into different categories.
Here are some real-world use cases that many smart people are working on:
Fraud detection: Detection of credit card frauds or banking fraud is the objective of this use case.
Email spam or ham: Classifying the email as spam or ham and putting it in either Inbox or Spam folder is the objective of this use case.
Sentiment Analysis: Analyzing the sentiment using the review or tweets is the objective of this use case. Most of the brands and companies use this to increase customer experience.
Image segmentation, recognition and classification: The objective of all these use cases is to identify the object in the image and classify it.
Object detection: This use case is to detect objects and classify them not in the image but the video.
Handwriting recognition: Recognizing the letters written is the objective of this use case.
Disease (diabetes, cancer etc.) prediction: Predicting whether the patient has disease or not is the objective of this use case.
3. Using Logistic Regression for Classification in Ocean’s Compute-to-Data
Going through the process of publishing this algorithm on Ocean Market showed how powerful the platform is for any no-code machine learning enthusiasts.
At a high level, one who already has their wallet set up on Ocean can simply click run compute and get a results back with no knowledge of how the backend machine learning works or where the job even happens. Ocean Compute-to-Data has made this a black box. This makes it possible for non-technical people to both buy datasets and train ML models. People who have been leveraging the libraries in Raven’s GitHub have certainly been engineers. So we are very excited about all the complementary opportunities for growth in this partnership!
To get to the stage where we can use Compute-to-Data as a black box, we need to publish a dataset (Iris) and an algorithm. Following are two comprehensive walk-throughs covering that. Once those are published, we will put everything together and run Logistic Regression on Iris (there’s a walk-through).
Last, but not least, we’ll show you a walk-through on how to everything via console. The motivation to show how to do all the steps via a console is simply that this is more in line with workflows that AI/ML people use, and would likely be workflows here too once they understand what’s happening in Ocean Market.
4. Walk-through: Publishing Iris Flower Dataset
Datasets are very important for feeding into Machine Learning algorithms. Logistic Regression certainly needs one. Let’s first walk through how to publish the Iris Flower Dataset to Ocean Market.
These are simple form fields. Title and Description are used all over Ocean Market to reference the dataset to be published.
File is a little more complicated. Instead of uploading a file of the Iris Flower Dataset to Ocean Market directly, we will need to host the file on your own secure server of your choosing. Since this is an open-sourced dataset, we can host it on IPFS. Now put the link to the file in the form. Same goes for the Sample file.
iris.csv if you were curious what the dataset looks like.
Select “Compute” as the Access Type for this dataset. This means a user isn’t buying the dataset itself for download. They are buying the results of a compute job. Timeout sets how long the buyer can still view the results.
Datatoken Name & Symbol are auto generated (probably by AI). You can cycle through various combinations until you get one you like. Author is typically the dataset publisher.
Tags allow other people to discover your datasets.
Read the Terms & Conditions, click agree, and hit Submit.
4. Confirming Transactions
Clicking submit from the previous step initiates the blockchain imteractions. Confirm the transaction. Note that this is creating a Data Token.
After the above transaction is confirmed on the network, a second transaction automatically pops up in MetaMask. It’s a contract interaction. Please confirm the transaction to complete publishing the dataset.
If everything went well, you’ll get a message indicating that the dataset was succesfully published. Now click “Go to Data Set”
5. Create Pricing
Initially, the dataset has no price and we need to set one. Click “Create Pricing”
By default, this brings us to configuring the dataset with dynamic pricing. There many advantages to this like letting the market decide on the value of the data and you can read more in detail in section A2 on approaches to pricing data dynamically
For simplicity and the purposes of this walk-through, let’s click “Fixed” to set our price for accessing this data set.
Let’s set the price to be 1 OCEAN.
The Community Fee is currently hardcoded at 0.1% and can’t be changed. It goes to Ocean DAO for teams to improve the tools, build apps, do outreach, and more. A small fraction is used to burn OCEAN. This fee is collected when downloading or using an asset in a compute job.
The Marketplace Fee is also currently hardcoded at 0.1% and can’t be changed. It goes to the marketplace owner that is hosting and providing the marketplace and is collected when downloading or using an asset in a compute job. In Ocean Market, it is treated as network revenue that goes to the Ocean community.
Now click “Create Pricing”
Confirm the transaction to mint the token.
Confirm another transaction to create the exchange (ability for someone to purchase the dataset).
Confirm another transaction to allow the marketplace to use the minted datatokens.
If everything went well, you’ll get a message indicating that a fixed price was created for the dataset succesfully. Click “Reload Page”
After learning how to publish a dataset, one may want to do some kind of action on that data. This is the innovation of Ocean’s Compute-to-Data. Users can easily select an algorithm to start a compute job. Let’s walk through how to publish a machine learning algorithm that can use the dataset we previously published. The algorithm of choice? You guessed it, Logistic Regression.
These are simple form fields. Title and Description are used all over Ocean Market to reference the algorithm to be published.
File is a little more complicated. Instead of uploading a file of the Logistic Regression algorithm to Ocean Market directly, we will need to host the file on our own secure server of our choosing. We can host it on IPFS. Now put the link to the file in the form.
Docker Image provides compute-to-data with the environment for the algorithm to run on. We’ve written Logistic Regression in Python so we will select “python:latest”.
Timeout is how long buyers are able to download the results of the algorithm again after the initial run.
Datatoken Name & Symbol are auto generated (probably by AI). You can cycle through various combinations until you get one you like.
Click the checkmark in Algorithm Privacy if you would like to keep your machine learning algorithm private. If left unchecked, any user who kicks off a Compute to Data job will be able to read and download this aglorithm.
Author is typically the algorithm publisher.
Tags allow other people to discover your algorithm.
Read the Terms & Conditions, click agree, and hit Submit.
4. Confirming Transactions
Clicking submit from the previous step initiates the blockchain imteractions. Confirm the transaction. Note that this is creating a Data Token.
After the above transaction is confirmed on the network, a second transaction automatically pops up in MetaMask. It’s a contract interaction. Please confirm the transaction to complete publishing the algorithm.
If everything went well, you’ll get a message indicating that the algorithm was succesfully published. Now click “Go to Algorithm”
5. Create Pricing
Initially, the algorithm has no price and we need to set one. Click “Create Pricing”
By default, this brings us to configuring the algorithm with dynamic pricing. There many advantages to this like letting the market decide on the value of the algorithm and you can read more in detail in section A2 on approaches to pricing dynamically.
For simplicity and the purposes of this walk-through, let’s click “Fixed” to set our price for accessing this algorithm.
Let’s set the price to be 1 OCEAN.
The Community Fee is currently hardcoded at 0.1% and can’t be changed. It goes to Ocean DAO for teams to improve the tools, build apps, do outreach, and more. A small fraction is used to burn OCEAN. This fee is collected when downloading or using an asset in a compute job.
The Marketplace Fee is also currently hardcoded at 0.1% and can’t be changed. It goes to the marketplace owner that is hosting and providing the marketplace and is collected when downloading or using an asset in a compute job. In Ocean Market, it is treated as network revenue that goes to the Ocean community.
Now click “Create Pricing”
Confirm the transaction to mint the token.
Confirm another transaction to create the exchange (ability for someone to purchase the dataset).
Confirm another transaction to allow the marketplace to use the minted datatokens.
If everything went well, you’ll get a message indicating that a fixed price was created for the algorithm succesfully. Click “Reload Page”
6. Allowing The Algorithm to Run on a Dataset
Now we can see that Logistic Regression is published on Ocean Market. However, we can’t actually use it yet because no datasets have allowed this algorithm to run on it.
Let’s now allow Logistic Regression to run on this dataset. Click on “Edit Compute Settings”
Select the algorithm “Logistic Regression v1.0” which we just recently published. Click Submit.
Click Submit and confirm the transaction.
If all went well, you’ll see that the dataset was succesfully updated to allow Logistic Regression to run on it. Click Close.
Confirm in the Iris Flower Dataset that Logistic Regression v1.0 is now an algorithm allowed to run on the dataset.
7. The Algorithm is Now Live and Usable
Go back to the algorithm that you published on Ocean Market. Now we can see the Logistic Regression algorithm is all ready for a compute job on Ocean Market Mainnet for 1 OCEAN!
The next walk-through will go over how to run a compute-to-data job using Logistic Regression on the Iris Flower Dataset.
6. Walk-Through: Logistic Regression on Iris
Imagine you’re building a hot new app that can detect flowers and display name & information in a beautiful augmented reality experience to your users. The Louvre is holding a special garden event featuring just the Iris Flower, but you don’t support it yet. What are you going to do?!? You’ll need some sort of Iris classifier.
Luckily, the teams at Raven and Ocean previously published the popular IRIS dataset. Let’s walk through opening the algorithm on Ocean Market, opening the dataset, selecting the algorithm, starting the compute job, confirming the transaction, viewing the history of compute jobs, and getting the results.
To train an algorithm, you need to open the dataset compatible with the algorithm. We have published open-source datasets like Iris Flower Dataset, Wine Quality Dataset, Boston House Pricing Dataset, and MNIST Dataset for you to have a play. For this article, we will use Iris Flower Dataset: https://market.oceanprotocol.com/asset/did:op:9dBC27177aC4A056cE9d834c1c28Ce216C3bb525
IRIS Flower Dataset on Ocean Market. Note Logistic Regression is a supported algorithm for this dataset.
3. Select Algorithm
For this article, we will use Logistic Regression. Select Logistic Regression from the list of algorithms. You can choose whichever algorithm you want to choose for the current dataset otherwise.
Select Logistic Regression v1.0. That big buy compute job will be clicked next
4. Start Compute Job
Finally, click on the “BUY COMPUTE JOB” button to start a compute job with Logistic Regression running on Iris Flower Dataset.
Status indicating the compute job is running.
5. Confirm/Validate Transaction
The next step is to validate the transaction. It will open the Metamask transaction dialog to confirm the transaction and the dialog will look like the below screenshot. You can validate the total amount, gas fee, and other charges there. If all looks good, hit confirim.
Metamask Transaction Confirmation
6. View Compute Jobs History
After you have confirmed the transaction, the compute job will take some time to complete. You can see the status of your compute job in the History section under compute jobs.
Once the compute job is finished, click on “SHOW DETAILS” to access the results and log files.
7. Get Results
And voila! A pop up shows a link to view the results of the finished job.
Click Get Results.
Download the log file and the results file to access the predictions. It will contain:
Predicted values
Accuracy
Confusion matrix
Precision, recall, and other scores
7. Walk-Through: Sections 4, 5, and 6 via Code
This section is for the devs! It walks you through everything done in Sections 4, 5, and 6 via console / command line / Python code. This will be a relatively shorter section since all concepts have been shared in Sections 4, 5, and 6.
Here are the steps:
Setup
Publish Iris Flower Dataset
Publish Logistic Regression Algo
Allow the Logistic Regression algorithm to run on the Iris Flower Dataset
Run Logistic Regression on Iris
1. Setup
To get started with these walk-throughs via code, we will first need to run Ocean Barge services and install the Ocean.py library.
Start a new console which will be dedicated to running Ocean Barge services. Note that this will require Docker.
Start another new console window which we will be using for the remainder fo the walk-through via code.
#Install the ocean.py library. Install wheel first to avoid errors. pip install wheel pip install ocean-lib
Set environment variables.
#set private keys of two accounts export TEST_PRIVATE_KEY1=0x5d75837394b078ce97bc289fa8d75e21000573520bfa7784a9d28ccaae602bf8 export TEST_PRIVATE_KEY2=0xef4b441145c1d0f3b4bc6d61d29f5c6e502359481152f869247c7a4244d45209
#set the address file only for ganache export ADDRESS_FILE=~/.ocean/ocean-contracts/artifacts/address.json
# Specify metadata & service attributes for Iris Flower dataset. # It's specified using _local_ DDO metadata format; Aquarius will convert it to remote # by removing `url` and adding `encryptedFiles` field. DATA_metadata = { "main": { "type": "dataset", "files": [ { "url": "https://www.example.com/path/to/dataset/file", "index": 0, "contentType": "text/text" } ], "name": "Iris Flower Dataset", "author": "Ocean Protocol & Raven Protocol", "license": "MIT", "dateCreated": "2019-12-28T10:55:11Z" } } DATA_service_attributes = { "main": { "name": "DATA_dataAssetAccessServiceAgreement", "creator": alice_wallet.address, "timeout": 3600 * 24, "datePublished": "2019-12-28T10:55:11Z", "cost": 1.0, # <don't change, this is obsolete> } }
# Set up a service provider. We'll use this same provider for ALG from ocean_lib.data_provider.data_service_provider import DataServiceProvider provider_url = DataServiceProvider.get_url(ocean.config) # returns "http://localhost:8030"
# Calc DATA service compute descriptor from ocean_lib.common.agreements.service_factory import ServiceDescriptor DATA_compute_service_descriptor = ServiceDescriptor.compute_service_descriptor(DATA_service_attributes, provider_url)
#Publish metadata and service info on-chain DATA_ddo = ocean.assets.create( metadata=DATA_metadata, publisher_wallet=alice_wallet, service_descriptors=[DATA_compute_service_descriptor], data_token_address=DATA_datatoken.address) print(f"DATA did = '{DATA_ddo.did}'")
3. Publish Logistic Regression Algo
From the same Python console as above, we can publish the Logistic Regression Algorithm.
# Calc ALG service access descriptor. We use the same service provider as DATA ALG_access_service_descriptor = ServiceDescriptor.access_service_descriptor(ALG_service_attributes, provider_url) #returns ("algorithm", # {"attributes": ALG_service_attributes, "serviceEndpoint": provider_url})
# Publish metadata and service info on-chain ALG_ddo = ocean.assets.create( metadata=ALG_metadata, # {"main" : {"type" : "algorithm", ..}, ..} publisher_wallet=alice_wallet, service_descriptors=[ALG_access_service_descriptor], data_token_address=ALG_datatoken.address) print(f"ALG did = '{ALG_ddo.did}'")
4. Allow the Logistic Regression algorithm to run on the Iris Flower Dataset
From the same Python console as above, we need to allow the algorithm to run on a dataset.
from ocean_lib.assets import utils utils.add_publisher_trusted_algorithm(DATA_ddo, ALG_ddo.did, config.metadata_cache_uri) ocean.assets.update(DATA_ddo, publisher_wallet=alice_wallet)
5. Run Logistic Regression on Iris
DATA_did = DATA_ddo.did # for convenience ALG_did = ALG_ddo.did DATA_DDO = ocean.assets.resolve(DATA_did) # make sure we operate on the updated and indexed metadata_cache_uri versions ALG_DDO = ocean.assets.resolve(ALG_did)
# Once you get {'ok': True, 'status': 70, 'statusText': 'Job finished'}, Alice can check the result of the job. result = ocean.compute.result_file(DATA_did, job_id, 0, alice_wallet)
8. Conclusion and Next Steps
Now that we’ve gone through publishing a dataset, publishing an algorithm, allowing that algo to run on the dataset, and also kicking off a compute job, we can see how powerful Ocean Compute-to-Data is.
It works great for Machine Learning engineers and Data Scientists alike with the Ocean Python library. Compute-to-Data also works great for those that don’t understand Python or reading/writing code through their marketplace UI.
Publishing algorithms developed by the Raven Protocol team on Ocean Market means that many more people in the world have easy access to what we’re building!
Look out for more algorithms on the marketplace next. The community will be able to fire them up with Compute-to-Data thanks to this amazing collaboration.