BigMLer - A command-line tool for BigML’s API
BigMLer makes BigML even easier.
BigMLer wraps BigML’s API Python bindings to offer a high-level command-line script to easily create and publish datasets and models, create ensembles, make local predictions from multiple models, clusters and simplify many other machine learning tasks.
BigMLer is open sourced under the Apache License, Version 2.0.
Requirements
BigMLer needs Python 3.8 or higher versions to work. Compatibility with Python 2.X was discontinued in version 3.27.2.
BigMLer requires bigml 9.7.1 or
higher, that contains the bindings providing support to use the BigML
platform to create, update, get and delete resources,
but also to produce local predictions using the
models created in BigML
. Most of them will be actionable with the basic
installation, but some additional dependencies are needed
to use local Topic Models
to produce Topic Distributions
. These can
be installed using:
pip install bigmler[topics]
The bindings also support local predictions for models generated from images. To use these models, an additional set of libraries needs to be installed using:
pip install bigmler[images]
The external libraries used in this case exist for the majority of recent Operating System versions. Still, some of them might need especific compiler versions or dlls, so their installation may require an additional setup effort.
The full set of libraries can be installed using
pip install bigmler[full]
BigMLer Installation
To install the latest stable release with pip
$ pip install bigmler
You can also install the development version of bigmler directly from the Git repository
$ pip install -e git://github.com/bigmlcom/bigmler.git#egg=bigmler
For a detailed description of install instructions on Windows see the :ref:bigmler-windows section.
Support for local Topic Distributions (Topic Models’ predictions) and local predictions for datasets that include Images will only be available as extras, because the libraries used for that are not usually available in all Operating Systems. If you need to support those, please check the Installation Extras section.
Installation Extras
Local Topic Distributions support can be installed using:
pip install bigmler[topics]
Images local predictions support can be installed using:
pip install bigmler[images]
The full set of features can be installed using:
pip install bigmler[full]
WARNING: Mind that installing these extras can require some extra work, as explained in the Requirements section.
BigML Authentication on Unix or Mac OS
All the requests to BigML.io must be authenticated using your username and API key and are always transmitted over HTTPS.
BigML module will look for your username and API key in the environment
variables BIGML_USERNAME
and BIGML_API_KEY
respectively. You can
add the following lines to your .bashrc
or .bash_profile
to set
those variables automatically when you log in
export BIGML_USERNAME=myusername
export BIGML_API_KEY=ae579e7e53fb9abd646a6ff8aa99d4afe83ac291
Otherwise, you can initialize directly when running the BigMLer script as follows
bigmler --train data/iris.csv --username myusername \
--api-key ae579e7e53fb9abd646a6ff8aa99d4afe83ac291
For a detailed description of authentication instructions on Windows see the :ref:bigmler-windows section.
BigMLer Install and Authentication on Windows
To install BigMLer on Windows environments, you’ll need Python installed.
The code has been tested with Python 3.10 and you can create a conda
environment with that Python version or download it from Python for Windows and install it. In the latter case, you’ll
also need too install the pip
tool to install BigMLer.
To install pip
, first you need to open your command terminal window
(write cmd
in
the input field that appears when you click on Start
and hit enter
).
Then you can follow the steps described, for example, in this guide
to install its latest version.
And finally, to install BigMLer in its basic capacities, just type
python -m pip install bigmler
and BigMLer should be installed in your computer or conda environment. Then issuing
bigmler --version
should show BigMLer version information.
Extensions of BigMLer to use images are usually not available in Windows. The libraries needed for those models are not available usually for that operating system. If your Machine Learning project involves images, we recommend that you choose a Linux based operating system.
Finally, to start using BigMLer to handle your BigML resources, you need to set your credentials in BigML for authentication. If you want them to be permanently stored in your system, use
setx BIGML_USERNAME myusername
setx BIGML_API_KEY ae579e7e53fb9abd646a6ff8aa99d4afe83ac291
Note that setx
will not change the environment variables of your actual
console, so you will need to open a new one to start using them.
Prior Versions Compatibility Issues
BigMLer will accept flags written with underscore as word separator like
--clear_logs
for compatibility with prior versions. Also --field-names
is accepted, although the more complete --field-attributes
flag is
preferred. --stat_pruning
and --no_stat_pruning
are discontinued
and their effects can be achived by setting the actual --pruning
flag
to statistical
or no-pruning
values respectively.
Running the Tests
The tests will be run using pytest.
You’ll need to set up your authentication
via environment variables, as explained in the authentication section.
Also some of the tests need other environment
variables like BIGML_ORGANIZATION
to test calls when used by Organization
members and BIGML_EXTERNAL_CONN_HOST
, BIGML_EXTERNAL_CONN_PORT
,
BIGML_EXTERNAL_CONN_DB
, BIGML_EXTERNAL_CONN_USER
,
BIGML_EXTERNAL_CONN_PWD
and BIGML_EXTERNAL_CONN_SOURCE
in order to test external data connectors.
With that in place, you can run the test suite simply by issuing
$ pytest
BigMLer subcommands
In addition to the BigMLer
simple command, that covers the main
functionality, there are some additional subcommands:
Usual workflows’ subcommands
bigmler connector
:
Used to generate external connectors to databases. See Connector subcommand.
bigmler source
:
Used to generate sources from data files. See Source subcommand.
bigmler dataset
:
Used to generate datasets from data files, sources and transformations on other datasets See Dataset subcommand.
bigmler cluster
:
Used to generate clusters and centroids’ predictions See Cluster subcommand.
bigmler anomaly
:
Used to generate anomaly detectors and anomaly scores. See Anomaly subcommand.
bigmler sample
:
Used to generate samples of data from your existing datasets. See Sample subcommand.
bigmler association
:
Used to generate association rules from your datasets. See Association subcommand.
bigmler logistic-regression
:
Used to generate logistic regression models and predictions. See Logistic-regression subcommand.
bigmler linear-regression
:
Used to generate linear regression models and predictions. See Linear-regression subcommand.
bigmler topic-model
:
Used to generate topic models and topic distributions. See Topic Model subcommand.
bigmler time-series
:
Used to generate time series and forecasts. See Time Series subcommand.
bigmler deepnet
:
Used to generate deepnets and their predictions. See Deepnet subcommand.
bigmler fusion
:
Used to generate fusions and their predictions. See Fusion subcommand.
bigmler pca
:
Used to generate PCAs and their projections. See PCA subcommand.
bigmler project
:
Used to generate and manage projects for organization purposes. See Project subcommand.
Management subcommands
bigmler delete
:
Used to delete the remotely created resources. See Delete subcommand.
bigmler.export
:
Used to generate the code you need to predict locally with no connection to BigML. See Export subcommand.
Reporting subcommands
bigmler report
:
Used to generate reports for the analyze subcommand showing the ROC curve and evaluation metrics of cross-validations. See Report subcommand.
Model tuning subcommands
bigmler analyze
:
Used for feature analysis, node threshold analysis and k-fold cross-validation. See Analyze subcommand.
Scripting subcommands
bigmler reify
:
Used to generate scripts to reproduce the existing resources in BigML. See Reify subcommand.
bigmler execute
:
Used to create WhizzML libraries or scripts and execute them. See Execute subcommand.
bigmler whizzml
:
Used to create WhizzML packages of libraries or scripts based on the
information of the metadata.json
file in the package directory. See
Whizzml subcommand
bigmler retrain
:
Used to retrain models by adding new data to the existing datasets and building a new model from it. See Retrain subcommand
BigML Development Mode
The Sandbox environment that could be reached by using the flag --dev
has been deprecated and. Right now, there’s only one mode to work with BigML:
the previous Production Model
, so the flag is no longer available.
Using BigMLer
To run BigMLer you can use the console script directly. The --help
option will describe all the available options
bigmler --help
Alternatively you can just call bigmler as follows
python bigmler.py --help
This will display the full list of optional arguments. You can read a brief explanation for each option below.
Building the Documentation
Install the tools required to build the documentation
$ pip install sphinx
$ pip install sphinx-rtd-theme
To build the HTML version of the documentation
$ cd docs/
$ make html
Then launch docs/_build/html/index.html
in your browser.
Additional Information
For additional information, see the full documentation for the Python bindings on Read the Docs. For more information about BigML’s API, see the BigML developer’s documentation.
Support
Please report problems and bugs to our BigML.io issue tracker.
Discussions about the different bindings take place in the general BigML mailing list.
How to Contribute
Please follow the next steps:
Fork the project on github.
Create a new branch.
Commit changes to the new branch.
Send a pull request.
For details on the underlying API, see the BigML API documentation.