pip install pyoptimus

What can Optimus do for you?

  • Easy to write and easy to read.

    Not tech-savvy? not a problem! Write transformation instruction in plain English.
  • No matter your data size.

    Process a small dataset on your laptop or use a cluster to process Big data. You can use Pandas, Dask, cuDF, Dask-cuDF, Vaex or Spark to process your data.
  • Open-source

    Download and use it anywhere. (Apache 2.0 License)

Easy API

 In a little more than 10 lines you can, remove white spaces, accents in all columns, lowercase all columns data, drop a "dummyCol", transform date format, sort a column, convert integers to a "string", replace "taco" per "taaaccoo" and "pizza" per "pizzza"Β  

Optimus transformation example

Connect to files and databases

Load and save locally (or remotely) Excel, CSV, JSON, parquet, Avro. Get and insert data from Mysql, Redshift, SQL Server, Postgres, Oracle, Casandra, and Presto.Β  

Optimus loading database example

Advanced features

All you need to handle your data in one place.
  • Outlier Detection

    Easily detect outlier with Out of box functions.
  • Machine Learning

    Apply linear regression, logistic regression or K-means easily.
  • String Clustering

    Cluster similar strings and change it for single value.
  • NLP functions

    Stem and Lemmatize verbs, Tokenize strings, word count, remove diacritics, expand contratect words and more.

  • How is Optimus different from Pandas, Dask, cuDF, etc.?

    Think Optimus as a universal way to access many of the dataframe technologies available in python. Optimus can works with Pandas, Dask, Spark, Vaex, cuDF, and Dask-cudf as backend.

  • Why so many data processing engines?

    Although most dataframe API tries to mimic Pandas there are always little differences in the way these dataframes work. With Optimus, we want to let you code and then use the technology and infrastructure available to you to process your data.

  • How can Optimus use CPUs and GPUs?

    For CPU, Optimus can use Pandas, Dask, Spark, or Vaex. For GPUs, Optimus relies on cuDF and Dask-cuDF.

  • Not sure if I understand between Pandas, Dask, cuDF... and Optimus. Can you explain further?

    Optimus focused on give you the best tools for all your data processing needs. From data quality, plotting, parsing dates, URLs, email, and NLP preparation.

    Optimus give you the best performance, so you don't have to reinvent the wheel.

