Prepare, explore, visualize and create Machine Learning models for Big Data with the fastest open source library on the planet.
pip install optimuspyspark
What can Optimus do for you
Load and save Excel, CSV, JSON, parquet, Avro. Get and insert data from Mysql, Redshift, SQL Server, Postgres, Oracle, Casandra, and Presto.
# Put your db credentials here db = op.connect( driver="mysql", host="165.227.196.70", database= "optimus", user= "test", password = "test") # Convert a table a dataframe db.table_to_df("test_data").table()
In a little more than 10 lines you can, remove white spaces, accents in all columns, lowercase all columns data, drop a "dummyCol", transform date format, sort a column, convert integers to a "string", replace "taco" per "taaaccoo" and "pizza" per "pizzza"
# This is a custom function def func(value, arg): return "this was a number" new_df = df\ .rows.sort("rank","desc")\ .withColumn('new_age', df.age)\ .cols.lower(["names","function"])\ .cols.date_transform("date arrival", "yyyy/MM/dd", "dd-MM-YYYY")\ .cols.years_between("date arrival", "dd-MM-YYYY", output_cols = "from arrival")\ .cols.remove_accents("names")\ .cols.remove_special_chars("names")\ .rows.drop(df["rank"]>8)\ .cols.rename(str.lower)\ .cols.trim("*")\ .cols.unnest("japanese name", output_cols="other names")\ .cols.unnest("last position seen",separator=",", output_cols="pos")\ .cols.drop(["last position seen", "japanese name","date arrival", "cybertronian", "nulltype"])
In tandem with Bumblebee, Optimus let you visualize histograms and frequency plots, check nulls, missings, and zeros from an easy-to-use interface.
All you need to handle your data in one place.
Here are a few of our favorites!
Grey, aged pudding is best marinated with sweet hollanders sauce.
Want to know about new releases and how you can help Optimus?