diff --git a/.gitignore b/.gitignore index a463abe..566e2e3 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,5 @@ .DS_Store .vscode +.env uv.lock __pycache__/ diff --git a/concepts/diffs.mdx b/concepts/diffs.mdx index 650f767..963a81d 100644 --- a/concepts/diffs.mdx +++ b/concepts/diffs.mdx @@ -69,7 +69,7 @@ The `..` separator can be used to specify commit ranges, making it easy to compa # Pick Your Tooling -All the functionality below is available through the [🖥️ Command Line](/getting-started/cli), [🦀 Rust Library](https://crates.io/crates/liboxen), [🐍 Python Library](/getting-started/python), as well as the [🌎 Web Interface](https://oxen.ai). This guide will focus on the command line tooling, but the same principles apply to the other interfaces. +All the functionality below is available through the [🖥️ Command Line](/getting-started/command-line/start_repository), [🦀 Rust Library](https://crates.io/crates/liboxen), [🐍 Python Library](/python-api), as well as the [🌎 Web Interface](https://oxen.ai). This guide will focus on the command line tooling, but the same principles apply to the other interfaces. Using the [Oxen.ai Hub](https://oxen.ai) you can quickly visualize and navigate the changes in your datasets with an easy to use interface. Sign up for free 👉 [here](https://oxen.ai/register). diff --git a/concepts/feature-updates.mdx b/concepts/feature-updates.mdx index e179cad..020f0ec 100644 --- a/concepts/feature-updates.mdx +++ b/concepts/feature-updates.mdx @@ -34,7 +34,7 @@ This page will have a complete update with demos, photos, and top feature update - Introduced the option to delete branches from the remote repository. [#80](https://github.com/Oxen-AI/Oxen/pull/80) - Improved workspace management to prevent multiple simultaneous workspaces and ensure consistent handling during file additions. [#80](https://github.com/Oxen-AI/Oxen/pull/80) - Enhanced branch existence checks and streamlined branch deletion. [#80](https://github.com/Oxen-AI/Oxen/pull/80) -- Introduced notebook management capabilities, allowing users to start and stop notebook sessions within Oxen repositories via [Python](/getting-started/python). [#79](https://github.com/Oxen-AI/Oxen/pull/79) +- Introduced notebook management capabilities, allowing users to start and stop notebook sessions within Oxen repositories via [Python](/python-api). [#79](https://github.com/Oxen-AI/Oxen/pull/79) - Added a new Python module for notebook operations and exposed these features in the public API. [#79](https://github.com/Oxen-AI/Oxen/pull/79) - Enhanced error handling and user feedback when initializing workspaces and executing SQL queries. [#79](https://github.com/Oxen-AI/Oxen/pull/79) - Improved performance for adding multiple files by batching operations. [#79](https://github.com/Oxen-AI/Oxen/pull/79) @@ -80,7 +80,7 @@ This page will have a complete update with demos, photos, and top feature update - Added Support for Model Deprecation - Allow import of hugging face datasets into Server - Added a `copy prompt` button to entries in the eval list view -- For [Notebooks](/getting-started/notebooks) added branch indicator and now respects branches with slashes +- For Notebooks added branch indicator and now respects branches with slashes - Now able to merge changes on the server where there are no file conflicts (for example: a notebook is staged and a data frame is staged) - GitHub Actions for python windows - Expanded row-view component to make viewing complex content easier @@ -114,7 +114,7 @@ This page will have a complete update with demos, photos, and top feature update - Added a new error handling mechanism for data retrieval methods. [#77](https://github.com/Oxen-AI/Oxen/pull/77) #### By Customer Request -- Added how to save a parquet file to a directory, and errors with to_csv in [Python](/getting-started/python) by [Mirascope's](https://www.oxen.ai/Mirascope) request +- Added how to save a parquet file to a directory, and errors with to_csv in [Python](/python-api) by [Mirascope's](https://www.oxen.ai/Mirascope) request - Added method to test if a branch already exists - Create data frame if it does not exist on the first insert @@ -175,11 +175,11 @@ This page will have a complete update with demos, photos, and top feature update #### By Customer Request - [@isaac_pivotal](https://www.oxen.ai/isaac_pivotal) requested the audio player in the UI can now be scrolled through ![audio player](/images/audio_player.png) -- [@isaac_pivotal](https://www.oxen.ai/isaac_pivotal) requested a [Python API](getting-started/python) to upload bytes +- [@isaac_pivotal](https://www.oxen.ai/isaac_pivotal) requested a [Python API](/python-api) to upload bytes - [@isaac_pivotal](https://www.oxen.ai/isaac_pivotal) requested to add a generator for paginating over a file list -- We've added a method to fetch file metadata (hash, etc) through our [Python SDK](/getting-started/python) -- [@Mirascope](https://www.oxen.ai/Mirascope) requested named [workspaces](/concepts/workspaces) -- We've added a method for checking if local file matches remote file in the [Python SDK](/getting-started/python) +- We've added a method to fetch file metadata (hash, etc) through our [Python SDK](/python-api) +- [@Mirascope](https://www.oxen.ai/Mirascope) requested named [workspaces](/getting-started/workspaces) +- We've added a method for checking if local file matches remote file in the [Python SDK](/python-api) ### Fixed Bugs - We now have better handling on commit view in the UI @@ -193,14 +193,14 @@ This page will have a complete update with demos, photos, and top feature update - New [Model Description Page](https://www.oxen.ai/ai/models/deepseek-r1)! Click on any model in [The Model Page](https://www.oxen.ai/ai/models) to learn what to use it for, stats, and compare prices. ![Model Description Page](/images/model_description_page.png) -- You can now download your SQL query results within [workspaces](/http-api/workspaces/query_dataframe) and in the [UI](https://oxen.ai)! +- You can now download your SQL query results within [workspaces](/http-api/data-frames/get-data-frame-slice) and in the [UI](https://oxen.ai)! ![download SQL query results](/images/download_sql_query.png) - We cleaned up [Oxen.ai's](https://www.oxen.ai) navigation bar:) -- The [Query Data Frame API](/http-api/workspaces/query_dataframe) now accepts a query id for downloading a SQL queried data frame in a workspace +- The [Query Data Frame API](/http-api/data-frames/get-data-frame-slice) now accepts a query id for downloading a SQL queried data frame in a workspace - As per request (shout out [Paul](https://www.oxen.ai/paul)), we've support PUT to a file with a `based-on` revision passed in - Added --remote to `oxen workspace list` -- Added ability to merge with the [python library](/getting-started/python) +- Added ability to merge with the [python library](/python-api) - By customer request we've slimmed down Python dependencies with FSSpec ### Fixed Bugs @@ -245,13 +245,13 @@ This page will have a complete update with demos, photos, and top feature update - A repo is created when a file is uploaded - We moved error banner in [Model Inference](https://oxen.ai/ai/models) above the dataframe - We've added to the [HTTP API docs](/http-api) -- You can now ask Text2SQL queries throught the workspaces API and [download results](/http-api/workspaces/query_dataframe) +- You can now ask Text2SQL queries throught the workspaces API and [download results](/http-api/data-frames/get-data-frame-slice) ### Fixed Bugs - Shout out [Mund](https://www.oxen.ai/MundVetter) for letting us know Model Inference crashed and wasn't showing the commit message. Fixed! :) - when an ID is `0`, the sidbar will not show `Null` but `0` - The target branch stays the target branch when running an evaluation from the completed evaluation page (it used to use the evaluation ID as the target branch) -- You can now have several [workspaces](/concepts/workspaces) with the same commit ID +- You can now have several [workspaces](/getting-started/workspaces) with the same commit ID - Columns used to sometimes show up twice...not anymore:) @@ -284,7 +284,7 @@ This page will have a complete update with demos, photos, and top feature update - We removed "Versions" Tab from Top Nav - There is now a Commit button directly on the data frame - You now have more advanced commit options for [editable data frames](/features/labeling_data) -- We now list [workspace API](/http-api/workspaces/list_workspaces) so admins can see all workspaces +- We now list [workspace API](/http-api/workspaces/list-workspaces) so admins can see all workspaces - There is now a Create Query API - Auto detect image file type with Rust for images that do not end in .png or .jpg @@ -329,7 +329,7 @@ This page will have a complete update with demos, photos, and top feature update /> - Now able to `oxen add`, `oxen add schema` metadata, and `oxen rm` in a subdirectory -- Enabled pushing, pulling, merging of [subtrees](/features/workspaces#remote-workspaces) +- Enabled pushing, pulling, merging of [subtrees](/getting-started/workspaces#remote-workspaces) - -```bash CLI -oxen init -oxen config --set-remote origin https://hub.oxen.ai/ox/ImageNet-1k -oxen workspace create --name extra_images # Create a workspace on a repository without pulling its contents -oxen workspace add new_images/ --workspace-name extra_images -oxen workspace commit -m "Add new images to dataset" -n extra_images -``` - -```python Python -from oxen import RemoteRepo -from oxen import Workspace - -repo = RemoteRepo("ox/ImageNet-1k") # Host defaults to 'hub.oxen.ai' -workspace = Workspace(repo, "extra-images") -workspace.add("new_images/") - -status = workspace.status() # View the changes staged to the workspace -print(status.added_files()) - -workspace.commit("Add new images to dataset") -``` - - -They're also useful for doing imports of large amounts of files to a server. When you use `oxen workspace add`, you're not writing any data to your local machine – Oxen processes and uploads the files directly to the remote. This allows you to avoid copying the files to a local repository like you would if you use the `add --> commit --> push` workflow, speeding up the operation and saving space on your machine - - - -```bash CLI -oxen init -oxen config --create-remote --host hub.oxen.ai --scheme https --name ox/ImageNet-1k --add_readme -oxen workspace create # Create a workspace to import the data. This will return a workspace ID -oxen workspace add images/ --workspace-id [WORKSPACE_ID] -oxen workspace commit -m "Import 1 million images" -w [WORKSPACE_ID] -``` - -```python Python -from oxen import RemoteRepo -from oxen import Workspace - -repo = RemoteRepo("ox/ImageNet-1k", files="README.md") -workspace = Workspace(repo) -workspace.add("images/") - -status = workspace.status() -print(status.added_files()) - -workspace.commit("Import 1 million images") -``` - - - -## Instantiating a Workspace - -Create a workspace with a remote repository and a branch name. - - - -```python Python -from oxen import RemoteRepo -from oxen import Workspace - -repo = RemoteRepo("ox/CatDogBBox") -workspace = Workspace(repo, "add-images") -``` - -```bash CLI -oxen workspace create -w my-workspace-name -b add-images -``` - - - -If no branch name is provided, the default branch is used - - - -```python Python -from oxen import RemoteRepo -from oxen import Workspace - -repo = RemoteRepo("ox/CatDogBBox") -workspace = Workspace(repo) -``` - -```bash CLI -oxen workspace create -``` - - - -On the server, this will base the workspace on the latest commit of the branch. As such, you can only create workspaces using branches that exist on the remote. - -Optionally, you can provide a name for the workspace - - - -```python Python -from oxen import RemoteRepo -from oxen import Workspace - -repo = RemoteRepo("ox/CatDogBBox") -workspace = Workspace(repo, name="my-workspace-name") -``` - -```bash CLI -oxen workspace create -n my-workspace-name -``` - - -Named workspaces are persistent. When you commit a named workspace, its `commit_id` will be updated to the newly created commit. Use a named workspace if you want make multiple commits with the same workspace - -## List Workspaces - -You can list the workspaces for a remote using `oxen workspace list` - - - -```bash CLI -oxen workspace list -r my_remote # Defaults to `origin` if no remote is provided -``` - - - -## Adding Files - -Use `oxen workspace add` to stage files to the workspace. This uploads the files' contents directly to the server - - - -```python Python -from oxen import RemoteRepo -from oxen import Workspace - -repo = RemoteRepo("ox/CatDogBBox") -workspace = Workspace(repo, "add-images") -workspace.add("/path/to/image.png") -status = workspace.status() -print(status.added_files()) -``` - -```bash CLI -oxen workspace add image.png -w my-workspace-id -oxen workspace status -w my-workspace-id -``` - - - -### Removing staged files - -If you want to remove files staged to a workspace, you can unstage them with `oxen workspace rm --staged`. - - - -```python Python -from oxen import RemoteRepo -from oxen import Workspace - -repo = RemoteRepo("ox/CatDogBBox") -workspace = Workspace(repo, "add-images") -workspace.rm("image.jpg") -status = workspace.status() -print(status.added_files()) # 'images.jpg' will no longer be listed -``` - -```bash CLI -oxen workspace rm --staged image.jpg -w my-workspace-id -``` - - - -### Removing files from the base repo - -On the other hand, you can use workspaces to remove files from the repository they're based on. Stage files for removal with `oxen workspace rm` - - - -```bash CLI -oxen workspace rm image.jpg -w my-workspace-id -``` - - - -## Commit Changes - -When you're finished staging changes, you can commit the workspace to merge them into the base repo. This will create a new commit on the remote branch. - - - -```python Python -from oxen import RemoteRepo -from oxen import Workspace - -repo = RemoteRepo("ox/CatDogBBox") -workspace = Workspace(repo, "add_images") -workspace.commit("adding an image using a workspace", "add_images") -``` - -```bash CLI -oxen workspace commit -m "adding an image" -w my-workspace-id -b add_images -``` - - -If you don't provide a branch, the commit will be made to a new branch on the remote - - - -```python Python -from oxen import RemoteRepo -from oxen import Workspace - -repo = RemoteRepo("ox/CatDogBBox") -workspace = Workspace(repo, "my_branch") -workspace.commit("adding an image using a workspace") # No branch is provided, so this will create a new branch -``` - -```bash CLI -oxen workspace commit -m "adding an image" -w my-workspace-id # No branch is provided, so this will create a new branch -``` - -If a workspace doesn't have a name, it will be deleted on commit. If it's named, it will be updated to point to the newly created commit - - - -🎉 You have now committed data to the remote repo! - -Note: If merge conflicts are found, the workspace commit will fail, and you'll have to resolve them to continue diff --git a/docs.json b/docs.json new file mode 100644 index 0000000..e09aa3c --- /dev/null +++ b/docs.json @@ -0,0 +1,261 @@ +{ + "$schema": "https://mintlify.com/docs.json", + "theme": "mint", + "name": "Oxen.ai", + "colors": { + "primary": "#0E87CB", + "light": "#0E87CB", + "dark": "#161616" + }, + "favicon": "/favicon.png", + "navigation": { + "tabs": [ + { + "tab": "Documentation", + "groups": [ + { + "group": "Get Started", + "pages": [ + "getting-started/intro", + { + "group": "⚡️ Models", + "pages": [ + "getting-started/inference", + "examples/inference/chat_completions", + "examples/inference/vision_language_models", + "examples/inference/image_generation", + "examples/inference/image_editing", + "examples/inference/video_generation" + ] + }, + { + "group": "🛢️ Data", + "pages": [ + "getting-started/data", + "examples/data/versioning", + "examples/data/datasets", + "examples/data/workspaces", + "examples/data/performance", + "concepts/diffs", + "concepts/file_metadata" + ] + }, + { + "group": "🛠️ Fine-Tuning", + "pages": [ + "getting-started/fine-tuning", + "examples/fine-tuning/text_generation", + "examples/fine-tuning/chat_completions", + "examples/fine-tuning/image_understanding", + "examples/fine-tuning/image_generation", + "examples/fine-tuning/image_editing", + "examples/fine-tuning/video_generation" + ] + } + ] + }, + { + "group": "Developer Tools", + "pages": [ + "getting-started/install", + { + "group": "💻 Command Line Interface", + "pages": [ + "getting-started/command-line/setup", + "getting-started/command-line/start_repository", + "getting-started/command-line/track_changes", + "getting-started/command-line/branches", + "getting-started/command-line/sync_remote", + "getting-started/command-line/workspaces", + "getting-started/command-line/maintenance", + "getting-started/command-line/debugging" + ] + }, + { + "group": "🐍 Python", + "pages": [ + "python-api/index", + "python-api/clone", + "python-api/data_frame", + "python-api/datasets", + "python-api/df_utils", + "python-api/diff/diff", + "python-api/diff/line_diff", + "python-api/diff/tabular_diff", + "python-api/diff/text_diff", + "python-api/init", + "python-api/oxen_fs", + "python-api/remote_repo", + "python-api/repo", + "python-api/repositories", + "python-api/workspace" + ] + }, + "getting-started/oxen-server" + ] + } + ] + }, + { + "tab": "Inference API", + "groups": [ + { + "group": "Inference API", + "pages": [ + "inference-api/overview", + { + "group": "Quick Starts", + "pages": [ + "inference-api/quickstart/chat", + "inference-api/quickstart/image-generation", + "inference-api/quickstart/video-generation", + "inference-api/quickstart/async-queue" + ] + }, + { + "group": "API Reference", + "pages": [ + "inference-api/reference/chat_completions", + "inference-api/reference/image_generation", + "inference-api/reference/image_editing", + "inference-api/reference/video_generation", + "inference-api/reference/async_queue", + "inference-api/reference/models/overview", + "inference-api/reference/model-references" + ] + }, + { + "group": "Model Walkthroughs", + "pages": [ + "inference-api/reference/models/walkthroughs/overview", + "inference-api/reference/models/walkthroughs/kling_o3_pro_reference_to_video", + "inference-api/reference/models/walkthroughs/kling_o3_pro_video_to_video_edit", + "inference-api/reference/models/walkthroughs/seedance_2_reference_to_video", + "inference-api/reference/models/walkthroughs/topaz_starlight_precise_2_5" + ] + } + ] + } + ] + }, + { + "tab": "Repository API", + "openapi": { + "source": "https://hub.oxen.ai/api/_spec/oxen_server_openapi.json", + "directory": "http-api" + }, + "groups": [ + { + "group": "Repository API", + "pages": [ + "http-api/index", + "http-api/example" + ] + } + ] + }, + { + "tab": "Fine-Tuning API", + "openapi": { + "source": "https://hub.oxen.ai/api/_spec/oxen_hub_api.json", + "directory": "fine-tuning-api" + }, + "groups": [ + { + "group": "Fine-Tuning API", + "pages": [ + "fine-tuning-api/overview", + { + "group": "Quick Starts", + "pages": [ + "fine-tuning-api/quickstart/text", + "fine-tuning-api/quickstart/image-generation", + "fine-tuning-api/quickstart/image-editing", + "fine-tuning-api/quickstart/video" + ] + }, + { + "group": "Tutorials", + "pages": [ + "fine-tuning-api/tutorials/01_fine_tuning", + "fine-tuning-api/tutorials/03_fine_tuning_image_generation", + "fine-tuning-api/tutorials/02_fine_tuning_image" + ] + }, + { + "group": "API Reference", + "pages": [ + "fine-tuning-api/reference/text_generation", + "fine-tuning-api/reference/text_chat_messages", + "fine-tuning-api/reference/image_generation", + "fine-tuning-api/reference/image_editing", + "fine-tuning-api/reference/multi_image_editing", + "fine-tuning-api/reference/image_to_text", + "fine-tuning-api/reference/image_to_video", + "fine-tuning-api/reference/text_to_video" + ] + }, + "fine-tuning-api/parameters" + ] + } + ] + } + ], + "global": { + "anchors": [ + { + "anchor": "Documentation", + "href": "https://docs.oxen.ai", + "icon": "book-open-cover" + }, + { + "anchor": "Blog", + "href": "https://blog.oxen.ai", + "icon": "newspaper" + }, + { + "anchor": "GitHub", + "href": "https://github.com/Oxen-AI/Oxen", + "icon": "github" + } + ] + } + }, + "logo": { + "light": "/logo/light.svg", + "dark": "/logo/dark.svg" + }, + "background": { + "color": { + "dark": "#0A0A0A" + } + }, + "navbar": { + "links": [ + { + "label": "Support", + "href": "https://discord.com/invite/s3tBEn7Ptg" + } + ], + "primary": { + "type": "button", + "label": "Sign Up", + "href": "https://oxen.ai/register" + } + }, + "footer": { + "socials": { + "twitter": "https://twitter.com/oxen_ai", + "github": "https://github.com/Oxen-AI/Oxen", + "linkedin": "https://www.linkedin.com/company/oxenai/" + } + }, + "integrations": { + "ga4": { + "measurementId": "G-H6D2C08EKP" + }, + "posthog": { + "apiKey": "phc_Do6VnS78puWwBQuPcJ7TkjixBZwsV4Xxc3ABHVNvjmE" + } + } +} diff --git a/getting-started/datasets.mdx b/examples/data/datasets.mdx similarity index 85% rename from getting-started/datasets.mdx rename to examples/data/datasets.mdx index 5c98793..00a6643 100644 --- a/getting-started/datasets.mdx +++ b/examples/data/datasets.mdx @@ -2,19 +2,21 @@ title: '📊 Datasets' --- -In Oxen.ai, datasets are the foundation of improving your models. They are the ground truth to [evaluate](/getting-started/evaluation) your models on. They are the starting point for any model [fine-tuning](/getting-started/fine-tuning) loop. Oxen.ai allows you to version, query, and edit your datasets with an easy to use web interface as well as a command line tools and python library. +In Oxen.ai, datasets are used to structure your unstructured data. Any tabular data file will automatically be turned into a collaborative database that can be versioned and used to organize data, can be queried, or fed in as training data for models. + +Oxen.ai datasets are accessible with an easy to use web interface as well as a command line tools, python, and HTTP library. ## Repositories vs Datasets -Repositories are the top level container for your datasets. Similar to GitHub, they are just a collection of versioned files and directories. +Repositories are the top level container for your datasets. They are a collection of versioned files and directories. Oxen.ai Repo -The difference is that in Oxen.ai, files with the extensions `csv`, `tsv`, `jsonl` and `parquet` come to life. These dataset files can be multi-modal containing links to images, audio, and PDFs. You can query them in natural language, and edit them like a spreadsheet. +Certain files within Oxen with the extensions `csv`, `tsv`, `jsonl` and `parquet` gain super powers. These dataset files can be multi-modal containing links to images, audio, and PDFs. You can query them in natural language, and edit them like a spreadsheet. Image Net -Under the hood we turn these raw files into a really lightweight database that can be queried, edited, versioned, and downloaded. +Under the hood we turn these raw files into a lightweight database that can be queried, edited, versioned, and downloaded. ## View Your Dataset @@ -86,38 +88,10 @@ oxen push -To perform **write operations** on datasets, you need to be an editor on the repository and have your username and API key set. You can set your username and API key using the [CLI](/getting-started/cli) or [Python library](/python-api/index). - - -### Configure Your Username - - - -```python Python -from oxen.user import config_user -config_user("Bessie Oxington", "bessie@oxen.ai") -``` - -```bash CLI -oxen config --name "Bessie Oxington" --email "bessie@yourcomany.com" -``` - - - -### Configure Your API Key - - - -```python Python -from oxen.auth import config_auth -config_auth("YOUR_AUTH_TOKEN") -``` - -```bash CLI -oxen config --auth hub.oxen.ai -``` +To perform **write operations** on datasets, you need to be an editor on the repository and have your username and API key set. You can set your username and API key using the [CLI](/getting-started/command-line/setup) or [Python library](/python-api/index). - +Read more about [Authentication & Authorization](/getting-started/auth). + ## Using `fsspec` @@ -178,39 +152,6 @@ df["answer"] = df["answer"].apply(lambda x: x.upper()) df.to_parquet("oxen://openai:gsm8k@main/gsm8k_test_new.parquet") ``` -## Editing Your Dataset - -You can edit your dataset directly from the UI by clicking the pencil icon in the upper right of the dataset viewer. This will open the file in an editor that will allow to add, edit, and delete rows and columns. - -Editing a dataset - -The editor will not commit any changes to the repository until you use the "Commit" button to write a message and save your changes. - -Editing a dataset - -## Use LLMs to Augment Your Dataset - -In Oxen.ai, you can generate new columns and rows using LLMs. This is a great way to automatically label your dataset or generate training data for small LLMs from larger models. Click the "Actions" button and select "Run Inference". - -Oxen.ai Evaluation - -Simply select a model, write a prompt, and run the model row by row on the dataset. - -Oxen.ai Evaluation - - -## Query Your Dataset - -To query your dataset, write a question in plain English in the search bar. This will automatically translate the question into a SQL query and apply it to the view of your data. For example, you can look at the distribution of question types by asking: - -``` -What are all the categories sorted by count? -``` - -Where to find Text2SQL - -If the query engine makes a mistake, no worries! You can edit the SQL query to get the results you want. - ## Your Dataset is a Database Datasets look like raw files on the surface, but some of their superpowers come from the fact that Oxen.ai can index them into a [DuckDB](https://duckdb.org/) database on the remote server. This allows you to query your dataset directly with SQL. @@ -302,8 +243,7 @@ for row in results: print(row["prompt"]) ``` -If you don't have an embedding column, you can either use a [Python Notebook](/examples/notebooks/compute_text_embeddings) to compute them or use an [Evaluation](/getting-started/evaluation) to compute them. - +If you don't have an embedding column, you can compute one using an Evaluation on the Oxen.ai platform. ## Rendering Images and Links @@ -315,4 +255,39 @@ In order to enable the rendering, you need to edit the `render` function in the Edit render function -This will save metadata to the repository that will be used to render the images and links. To programmatically set the render function, checkout the [file metadata documentation](/concepts/file_metadata). \ No newline at end of file +This will save metadata to the repository that will be used to render the images and links. To programmatically set the render function, checkout the [file metadata documentation](/concepts/file_metadata). + +## Using the UI + +### Editing Your Dataset + +You can edit your dataset directly from the UI by clicking the pencil icon in the upper right of the dataset viewer. This will open the file in an editor that will allow to add, edit, and delete rows and columns. + +Editing a dataset + +The editor will not commit any changes to the repository until you use the "Commit" button to write a message and save your changes. + +Editing a dataset + +### Use LLMs to Augment Your Dataset + +In Oxen.ai, you can generate new columns and rows using LLMs. This is a great way to automatically label your dataset or generate training data for small LLMs from larger models. Click the "Actions" button and select "Run Inference". + +Oxen.ai Evaluation + +Simply select a model, write a prompt, and run the model row by row on the dataset. + +Oxen.ai Evaluation + + +### Query Your Dataset + +To query your dataset, write a question in plain English in the search bar. This will automatically translate the question into a SQL query and apply it to the view of your data. For example, you can look at the distribution of question types by asking: + +``` +What are all the categories sorted by count? +``` + +Where to find Text2SQL + +If the query engine makes a mistake, no worries! You can edit the SQL query to get the results you want. diff --git a/features/performance.mdx b/examples/data/performance.mdx similarity index 99% rename from features/performance.mdx rename to examples/data/performance.mdx index f04129a..54f3f82 100644 --- a/features/performance.mdx +++ b/examples/data/performance.mdx @@ -1,5 +1,5 @@ --- -title: 🔥 CLI Performance +title: 🔥 Performance --- # 🖼️ 1 Million Files Benchmark diff --git a/examples/data/versioning.mdx b/examples/data/versioning.mdx new file mode 100644 index 0000000..e77d5be --- /dev/null +++ b/examples/data/versioning.mdx @@ -0,0 +1,842 @@ +--- +title: '💾 Version Control' +description: 'Oxen.ai is built on top of a blazing fast data version control system that allows you to version, branch, and share datasets, model weights, and experiments with your team.' +--- + +Oxen's [open source data version control system](https://github.com/Oxen-AI/Oxen) shines at workflows and data sizes where git or git-lfs fall short. The interface is inspired by git, so that it is easy to learn for engineers, but has a few core differences. Oxen is built from the ground up to handle large datasets with many files or large csvs, parquet files, or other large binary blobs like model weights, videos or 3D assets. + +The developer tools come with a [CLI](/examples/data/versioning#versioning-101), [HTTP APIs](/http-api), and [Python library](/python-api) to make it easy to integrate into your workflow. + +## Versioning 101 + +On the surface, `oxen` looks a lot like `git`. Users can add, commit, data locally then push to a remote server. Similar to git, by default oxen will create a local copy of the data on your machine in your `.oxen` directory before pushing to the remote server. + + + ```bash CLI + oxen init + oxen add lotsa_data/ + oxen commit -m "adding too much data for git" + # Create the remote on hub.oxen.ai (or `oxen create-remote --name /`) + # and wire it up before pushing: + oxen config --set-remote origin https://hub.oxen.ai// + oxen push origin main + ``` + + ```python Python + from oxen import Repo + + repo = Repo(".") + repo.init() + repo.add("lotsa_data/") + repo.commit("adding too much data for git") + repo.set_remote("origin", "https://hub.oxen.ai//") + repo.push() + ``` + + +The first main difference is that `oxen` comes with a remote `oxen-server` that user's can sync data to. This server also allows you to upload data directly without making local copies. + + + ```bash CLI + SYNC_DIR=/path/to/data oxen-server start -p 3000 -i 0.0.0.0 + ``` + + +Say we had already pushed a large dataset to the remote server, and simply wanted to to add a file to a large dataset like ImageNet with [1 Million Files](/examples/data/performance). You do not want to wait to clone all the files locally just to add yours to the server. + + + + ```python Python + from oxen import RemoteRepo + + # Connect to the remote client + repo = RemoteRepo("my-username/my-repo") + # Add the images to the workspace without committing. + # Pass `dst=` so the files land under `images/` on the remote. + repo.add("images/image_1_000_001.png", dst="images/") + repo.add("images/image_1_000_002.png", dst="images/") + # Commit the remote changes + repo.commit("Adding the 1,000,001st image to the dataset") + ``` + + ```bash CLI + # Assuming you already are in a local repository with a remote configured + # (run `oxen config --set-remote origin ` if you haven't). + oxen workspace create --name add-image --branch main + # Stage multiple files into the workspace before committing + oxen workspace add images/image_1_000_001.png --workspace-name add-image + oxen workspace add images/image_1_000_002.png --workspace-name add-image + # Commit the remote changes + oxen workspace commit \ + -m "Adding the 1,000,001st image to the dataset" \ + --workspace-name add-image \ + --branch main + ``` + + +This is just one example of how Oxen.ai enables a more developer friendly workflow for large datasets. There are also optimizations under the hood such as parallel file transfer, scalable merkle trees, and data deduplication to make Oxen go brrr (or mooo?). + +## Interfaces + +The server exposes a REST API that can be used to interact with data. Oxen.ai's clients include a [command line interface](/getting-started/command-line/start_repository), as well as bindings for [Rust](https://github.com/Oxen-AI/Oxen) 🦀, [Python](/python-api) 🐍, and [HTTP interfaces](/http-api) 🌎 to make it easy to integrate into your workflow. + +## Installation + +Oxen makes versioning your datasets as easy as versioning your code. You can install through homebrew or pip or from our [releases page](https://github.com/Oxen-AI/Oxen/releases). + + + +```bash CLI +brew install oxen +``` + +```bash Python +pip install oxenai +``` + + + +## Remote Workflow + +Centralized version control systems like Oxen.ai allow you to have remote first workflows where you do not need to have a fully copy of the data on your local machine. Decentralized version control systems like git by default duplicate all the data to every node in your network. + +Oxen Remote and Local Workflow + +While the decentralized nature of git makes it easy to maintain full copies of the history across many machines, this is not practical for large datasets. Oxen was designed from the ground up to be able to seamlessly switch between local and remote (centralized) workflows. Only clone what you need, and contribute back to the remote repository when you are done. + + +### Create a Remote Repository + +If you do not already have a remote repository, you can create one with a single `README.md` and initial commit so it is immediately cloneable. + + + +```python Python +from oxen import RemoteRepo + +# RemoteRepo.create is an instance method — construct first, then call create. +# The Python client adds a README.md and initial commit by default. +repo = RemoteRepo("my-user/my-repo-name") +repo.create() +``` + +```bash CLI +# The CLI defaults to an empty repo, so pass --add_readme to include a +# README.md and initial commit (the equivalent of the Python default). +oxen create-remote --name my-user/my-repo-name --add_readme +``` + +```bash cURL +curl -X POST -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + https://hub.oxen.ai/api/repos \ + -d '{ + "namespace": "my-user", + "name": "my-repo-name", + "description": "A repository for image classification", + "is_public": true + }' +``` + + + +If you want to create an empty repository — with no `README.md` and no initial commit — pass `empty=True` from Python, or simply omit `--add_readme` from the CLI. + + + +```python Python +from oxen import RemoteRepo + +repo = RemoteRepo("my-user/my-repo-name") +repo.create(empty=True) +``` + +```bash CLI +# The CLI default is an empty repo (no README, no commits). +oxen create-remote --name my-user/my-repo-name +``` + +```bash cURL +# The HTTP API creates a bare empty repository by default — there is no +# `empty` flag because no README is ever added server-side. +curl -X POST -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + https://hub.oxen.ai/api/repos \ + -d '{ + "namespace": "my-user", + "name": "my-repo-name" + }' +``` + + + +The reason you may want to start with an empty repository is if you already started a local repository and want to push it to the remote repository. This local repository already has a commit history. When pushing to a remote, commit histories must match. Hence we need to start with an empty remote repository without any commits if we want to push a local repository with a commit history. + + +### Add Files + +You can add files to the remote repository by passing the path to the file and the destination directory. This will upload the file to the remote repository and stage it for commit. + + + +```python Python +from oxen import RemoteRepo +repo = RemoteRepo("ox/CatDogBBox") +repo.add("images/000000002754.jpg", dst="images/") +``` + +```bash CLI +# Stage a file into a workspace before committing +oxen workspace add images/000000002754.jpg --workspace-name add-image +``` + +```bash cURL +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/file/:branch/:dst_dir +# Uploads files via multipart form AND commits in one call. +curl -X PUT -H "Authorization: Bearer $TOKEN" \ + -F "files[]=@images/000000002754.jpg" \ + -F "name=Bessie Oxington" \ + -F "email=bessie@oxen.ai" \ + -F "message=Adding the 1,000,001st image to the dataset" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/file/main/images +``` + + + +### Commit Changes + +You can commit changes to the remote repository by passing a message. + + + +```python Python +repo.commit("Adding the 1,000,001st image to the dataset") +``` + +```bash CLI +oxen workspace commit \ + -m "Adding the 1,000,001st image to the dataset" \ + --workspace-name add-image +``` + +```bash cURL +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/workspaces/:workspace_id/merge/:branch +# Commits the staged files in the workspace and merges them into the branch. +curl -X POST -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/workspaces/$WORKSPACE_ID/merge/main \ + -d '{ + "author": "Bessie Oxington", + "email": "bessie@oxen.ai", + "message": "Adding the 1,000,001st image to the dataset" + }' +``` + + + +### File Exploration + +To see the files in the remote repository you can use `ls`. + + + +```python Python +from oxen import RemoteRepo + +repo = RemoteRepo("ox/CatDogBBox") +print(repo.ls()) +``` + +```bash cURL +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/dir/:revision/:path +# :revision can be a branch name or commit hash. Pass an empty path for the repo root. +curl -X GET -H "Authorization: Bearer $TOKEN" \ + "https://hub.oxen.ai/api/repos/ox/CatDogBBox/dir/main/" +``` + + + +To view a specific directory you can pass the directory name to the `ls` method. + +Note: the directories are paginated so you will need to use the `page_num` parameter to view the next page of results. +There are also `total_pages`, `page_number`, and `total_entries` attributes that give you information about the pagination. + + + +```python Python +from oxen import RemoteRepo + +repo = RemoteRepo("ox/CatDogBBox") +images_results = repo.ls("images", page_num=1, page_size=10) +print(images_results) +print(images_results.total_pages) +print(images_results.page_number) +print(images_results.total_entries) +``` + +```bash cURL +# Pass `page` and `page_size` as query params for pagination. +curl -X GET -H "Authorization: Bearer $TOKEN" \ + "https://hub.oxen.ai/api/repos/ox/CatDogBBox/dir/main/images?page=1&page_size=10" +``` + + + +### Downloading Data + +You can download individual files and folders if you do not need the entire data repository for your job. + + + +```bash CLI +oxen download ox/CatDogBBox annotations/test.csv +``` + +```python Python +from oxen import RemoteRepo +repo = RemoteRepo("ox/CatDogBBox") +repo.download("annotations/test.csv") +``` + +```bash cURL +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/file/:revision/:path +# :revision can be a branch name or commit hash +curl -X GET -H "Authorization: Bearer $TOKEN" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/file/main/annotations/test.csv \ + -o ~/Downloads/test.csv +``` + + + +### Checkout a Branch + +If you have a data on a separate branch that you want to view you can checkout a branch by passing the branch name to the `checkout` method. + + + +```python Python +from oxen import RemoteRepo +repo = RemoteRepo("ox/CatDogBBox") +repo.checkout("my-branch-name") +print(repo.ls()) +``` + +```bash CLI +oxen checkout my-branch-name +``` + +```bash cURL +# There is no HTTP "checkout" — branches are referenced by name in the URL of +# subsequent API calls. To verify a branch exists and read its current commit: +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/branches/:branch_name +curl -X GET -H "Authorization: Bearer $TOKEN" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/branches/my-branch-name +``` + + + +### Create a New Branch + +The `checkout` method also allows you to create a new branch if the branch does not exist. + + + +```python Python +from oxen import RemoteRepo +repo = RemoteRepo("ox/CatDogBBox") +repo.checkout("my-new-branch-name", create=True) +print(repo.ls()) +``` + +```bash CLI +oxen checkout -b my-new-branch-name +``` + +```bash cURL +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/branches +curl -X POST -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/branches \ + -d '{ + "from_name": "main", + "new_name": "my-new-branch-name" + }' +``` + + + +### View Branches + +To see all the branches in the remote repository you can use the `branches` method. + + + +```python Python +from oxen import RemoteRepo +repo = RemoteRepo("ox/CatDogBBox") +print(repo.branches()) +``` + +```bash CLI +# List both local and remote branches from inside a local clone. +# To list only remote branches, use: `oxen branch -r origin` +oxen branch -a +``` + +```bash cURL +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/branches +curl -X GET -H "Authorization: Bearer $TOKEN" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/branches +``` + + + +### Workspaces + +Under the hood, the way that we enable remote collaboration is through a concept called a [workspace](/getting-started/workspaces). A workspace can be thought of as an uncommitted working directory that is stored on the server. Just like you can `add` files before committing locally, you can `add` files to a workspace on the remote server before committing. This allows you to build up a set of changes remotely before committing them in bulk. + + + +```python Python +from oxen import RemoteRepo +from oxen import Workspace + +repo = RemoteRepo("ox/CatDogBBox") +# The second positional arg to Workspace is the BRANCH the workspace is tied +# to. The optional `workspace_name` gives the workspace a stable identifier +# so you can reattach to it later by name. +workspace = Workspace(repo, "main", workspace_name="add-images") +workspace.add("/path/to/image.png") +status = workspace.status() +print(status.added_files()) +# Commits land on the workspace's branch — "main" in this example. +workspace.commit("Adding the 1,000,001st image to the dataset") +``` + +```bash CLI +# Run from inside a local clone of the repo. +# Workspaces can be addressed by name (-n) or by their server-assigned id (-w). +oxen workspace create -n add-images --branch main +oxen workspace add image.png -n add-images +oxen workspace status -n add-images +oxen workspace commit -n add-images -m "Adding the 1,000,001st image to the dataset" --branch main +``` + +```bash cURL +# 1. Get or create a workspace from a base branch +curl -X PUT -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/workspaces/get_or_create \ + -d '{ + "branch_name": "main", + "name": "add-images" + }' + +# 2. Upload and stage a file into the workspace at a destination path +# URL Format: /api/repos/:namespace/:repo_name/workspaces/:workspace_id/files/:dst_path +curl -X POST -H "Authorization: Bearer $TOKEN" \ + -F "file=@/path/to/image.png" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/workspaces/$WORKSPACE_ID/files/images + +# 3. Commit the workspace and merge it into the target branch +curl -X POST -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/workspaces/$WORKSPACE_ID/merge/main \ + -d '{ + "author": "Bessie Oxington", + "email": "bessie@oxen.ai", + "message": "Adding the 1,000,001st image to the dataset" + }' +``` + + + +The `RemoteRepo.add` method is a shortcut for creating a workspace and adding files to it. It creates a ephemeral workspace and adds the files to it, and deletes the workspace after committing. + +To learn more about workspaces, check out the [workspaces documentation](/getting-started/workspaces). + +### Clone a Remote Repository + +Remote repositories are identified by a remote URL. This is the URL that you can use to clone the repository. + + + +```python Python +from oxen import RemoteRepo + +remote_repo = RemoteRepo("my-user/my-repo-name") +remote_repo.create(empty=True) +# `url` is a property, not a method — no parentheses. +print(remote_repo.url) +``` + +```bash CLI +# The remote URL follows the format below. View it with: +oxen config --get-remote +``` + +```bash cURL +# Verify the remote repo exists and fetch its metadata. +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name +curl -X GET -H "Authorization: Bearer $TOKEN" \ + https://hub.oxen.ai/api/repos/my-user/my-repo-name +``` + + + +You can use this URL to clone the repository. + +```python Python +# Local Repository +from oxen import Repo +from oxen import RemoteRepo + +remote_repo = RemoteRepo("my-user/my-repo-name") +remote_repo.create(empty=True) +repo_url = remote_repo.url + +local_repo = Repo("/path/to/local/repo") +local_repo.clone(repo_url) +``` + +Or you can set the remote of an existing local repository to point at the remote repository. + +```python Python +from oxen import Repo +from oxen import RemoteRepo + +remote_repo = RemoteRepo("my-user/my-repo-name") +remote_repo.create(empty=True) + +local_repo = Repo("/path/to/local/repo") +local_repo.set_remote("origin", remote_repo.url) +``` + +## Local Workflow + +Local workflow looks a lot like git. The downside is that you have to duplicate all the data locally. The good news is that oxen is much faster than git for large files and repositories. + +### Initialize User + +Each change you make will be associated with a name and email. Set them before you get started so you know who changed what. The user data is saved by default in `~/.config/oxen/user_config.toml`. + + + +```bash CLI +oxen config --name "Bessie Oxington" --email "bessie@yourcomany.com" +``` + +```python Python +from oxen.user import config_user +config_user("Bessie Oxington", "bessie@oxen.ai") +``` + + + +### Create Repository + +Initialize your first Oxen repository, and commit the first version of your data. + + + +```bash CLI +# Initialize the repository +oxen init +# Write data to a file +printf '%s\n' 'name,age' 'bob,12' 'jane,13' > people.csv +# Stage the data for commit +oxen add people.csv +# Commit the changes with a message +oxen commit -m "Adding my data" +``` + +```python Python +import os +from oxen import Repo + +# Instantiate a Repo object and create the repo directory +repo = Repo("/path/to/data", mkdir=True) +# Initialize the repository +repo.init() +# Write data to a file +data_path = os.path.join(repo.path, "people.csv") +with open(data_path, "w") as f: + f.write("name,age\nbob,12\njane,13") +# Stage the data for commit +repo.add(data_path) +# Commit the changes with a message +repo.commit("Adding my data") +``` + + + +### Create Branch + +It is good practice to create a new branch for changes you make to your data. This will allow you to easily compare the parallel versions of your data over time. + + + +```bash CLI +# Checkout a branch named `modify-data` +oxen checkout -b modify-data +# Overwrite data in existing file +printf '%s\n' 'name,age' 'bob,12' 'jane,13' 'joe,14' > people.csv +``` + +```python Python +import os +from oxen import Repo + +repo = Repo("/path/to/data") +# Create a new branch called `modify-data` +repo.checkout("modify-data", create=True) +# Overwrite data in existing file +data_path = os.path.join(repo.path, "people.csv") +with open(data_path, "w") as f: + f.write("name,age\nbob,12\njane,13\njoe,14") +``` + + + +### Delete Branch + +Once finished with a branch, you can delete it. + + + +```bash CLI +# Checkout main branch locally +oxen checkout main +# Delete 'other_branch' locally +oxen branch -d new_branch # may need -D if branch is not merged into main +# Delete branch in remote repo +oxen push origin --delete new_branch +``` + +```python Python +import os +from oxen import Repo + +# Instantiate a Repo object +repo = Repo("/path/to/data") +# Checkout the main branch +repo.checkout("main") +# Delete new_branch. If it has commits not merged into main, oxen will +# refuse the delete — fully merge first, or use the CLI's -D for a force-delete. +repo.branch('new_branch', delete=True) +# Delete remote branch +repo.push('origin', 'new_branch', delete=True) +``` + +```bash cURL +# Deletes the branch on the remote repository. +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/branches/:branch_name +curl -X DELETE -H "Authorization: Bearer $TOKEN" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/branches/new_branch +``` + + + +### Status + +Check the current state of your local repository by using `oxen status`. Instead of printing out every file that was added/modified/removed (which is unsustainable for large repositories), `oxen` summarizes the changes and lets you page through them. + + + +```bash CLI +oxen status +``` + +```python Python +from oxen import Repo + +repo = Repo("/path/to/data") +print(repo.status()) +``` + + + +### Restore Changes + +If you are not happy with the changes you made to your data, you can restore them to the previous commit with the `oxen restore` command. + + + +```bash CLI +oxen restore --source people.csv +``` + + + +### Commit Changes + +Once you are happy with the changes you have made to your data, you can commit them to the repository with a new message. + + + +```bash CLI +oxen add people.csv +oxen commit -m "Adding Joe to the dataset" +``` + +```python Python +from oxen import Repo + +repo = Repo("/path/to/data") +# Stage the data for commit +data_path = os.path.join(repo.path, "people.csv") +repo.add(data_path) +# Commit the changes with a message +repo.commit("Adding Joe to the dataset") +``` + + + +### View Commit History + +To see the commit history of your repository, you can use the `oxen log` command. + + + +```bash CLI +oxen log +``` + +```python Python +from oxen import Repo + +# Instantiate a Repo object +repo = Repo("/path/to/data") +# Get the commit history +commits = repo.log() +``` + +```bash cURL +# View the commit history of a remote repo at a given revision. +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/commits/history/:revision +curl -X GET -H "Authorization: Bearer $TOKEN" \ + "https://hub.oxen.ai/api/repos/ox/CatDogBBox/commits/history/main?page=1&page_size=25" +``` + + + +### Checkout Main Branch + +Once you are done making changes to your data, you can return to the main branch with the `oxen checkout` command. + +Never fear, the file now has now been reverted to the inital commit again, but your changes will be saved in the branch you created. + + + +```bash CLI +oxen checkout main +``` + +```python Python +from oxen import Repo + +# Instantiate a Repo object +repo = Repo("/path/to/data") +# Checkout the main branch +repo.checkout("main") +``` + + + +### List Branches + +To see the branches in your repository, you can use the `oxen branch` command. + + + +```bash CLI +oxen branch +``` + +```python Python +from oxen import Repo + +# Instantiate a Repo object +repo = Repo("/path/to/data") +# Get the branches +print(repo.branches()) +``` + +```bash cURL +# List branches in the remote repository. +# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/branches +curl -X GET -H "Authorization: Bearer $TOKEN" \ + https://hub.oxen.ai/api/repos/ox/CatDogBBox/branches +``` + + + +### Push Data + +Once your data has been committed locally, you can sync it to the `oxen-server`. + +Oxen.ai has a web hub that allows you to collaborate on your data in the cloud. You can create a free account at [https://oxen.ai](https://oxen.ai). + + + +```bash CLI +# Go create repo at https://oxen.ai +# ... +oxen config --set-remote origin https://hub.oxen.ai// +oxen config --auth hub.oxen.ai +oxen push origin main +# to push your other branch simply change the branch name from `main` to `modify-data` +``` + +```python Python +# Go create repo at https://oxen.ai +# ... +# Set where to push the data to (replace and with your remote) +repo.set_remote("origin", "https://hub.oxen.ai//") +# Set your auth token (defaults to hub.oxen.ai host) +oxen.auth.config_auth("YOUR_AUTH_TOKEN") +# Push the changes to the remote +repo.push() +``` + + + +To learn more about setting up authentication and authorization, read our [security documentation here](/getting-started/auth). + +### Clone Data + +Clone your data faster than ever before. Oxen has been optimized to the core to make pulling large datasets as fast as possible. + + + +```bash CLI +oxen clone https://hub.oxen.ai/ox/CatDogBBox +``` + +```python Python +from oxen import Repo + +# Construct a Repo at the local destination, then clone into it. +repo = Repo("/path/to/dst") +repo.clone("https://hub.oxen.ai/ox/CatDogBBox") +``` + + + +### Pull Changes + +Only pull the changes you need. Oxen will only pull the files that have changed since the last time you pulled. + + + +```bash CLI +oxen pull origin main +``` + +```python Python +from oxen import Repo +repo = Repo("/path/to/repo") +repo.pull() +``` + + diff --git a/examples/data/workspaces.mdx b/examples/data/workspaces.mdx new file mode 100644 index 0000000..d58e5f4 --- /dev/null +++ b/examples/data/workspaces.mdx @@ -0,0 +1,302 @@ +--- +title: '📦 Workspaces' +description: 'Workspaces allow you to stage changes to a repository without having to download it locally.' +--- + +A workspace is like a working directory that lives on the Oxen server. You can `add`, `rm`, and modify files in a workspace and then commit those changes in bulk, without ever cloning the repository to your local machine. + +Under the hood, every workspace is pinned to a specific commit on a branch. All staged changes are computed relative to that commit. Staged changes survive server restarts, but they are **not** part of the repository's commit history until you commit the workspace, so they can be deleted without leaving a trace. + +## Why use a workspace? + +Reach for a workspace when you want to: + +- **Edit a repository that's too large to clone.** Stage changes to a 100GB+ repo without copying it to your machine. +- **Bulk-import data without keeping a local copy.** `oxen workspace add` uploads directly to the server, so you don't pay the disk cost of writing every file into a local `.oxen` store first. +- **Batch many changes into a single commit.** Add, remove, and modify dozens of files server-side, then land them as one atomic commit. +- **Let multiple clients contribute to one staged commit.** A named workspace can be written to by several processes or users before anyone commits — useful for workflows like labeling UIs, ingestion services, or agents producing training data. +- **Keep a long-lived staging area across multiple commits.** Named workspaces persist after commit and fast-forward to the new commit, so the same workspace can be reused over and over. + +If you're working in a repo small enough to clone and just want the normal `add → commit → push` flow, you don't need a workspace — see the [Version Control guide](/examples/data/versioning) instead. + +## Quick start + +### Add to an existing repo without cloning it + +Imagine a repository with 1 million images. Instead of cloning the data, init an empty local repo, point it at the remote, and write directly to a workspace. + + + +```bash CLI +oxen init +oxen config --set-remote origin https://hub.oxen.ai/ox/ImageNet-1k +oxen workspace create --name add_image --branch main +# Stage a single file into the images/ directory of the workspace +oxen workspace add /path/to/my_images/image.jpg --directory images/ --workspace-name add_image +# See what's staged +oxen workspace status --workspace-name add_image +# Commit the staged changes to main +oxen workspace commit -m "Add new image to images/ directory" -n add_image -b main +``` + +```python Python +from oxen import RemoteRepo +from oxen import Workspace + +repo = RemoteRepo("ox/ImageNet-1k") # Host defaults to 'hub.oxen.ai' +workspace = Workspace(repo, "extra-images") +workspace.add("new_images/") + +status = workspace.status() +print(status.added_files()) + +workspace.commit("Add new images to dataset") +``` + + + +### Bulk-import data into a fresh repo + +`oxen workspace add` never writes data to your local machine — files stream directly to the remote. This avoids the disk and time cost of `add → commit → push`, which would otherwise copy every file into a local `.oxen` store first. + + + +```bash CLI +oxen init +oxen config --create-remote --host hub.oxen.ai --scheme https --name ox/ImageNet-1k +oxen workspace create # Create a workspace to import the data. This will return a workspace ID +oxen workspace add images/ --workspace-id [WORKSPACE_ID] +oxen workspace commit -m "Import 1 million images" -w [WORKSPACE_ID] +``` + +```python Python +from oxen import RemoteRepo +from oxen import Workspace + +repo = RemoteRepo("ox/ImageNet-1k", files="README.md") +workspace = Workspace(repo) +workspace.add("images/") + +status = workspace.status() +print(status.added_files()) + +workspace.commit("Import 1 million images") +``` + + + +Workspaces can also be driven through the [HTTP API](/http-api/workspaces/get-or-create-workspace) — useful when you're building a custom client (labeling UI, ingestion daemon, agent, etc.) that needs to write to a repo without shipping the Oxen CLI or Python SDK. + +## How it works + +Every workspace references a specific commit, the same way a branch does. All your staged operations are recorded as a diff against that base commit. + +When you commit a workspace: + +1. Oxen applies your staged diff on top of the workspace's base commit to produce a new commit. +2. That new commit is added to a target branch on the remote (see [Committing changes](#committing-changes) for how the target is chosen). +3. If the target branch has advanced past the workspace's base commit, Oxen attempts to merge. Conflicts cause the commit to fail and you'll need to resolve them before retrying. + +Because workspaces are commit-scoped, two workspaces created from the same branch at different times can see completely different views of the repo. That's intentional — it gives you isolation while staging — but it also means a long-lived workspace can drift from the branch tip and accumulate conflicts. + +## Creating a workspace + +A workspace is created against a remote repository, and optionally against a specific branch. + + + +```python Python +from oxen import RemoteRepo +from oxen import Workspace + +repo = RemoteRepo("ox/CatDogBBox") +workspace = Workspace(repo, "add-images") +``` + +```bash CLI +oxen workspace create -w my-workspace-name -b add-images +``` + + + +If no branch is provided, the default branch (usually `main`) is used. + + + +```python Python +from oxen import RemoteRepo +from oxen import Workspace + +repo = RemoteRepo("ox/CatDogBBox") +workspace = Workspace(repo) +``` + +```bash CLI +oxen workspace create +``` + + + +The workspace is pinned to whatever commit the branch points at when you create it. You can only create workspaces from branches that already exist on the remote. + +### Named vs. unnamed workspaces + +Every workspace has an auto-generated **id**. You can optionally also give it a human-readable **name**. + + + +```python Python +from oxen import RemoteRepo +from oxen import Workspace + +repo = RemoteRepo("ox/CatDogBBox") +workspace = Workspace(repo, name="my-workspace-name") +``` + +```bash CLI +oxen workspace create -n my-workspace-name +``` + + + +The name matters because of two behavioral differences: + +| | Unnamed workspace | Named workspace | +| --- | --- | --- | +| Lifetime after commit | Deleted | Persists, fast-forwarded to the new commit | +| Best for | One-shot imports, throwaway staging | Long-lived staging, multi-commit or multi-client workflows | + +Use a **named** workspace when you expect to make multiple commits from the same workspace, or when several processes/users will share it. Use an **unnamed** workspace for one-off imports where you don't need it to stick around. + +### Identifying a workspace in CLI commands + +Most workspace commands need to know *which* workspace you're targeting. You can reference a workspace by either its id or its name: + +- `--workspace-id ` (short: `-w`) — reference by the auto-generated id returned from `oxen workspace create`. +- `--workspace-name ` (short: `-n`) — reference by the name you set with `--name` at create time. + +## Listing workspaces + +List the workspaces on a remote with `oxen workspace list`. + +```bash +oxen workspace list -r my_remote # Defaults to `origin` if no remote is provided +``` + +## Adding files + +`oxen workspace add` streams a file's contents directly to the server and stages it on the workspace. + + + +```python Python +from oxen import RemoteRepo +from oxen import Workspace + +repo = RemoteRepo("ox/CatDogBBox") +workspace = Workspace(repo, "add-images") +workspace.add("/path/to/image.png") +status = workspace.status() +print(status.added_files()) +``` + +```bash CLI +oxen workspace add image.png -w my-workspace-id +oxen workspace status -w my-workspace-id +``` + + + +### Unstaging a file + +To remove a file you've staged on the workspace (without touching the base repo), unstage it with `oxen workspace rm --staged`. + +```bash CLI +oxen workspace rm --staged image.jpg -w my-workspace-id +``` + +{/* TODO: the Python `Workspace.rm()` method maps to the *non-staged* server endpoint + (it stages a deletion of a file that exists in the base commit). There doesn't appear + to be a Python equivalent of `oxen workspace rm --staged` (unstage a previously + `add`ed file) — calls into `repositories::workspaces::files::unstage` aren't + exposed on the `Workspace` class. Confirm whether this is intentional or an SDK gap, + and either add a `Workspace.unstage(path)` binding or document the workaround + (e.g. delete the workspace and re-create it). */} + +### Deleting a file from the base repo + + +`oxen workspace rm` **without** `--staged` stages a deletion of a file that exists in the base repo. When you commit the workspace, that file will be removed from the branch. Use `--staged` if you only want to unstage a previously added file. + + + + +```bash CLI +oxen workspace rm image.jpg -w my-workspace-id +``` + +```python Python +from oxen import RemoteRepo +from oxen import Workspace + +repo = RemoteRepo("ox/CatDogBBox") +workspace = Workspace(repo, "add-images") +workspace.rm("image.jpg") # Stages a deletion of image.jpg from the base repo +``` + + + +## Committing changes + +Commit a workspace to land its staged changes as a new commit on the remote. + + + +```python Python +from oxen import RemoteRepo +from oxen import Workspace + +repo = RemoteRepo("ox/CatDogBBox") +workspace = Workspace(repo, "add_images") +workspace.commit("adding an image using a workspace", "add_images") +``` + +```bash CLI +oxen workspace commit -m "adding an image" -w my-workspace-id -b add_images +``` + + + +{/* TODO: the original doc contradicts itself about what happens when you commit without a target branch: + + - Line 221 of the previous version said: "If you don't provide a branch, the commit will be made to a new branch on the remote" + - The "How it works" section said: "If no branch is provided, the commit will be made to the branch the workspace was created from." + + Which is correct? If a new branch is created when none is provided, what is the new branch named (auto-generated from the workspace name? UUID?)? I've left the "new branch" behavior below for now to match the previous doc, but please confirm and I'll fix both this section and "How it works" accordingly. */} + +If you don't provide a target branch, the commit will be made to a new branch on the remote. + + + +```python Python +from oxen import RemoteRepo +from oxen import Workspace + +repo = RemoteRepo("ox/CatDogBBox") +workspace = Workspace(repo, "my_branch") +workspace.commit("adding an image using a workspace") # No branch is provided, so this will create a new branch +``` + +```bash CLI +oxen workspace commit -m "adding an image" -w my-workspace-id # No branch is provided, so this will create a new branch +``` + + + +After a successful commit: + +- An **unnamed** workspace is deleted. +- A **named** workspace is fast-forwarded to point at the new commit, so you can keep using it. + +If merge conflicts are detected (because the target branch has advanced past the workspace's base commit), the commit will fail. {/* TODO: how should users resolve a workspace merge conflict? Is there an `oxen workspace ...` command for this, or is the recommended fix to discard the workspace and re-create it off the latest commit, then re-stage? A short pointer here would help. */} + diff --git a/examples/fine-tuning/chat_completions.mdx b/examples/fine-tuning/chat_completions.mdx index e59576f..933abf9 100644 --- a/examples/fine-tuning/chat_completions.mdx +++ b/examples/fine-tuning/chat_completions.mdx @@ -61,7 +61,7 @@ Once you have uploaded your dataset, click the "Actions" button and select "Fine Next select your base model, the messages source, whether you'd like to use LoRA or not. We recommend starting with a smaller model like [Qwen3-0.6B](https://www.oxen.ai/ai/models/qwen-qwen3-0-6b) for faster iteration, or a larger model like [Llama 3.1 8B](https://www.oxen.ai/ai/models/meta-llama-3-1-8b-instruct) for better performance on complex conversations. Fine-tune first page Create Repository @@ -116,7 +116,7 @@ This will take you to the file viewer where you can download the model safetenso File Viewer -You can also automatically download the weights with the [oxen cli](/getting-started/cli) or [python library](/getting-started/python). +You can also automatically download the weights with the [oxen cli](/getting-started/command-line/start_repository) or [python library](/python-api). diff --git a/examples/fine-tuning/image_generation.mdx b/examples/fine-tuning/image_generation.mdx index cafb625..76e9d4a 100644 --- a/examples/fine-tuning/image_generation.mdx +++ b/examples/fine-tuning/image_generation.mdx @@ -155,7 +155,7 @@ This will take you to the file viewer where you can download the model `.safeten File Viewer -You can also automatically download the weights with the [oxen cli](/getting-started/cli) or [python library](/getting-started/python). +You can also automatically download the weights with the [oxen cli](/getting-started/command-line/start_repository) or [python library](/python-api). diff --git a/examples/fine-tuning/image_understanding.mdx b/examples/fine-tuning/image_understanding.mdx index fdb996f..2148226 100644 --- a/examples/fine-tuning/image_understanding.mdx +++ b/examples/fine-tuning/image_understanding.mdx @@ -13,7 +13,7 @@ Each row in this dataset should have an associated image in the repository store Dataset Format -To upload the dataset you can use the [oxen command line interface](/getting-started/cli). Here's an example of creating a repository from the command line and uploading data: +To upload the dataset you can use the [oxen command line interface](/getting-started/command-line/start_repository). Here's an example of creating a repository from the command line and uploading data: ```bash # Navigate to the directory containing your dataset diff --git a/examples/fine-tuning/video_generation.mdx b/examples/fine-tuning/video_generation.mdx index e6dc429..aca84cc 100644 --- a/examples/fine-tuning/video_generation.mdx +++ b/examples/fine-tuning/video_generation.mdx @@ -143,7 +143,7 @@ This will take you to the file viewer where you can download the model safetenso File Viewer -You can also automatically download the weights with the [oxen cli](/getting-started/cli) or [python library](/getting-started/python). +You can also automatically download the weights with the [oxen cli](/getting-started/command-line/start_repository) or [python library](/python-api). diff --git a/examples/index.mdx b/examples/index.mdx deleted file mode 100644 index 2463bc0..0000000 --- a/examples/index.mdx +++ /dev/null @@ -1,31 +0,0 @@ ---- -title: "Table of Contents" ---- - -There are many ways to use Oxen.ai in your workflow. Here are a few examples to get you going. - - - - Use a Notebook to build your own custom 👍/👎 labeling tool - - - - How to compute multimodal embeddings to search an image dataset. - - - - How to compute multimodal embeddings to search an image dataset. - - - - Extract text from PDFs with Vision LLMs - - - - How to train an LLM on your own data. - - - - How to evaluate an LLM on your own data. - - diff --git a/examples/inference/chat_completions.mdx b/examples/inference/chat_completions.mdx index c30ba72..303f0e3 100644 --- a/examples/inference/chat_completions.mdx +++ b/examples/inference/chat_completions.mdx @@ -1,5 +1,5 @@ --- -title: '💬 Chat Completions' +title: '💬 Language Models' description: 'Integrate an LLM into your application through the `/ai/chat/completions` API.' --- @@ -465,9 +465,3 @@ The API returns errors as JSON with an `error` object and a standard HTTP status } } ``` - -## Playground - -The [model playground](https://www.oxen.ai/ai/models) lets you test any model interactively before writing code. This is also a great way to test models you've [fine-tuned](/getting-started/fine-tuning) after deploying them. - -Chat Interface diff --git a/examples/notebooks/build_labeling_tool.mdx b/examples/notebooks/build_labeling_tool.mdx deleted file mode 100644 index f805847..0000000 --- a/examples/notebooks/build_labeling_tool.mdx +++ /dev/null @@ -1,131 +0,0 @@ ---- -title: 🏷️ Build a Custom Labeling Tool -description: Rate examples from your dataset and write them back to a data frame before committing. ---- - -When building AI applications, looking at data and labeling successes and failures is an important part of the development process. This is an example of a labeling workflow using the [Oxen Python API](/python-api/data_frame) to fetch rows from a data frame one by one, then writing the results back to the same data frame. We will be labeling the text of an SMS message as "spam" or "ham" depending on the content. The interface is built with native [Marimo](https://marimo.io) UI components. - -Feel free to download the code from [this Notebook](https://www.oxen.ai/datasets/MarimoNotebooks/file/main/labeling_tool.py) and run it in your own repository to follow along. The final result will look like this: - -![User Interface](/images/marimo/labeling/ui.png) - -## Fetching the Rows - -The `RemoteRepo` class along with the `DataFrame` class make it easy to fetch and write data from a data frame in a repository. Specify the namespace, repository name, and path to the data frame in order to fetch data. - -In order to write data back to the data frame, we need to specify a `workspace_name` when instantiating the `DataFrame` class. This is because the data frame will be written back to a temporary [workspace](/concepts/workspaces) before being committed. This allows you to see the changes in the UI before writing them to the commit history. - -```python -from oxen import RemoteRepo, DataFrame - -# REPLACE WITH YOUR REPOSITORY -repo_name = "username/repo_name" -path = "data.tsv" -repo = RemoteRepo(repo_name) -remote_df = DataFrame(repo, path, workspace_name="labeling_workflow") -``` - -In order to fetch the rows, we can use the `get_row` method. This will return a `Row` object at the index specified. - -```python -row = remote_df.get_row(0) -``` - -To know the number of rows in the data frame, we can use the `size()` function to determine the width and height of the data frame. - -```python -width, height = remote_df.size() -``` - - -## Iterating through the data frame - -Let's add some helper functions to increment and decrement the index, and get the row at the current index. - -```python -def increment_index(): - set_index(lambda v: v+1) - -def decrement_index() -> int: - set_index(lambda v: max(0, v - 1)) - -def get_row(remote_df, idx): - data = remote_df.get_row(idx) - return data[0] -``` - -## Updating the Rows - -The `label_picker` will call the `update_category` function when the user selects a new label. This function will update the category in the data frame. - -```python -def update_category(remote_df, id, category): - remote_df.update_row(id, {"category": category}) -``` - -## Setting up the UI - -We will keep track of which row is being labeled using the `mo.state` reactive state variable. This sets up a getter and setter for the state variable. - -```python -import marimo as mo - -get_index, set_index = mo.state(0) -``` - -Then we can use a radio button for the categories and a few buttons to move between rows. - -```python -# Get the current index from the state variable -idx = get_index() - -# Get the row at the current index -row = remote_df.get_row(idx) - -# Create a radio button for the user to select the label -label_picker = mo.ui.radio( - ["spam", "ham"], - value=data["category"], - on_change=lambda v: update_category(remote_df, data['_oxen_id'], v), -) - -# Create a button to move to the next row -next_button = mo.ui.button(label="next", on_change=lambda _: increment_index()) - -# Create a button to move to the previous row -previous_button = mo.ui.button(label="previous", on_change=lambda _: decrement_index()) - -# Display the UI -mo.vstack([ - mo.md("# Spam or Ham?"), - mo.md(f"Example: {idx}"), - mo.md(f"ID: {data['_oxen_id']}"), - mo.md(f"[View Changes]({remote_df.workspace_url()})"), - mo.ui.text_area(value=data['text'], full_width=True), - label_picker, - mo.hstack([previous_button, next_button], justify="center") -]) -``` - -## Viewing changes - -The changes will be written back to a temporary workspace. We can view the changes by clicking the "View Changes" link in the UI. This is populated with the `workspace_url` method on the `DataFrame` class. - -```python -# https://oxen.ai/{namespace}/{repo_name}/workspaces/{workspace_id}/file/{path} -remote_df.workspace_url() -``` - -Under the hood, we have indexed the data frame into a temporary read/write workspace. You can see the changes in the Oxen Diff UI and confirm them before committing. - -![Oxen Diff](/images/marimo/labeling/diff.png) - -Once you are happy with the changes, you can commit the changes to the data frame and they will be written back to the original data frame and added to the commit history. - -```python -remote_df.commit("Updated the categories from the labeling tool") -``` - -## Full Code - -The full code is available [here](https://www.oxen.ai/datasets/MarimoNotebooks/file/main/labeling_tool.py). diff --git a/examples/notebooks/chatbot_data_flywheel.mdx b/examples/notebooks/chatbot_data_flywheel.mdx deleted file mode 100644 index 0cf681b..0000000 --- a/examples/notebooks/chatbot_data_flywheel.mdx +++ /dev/null @@ -1,6 +0,0 @@ ---- -title: 💬 Chatbot Data Flywheel -description: 'How to collect/log data directly from your chatbot into an Oxen.ai dataset' ---- - -Coming soon! \ No newline at end of file diff --git a/examples/notebooks/compute_text_embeddings.mdx b/examples/notebooks/compute_text_embeddings.mdx deleted file mode 100644 index 1f37777..0000000 --- a/examples/notebooks/compute_text_embeddings.mdx +++ /dev/null @@ -1,208 +0,0 @@ ---- -title: 🔎 Compute Text Embeddings -description: 'How to compute vector embeddings for a text dataset on a GPU.' ---- - -Embeddings are a way to represent text in a numerical format as vectors. They are used in a variety of applications, including search and retrieval, clustering, labeling data and anomaly detection. - -Notebooks make it easy and fast to compute embeddings for a dataset on a GPU. If you want to follow along, you can checkout [this notebook](https://www.oxen.ai/datasets/MarimoNotebooks/file/main/compute_text_embeddings.py) and run it in your own Oxen.ai account. When running this example, try an A10 GPU with 4GB of memory and 4 CPU cores. This will allow us to compute over 1,000 embeddings per second 🔥 - -![GPU Selection](/images/marimo/embeddings/gpu.png) - -Of the time of writing, you need a specific version of sentence transformers and transformers for this code to work. - -```bash -pip install transformers==4.51.3 -pip install sentence-transformers -``` - -# Setting Up The Interface - -[Marimo](https://marimo.io) allows you to define UI elements that can be used to define the input repository, dataset, model name and number of rows to compute embeddings for. First lets setup a simple form that allows us to kick off the embedding computation. - -![Embeddings UI](/images/marimo/embeddings/ui.png) - -Use the following code in your first cell to setup the UI. - -```python -import marimo as mo - -model_name_ui = mo.ui.text(value="BAAI/bge-large-en-v1.5", full_width=True) -oxen_repo_name = mo.ui.text(value="ox/Simple-Wikipedia-50k", full_width=True) -oxen_dataset_name = mo.ui.text(value="train_0_50000.parquet", full_width=True) -num_rows = mo.ui.number(value=10000) - -run_form = mo.md( - """ - Model Name - {model_name} - Repo Name - {oxen_repo_name} - File Name - {oxen_dataset_name} - Num Rows - {num_rows} - """ -).batch( - oxen_repo_name=oxen_repo_name, - oxen_dataset_name=oxen_dataset_name, - model_name=model_name_ui, - num_rows=num_rows -).form( - submit_button_label="Compute", - bordered=False, - show_clear_button=True, - clear_button_label="Reset" -) - -run_form -``` - -To wait for the button to be clicked, use the `mo.stop` function and check if the `run_form.value` is `None`. - -```python -# If the button is not pressed, stop execution -mo.stop( - run_form.value is None -) -``` - -Then download the data using the values from the form and the [Remote Repo](/python-api/remote_repo) class. - -```python -from oxen import RemoteRepo -import pandas as pd - -repo = RemoteRepo(oxen_repo_name.value) -repo.download(oxen_dataset_name.value, revision="main") -df = pd.read_parquet(oxen_dataset_name.value) -``` - -# Compute Embeddings - -This example will use the `sentence_transformers` library to compute the embeddings with the default model as `BAAI/bge-large-en-v1.5`. Find more information about the model [here](https://huggingface.co/BAAI/bge-large-en-v1.5). - -```python -from sentence_transformers import SentenceTransformer - -model_name = model_name_ui.value -print(f"Loading: {model_name}") -model = SentenceTransformer(model_name, device="cuda") -print(f"Model Loaded: {model_name}") -``` - -Now we can compute the embeddings for the dataset. We will compute them in batches to take full advantage of the GPU. In this example, we are just computing the embeddings for the `title` column, but you can compute the embeddings for any text column in the dataset. The embeddings will now be in the `result_df` data frame in a new column called `embedding`. - -```python -# How many embeddings to compute at once -batch_size = 128 - -# Determine how many rows you want to process -rows_to_process = num_rows.value -# Copy the data frame -result_df = df.iloc[:rows_to_process].copy() -computed_embeddings = [] - -# Process the dataframe in batches -with mo.status.progress_bar(total=len(result_df)) as bar: - for i in range(0, len(result_df), batch_size): - if i % 10 == 0: - print(f"Computed {i} embeddings") - - # Get the current batch - batch = result_df.iloc[i:i+batch_size] - - # Extract texts from the batch - texts = batch['title'].tolist() - - # Compute embeddings for the batch - batch_embeddings = model.encode(texts, normalize_embeddings=True) - - # Add the batch embeddings to the overall list - computed_embeddings.extend(batch_embeddings) - bar.update(increment=batch_size) - -# Add embeddings to the data frame -result_df['embedding'] = computed_embeddings -``` - -`mo.status.progress_bar` is used to show a progress bar in the UI as we compute the embeddings. - -![Progress Bar](/images/marimo/embeddings/progress.png) - -You should see the model computing over 1,000 embeddings per second 🔥 - -# Save the Embeddings - -Once you have computed the embeddings, save them to your Oxen.ai repository to share with your team. Oxen.ai will version the embeddings and allow you to track changes so that you can try out different models and configurations without worrying about losing your previous work. - -```python -def save_embeddings(df, username="YOUR_USERNAME", repo_name="YOUR_REPO_NAME", filename="embeddings.parquet", branch="main"): - # Connect to the remote repo - repo = RemoteRepo(f"{username}/{repo_name}") - # Checkout the branch - repo.create_checkout_branch(branch) - # Save data to disk - df.to_parquet(filename, index=False) - - # Check if the file exists or has changed on the remote - if not repo.file_exists(filename) or repo.file_has_changes(filename): - # Stage/upload data to remote repository - repo.add(filename) - # Commit data with a message - repo.commit(f"Adding {filename}") - else: - print("File has no changes") -``` - -```python -save_embeddings(result_df, username="YOUR_USERNAME", repo_name="YOUR_REPO_NAME", filename="embeddings.parquet", branch="embeddings") -``` - -# Search Nearest Neighbors - -To check how well the embeddings encode the text, let's build a little search tool. We will use `cosine_similarity` from `sklearn` to build a simple nearest neighbor search. - -```python -from sklearn.metrics.pairwise import cosine_similarity -import numpy as np -def embedding_similarity(df, query_embedding, text_column='text', embedding_column='embedding', top_k=5): - # Make sure query_embedding is a 2D array for sklearn's cosine_similarity - if len(query_embedding.shape) == 1: - query_embedding = query_embedding.reshape(1, -1) - - # Stack all embeddings from the DataFrame into a 2D array - embeddings_matrix = np.vstack(df[embedding_column].values) - - # Calculate cosine similarity between query and all embeddings - similarities = cosine_similarity(query_embedding, embeddings_matrix).flatten() - - # Create a results DataFrame with similarities - results_df = df.copy() - results_df['similarity_score'] = similarities - - # Sort by similarity score (descending) and get top_k results - results_df = results_df.sort_values('similarity_score', ascending=False).head(top_k) - - # Keep only text and similarity score for cleaner output - return results_df[[text_column, 'similarity_score']] -``` - -Now we can use the `embedding_similarity` function to search for the nearest neighbors of a query. - -```python -search_term_embedding = model.encode(search_term_ui.value, normalize_embeddings=True) -embedding_similarity(result_df, search_term_embedding, text_column='title') -``` - -Build a text input so that we can enter any term we want and see similar titles. - -```python -search_term_ui = mo.ui.text(value="Denver Broncos", full_width=True) -mo.md(f""" -Enter any term to see it's neighbors -{search_term_ui} -""") -``` - -![Nearest Neighbor Search](/images/marimo/embeddings/search.png) \ No newline at end of file diff --git a/examples/notebooks/eval_llm/human_in_the_loop.mdx b/examples/notebooks/eval_llm/human_in_the_loop.mdx deleted file mode 100644 index 7368fd4..0000000 --- a/examples/notebooks/eval_llm/human_in_the_loop.mdx +++ /dev/null @@ -1,238 +0,0 @@ ---- -title: 🕵️‍♂️ Evaluation w/ Human in the Loop -description: 'How to build a human in the loop evaluation workflow.' ---- - -One of the most reliable ways to evaluate an LLM is to have a human in the loop reviewing each input and output pair. Having human eyes not only will catch errors that the LLM missed, but it will also spark ideas for how to improve the model. Once you have a dataset of labeled examples, you can use it to [train a new model](/examples/notebooks/train_llm), or compare the performance of different models. - -This tutorial will show you how to build a simple labeling tool that allows a human to review the output of an LLM, and give a thumbs up or down (👍/👎). All your labeled data will be versioned and stored in an Oxen.ai repository so that you can always go back and see how the model's performance evolved over time and iterate on it with your team. - -![Oxen.ai Data Frame](/images/marimo/human-in-the-loop/oxen-data-frame.png) - -## Example: Asking questions about Oxen.ai's Python Library - -For this example, we will see how well an LLM can answer questions about developer docs. We will use the Oxen.ai [Developer Docs](https://docs.oxen.ai/python-api/remote_repo) as our context. This tutorial will show you how you can prompt an LLM with context, save the outputs, and build an interface to have a human review the output. - -Follow along with the [example notebook](https://www.oxen.ai/datasets/MarimoNotebooks/file/main/llm_eval_human_loop.py) by running it in your own Oxen.ai account. - -![Human in the Loop](/images/marimo/human-in-the-loop/ui.png) - -## Creating the Dataset - -The dataset will consist of 10 questions about the `RemoteRepo` Python class. For your use case, a small dataset is better than none, and you can always scale up. Even if it is only a few examples to start, this allows you to setup and kick off your data flywheel. - -```python -data = [ - {"question": "What is the purpose of the `RemoteRepo` class?"}, - {"question": "How is the `RemoteRepo` class different from `Repo`?"}, - {"question": "Point the RemoteRepo to my own oxen server"}, - {"question": "How do I create a new remote repo?"}, - {"question": "How do I add a file to a remote repo?"}, - {"question": "How do I remove a file from a remote repo?"}, - {"question": "How do I update a file in a remote repo?"}, - {"question": "How do I clone a remote repo?"}, - {"question": "How do I push a file to a remote repo?"}, - {"question": "How do I list the files in a remote repo?"}, - {"question": "How do I get the contents of a file in a remote repo?"}, - {"question": "How do I delete a file in a remote repo?"}, -] -``` - -Create a data frame from these questions, leaving a couple columns blank for the LLM's output and the human's labels. - -```python -import pandas as pd - -df = pd.DataFrame(data) -# Add the columns for the LLM's output and the human's labels / reasoning -df["llm_output"] = None -df["human_label"] = None - -df.head() -``` - -![Dataset](/images/marimo/human-in-the-loop/df.png) - -## Using a Model - -For this example, we will be using `gpt-4.1-nano` to see if OpenAI's fast and cheap model can perform the operations we need. - -To start, make a cell at the top of the notebook that allows the user to put in their own OpenAI API_KEY. - -```python - import marimo as mo - -api_key = mo.ui.text(kind="password") -mo.vstack([ - mo.md("## Enter Your OpenAI API Key"), - api_key -]) -``` - -![API Key](/images/marimo/human-in-the-loop/api-key.png) - -We can then use the output of this cell to stop execution further down in the notebook until the user has put in their API_KEY. - -```python -mo.stop(not api_key.value) -``` - -## Building the Context - -For updates to developer docs, it is best to assume the model does not yet know the latest information. To help the model, we can provide it with the latest docs as context. - -```python -import requests - -def fetch_github_raw_text(url): - response = requests.get(url) - response.raise_for_status() - return response.text - -url = "https://raw.githubusercontent.com/Oxen-AI/docs/refs/heads/main/python-api/remote_repo.mdx" -docs_context = fetch_github_raw_text(url) -docs_context -``` - -Once we have the context, we can define a simple function to make our LLM call and pass it in. - -```python -def llm(question: str, context: str, model="gpt-4.1-nano") -> str: - response = openai.chat.completions.create( - model=model, - messages=[ - {"role": "system", "content": "You are a developer docs expert. Read the docs and answer the following question. Keep the answers short and sweet."}, - {"role": "user", "content": f"Context:\n{context}\n\nQuestion:\n{question}"} - ] - ) - - return response.choices[0].message.content -``` - -## Running the Model - -Now that we have our model, and our context, we can use it to answer all the questions. - -```python -mo.stop(not api_key.value) -model_name = "gpt-4.1-nano" -with mo.status.progress_bar(total=len(df)) as bar: - for idx, row in df.iterrows(): - # Set the 'llm_column' column to the output from the model - df.at[idx, 'llm_output'] = llm(row['question'], docs_context, model=model_name) - bar.update() - -df.head() -``` - -The `with mo.status.progress_bar(total=len(df)) as bar:` is a Marimo feature that allows you to display a progress bar in the notebook to help you visualize the progress of the loop. This is helpful when you have more than 10 examples and want to know how much longer the loop will take. - -![Progress Bar](/images/marimo/human-in-the-loop/progress-bar.png) - -After we have run the model, the dataset should look like this: - -![Saving Results](/images/marimo/human-in-the-loop/results.png) - -PS: If you want to play with different prompts and models without having to write code, you can use also use the [Oxen.ai Model Inference Playground](https://oxen.ai/ai/models) for this part. - -## Saving the Results - -Before we build our labeling tool, let's save the results to Oxen.ai. - -```python -from oxen import RemoteRepo, DataFrame -# Save to oxen in a file called `results/gpt-4.1-nano.jsonl` -repo = RemoteRepo("ox/Oxen-Docs-RAG", host="dev.hub.oxen.ai") -file_name = f"{model_name}.jsonl" -df.to_json(file_name, orient="records", lines=True, index=False) -output_dir = "results" -path = f"{output_dir}/{file_name}" -if not repo.file_exists(path) or repo.file_has_changes(local_path=file_name, remote_path=path): - repo.add(file_name, dst=output_dir) - repo.commit(f"Got results for labeling from {model_name}") -else: - print("No changes!") - -# Instantiate a DataFrame object to use in our labeling tool -remote_df = DataFrame(repo, path) -``` - -Notice the last line also creates a variable called `remote_df` that we can use in our labeling tool. - -## Building a Custom Labeling Tool - -Now that we have the results saved, we can build a simple labeling tool to label the results. We'll need some state to keep track of the current index of the dataframe, and the current row. - -```python -get_index, set_index = mo.state(0) -``` - -Then some functions to get the current row and move between rows. - -```python -def update_label(remote_df, id, value): - remote_df.update_row(id, {"human_label": value}) - increment_index() - -def increment_index(): - set_index(lambda v: v+1) - -def decrement_index() -> int: - set_index(lambda v: max(0, v - 1)) - -def get_row(remote_df, idx): - data = remote_df.get_row(idx) - return data[0] -``` - -Finally, we can build the UI for the labeling tool. - -```python -# Get the current index from the state variable -row_idx = get_index() - -# Get the row at the current index -current_row = get_row(remote_df, row_idx) - -# Create a radio button for the user to select the label -label_picker = mo.ui.radio( - ["👍", "👎"], - value=current_row["human_label"], - on_change=lambda v: update_label(remote_df, current_row['_oxen_id'], v), -) - -# Create a button to move to the next row -next_button = mo.ui.button(label="next", on_change=lambda _: increment_index()) - -# Create a button to move to the previous row -previous_button = mo.ui.button(label="previous", on_change=lambda _: decrement_index()) - -# Display the UI -mo.vstack([ - mo.md(f"# Label {model_name} Responses"), - mo.md(f"Example: {row_idx}"), - mo.md(f"ID: {current_row['_oxen_id']}"), - mo.md(f"[View Changes]({remote_df.workspace_url()})"), - label_picker, - mo.hstack([previous_button, next_button], justify="center"), - mo.md("## Question"), - mo.md(current_row['question']), - mo.md("## LLM Response"), - mo.md(current_row['llm_output']), -]) -``` - -The final output should look like this: - -![Human in the Loop](/images/marimo/human-in-the-loop/ui.png) - -When you click a label with the radio button, the label is saved to the dataframe and the index is incremented. You can click the "View Changes" button to see the changes you've made to the dataframe before committing them to the repo. - -If you want to save the changes programmatically, you can use the `remote_df.commit()` method. - -```python -remote_df.commit(f"Labeled responses for {model_name}") -``` - -Take this example as a starting point, and build your own labeling tool to fit your needs. You may want to add a score, or a reason for the label, or even a more complex UI that lives outside of Marimo. If you don't need a custom labeling workflow, feel free to use the built in [DataFrame UI](/features/labeling_data) in Oxen.ai that feels like editing a spreadsheet. - diff --git a/examples/notebooks/eval_llm/llm_as_a_judge.mdx b/examples/notebooks/eval_llm/llm_as_a_judge.mdx deleted file mode 100644 index 0e0e403..0000000 --- a/examples/notebooks/eval_llm/llm_as_a_judge.mdx +++ /dev/null @@ -1,6 +0,0 @@ ---- -title: ⚖️ LLM as a Judge Evaluation -description: 'How to build a LLM as a judge evaluation workflow.' ---- - -Coming soon! \ No newline at end of file diff --git a/examples/notebooks/explore_process_version_data.mdx b/examples/notebooks/explore_process_version_data.mdx deleted file mode 100644 index fac9bec..0000000 --- a/examples/notebooks/explore_process_version_data.mdx +++ /dev/null @@ -1,76 +0,0 @@ ---- -title: 🗺️ Explore, Process, and Version Data -description: Explore, process, and version data with your favorite tools ---- - -Example Notebook: [Explore, Process, and Version Data](https://www.oxen.ai/datasets/MarimoNotebooks/file/main/explore_data.py) - -## Downloading Data - -The [RemoteRepo](/python-api/remote_repo) class allows you to download arbitrary files from a remote repository. To see more options check out the [Python Docs](/python-api/remote_repo). - -```python -from oxen import RemoteRepo -repo = RemoteRepo("datasets/GettingStarted") -repo.download("tables/llm_fine_tune.jsonl") -``` - -Or you can directly download a file into a [Pandas DataFrame](https://pandas.pydata.org/) using HTTP or the FSSpec format: - -```python -import pandas as pd -# URL Format: https://hub.oxen.ai/api/repos/{username}/{repo_name}/file/{revision}/{file_path} -url = "https://hub.oxen.ai/api/repos/datasets/GettingStarted/file/main/tables/llm_fine_tune.jsonl" - -# FSSpec Format: oxen://{username}:{repo_name}@{revision}/{path} -url = "oxen://datasets:GettingStarted@main/tables/llm_fine_tune.jsonl" -df = pd.read_json(url, lines=True) -``` - -## Exploring Data - -Use whatever tools you want to explore your data. For example you can use Matplotlib to plot the distribution of the `model` column: - -```python -import matplotlib.pyplot as plt - -category_counts = data_frame['model'].value_counts() -category_counts.plot(kind='bar') -plt.title('Model Distribution') -plt.xlabel('Models') -plt.ylabel('Counts') -plt.gca() -``` - -![Model Distribution](/images/marimo-model-distribution.png) - -## Cleaning Data - -Use pandas to clean or process the data. For example you can remove the `Internal Thoughts` from the `response` column: - -```python -df['response'] = df['response'].replace(to_replace=r'^(Internal Thoughts|\*Internal Thoughts:\*|\*\*Internal Thoughts:\*\*).*', value='', regex=True) -``` - -## Versioning Data - -You can then either write the data back directly with `pandas` or with the `RemoteRepo` class. - -With pandas (auto commit message): - -```python -df.write_parquet("oxen://datasets:GettingStarted@main/tables/llm_fine_tune.jsonl", index=False) -``` - -With RemoteRepo and a commit message - -```python -from oxen import RemoteRepo -repo = RemoteRepo("datasets/GettingStarted") -repo.add("llm_fine_tune.jsonl", dst="tables") -repo.commit("Cleaned 'Internal thoughts' string from the start of the response column") -``` - - - - diff --git a/examples/notebooks/generate_synthetic_datasets.mdx b/examples/notebooks/generate_synthetic_datasets.mdx deleted file mode 100644 index ceef46c..0000000 --- a/examples/notebooks/generate_synthetic_datasets.mdx +++ /dev/null @@ -1,146 +0,0 @@ ---- -title: 🧪 Generate Synthetic Datasets -description: Build and version a synthetic dataset to train a model on ---- - -It can be expensive and time-consuming to collect or label data. Synthetic data can either augment your existing data, help filter down a data distribution, or generate a completely new dataset. Be careful, the data generated is not always 100% accurate, but can give you a good jumping off point. You should always ***validate and version*** the data you generate, so that you can track changes and roll back if the data is not what you expected. - -Follow along with the [example notebook](https://www.oxen.ai/datasets/MarimoNotebooks/file/main/synthetic_data.py) by running it in your own Oxen.ai account. - -## Setting Up the Dataset - -In this case, we will be constructing a synthetic dataset of customer support conversations. Let's assume you have *no data* to start with. As long as you know the types of problems your customers are having, you can generate a starting dataset of fake names, roles, problems, and experience levels. - -![Starting Dataset](/images/marimo/synthetic-data/starting-df.png) - -We will be using the `faker` library to generate a starting dataset. We can then run this dataset through an LLM to generate prompts and responses as if they were both the customer and the support agent. `Faker` has a lot of built in functionality, such as the ability to generate fake names, addresses, phone numbers, and more. - - -```python -from faker import Faker -from faker.providers import DynamicProvider - -fake = Faker() -``` - -We will be extending it using the `DynamicProvider` interface to create our own. - -```python -# You are a client of company X and you are having trouble... -problem_provider = DynamicProvider( - provider_name="problem", - elements=[ - "verifying your email", - "canceling a subscribtion", - "buying a product", - "finding a product", - "creating an account", - "downgrading a subscription", - ], -) -experience_provider = DynamicProvider( - provider_name="experience", - elements=[ - "beginner", - "intermediate", - "advanced", - ], -) -response_descriptor_provider = DynamicProvider( - provider_name="response_descriptor", - elements=[ - "brief", - "as brief as possible", - "verbose", - "short and sweet", - "concise", - "as detailed as possible", - ], -) -``` - -You can now call these new providers to generate data. - -```python -problem = fake.problem() -experience = fake.experience() -descriptor = fake.response_descriptor() -``` - -Each row of our dataset will now have a uuid, name, role, problem, descriptor, and experience. - -```python -import uuid - -def gen_row(fake: Faker): - role = fake.job() - name = fake.name() # Name just provides some randomness 🤷‍♂️ - problem = fake.problem() - experience = fake.experience() - descriptor = fake.response_descriptor() - return { - "uuid": str(uuid.uuid4()), - "name": name, - "role": role, - "problem": problem, - "descriptor": descriptor, - "experience": experience - } -``` - -Now we can generate a starting dataset of 100 rows. - -```python -num_examples = 100 -examples = [gen_row(fake) for _ in range(num_examples)] -df = pd.DataFrame(examples) -df.head() -``` - -![Starting Dataset](/images/marimo/synthetic-data/df.png) - -## Versioning the Dataset - -Now that we have our starting dataset, we should version it, so that we can play around with different prompts and models. You can use the `upload` method to upload a file to your repository with a commit message. - -```python -from oxen.datasets import upload - -file_name = "synthetic_data.parquet" -df.to_parquet(file_name) -upload("YOUR_NAMESPACE/YOUR_REPO_NAME", file_name, "Adding synthetic data") -``` - -Once the data is uploaded, you can view and query the generated dataset from Oxen.ai's [Dataset UI](/getting-started/datasets). - -![Dataset Interface](/images/marimo/synthetic-data/df-ui.png) - -## Running an LLM - -If you want to try out different prompts and models without writing any code, you can use Oxen.ai's [Model Inference](https://oxen.ai/ai/models) feature. Click the 🚀 button on the right of the screen to open the inference UI. - -![Inference UI](/images/marimo/synthetic-data/inference-ui.png) - -In the example above, we are using [DeepSeek-v3](https://www.oxen.ai/ai/models/deepseek-v3) to generate synthetic customer questions about an iPhone with the following prompt: - -```markdown -{uuid} You are a {role} named {name} who is using an iPhone with an intermediate level of experience. Write a {descriptor} question about the product that a customer support agent might answer. Only write the question and nothing else. -``` - -By default, Oxen.ai samples 5 rows from the dataset, so that you can get a sense of how well the model is performing. You will also see an estimated price for how much the inference will cost over the entire dataset. - -![Inference Results](/images/marimo/synthetic-data/inference-results.png) - -Once you feel confident in the sample results, you can run the inference on the entire dataset by clicking the `Next ->` button. This will allow you to pick an output branch, file, and write a commit message once the run is complete. - -![Inference Results](/images/marimo/synthetic-data/save-inference.png) - -Sit back, grab a coffee ☕️ and Oxen.ai will run the inference in the background. - -![Inference Results](/images/marimo/synthetic-data/inference-running.png) - -Once the inference is complete, you can view and share the results with your team 🎉 - -![Inference Results](/images/marimo/synthetic-data/df-completed.png) - -You can run the same process again with a new prompt to generate all the responses to the synthetic questions, but we will leave this as an exercise for the reader 🤓 happy generating! diff --git a/examples/notebooks/process_pdfs.mdx b/examples/notebooks/process_pdfs.mdx deleted file mode 100644 index d3afe4e..0000000 --- a/examples/notebooks/process_pdfs.mdx +++ /dev/null @@ -1,6 +0,0 @@ ---- -title: 📑 Process PDFs -description: 'Extract text from PDFs with Vision LLMs' ---- - -Coming soon! diff --git a/examples/notebooks/run_as_script.mdx b/examples/notebooks/run_as_script.mdx deleted file mode 100644 index 9afbaff..0000000 --- a/examples/notebooks/run_as_script.mdx +++ /dev/null @@ -1,117 +0,0 @@ ---- -title: 💻 Running Notebooks as Scripts -description: 'How to automate workflows by running a notebook from the command line.' ---- - -One nice property of Marimo notebooks is that they are a just python code. This means that you can run a notebook as a `script` from the command line. This tutorial will show you best practice for setting up a notebook to parse command line arguments, and be flexible enough to run from the command line or from a "edit mode" in the UI. - -## CLI Argument Parsing - -In order to keep your code modular and easy to run, it is best to start by defining a function as the entry point. This will allow us to route to the function in `edit` or `script` mode. For this example, we will be writing a dummy training loop for a model that simply sleeps for a user provided number of epochs. - -```python -import typer -import time - -app = typer.Typer(help="Train a model") # Create the CLI app - -@app.command() # Add function to your CLI app -def train(model_name: str, epochs:int=1): - for i in range(epochs): - time.sleep(1) - print(f"Training: {model_name}, epoch: {i}") -``` - -Marimo allows you to use any of your favorite Python libraries for argument parsing. A nice library for parsing command line arguments is `typer`. The `typer` library allows you to add a decorator to your function to turn it into a command line application. It will automagically infer the command names, types and default values from the function signature. The function is now usable from the command line without affecting it's ability to be used in other parts of your code. - -## Running As A Script - -Marimo makes it easy to detect the execution strategy of the file. Simply add a cell that checks if the `mo.app_meta().mode` is set to `script`. - -```python -import marimo as mo - -if mo.app_meta().mode == "script": - # Run our CLI app - app() -``` - -These small changes make it so you can execute the file from the command line. Run the file like you would normally run a python script. - -```bash -python train.py Qwen/Qwen3-32B --epochs 3 -``` - -You should see the following output: - -```bash -Training: Qwen/Qwen3-32B, epoch: 1 -Training: Qwen/Qwen3-32B, epoch: 2 -Training: Qwen/Qwen3-32B, epoch: 3 -``` - -## Running In Edit Mode - -It is also nice for development of your scripts to be able to kick off the same function in edit mode. For this, using the `mo.ui.run_button` is a nice pattern. At the top of your notebook, define a cell with a button in it. - -```python -import marimo as mo - -button = mo.ui.run_button(label="Train model") -button -``` - -Then in a cell below, you can block the execution until the button has been pressed. Below the `mo.stop` method, simply call the same function we used for the CLI. - -```python -mo.stop(not button.value) - -train("Qwen/Qwen3", epochs=3) -``` - -![Run Button](/images/marimo/run-as-a-script/run-button.png) - -Every time you press the button it will run the code that depends on the function defined in the cell. - -## Running on Oxen.ai - -What if you want to train a model, but don't have a powerful enough GPU? Oxen.ai's infrastructure allows you to run these scripts in the cloud on customizable hardware. For example, you may want to write a script to fine-tune a model on an A10G GPU with 8 cpu cores and 8GB of RAM. You could configure this script and kick it off on the command line like so: - -```bash -oxen notebook start -n train.py --cpu-cores 8 --mode script \ - --memory-mb 8192 --gpu "a10g" -- --model Qwen/Qwen3-32B \ - --epochs 2 --learning-rate 0.03 -``` - -Note: Currently, the `train.py` file must be committed and pushed to your Oxen.ai remote repository on https://oxen.ai for this command to work. - -## Stopping the Notebook - -When running jobs on Oxen.ai, you may want to spin down the hardware as soon as a job is finished. There is a convenient helper to stop a notebook in the Oxen Python Library. - -```python -import oxen - -oxen.notebooks.stop() -``` - -This automatically has context into the current notebook that is running and will spin down the notebook when your computation is finished. For example, let's conclude our training function with the spinning down of the Notebook so that we don't incur any unneeded costs. - -```python -import typer -import time -import oxen - -app = typer.Typer(help="Train a model") - -@app.command() -def train(model_name: str, epochs:int=1): - for i in range(epochs): - time.sleep(1) - print(f"Training: {model_name}, epoch: {i}") - - # Kill the notebook when it is done training - oxen.notebooks.stop() -``` - -Congratulations! You now have the power of serverless GPU infrastructure at your fingertips. Write scripts that clean data, compute embeddings, train models, evaluate models, or any other computation you can think of. \ No newline at end of file diff --git a/examples/notebooks/train_llm.mdx b/examples/notebooks/train_llm.mdx deleted file mode 100644 index 187d260..0000000 --- a/examples/notebooks/train_llm.mdx +++ /dev/null @@ -1,366 +0,0 @@ ---- -title: 🏋️‍♀️ Fine-Tune an LLM -description: 'How to train an LLM on your own data in a Marimo Notebook.' ---- - - -To quickly get started without writing any code, you can also use the [zero-code fine-tuning interface](/getting-started/fine-tuning) to fine-tune your model on a dataset with a few clicks. - - -## Notebooks for Fine-Tuning - -Oxen.ai gives you the power to write custom code in [Marimo Notebooks](https://marimo.io/) on a powerful GPU in seconds. This is a great place to write custom code and fine-tune your model. You can version your code, data and model weights all in one place, in a single repository. - -## Example: Medical Question Answering - -The domain of medicine is a good example where you might want to fine-tune an LLM. The domain is rich with nuance, and the data often has privacy concerns and cannot be shared publicly. If you want to follow along, you can run this [example notebook](https://www.oxen.ai/datasets/MarimoNotebooks/file/main/fine_tune_llm.py) in your own Oxen.ai account with the same data and model. - -### Configure the Machine - -Make sure to configure your notebook with an A10G GPU and the following dependencies. Allocate at least 2 hours, 8 cpu cores and 8GB of memory for the training to complete in a reasonable amount of time. - -``` -pip install transformers torch trl peft bitsandbytes -``` - -![GPU Selection](/images/marimo/train-llm/gpu-selection.png) - -### The Dataset - -The dataset we will be using in this example is the [MedQuAD](https://www.oxen.ai/ox/MedQuAD/file/main/train.parquet) dataset. MedQuAD includes 47,457 medical question-answer pairs created from 12 NIH websites (e.g. cancer.gov, niddk.nih.gov, GARD, MedlinePlus Health Topics). The collection covers 37 question types (e.g. Treatment, Diagnosis, Side Effects) associated with diseases, drugs and other medical entities such as tests. - -![MedQuAD](/images/marimo/train-llm/dataset.png) - -To load the dataset, we can use the `load_dataset` function from the `oxen.datasets` library. This is a wrapper around the Hugging Face [datasets](https://huggingface.co/docs/datasets/en/index) library, and is an easy way to load datasets from the Oxen.ai hub. To have fine-tuning work well, it is a good idea to have at least ~1000-10000 unique examples in your dataset. If you can collect more, that's even better. - -Don't have a dataset yet? Checkout how to generate a [synthetic dataset](/examples/notebooks/generate_synthetic_datasets) from a stronger model to bootstrap your own. - -```python -from oxen.datasets import load_dataset - -# Load dataset from the hub -raw_dataset = load_dataset("ox/MedQuAD", "train.parquet") -raw_dataset = raw_dataset.shuffle() -``` - -We then want to transform this dataset into a format that can be used for training a chatbot. This means mapping the question and answer pairs to a list of messages with roles. This is the format that most LLMs expect for training and inference. - -```python -system_message = """You are a highly trained medical doctor. Patients will ask you questions and you will provide and answer in plain english with easy to understand terms.""" - -def create_conversation(sample): - return { - "messages": [ - {"role": "system", "content": system_message}, - {"role": "user", "content": sample["question"]}, - {"role": "assistant", "content": sample["answer"]} - ] - } - -# Convert dataset to OAI messages -dataset = raw_dataset.map(create_conversation, batched=False) - -# Print formatted user prompt -print(json.dumps(dataset["train"][345]["messages"], indent=2)) -``` - -### The Model - -For this example, we will be using the [Qwen/Qwen2.5-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct) model. This is a 1.5B parameter model that will be quick to train, and fast for inference. You can even download the weights and run on your laptop if you want. - -To load the model, we can use the `AutoModelForCausalLM` and `AutoTokenizer` classes from the `transformers` library. - -```python -import torch -from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TextStreamer -model_name = "Qwen/Qwen2.5-1.5B-Instruct" -model = AutoModelForCausalLM.from_pretrained( - model_name, - torch_dtype="auto", - device_map="auto" -) -tokenizer = AutoTokenizer.from_pretrained(model_name) -``` - -Before you start training, it is a good idea to get a feel for the model. Start by writing a function to make a prediction given a prompt and system message. - -```python -def predict(tokenizer: AutoTokenizer, model: AutoModelForCausalLM, prompt: str): - system_prompt = "You are a medical professional who is helping a patient. Patients will ask you questions and you will answer them in plain English so that anyone can understand." - - messages = [ - {"role": "system", "content": system_prompt}, - {"role": "user", "content": prompt} - ] - text = tokenizer.apply_chat_template( - messages, - tokenize=False, - add_generation_prompt=True - ) - model_inputs = tokenizer([text], return_tensors="pt").to(model.device) - - streamer = TextStreamer(tokenizer) - generated_ids = model.generate( - **model_inputs, - max_new_tokens=1024, - streamer=streamer - ) - generated_ids = [ - output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids) - ] - - response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0] - - return response -``` - -Call the `predict` function with a sample question. - -```python -predict(tokenizer, model, "What are the symptoms of Anencephaly?") -``` - -![Question](/images/marimo/train-llm/question.png) - -Once you have predictions working from a model, it is good practice to have some sort of evaluation in place to see if fine-tuning actually improved the model. For situations where precision is important, you may want to build a [Human in the Loop](/examples/notebooks/eval_llm/human_in_the_loop) pipeline to evaluate the model's predictions. If you want to automate the evaluation process, you can use an [LLM as a Judge](/examples/notebooks/eval_llm/llm_as_a_judge) pipeline to evaluate the model's predictions. - -To learn more about how to evaluate your model, check out Eugene Yan's [blog post](https://eugeneyan.com/writing/eval-process/) on fixing your evaluation process. - -### Parameter Efficient Fine-Tuning - -To make our fine-tuning process more efficient in terms of memory and time, we can use a technique called Parameter Efficient Fine-Tuning. This technique uses a technique called Low-Rank Adaptation (LoRA) to fine-tune the model. If you want to learn more about LoRA, check out the [LoRA paper](https://arxiv.org/abs/2106.09685) or our [Arxiv Dive](https://www.youtube.com/watch?v=_W85WtlfJcU) on the topic. - -```python -from peft import LoraConfig - -# Define model init arguments -model_kwargs = dict( - attn_implementation="eager", # Use "flash_attention_2" when running on Ampere or newer GPU - torch_dtype=torch_dtype, # What torch dtype to use, defaults to auto - device_map="auto", # Let torch decide how to load the model -) - -# BitsAndBytesConfig: Enables 4-bit quantization to reduce model size/memory usage -model_kwargs["quantization_config"] = BitsAndBytesConfig( - load_in_4bit=True, - bnb_4bit_use_double_quant=True, - bnb_4bit_quant_type='nf4', - bnb_4bit_compute_dtype=model_kwargs['torch_dtype'], - bnb_4bit_quant_storage=model_kwargs['torch_dtype'], -) - -peft_config = LoraConfig( - lora_alpha=16, - lora_dropout=0.05, - r=16, - bias="none", - target_modules="all-linear", - task_type="CAUSAL_LM", - modules_to_save=["lm_head", "embed_tokens"] # make sure to save the lm_head and embed_tokens as you train the special tokens -) -``` - -This step is optional, but is good to know if you have limited resources. If you do not use parameter efficient fine-tuning, you will need to select a larger GPU for training. - -### Branches for Experiments - -It is rare that you will get a fine-tune perfect on the first try. You must have an experimental mindset and be willing to iterate. In this case we will be simply saving the trained models and results to new branches on the same repository. We will setup an `OxenExperiment` class that will handle creating a new branch, saving the model, and logging the results. - -Branches are light weight in Oxen.ai, and by default will not be downloaded to your local machine when you do a clone. This means you can easily store model weights and other large assets on parallel branches and keep your `main` branch small and manageable. - -```python -from datetime import datetime -from pathlib import Path -import os - -class OxenExperiment(): - """ - An experiment helps log the experiment to an oxen repository, - keeps track of the name and creates a corresponding branch to save results to - """ - def __init__(self, repo, model_name, output_dir, experiment_type="SFT"): - self.repo = repo - self.output_dir = output_dir - - # List the existing branches to figure out which experiment this is - branches = repo.branches() - experiment_number = 0 - for branch in branches: - if branch.name.startswith(f"{experiment_type}_"): - experiment_number += 1 - self.experiment_number = experiment_number - # Name the experiment with the experiment number and timestamp - short_model_name = model_name.split('/')[-1] - timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S") - self.name = f"{experiment_type}_{experiment_number}_{timestamp}_{short_model_name}" - # Set the output directory - self.dir = Path(os.path.join(self.output_dir, self.name)) - # Create the output directory if it doesn't exist - os.makedirs(self.dir, exist_ok=True) - - print(f"Creating experiment branch {self.name}") - repo.create_checkout_branch(self.name) -``` - -When you start a training run, you'll see a new branch in the repo with a prefix, number and a timestamp. - -![Branches](/images/marimo/train-llm/branches.png) - -You can navigate to this branch and look in the `models` directory to see the model weights and other assets. - -### Logging and Saving - -Once we have the experiment setup, we will want to reference it during training and log our experiment results. To do this, we will setup an `OxenTrainerCallback` that will be called during training to save the model weights and our metrics. This is a subclass of the `TrainerCallback` class from the `transformers` library, which can be passed into our training loop. - -```python -from transformers import TrainerCallback - -class OxenTrainerCallback(TrainerCallback): - def __init__(self, experiment: OxenExperiment, save_every): - self.experiment = experiment - self.save_every = save_every - self.log_file_name = "logs.jsonl" - self.log_file = os.path.join(self.experiment.dir, self.log_file_name) - self.dst_dir = os.path.dirname(self.log_file) - self.workspace = Workspace( - experiment.repo, - branch=f"{experiment.name}", - workspace_name=f"training_run_{experiment.experiment_number}" - ) - self.df = DataFrame( - self.workspace, - self.log_file, - branch=f"{experiment.name}" - ) - super().__init__() - - def on_log(self, args, state, control, logs=None, **kwargs): - print("on_log.logs") - print(logs) - - if "loss" in logs: - # add timestamp to logs - logs['timestamp'] = datetime.now().strftime("%Y-%m-%d %H:%M:%S") - - # save logs to data frame - self.df.insert_row(logs) - - def on_save(self, args, state, control, **kwargs): - print(f"on_save {state.global_step}") - - if state.global_step % self.save_every == 0: - print(f"save every! {state.global_step} dir: {self.experiment.dir}") - # Save the checkpoints to model_dir/model_name/checkpoints/checkpoint_N - checkpoint_dir = os.path.join("checkpoints", f"checkpoint_{state.global_step}") - dst_dir = os.path.join(self.experiment.dir, checkpoint_dir) - self.workspace.add(self.experiment.dir, dst=dst_dir) - is_clean = self.workspace.status().is_clean() - print(f"Is Clean: {is_clean}") - if not self.workspace.status().is_clean(): - self.workspace.commit(f"Saving model step {state.global_step}") - - def on_step_end(self, args, state, control, **kwargs): - print(f"on_step_end {state.global_step}") -``` - -Since we are subclassing the `TrainerCallback` class, we implement the `on_save` and `on_log` methods. The `on_save` method is called when the model is saved to disk, and the `on_log` method is called when the model is trained on a batch, reporting loss and other useful metrics. - -The most important concepts here are the `Workspace` and `DataFrame` objects from the `oxenai` library. The `Workspace` is a wrapper around the branch that we are currently on. This allows us to write data to the remote branch without committing the changes to the branch. Think of it like your local repo of unstaged changes, but for remote branches. To navigate to your workspaces, use the branch dropdown and then look at the active workspaces for a file. - -![Workspace Selection](/images/marimo/train-llm/workspace_selection.png) - -During training it would be expensive to commit the changes to the branch every step, so instead we use a `Workspace` to write the temporary results, and then can commit the changes to the branch after training is complete. - -```python -from oxen import Workspace, DataFrame -self.workspace = Workspace( - experiment.repo, - branch=f"{experiment.name}", - workspace_name=f"training_run_{experiment.experiment_number}" -) -self.df = DataFrame( - self.workspace, - self.log_file, - branch=f"{experiment.name}" -) -``` - -The DataFrame allows us to write rows and columns to the log file. We can read from this to make plots and analyze the results. When clicking on a data frame's workspace, you can see a preview of the data that is written during the `on_log` method. - -```python -def on_log(self, args, state, control, logs=None, **kwargs): - # add timestamp to logs - logs['timestamp'] = datetime.now().strftime("%Y-%m-%d %H:%M:%S") - - # save logs to data frame - self.df.insert_row(logs) -``` - -![Workspace Logs](/images/marimo/train-llm/workspace_logs.png) - -With all the building blocks in place, we can then chain all of these classes together and specify the `RemoteRepo`, model name, and output directory. - -```python -from oxen import RemoteRepo -output_dir = "models/qwen-med" -repo = RemoteRepo("ox/fine-tune-medical-qwen") -experiment = OxenExperiment(repo, model_name, output_dir) -trainer_callback = OxenTrainerCallback(experiment) -``` - -### The Training Loop - -The `trl` library from Hugging Face is an easy to use library for training and fine-tuning models. We can use the `SFTConfig` class to setup our training loop. This determines our batch size, learning rate, number of epochs, and other hyperparameters. - -```python -from trl import SFTConfig - -logging_steps = 1 -args = SFTConfig( - output_dir=experiment.dir, # directory to save and repository id - num_train_epochs=1, # number of training epochs - per_device_train_batch_size=1, # batch size per device during training - gradient_accumulation_steps=4, # number of steps before performing a backward/update pass - gradient_checkpointing=True, # use gradient checkpointing to save memory - optim="adamw_torch_fused", # use fused adamw optimizer - logging_steps=logging_steps, # log every N steps - save_strategy="epoch", # save the weights the end of an epoch - learning_rate=2e-4, # learning rate, based on QLoRA paper - fp16=True if torch_dtype == torch.float16 else False, # use float16 precision - bf16=True if torch_dtype == torch.bfloat16 else False, # use bfloat16 precision - max_grad_norm=0.3, # max gradient norm based on QLoRA paper - warmup_ratio=0.03, # warmup ratio based on QLoRA paper - lr_scheduler_type="constant", # use constant learning rate scheduler -) -``` - -Once you have set up the training arguments, you can then setup the training loop. Pass in the model, training arguments, peft config, the training dataset, and callbacks. - -```python -from trl import SFTTrainer - -# Create Trainer object -trainer = SFTTrainer( - model=model, - args=args, - train_dataset=dataset["train"], - peft_config=peft_config, - processing_class=tokenizer, - callbacks=[OxenTrainerCallback(experiment)] -) -``` - -Finally, you can then train the model. - -```python -trainer.train() -``` - -This should take just under 2 hours with the settings above. Once the training is complete, you will be able to download the model weights from the experiment branch and use them for inference. - -### Evaluation - -Just because the fine-tune has completed, does not mean your job is done. Now you must evaluate the model to see if it is any good. With the dataset that we have been using, it is hard to do an exact string match evaluation on outputs to tell if the fine-tuned model is better than the original. - -Instead, we will use an [LLM as a Judge](/examples/notebooks/eval_llm/llm_as_a_judge) pipeline to evaluate the model's predictions. This will allow us to quickly see if the fine-tuned model is better than the original. - diff --git a/features/data_frames.mdx b/features/data_frames.mdx index daccd13..1a710f2 100644 --- a/features/data_frames.mdx +++ b/features/data_frames.mdx @@ -29,7 +29,7 @@ When files are committed to a repository, Oxen automatically detects the format All Oxen data frames can be queried with SQL. When using the UI, we also provide a Text2SQL interface to help you get started. We automatically translate natural language questions into SQL queries and return the results in a tabular format. -![Text2SQL](/images/text2sql.png) +![Text2SQL](/images/datasets/text2sql.png) ## Edit Your Data diff --git a/features/embeddings.mdx b/features/embeddings.mdx index d43cc1c..4aade7e 100644 --- a/features/embeddings.mdx +++ b/features/embeddings.mdx @@ -71,7 +71,7 @@ oxen download ox/Simple-Wikipedia-50k title_embeddings.parquet -o embeddings.par ### Create a workspace -In order to use embeddings, you will need to create a [workspace](/concepts/workspaces). Workspaces allow you to query and edit versions of the data without immediately committing your changes. Oxen uses [DuckDB](https://duckdb.org/2024/05/03/vector-similarity-search-vss.html) to store your embeddings and data. +In order to use embeddings, you will need to create a [workspace](/getting-started/workspaces). Workspaces allow you to query and edit versions of the data without immediately committing your changes. Oxen uses [DuckDB](https://duckdb.org/2024/05/03/vector-similarity-search-vss.html) to store your embeddings and data. If you haven't already created an Oxen repository, you should create a new one to get started. @@ -91,7 +91,7 @@ oxen add embeddings.parquet oxen commit -m "Add embeddings" ``` -Now we have our embeddings committed to the repository. We can create a workspace to query the data. A workspace is based off of a branch and links directly to a version of a dataset at a commit. If you want to learn more about workspaces, check out the [workspaces](/concepts/workspaces) page. +Now we have our embeddings committed to the repository. We can create a workspace to query the data. A workspace is based off of a branch and links directly to a version of a dataset at a commit. If you want to learn more about workspaces, check out the [workspaces](/getting-started/workspaces) page. Create a workspace and give it a name. @@ -153,4 +153,4 @@ oxen workspace df get embeddings.parquet \ Workspaces are power tools once you wrap your head around them. They allow you to build some really interesting exploratory data analysis, labeling workflows, and search pipelines. Using nearest neighbor search with embeddings is a great way to sift through large datasets, prototype RAG pipelines, and test different embeddings models. -If you want to see the underlying HTTP request that is being made, checkout the [API reference](/http-api/workspaces). +If you want to see the underlying HTTP request that is being made, checkout the [API reference](/http-api/data-frames/get-data-frame-slice). diff --git a/features/labeling_data.mdx b/features/labeling_data.mdx index 71c4506..cdd19bd 100644 --- a/features/labeling_data.mdx +++ b/features/labeling_data.mdx @@ -88,7 +88,7 @@ Congratulations! You've just seen how easy it is to edit your datasets without d ## Using the Python Library -The web interface is built on top of [HTTP APIs](/http-api/data_frames) that are also exposed through Oxen.ai's [Python Library](/python-api/data_frame). This makes it easy to interact with data frames programatically and build your own custom labeling tools. Under the hood the dataset will be indexed into DuckDB within a Workspace to make it fast to query and update the data before fully committing it back to your repository. +The web interface is built on top of [HTTP APIs](/http-api/data-frames/get-data-frame-slice) that are also exposed through Oxen.ai's [Python Library](/python-api/data_frame). This makes it easy to interact with data frames programatically and build your own custom labeling tools. Under the hood the dataset will be indexed into DuckDB within a Workspace to make it fast to query and update the data before fully committing it back to your repository. ### Indexing a Data Frame diff --git a/fine-tuning-api/02_fine_tuning_image.mdx b/fine-tuning-api/02_fine_tuning_image.mdx index 10d2a29..5522fd1 100644 --- a/fine-tuning-api/02_fine_tuning_image.mdx +++ b/fine-tuning-api/02_fine_tuning_image.mdx @@ -13,7 +13,7 @@ With this guide, you will: - **Run inference** with the deployed model We will use one of the Qwen image-editing models described in -[`Available Fine-Tuning Models`](/fine-tuning-api/03_available_fine_tune_models): +[`Available Fine-Tuning Models`](/fine-tuning-api/overview): - `base_model`: `Qwen/Qwen-Image-Edit` - `script_type`: `image_editing` diff --git a/fine-tuning-api/overview.mdx b/fine-tuning-api/overview.mdx index 395f04c..71be9ef 100644 --- a/fine-tuning-api/overview.mdx +++ b/fine-tuning-api/overview.mdx @@ -109,16 +109,16 @@ Each operation type requires specific data columns: Choose your use case to get started with minimal examples: - + Fine-tune chatbots and Q&A models - + Create custom image styles - + Fine-tune image transformation models - + Generate custom videos diff --git a/fine-tuning-api/tutorials/02_fine_tuning_image.mdx b/fine-tuning-api/tutorials/02_fine_tuning_image.mdx index 10d2a29..5522fd1 100644 --- a/fine-tuning-api/tutorials/02_fine_tuning_image.mdx +++ b/fine-tuning-api/tutorials/02_fine_tuning_image.mdx @@ -13,7 +13,7 @@ With this guide, you will: - **Run inference** with the deployed model We will use one of the Qwen image-editing models described in -[`Available Fine-Tuning Models`](/fine-tuning-api/03_available_fine_tune_models): +[`Available Fine-Tuning Models`](/fine-tuning-api/overview): - `base_model`: `Qwen/Qwen-Image-Edit` - `script_type`: `image_editing` diff --git a/getting-started/auth.mdx b/getting-started/auth.mdx new file mode 100644 index 0000000..813fc1f --- /dev/null +++ b/getting-started/auth.mdx @@ -0,0 +1,49 @@ +# 🔒 Authentication & Authorization + +At Oxen.ai we take your data security and privacy seriously. Every action performed on an Oxen.ai repository needs to be authenticated and authorized. You must provide an API_KEY to perform any actions on an Oxen.ai repository. You must also have the correct permissions to perform the action. + +### Public vs Private Repositories + +On Oxen.ai you can choose if you want your repositories to be public or private. Private repositories limit access to the public internet, and can have viewers, editors, or admin roles. + +### Obtain Auth Token + +Before you can write to a remote repository, you must have permissions to do so. Permissions are handled through an `auth_token` that is passed in with the request. + +You can obtain an `API Key` by creating an account on [Oxen.ai](https://oxen.ai) and going to your profile. + +Oxen.ai authentication key + +### Set Auth Token + +To set your auth token, you can either set it through the command line interface or directly in python. + + + +```python Python +from oxen.auth import config_auth +config_auth("YOUR_AUTH_TOKEN") +``` + +```bash CLI +oxen config --auth 'hub.oxen.ai' YOUR_AUTH_TOKEN +``` + + + +This will write the auth token to a file in `~/.config/oxen/auth_config.toml` for future use. If you set up your own [oxen-server](/getting-started/oxen-server) you can generate custom auth tokens there. + +## Setup User + +In order for Oxen to know who is committing and where to sync to by default, you must call [config_user](/python-api/user) and pass in the name and email you would like to use in your commit messages. + +```python +from oxen.user import config_user +config_user("YOUR NAME", "YOUR EMAIL") +``` + +This will save a file in `~/.config/oxen/user_config.toml` that contains your user configuration. diff --git a/getting-started/batch_inference.mdx b/getting-started/batch_inference.mdx index 1113b48..df695ee 100644 --- a/getting-started/batch_inference.mdx +++ b/getting-started/batch_inference.mdx @@ -9,7 +9,7 @@ In Oxen.ai, a batch inference lets you test a model on a dataset row by row to s Oxen.ai Evaluation -Once the model has run, you can use the [dataset viewer](/getting-started/datasets) to query the results and get a sense of how well your model performs given your data. +Once the model has run, you can use the [dataset viewer](/examples/data/datasets) to query the results and get a sense of how well your model performs given your data. Oxen.ai Evaluation @@ -51,7 +51,7 @@ Now you can grab some coffee, sit back, and watch the model run. Feel free to cl If you need to run a model as part of a larger workflow, you can use the [Oxen.ai API](/http-api) to programmatically run a model on a dataset. -Currently the API is only exposed over HTTP requests and requires a valid [api key](/getting-started/python#obtain-auth-token) in the header. To kick off a model inference job, you can send a POST request to the `/api/repos/:namespace/:repo_name/evaluations/:resource` endpoint. +Currently the API is only exposed over HTTP requests and requires a valid [api key](/python-api#obtain-auth-token) in the header. To kick off a model inference job, you can send a POST request to the `/api/repos/:namespace/:repo_name/evaluations/:resource` endpoint. For example if the file you want to process is at: diff --git a/getting-started/command-line/branches.mdx b/getting-started/command-line/branches.mdx new file mode 100644 index 0000000..462ac40 --- /dev/null +++ b/getting-started/command-line/branches.mdx @@ -0,0 +1,70 @@ +--- +title: '🌿 Branches & Merging' +description: 'Create branches, switch between them, and merge work together.' +--- + +Branches let you take a snapshot of your data, experiment freely, and merge the results back without affecting the original. The commands map closely to git. + +## Create a Branch + +Create a new branch and check it out in one step with `oxen checkout -b`. + +```bash +oxen checkout -b feature +``` + +## List Branches + +To list the branches in your repository (highlighting the one you're on), use `oxen branch`. + +```bash +oxen branch +``` + +To see branches that exist on the remotes, add `--remote`. + +```bash +oxen branch --remote +``` + +To delete a branch, use `oxen branch --delete`. This fails if the branch has changes that haven't been merged. + +```bash +oxen branch --delete feature +``` + +Use `-D` to force-delete a branch. + +```bash +oxen branch -D feature +``` + +## Switch Between Branches + +Use `oxen checkout` to switch branches. This restores the working directory to the HEAD commit of the branch you're checking out. + +```bash +oxen checkout main +``` + +You can also check out a specific commit. + +```bash +oxen checkout COMMIT_ID +``` + +## Merge Branches + +Merge another branch into your current branch with `oxen merge`. This creates a merge commit, or fails if there are conflicts to resolve. + +```bash +oxen merge TARGET_BRANCH +``` + +If you're collaborating, you may instead want to open a merge request through the [Oxen.ai web UI](https://oxen.ai). + +Oxen.ai merge request diff --git a/getting-started/command-line/dev_tools.mdx b/getting-started/command-line/debugging.mdx similarity index 97% rename from getting-started/command-line/dev_tools.mdx rename to getting-started/command-line/debugging.mdx index f3aaf91..7882b73 100644 --- a/getting-started/command-line/dev_tools.mdx +++ b/getting-started/command-line/debugging.mdx @@ -1,6 +1,6 @@ --- -title: '🔌 Dev Tools' -description: 'Oxen provides multiple tools to debug and configure the CLI' +title: '🔧 Debugging & Performance' +description: 'Diagnose repository state and tune Oxen for your hardware and network.' --- ## Oxen Tree diff --git a/getting-started/command-line/import_data.mdx b/getting-started/command-line/import_data.mdx deleted file mode 100644 index fca4b92..0000000 --- a/getting-started/command-line/import_data.mdx +++ /dev/null @@ -1,106 +0,0 @@ ---- -title: '📥 Import Data' -description: "If you're working with a repository that's hosted remotely, there are several options to download the data." ---- - -## Clone Repository - -There's a few ways to clone an Oxen repository, depending on the level of data transfer you want to incur. The default `oxen clone` with no flags will download the *latest commit* from the `main` branch. - -```bash -oxen clone https://hub.oxen.ai/ox/CatDogBBox -``` -To clone from a specific branch, you can use the `-b` flag. - -```bash -oxen clone https://hub.oxen.ai/ox/CatDogBBox -b my-pets -``` -This creates a new directory `CatDogBBox` in which the files from the latest commit are downloaded, and a `.oxen` folder containing the [Merkle Tree](https://ghost.oxen.ai/merkle-tree-101/) for each commit in the branch's history and metadata for the repository. - -### Clone all - -To clone the commit history for every branch, you can use the `--all` flag. - -```bash -oxen clone https://hub.oxen.ai/ox/CatDogBBox --all -``` - -This is useful if you want to migrate a repository to a new remote. - -### Cloning a subtree - -If you're only working with a subset of the repository, you can use the `--filter` and `--depth` flags to limit the clone to a subsection of the full tree. `--filter` selects the directories to be cloned, while `--depth` limits how many subdirectories can be recursed into. - -```bash -oxen clone https://hub.oxen.ai/ox/CatDogBBox --filter annotations --depth 1 -``` - -This clones the subtree of the commit that starts at annotations directory, without recursing into any new subdirectories. Only the files and merkle tree nodes from the `annotations` directory will be pulled locally. - -### Remote Mode - -If you're working with a larger repository than you can store on your local device, you can clone the repository in Remote Mode to get the commit merkle trees without downloading the files. - -```bash -oxen clone --remote https://hub.oxen.ai/ox/CatDogBBox -``` - -In remote-mode repositories, you can download individual files or directories with `oxen restore`. - -```bash -oxen restore path/to/file -``` - -This is useful if you want to view the state of the repository locally without waiting for all its files to download. - -## Set Remote - -If create a repo locally, you can use `oxen config --set-remote` to point it to a repository that's hosted remotely. This allows you to `fetch` or `pull` from that repository. - -```bash -oxen config --set-remote origin https://hub.oxen.ai/ox/CatDogBBox -``` -To set a remote, use the URL of the remote repository and specify a name to associate it with on the local repo. By default, cloned repositories will use the name 'origin' for their remote. - -You can use this command to change a local repository's remotes or add new ones. A repo can have multiple remotes, although most oxen commands that interact with a remote repository will default to 'origin' if no remote is specified. - -## Pull Changes - -Once you have a local repo with a remote, you can `pull` the latest changes for a branch. This fetches any new commits, downloading their files and commit merkle trees, and then checks out the latest commit in the working directory - -```bash -oxen pull origin main -``` - -If no arguments are provided, the remote defaults to `origin` and branch defaults to `main` - -```bash -oxen pull -``` - -As with `clone`, you can pull all branches with `--all` - -```bash -oxen pull --all -``` - -## Fetch Changes - -If you want to fetch the latest changes without checking them out in the working directory, you can use `oxen fetch` - -```bash -oxen fetch -``` - -## Download Data - -If you want to download specific files or directories with the `oxen download` command. - -```bash -oxen download ox/CatDogBBox test.csv -``` -You can download from a specific revision (either a branch or commit id) with the `--revision` flag - -```bash -oxen download ox/CatDogBBox path/to/folder --revision commit_or_branch_name -``` diff --git a/getting-started/command-line/local_development.mdx b/getting-started/command-line/local_development.mdx deleted file mode 100644 index b4cd509..0000000 --- a/getting-started/command-line/local_development.mdx +++ /dev/null @@ -1,255 +0,0 @@ ---- -title: '🏗️ Local Development' -description: 'Oxen provides numerous tools to work with and version your data from the command line.' ---- - -## Create a repository - -You can initialize an Oxen repository locally with `oxen init` - -```bash -oxen init -``` - -This will create a `.oxen` folder in your working directory, which contains the metadata for the repository. It's also where each commit's [Merkle Tree](https://ghost.oxen.ai/merkle-tree-101/) will be stored new files are added and committed. - -## Add files - -Add files to a repository with `oxen add`. This copies the files' contents to the repository's `version store` and stages the changes to the `staged_db`. -You can add files and directories (added recursively) using either absolute paths or their relative paths to the repo root. - -```bash -oxen add path/to/file.txt -``` - -```bash -oxen add images/ -``` - -You can also add with glob paths or wildcards. This will add all files matching the glob pattern that aren't ignored in `.oxenignore` - -```bash -oxen add images/f* # Adds all paths starting with an 'f' in the images dir -``` - -```bash -oxen add . # Adds the current directory -``` - -This can be used to stage new, modified, or removed files and directories to the repo. You can view staged changes with `oxen status` - - -``` bash -oxen status -``` - -Output: - -``` -On branch main -> 113b00a451ac07d284b29ce5604d6891 - -Files to be committed: - (use "oxen restore --staged ..." to unstage) - modified: modified_file.txt - new file: new_file.txt - removed: removed_file.txt -``` -Oxen allows you to stage and version many different data types in the same repository, accessing them the same at the level of the CLI. That means you don't have to worry about what kind of data you're working with when you're using `oxen` commands. Under the hood, oxen stores different [File Metadata](/concepts/file_metadata) for different types for additional functionality - -## Commit Changes - -To commit the changes that are staged with a message you can use - -```bash -oxen commit -m "Some informative commit message" -``` - -This creates a new commit on the current branch. If the repository was previosuly empty, this also creates the `main` branch - -Once you've committed a file, a copy of its contents will be stored in the repository's `version store`. By default, the version files will be located in `.oxen/versions/files`. File and directory metadata are stored in the [Merkle Tree](https://ghost.oxen.ai/merkle-tree-101/), which mimics the structure of the working directory - -## View History - -You can see the history of changes on your current branch by running: - -```bash -oxen log -``` - -Output: - -``` -commit 6b958e268656b0c5 - -Author: Ox -Date: Fri, 21 Oct 2022 16:08:39 -0700 - - adding 10,000 training images - -commit e76dd52a4fc13a6f - -Author: Ox -Date: Fri, 21 Oct 2022 16:05:22 -0700 - - Initialized Repo 🐂 -``` - -## List Branches - -To view the branch you're on, use `oxen branch`. This will list all of the repo's branches, highlighting the current one. - -```bash -oxen branch -``` - -You can also view the branches that exist on a repo's remotes with `oxen branch --remote` - -```bash -oxen branch --remote -``` -To delete a branch, use `oxen branch --delete`. This will fail if the branch is not safe to delete (i.e., it has changes that aren't merged to main) - -```bash -oxen branch --delete feature -``` - -```bash -oxen branch -D feature # use -D to force delete a branch -``` - -## View Status - -To see what data is tracked, staged, modified, removed, or not yet added to the repository you can use `oxen status` - -Note: since we are dealing with large datasets with many files, `status` rolls up the changes and summarizes them for you. - -```bash -oxen status -``` - -Output: - -``` -On branch main -> e76dd52a4fc13a6f - -Directories to be committed - added: images with added 8108 files - -Files to be committed: - new file: images/000000000042.jpg - new file: images/000000000074.jpg - new file: images/000000000109.jpg - new file: images/000000000307.jpg - new file: images/000000000309.jpg - new file: images/000000000394.jpg - new file: images/000000000400.jpg - new file: images/000000000443.jpg - new file: images/000000000490.jpg - new file: images/000000000575.jpg - ... and 8098 others - -Untracked Directories - (use "oxen add ..." to update what will be committed) - annotations/ (3 items) -``` - -You can paginate through the changes with the `-s` (skip) and `-l` (limit) params on the status command. Run `oxen status --help` for more info. - -## Restore Files - -If you want to revert changes you've made to a file in the working directory, you can use `oxen restore` to restore the file to its version in the HEAD commit. This works with deleted or modified files - -```bash -oxen restore path/to/file.txt -``` - -You can also use `oxen restore` on directories to recursively restore their files - -If you want to restore to a specific version in your commit history, you can supply the commit id or branch name with the `--source` flag. - -```bash -oxen restore path/to/file.txt --source COMMIT_ID -``` - -As with git, you can also restore files from the `staged_db` using `oxen restore --staged` - -```bash -oxen restore --staged path/to/dir -``` - -## Removing Data - -To stage a file to be removed from the next commit, use the `oxen rm` command. - -```bash -oxen rm path/to/file.txt -``` - -Note: the file must be committed in the history for this to work. If you want to remove a file that has not been committed yet, simple use your /bin/rm command. - -To recursively remove a directory use the `-r` flag. - -```bash -oxen rm -r path/to/dir -``` - -You can also remove entries from the `staged_db` using `oxen rm --staged` - -```bash -oxen rm --staged -r path/to/dir -``` - -## Checkout Revision - -You can create a new branch with `oxen checkout --branch` - -```bash -oxen checkout -b feature -``` - -Once you have multiple branches, you can use `oxen checkout` to safely move between them. This will restore the working directory to head commit of the branch you're checking out - -```bash -oxen checkout main -``` -You can also checkout a commit ID - -```bash -oxen checkout COMMIT_ID -``` - -## View Diffs - -Oxen can compute and display the diff between files using the [oxen diff](/concepts/diffs) command - -```bash -oxen diff dataset.csv -``` - -This will compare the file `dataset.csv` in the working directory with its version in the HEAD commit. You can also compare different files, compare files across revisions, and compare different revisions with each other - -## Merge Commits - -You can create a merge commit with `oxen merge`. This will merge the current branch into the target, or fail if there are merge conflicts - -```bash -oxen merge TARGET_BRANCH -``` - -If you're collaborating with on a repo, you may instead want to create a merge request. You can do this through the UI on [Oxen.ai](https://oxen.ai) - -Oxen.ai merge request - -## Prune Commits - -If you've accidentally committed sensitive data or have a bloated commit history with many orphaned files, you can use [oxen prune](/commands/prune) to cleanup the repository. - -```bash -oxen prune -``` - - diff --git a/commands/prune.mdx b/getting-started/command-line/maintenance.mdx similarity index 95% rename from commands/prune.mdx rename to getting-started/command-line/maintenance.mdx index a62abd0..4c62830 100644 --- a/commands/prune.mdx +++ b/getting-started/command-line/maintenance.mdx @@ -1,6 +1,6 @@ --- -title: 'Oxen Prune' -description: 'Remove orphaned nodes and version files from your repository' +title: '🧹 Maintenance' +description: 'Reclaim disk space by removing unreferenced data from your repository.' --- The `oxen prune` command removes orphaned nodes and version files that are not referenced by any commit in your repository. This helps reclaim disk space by cleaning up unreferenced data that accumulates over time. @@ -39,7 +39,9 @@ This is useful to see how much space would be freed without making any changes t After running `oxen prune`, you'll see detailed statistics about the operation: -```bash +Output: + +``` Prune Statistics: Nodes: Scanned: 1250 diff --git a/getting-started/command-line/push_changes.mdx b/getting-started/command-line/push_changes.mdx deleted file mode 100644 index 7e76be4..0000000 --- a/getting-started/command-line/push_changes.mdx +++ /dev/null @@ -1,92 +0,0 @@ ---- -title: '⬆️ Push Changes' -description: "After you make changes locally, push them to update the remote" ---- - -## Push Changes - -Once you've committed changes to a local repository, you can push them to a remote with `oxen push` - -```bash -oxen push origin main -``` - -If you don't supply a remote name or branch, they default to `origin` and the current branch respectively. - -```bash -oxen push -``` - -### Resume Push - -If a push is cancelled partway through, you can use the `--missing-files` flag to resume push progress and upload the remaining files - -```bash -oxen push --missing-files -``` -## Set Remote - -You can use `oxen config --set-remote` to set the remote for a repository. This allows you to push changes to that remote, provided you've set the API key and have permission. - -```bash -oxen config --set-remote origin https://hub.oxen.ai/ox/CatDogBBox -``` - -## View Remotes - -The `oxen remote` command allows you to view what remotes you have for a repository - -```bash -oxen remote -``` - -Output: - -``` -origin -``` - -Use the `--verbose` flag to list the remotes with their URLs - -```bash -oxen remote --verbose -``` - -Output: - -``` -origin https://hub.oxen.ai/ox/CatDogBBox -local_dev http://localhost:3000/ox/CatDogBBox -``` - -## Create Remote - -You can also create a remote from the CLI with `oxen create-remote`. - -```bash -oxen create-remote --host hub.oxen.ai --scheme https --name ox/SampleRepo -``` - -## Workspaces - -You can also stage changes to a remote by using a [Workspace](/concepts/workspaces). This allows you to skip copying files to a local repository, making it ideal for bulk imports. - -Create a workspace on a branch with `oxen workspace create` - -```bash -oxen workspace create -``` - -You can then add the files with `oxen workspace add`, uploading their contents to the remote and staging them for commit - -```bash -oxen workspace add images --workspace-id 117abd2d-3363-497d-ac93-a5cb3c280234 -``` - -Then, commit the changes with `oxen workspace commit` - -```bash -oxen workspace commit -m "Uploading Images" --workspace-id 117abd2d-3363-497d-ac93-a5cb3c280234 -``` - - diff --git a/getting-started/command-line/config.mdx b/getting-started/command-line/setup.mdx similarity index 88% rename from getting-started/command-line/config.mdx rename to getting-started/command-line/setup.mdx index 0f03c00..fe897bb 100644 --- a/getting-started/command-line/config.mdx +++ b/getting-started/command-line/setup.mdx @@ -1,5 +1,6 @@ --- -title: '⚙️ Configuration & Auth' +title: '⚙️ Setup & Authentication' +description: 'Configure your local Oxen identity and authenticate with a remote.' --- ## Setup User diff --git a/getting-started/command-line/start_repository.mdx b/getting-started/command-line/start_repository.mdx new file mode 100644 index 0000000..8d7a6ba --- /dev/null +++ b/getting-started/command-line/start_repository.mdx @@ -0,0 +1,104 @@ +--- +title: '🚀 Start a Repository' +description: 'Create a new repository, clone an existing one, or download specific files.' +--- + +Most Oxen workflows begin in one of three ways: initializing a fresh local repository, cloning an existing one from a remote, or pulling down specific files without setting up a full repo. + +## Initialize a Local Repository + +Create a new Oxen repository in the current directory with `oxen init`. + +```bash +oxen init +``` + +This creates a `.oxen/` directory in your working directory containing the repository metadata. As you add and commit files, each commit's [Merkle Tree](https://ghost.oxen.ai/merkle-tree-101/) is stored under `.oxen/`. + +## Clone a Remote Repository + +There are a few ways to clone an Oxen repository, depending on how much data you want to transfer. The default `oxen clone` with no flags downloads the *latest commit* from the `main` branch. + +```bash +oxen clone https://hub.oxen.ai/ox/CatDogBBox +``` + +This creates a new directory `CatDogBBox` containing the files from the latest commit, plus a `.oxen/` folder with the [Merkle Tree](https://ghost.oxen.ai/merkle-tree-101/) for the branch's history. + +### Clone a Specific Branch + +Use the `-b` flag to clone from a branch other than `main`. + +```bash +oxen clone https://hub.oxen.ai/ox/CatDogBBox -b my-pets +``` + +### Clone All Branches + +To clone the commit history for every branch (useful when migrating a repo to a new remote), use `--all`. + +```bash +oxen clone https://hub.oxen.ai/ox/CatDogBBox --all +``` + +### Clone a Subtree + +If you only need a subset of the repository, use `--filter` and `--depth` to limit the clone. `--filter` selects which directories to clone, while `--depth` limits how many levels of subdirectories are recursed into. + +```bash +oxen clone https://hub.oxen.ai/ox/CatDogBBox --filter annotations --depth 1 +``` + +This clones only the subtree starting at the `annotations` directory, without recursing into any new subdirectories. + +### Remote Mode + +If the repository is larger than you can store locally, you can clone it in remote mode to download the commit Merkle trees without the file contents. + +```bash +oxen clone --remote https://hub.oxen.ai/ox/CatDogBBox +``` + +In a remote-mode repository, you can download individual files or directories on demand with `oxen restore`. + +```bash +oxen restore path/to/file +``` + +This is useful for inspecting the state of a repository without waiting for all its files to download. + +## Configure a Remote + +If you initialized a repository locally, you can point it at a remote with `oxen config --set-remote`. This is what enables `oxen push`, `oxen pull`, and `oxen fetch`. + +```bash +oxen config --set-remote origin https://hub.oxen.ai/ox/CatDogBBox +``` + +Specify a remote name (commonly `origin`) and the URL of the remote repository. Cloned repositories already have `origin` set automatically. + +A repo can have multiple remotes — most commands default to `origin` if no remote is specified. + +### Create a Remote from the CLI + +If the remote repository doesn't exist yet, you can create it from the CLI with `oxen create-remote`. + +```bash +oxen create-remote --host hub.oxen.ai --scheme https --name ox/SampleRepo +``` + +You can also create remotes through the [Oxen.ai web UI](https://oxen.ai). + +## Download Specific Files + +If you only need specific files or directories — without cloning the whole repository — use `oxen download`. + +```bash +oxen download ox/CatDogBBox test.csv +``` + +To download from a specific branch or commit, pass `--revision`. + +```bash +oxen download ox/CatDogBBox path/to/folder --revision commit_or_branch_name +``` diff --git a/getting-started/command-line/sync_remote.mdx b/getting-started/command-line/sync_remote.mdx new file mode 100644 index 0000000..6ff484f --- /dev/null +++ b/getting-started/command-line/sync_remote.mdx @@ -0,0 +1,87 @@ +--- +title: '🔄 Sync with a Remote' +description: 'Push, pull, and fetch changes between your local repository and a remote.' +--- + +Once your repository has a remote configured (see [Start a Repository](/getting-started/command-line/start_repository#configure-a-remote)), you can push your work and pull collaborators' changes. + +## Push Changes + +Once you've committed changes locally, push them to a remote with `oxen push`. + +```bash +oxen push origin main +``` + +If you don't supply a remote name or branch, they default to `origin` and the current branch. + +```bash +oxen push +``` + +### Resume a Push + +If a push is cancelled partway through, use `--missing-files` to resume and upload only the remaining files. + +```bash +oxen push --missing-files +``` + +## Pull Changes + +To pull the latest commits for a branch — downloading their files and Merkle trees, then checking out the latest commit — use `oxen pull`. + +```bash +oxen pull origin main +``` + +If no arguments are provided, the remote defaults to `origin` and the branch defaults to the current branch. + +```bash +oxen pull +``` + +As with `clone`, you can pull all branches with `--all`. + +```bash +oxen pull --all +``` + +## Fetch Changes + +To fetch the latest changes without checking them out in the working directory, use `oxen fetch`. + +```bash +oxen fetch +``` + +This is useful when you want to inspect what's new on the remote before deciding whether to merge or check it out. + +## View Configured Remotes + +`oxen remote` lists the remotes configured for your repository. + +```bash +oxen remote +``` + +Output: + +``` +origin +``` + +Use `--verbose` to also see each remote's URL. + +```bash +oxen remote --verbose +``` + +Output: + +``` +origin https://hub.oxen.ai/ox/CatDogBBox +local_dev http://localhost:3000/ox/CatDogBBox +``` + +To add or change a remote's URL, see [Configure a Remote](/getting-started/command-line/start_repository#configure-a-remote) on the Start a Repository page. diff --git a/getting-started/command-line/track_changes.mdx b/getting-started/command-line/track_changes.mdx new file mode 100644 index 0000000..6553669 --- /dev/null +++ b/getting-started/command-line/track_changes.mdx @@ -0,0 +1,164 @@ +--- +title: '📝 Track Changes' +description: 'Stage, commit, inspect, and undo changes in your local repository.' +--- + +The day-to-day Oxen workflow follows the same shape as git: stage what's changed, commit it, and inspect the history when you need to. + +## Stage Files + +Add files to a repository with `oxen add`. This copies the files' contents to the repository's version store and stages the changes for commit. You can use absolute paths or paths relative to the repo root. + +```bash +oxen add path/to/file.txt +``` + +```bash +oxen add images/ +``` + +You can also stage matching files with glob patterns and wildcards. This stages everything that matches the pattern and isn't excluded by `.oxenignore`. + +```bash +# Adds all paths starting with an 'f' in the images dir +oxen add images/f* +``` + +```bash +# Adds everything in the current directory +oxen add . +``` + +`oxen add` handles new, modified, and removed files and directories. + +Oxen lets you version any data type — text, images, audio, video, parquet, etc. — in the same repository, and you interact with all of them through the same commands. Under the hood, Oxen stores type-specific [file metadata](/concepts/file_metadata) to power richer features. + +## View Status + +To see what is tracked, staged, modified, removed, or not yet added, use `oxen status`. + +```bash +oxen status +``` + +Output: + +``` +On branch main -> e76dd52a4fc13a6f + +Directories to be committed + added: images with added 8108 files + +Files to be committed: + new file: images/000000000042.jpg + new file: images/000000000074.jpg + new file: images/000000000109.jpg + new file: images/000000000307.jpg + new file: images/000000000309.jpg + new file: images/000000000394.jpg + new file: images/000000000400.jpg + new file: images/000000000443.jpg + new file: images/000000000490.jpg + new file: images/000000000575.jpg + ... and 8098 others + +Untracked Directories + (use "oxen add ..." to update what will be committed) + annotations/ (3 items) +``` + +Because Oxen is built for large datasets with many files, `status` rolls up directory-level changes and summarizes them. + +You can paginate through staged files with the `-s` (skip) and `-l` (limit) flags. Run `oxen status --help` for the full list. + +## Commit Changes + +Once changes are staged, commit them with a message. + +```bash +oxen commit -m "Some informative commit message" +``` + +This creates a new commit on the current branch. If the repository was previously empty, this also creates the `main` branch. + +After a commit, a copy of each file's contents lives in the repository's version store (by default `.oxen/versions/files`). File and directory metadata are stored in the [Merkle Tree](https://ghost.oxen.ai/merkle-tree-101/), which mirrors the working directory structure. + +## View History + +Show the commit history of your current branch with `oxen log`. + +```bash +oxen log +``` + +Output: + +``` +commit 6b958e268656b0c5 + +Author: Ox +Date: Fri, 21 Oct 2022 16:08:39 -0700 + + adding 10,000 training images + +commit e76dd52a4fc13a6f + +Author: Ox +Date: Fri, 21 Oct 2022 16:05:22 -0700 + + Initialized Repo 🐂 +``` + +## View Diffs + +Oxen can compute and display diffs between files using the [oxen diff](/concepts/diffs) command. + +```bash +oxen diff dataset.csv +``` + +This compares `dataset.csv` in the working directory with its version in the HEAD commit. You can also diff different files against each other, files across revisions, or whole revisions against each other. See the [diff concepts page](/concepts/diffs) for the full set of options. + +## Restore Files + +To revert changes you've made to a file in the working directory, use `oxen restore`. This restores the file to its version in the HEAD commit, and works on both modified and deleted files. + +```bash +oxen restore path/to/file.txt +``` + +You can also restore directories — `oxen restore` will recursively restore the files inside. + +To restore from a specific commit or branch, pass `--source`. + +```bash +oxen restore path/to/file.txt --source COMMIT_ID +``` + +Like git, you can also unstage files (without changing the working directory) using `--staged`. + +```bash +oxen restore --staged path/to/dir +``` + +## Remove Files + +To stage a file to be removed from the next commit, use `oxen rm`. + +```bash +oxen rm path/to/file.txt +``` + +The file must already be committed for this to work. If you want to remove a file that has not been committed yet, just use your shell's `rm` command. + +To recursively remove a directory, use the `-r` flag. + +```bash +oxen rm -r path/to/dir +``` + +You can also remove entries from the staging area only — without deleting the file from the working directory — using `--staged`. + +```bash +oxen rm --staged -r path/to/dir +``` diff --git a/getting-started/command-line/workspaces.mdx b/getting-started/command-line/workspaces.mdx new file mode 100644 index 0000000..29d4be5 --- /dev/null +++ b/getting-started/command-line/workspaces.mdx @@ -0,0 +1,36 @@ +--- +title: '🗂️ Workspaces' +description: 'Stage and commit changes directly against a remote without a full local clone.' +--- + +A workspace lets you stage changes against a remote branch without first copying its files to a local repository. This makes it ideal for bulk imports, automation, and any case where you don't need a local working copy. + +For the conceptual overview, see [Workspaces](/getting-started/workspaces). For the Python interface, see [`python-api/workspace`](/python-api/workspace). + +## Create a Workspace + +Create a workspace on the current branch with `oxen workspace create`. + +```bash +oxen workspace create +``` + +This returns a workspace ID you'll use for subsequent commands. + +## Stage Files in a Workspace + +Add files to the workspace with `oxen workspace add`. The file contents are uploaded directly to the remote and staged for commit. + +```bash +oxen workspace add images --workspace-id 117abd2d-3363-497d-ac93-a5cb3c280234 +``` + +## Commit a Workspace + +Once your changes are staged, commit them with `oxen workspace commit`. + +```bash +oxen workspace commit -m "Uploading Images" --workspace-id 117abd2d-3363-497d-ac93-a5cb3c280234 +``` + +The commit lands on the remote branch directly — no local push step required. diff --git a/getting-started/data.mdx b/getting-started/data.mdx new file mode 100644 index 0000000..82011c8 --- /dev/null +++ b/getting-started/data.mdx @@ -0,0 +1,19 @@ +--- +title: 'Repositories on Oxen.ai' +sidebarTitle: 📖 Overview +description: 'Oxen.ai allows you to version and store your data in repositories. Think of it like git for large data.' +--- + +When using models on Oxen.ai, by default we store the model inputs, outputs, and metadata in a repository. Every piece of data is versioned so you can trace the provenance of your data and models. + +In order to version data at scale, we built an [open source version control system](https://github.com/Oxen-AI/Oxen) that can scale to monorepos with millions of files and terabytes of data. + +## Key Concepts + +* **Repository**: A collection of files and folders that is versioned together. +* **Commit**: A snapshot of a repository at a given time. +* **Branch**: A named pointer to a commit. +* **Dataset**: A tabular file within an oxen repository that can be indexed and searched. ie csv, jsonl, parquet, etc. +* **Workspace**: The equivalent of a working directory on the remote server where files can be added in an uncommitted state. + +Follow along with the [Version Control](/examples/data/versioning) guide to learn how to version your data. \ No newline at end of file diff --git a/getting-started/fine-tuning.mdx b/getting-started/fine-tuning.mdx index 0aec451..666625b 100644 --- a/getting-started/fine-tuning.mdx +++ b/getting-started/fine-tuning.mdx @@ -4,13 +4,13 @@ sidebarTitle: 📖 Overview description: 'Oxen.ai allows you to fine-tune text, image, and video models with a few clicks.' --- -Simply [upload your data](/getting-started/datasets), and we will provision GPU infrastructure and run the fine-tune. When it's done, Oxen.ai will save the fine-tuned model weights directly to your repository, and we spin down the GPU for you. No worrying about run away costs or having to manage your own infrastructure. +Simply [upload your data](/examples/data/datasets), and we will provision GPU infrastructure and run the fine-tune. When it's done, Oxen.ai will save the fine-tuned model weights directly to your repository, and we spin down the GPU for you. No worrying about run away costs or having to manage your own infrastructure. Once the fine-tuning process is complete, you can deploy your model to a dedicated endpoint and use the [inference endpoints](/getting-started/inference) to integrate it into your application. -Fine-Tuning Ox +Fine-Tuning Ox -Oxen.ai automatically [versions](/getting-started/versioning) and manages the raw model weights and datasets, so that you can always track the data that was used to train the model, or download the model to run locally. +Oxen.ai automatically [versions](/examples/data/versioning) and manages the raw model weights and datasets, so that you can always track the data that was used to train the model, or download the model to run locally. ## Why Fine-Tune? @@ -99,7 +99,7 @@ If you are fine-tuning an [image](/examples/fine-tuning/image_generation) or [vi Fine-Tuning Samples -Click on the "Info" tab to see the fine-tuning configuration and all the hyper-parameters used. This will include a link to the [dataset version](/getting-started/versioning) you used and the raw model weights for downloading and running locally. +Click on the "Info" tab to see the fine-tuning configuration and all the hyper-parameters used. This will include a link to the [dataset version](/examples/data/versioning) you used and the raw model weights for downloading and running locally. Fine-Tuning Samples @@ -142,7 +142,7 @@ For image and video generation, you can use the [playground](https://oxen.ai/ai/ ## Downloading the Model -If you want access to the raw model weights, you can download them from the repository using the Oxen.ai [Python Library](/getting-started/python) or the [CLI](/getting-started/cli). +If you want access to the raw model weights, you can download them from the repository using the Oxen.ai [Python Library](/python-api) or the [CLI](/getting-started/command-line/start_repository). Follow the instructions for [installing oxen](/getting-started/install) if you haven't already. diff --git a/getting-started/inference.mdx b/getting-started/inference.mdx index a9ce103..343f08a 100644 --- a/getting-started/inference.mdx +++ b/getting-started/inference.mdx @@ -6,7 +6,7 @@ description: "Oxen.ai exposes API endpoints and a playground for a variety of mo ## Model API -Oxen.ai's API allows you to start building on top of the [latest and greatest models](https://oxen.ai/ai/models) and deploy fine-tuned models with a single API. If a model is too slow, costly, inaccurate, or if you want full control of the weights, you can use our [one-click interface to fine-tune](/getting-started/fine-tuning) and deploy a custom model using the same interface. +Oxen.ai's API allows you to start building on top of the [latest models](https://oxen.ai/ai/models) and deploy [fine-tuned models](/getting-started/fine-tuning) with a single API. If a model is too slow, costly, inaccurate, or if you want full control of the weights, you can use our [one-click interface to fine-tune](/getting-started/fine-tuning) and deploy a custom model using the same interface. ## All your modalities, in one place diff --git a/getting-started/intro.mdx b/getting-started/intro.mdx index 4d8381f..327142c 100644 --- a/getting-started/intro.mdx +++ b/getting-started/intro.mdx @@ -1,71 +1,53 @@ --- title: 🐂 Oxen.ai -description: Infrastructure for collaborating on datasets and fine-tuning open source models. +description: The platform for building AI on your own data. --- -Oxen.ai Moon Ox Hero +import { CTAButton } from "/snippets/cta-button.mdx"; -Oxen.ai gives you the tools to [fine-tune](/getting-started/fine-tuning) open source [models](/getting-started/inference) on your own [datasets](/getting-started/datasets). We believe that your data is your differentiator, and training models on your own data should be easy, fast, and approachable for everyone. +Oxen.ai gives developers, creators, and teams easy access to the latest AI models, plus the data infrastructure to organize, version, collaborate on, and customize the data behind them. -The platform makes it easy to spin up GPU infrastructure to [train models](/examples/notebooks/train_llm) or run [inference](/getting-started/inference) at scale. Never lose track of which [dataset](/getting-started/datasets) trained which [model](/getting-started/fine-tuning) because all files are [version controlled](/getting-started/versioning) in a [repository](/getting-started/versioning). Kick off many experiments in parallel and easily collaborate with your team. +Use 200+ image, video, audio, and language models through one API. Save prompts, reference assets, generations, metadata, and training data into collaborative repositories. Track every change, branch experiments, and fine-tune custom models on your proprietary data. -## ✅ Features +Oxen.ai Moon Ox Hero -Oxen.ai allows you to own your model end to end, from curating datasets to fine-tuning models to deploying them at scale. Start by trying out the latest and greatest open source [LLM](/examples/inference/chat_completions), [video generation](/examples/inference/video_generation), or [image editing](/examples/inference/image_editing) models, then graduate to [fine-tuning](/getting-started/fine-tuning) your own models. +Your models are only as good as the data behind them. Oxen helps teams turn raw inputs and generated outputs into organized, reusable, version-controlled assets. -* ⚙️ [Fine-Tuning](/getting-started/fine-tuning) - Train models for many modalities (text, images, videos) - * [💬 Language Models](/examples/fine-tuning/chat_completions) - Generate and understand text - * [👁️ Vision Language Models](/examples/fine-tuning/image_understanding) - Understand image and video data - * [🖼️ Image Generation](/examples/fine-tuning/image_generation) - Generate images from prompts - * [🎨 Image Editing](/examples/fine-tuning/image_editing) - Edit images with prompts - * [🎥 Video Generation](/examples/fine-tuning/video_generation) - Generate videos from prompts -* 📊 [Datasets](/getting-started/datasets) - Build datasets for training, fine-tuning, or evaluating models -* ⚡️ [Inference APIs](/getting-started/inference) - Deploy fine-tuned models to an API endpoint -* 🚀 [Batch Inference](/getting-started/evaluation) - Run your model at scale over large datasets, to label data, generate synthetic data or evaluate model performance +## ⚡️ Use Any Model -### ⚙️ **Fine-Tune Models** +Access 200+ models through a unified API instead of integrating with each provider separately. Build your own product experiences on top of Oxen while keeping model inputs, outputs, and metadata stored in a data repository for auditability, reproducibility, and collaboration. -The best models are the ones that understand your context and continue to learn from your data over time. +Oxen.ai has models of every modality (text, images, videos, audio) from the major labs, and more. Explore the [list of supported models](https://oxen.ai/ai/models) to see what you can build. -Go from dataset to model in a few clicks with Oxen.ai's fine-tuning tooling. Select a dataset, define your inputs and outputs, and let Oxen.ai do the grunt work. Oxen saves model weights to it's version store tying model weights to the dataset and code that was used to train them. +View API documentation -Fine-Tuning +## 🌾 Customize Your Models -Once the model has been fine-tuned, you can easily deploy the model behind an inference endpoint and start the evaluation loop over again. +Customization can start simple: managing prompts, context, reference images, and generated outputs. If prompting isn't enough, fine-tune the model weights themselves on your proprietary datasets. In both cases, your data is what makes the model yours. -### 📊 **Build Datasets** +Fine-tune open source models for many modalities (text, images, videos) on your proprietary data. Oxen gives you the tools to version, collaborate, and customize the data behind your models. -Quality datasets are what bring your unique style and differentiation to the model. Collaborate on multi-modal datasets used for training, fine-tuning, or evaluating models. Backed by Oxen.ai's [version control](https://github.com/Oxen-AI/Oxen), you'll never worry about remembering what data a model was trained or evaluated on. +Learn how to train your own model -Learn how to interface with datasets in the Oxen.ai [python library](/getting-started/python) or more about supported dataset types and formats [here](/getting-started/datasets). +## 💾 Version Your Data -Image Net +Track the provenance of prompts, images, videos, audio, text, labels, metadata, generations, and training examples in repositories built for large datasets. Oxen gives you git-like version control for data that can scale to terabytes. -### ⚡ **Model Inference** +We built the version control system to be [blazing fast](/examples/data/performance), [open source](https://github.com/Oxen-AI/Oxen), and extensible for anyone to build upon. It can be used to version any type of data, not just machine learning datasets. It scales up to monorepos with [millions of files and terabytes of data](/examples/data/performance). -Whether you are making your first LLM call or need to deploy a fine-tuned model, [Oxen.ai](http://Oxen.ai) gives you the flexibility to swap models through a unified [model inference API](/getting-started/inference). The API is OpenAI compatible and supports a variety of foundation models as well as fine-tunable models. See the list of [supported models](https://oxen.ai/ai/models) to get started. +Get started with versioning -Oxen.ai Chat Window +## 🤝 Collaborate With Your Team -The while calling inference API is a great place to start, the real power of Oxen.ai is being able to take an open source model and [fine-tune](/getting-started/fine-tuning) it on your own data, optimizing it for accuracy, speed, or quality. Once it's fine-tuned, you can deploy it to the same interface in minutes. No DevOps or MLOps experience required. +Built on top of the [open source](https://github.com/Oxen-AI/Oxen) Oxen version control system, Oxen.ai gives your team a [web hub](https://oxen.ai) to work with your data, prompts, and generations at scale. Browse datasets, review generated outputs, and experiment across branches. Every contribution is versioned, so you can see who changed what, when, and why, and discard changes when an experiment doesn't pan out. -### 🚀 **Run Models at Scale** +Sign up for free -Find the best model and prompt for your use case. Leverage your own datasets to build custom evaluations. Evaluation results are versioned and saved as datasets in the repository for easy performance tracking over time. +For teams with stricter requirements around data residency, compliance, or IP, Oxen.ai offers private deployments in your VPC or fully on-prem. Your proprietary data, prompts, and model weights stay in your environment, while your team gets the same collaborative tooling. Reach out to [hello@oxen.ai](mailto:hello@oxen.ai) to learn more about private deployments. -Run Models on Datasets +## 🤖 Own Your AI -### 💾 **Version Control** - -The through line of Oxen.ai is that all model weights, datasets, and code are versioned and can be stored in a single repository. This makes it easy to track changes, compare models, and share datasets with your team. You can interact with the repository through the [command line interface](/getting-started/cli), [python library](/getting-started/python), or web interface. - -We built the version control system to be [blazing fast](/features/performance), [open source](https://github.com/Oxen-AI/Oxen), and extensible for anyone to build upon. It can be used to version any type of data, not just machine learning datasets. It scales up to monorepos with [millions of files and terabytes of data](/features/performance). - -## 🔒 Own Your AI - -At Oxen.ai, we believe you should **own your AI, don't rent it**. Owning your AI means that you can easily differentiate and extend a model's capabilities. You are not reliant on what the model was originally trained on. This means you can create models that are smarter in your domain, higher quality, more consistent, and better than the competition. - -For image or video generation, you may differentiate by bringing your own unique style to a model and make sure the generations are consistent. For language models, optimize for speed, cost, privacy, or custom domain knowledge. What ever type of model you are using, you should have the flexibility to train and deploy the model anywhere. +At Oxen.ai, we believe you should **own your AI**. Owning your AI means making the model uniquely yours. For image and video generation, that might mean consistent style, characters, products, or brand identity. For language models, it might mean better accuracy, lower cost, stronger privacy, or deeper domain expertise. It also means owning the data behind the model. Your prompts, reference images, generations, labels, and training data live in a versioned repository, and you can read or write any of it through the [Python API](/python-api/index), [HTTP API](/http-api/index), [command line](/getting-started/install), or [open source server](/getting-started/oxen-server). No matter what kind of model you are building with, you should be able to train it, version it, deploy it, and improve it on your terms. ## 🌾 Why Build Oxen? @@ -77,4 +59,4 @@ Oxen is the tool we wish we had to abstract away the infrastructure and focus on ## 🐂 Why the name Oxen? -“Oxen” comes from the fact that we take care of the grunt work of the infrastructure for you. Oxen love will plow, maintain, and version your data and models like a good farmer tends to their fields 🌾. During the agricultural revolution, the oxen pulling plows offloaded work and helped people specialize and start working on other important societal tasks. Let Oxen take care of the heavy infrastructure work so you can focus on solving the higher-level problems that matter to your product. \ No newline at end of file +"Oxen" comes from the fact that we take care of the grunt work of the infrastructure for you. Oxen love will plow, maintain, and version your data and models like a good farmer tends to their fields 🌾. During the agricultural revolution, the oxen pulling plows offloaded work and helped people specialize and start working on other important societal tasks. Let Oxen take care of the heavy infrastructure work so you can focus on solving the higher-level problems that matter to your product. \ No newline at end of file diff --git a/getting-started/python.mdx b/getting-started/python.mdx deleted file mode 100644 index 5759d77..0000000 --- a/getting-started/python.mdx +++ /dev/null @@ -1,299 +0,0 @@ ---- -title: '🐍 Python' -description: 'Learn how to get started with the oxenai python package.' ---- - -## Install - -``` -pip install oxenai -``` - -## Clone Repository - -Clone a repository from the [Oxen Hub](https://oxen.ai) or a your own [oxen-server](/getting-started/oxen-server). Detailed documentation for the [clone](/python-api/clone) method can be found in the [API Documentation](/python-api). - -```python -import oxen -oxen.clone("ox/SpanishToEnglish") -``` - -This will create a directory called `SpanishToEnglish` in your current working directory and download the latest version of the repository. - -### Private Repositories - -Not all repositories are public. If you are trying to clone a private repository, you will need to [configure auth](/python-api/auth) before you can clone. - -If you try to clone a repository you do not have access to, and you have not configured auth, you will see the following error: - -```bash -ValueError: oxen authentication token not found, obtain one from your administrator and configure with: - -oxen config --auth -``` - -### Obtain Auth Token - -Before you can push to a remote repository, you must have permissions to do so. Permissions are handled through an `auth_token` that is passed in with the request. - -You can obtain an `auth_token` by creating an account on [Oxen.ai](https://oxen.ai) and going to your profile. - -Oxen.ai authentication key - -### Set Auth Token - -To set your auth token, you can either set it through the command line interface or directly in python. - - - -```python Python -from oxen.auth import config_auth -config_auth("YOUR_AUTH_TOKEN") -``` - -```bash CLI -oxen config --auth 'hub.oxen.ai' YOUR_AUTH_TOKEN -``` - - - -This will write the auth token to a file in `~/.config/oxen/auth_config.toml` for future use. If you set up your own [oxen-server](/getting-started/oxen-server) you can generate custom auth tokens there. - -## Setup User - -In order for Oxen to know who is committing and where to sync to by default, you must call [config_user](/python-api/user) and pass in the name and email you would like to use in your commit messages. - -```python -from oxen.user import config_user -config_user("YOUR NAME", "YOUR EMAIL") -``` - -This will save a file in `~/.config/oxen/user_config.toml` that contains your user configuration. - -## Initialize Local Repository - -If you are creating a new repository from scratch, you can initialize it with the [init](/python-api/init) method. - -We will be using a fictional repository called `CatsVsDogs` for this example. - -```python -import oxen -import os - -# Create an empty directory named CatsVsDogs -directory = "CatsVsDogs" -os.makedirs(directory) - -# Initialize the Oxen Repository -repo = oxen.init(directory) -``` - -This will create a `.oxen` directory to keep track of changes as you make them. - -## Load Existing Repository - -Use the [repo](/python-api/repo) class to interact with a repository that has already been initialized. - -```python -from oxen import Repo - -# Load the repository from the CatsVsDogs directory -repo = Repo("CatsVsDogs") -# Check the status of the repository -print(repo.status()) -``` - -## Add Files - -Now let's create a README.md file and [add](/python-api/repo#add) it to the local staging area. This will not commit the changes to the repository, but it will prepare them to be committed. - -```python -# ... continue from previous example - -# Create a README.md file -filename = os.path.join(repo.path, "README.md") -with open(filename, "w") as f: - f.write("# Cats vs. Dogs\n\nWhich is it? We will be using machine learning to find out!") - -# Add the README.md file to the staging area -repo.add(filename) - -# Confirm that the file has been staged -print(repo.status()) -``` - -## Commit Changes - -Now that we have added the README.md file to the staging area, we can commit the changes to the repository. - -```python -# ... continue from previous example - -# Commit the changes to the repository -repo.commit("Adding README.md") -``` - -## Diff Changes - -Oxen.ai has powerful diff tools built in that allow you to see the changes to files between commits, branches, and more. - -```python -result = oxen.diff("README.md") -print(result.get()) -``` - -To learn more about diffs checkout the [diff](/concepts/diffs) documentation or the [Python API Documentation](/python-api/diff/diff). - -## Push To Remote - -It's one thing to version your data locally, but where the real power comes in is when you can share your data with others. Oxen repositories can be pushed to a remote repository hosted on [Oxen Hub](https://oxen.ai) or your own [oxen-server](/getting-started/oxen-server). - -There are a few steps when pushing to a remote for the first time. - -1. [Create Remote](/python-api/remote_repo#create_repo) -2. [Point Local to Remote](/python-api/repo#set_remote) -3. [Push Changes](/python-api/repo#push) - -### Create Remote - -Before you can push to a remote repository, you must create it. This can be done with the [create_repo](/python-api/remote_repo#create_repo) method. - -```python -from oxen.remote_repo import create_repo - -# Create a remote repository -remote_name = "MyNamespace/CatsVsDogs" -remote_repo = create_repo(remote_name) -``` - -### Point Local to Remote - -Now that we have created the remote repository, we need to point our local repository to sync to it. This can be done with the [set_remote](/python-api/repo#set_remote) method. - -```python -from oxen import Repo - -# Load the local repository -repo = Repo("CatsVsDogs") - -# Point the local repository to the remote -repo.set_remote("origin", remote_repo.url()) -``` - -### Push Changes - -Now that we have created the remote repository and pointed our local repository to it, we can [push](/python-api/repo#push) our changes to the remote repository. - -```python -# Push the changes to the remote repository -repo.push() -``` - -### Full Push Example - -The end to end workflow from scratch looks like this: - -```python -from oxen import Repo -from oxen.remote_repo import create_repo -from oxen.auth import config_auth - -# 0. Load the local repository -repo = Repo("CatsVsDogs") - -# 1. Configure Authentication -config_auth("YOUR_AUTH_TOKEN") - -# 2. Create a remote repository -remote_name = "MyNamespace/CatsVsDogs" -repo = create_repo(remote_name) - -# 3. Point the local repository to the remote -repo.set_remote("origin", repo.url) - -# 4. Push the changes to the remote repository -repo.push() -``` - -## Pull Data - -Now that we have pushed our changes to the remote repository, we can [pull](/python-api/repo#pull) them down to another machine. - -```python -import oxen -import os - -repo_path = "CatsVsDogs" -if os.path.exists(repo_path): - # if you already have a local copy of the repository, you can load it - repo = oxen.Repo(repo_path) -else: - # if you don't have a local copy of the repository, you can clone it - repo = oxen.clone("ox/CatsVsDogs") - -# Pull the latest changes from the remote repository -repo.pull() -``` - -## OxenFS (fsspec Integration) - -OxenFS allows you to conveniently read and write files through a Pythonic file interface. - -```python -import oxen - -fs = oxen.OxenFS("openai", "gsm8k") -with fs.open("gsm8k_test.parquet") as f: - content = f.read() -``` - -It also integrates directly with third-party libraries like Pandas like this: -```python -df = pd.read_parquet("oxen://openai:gsm8k@main/gsm8k_test.parquet") -``` - -See the full documentation for [OxenFS](/python-api/oxen_fs). - -## Branching - -Branching is a powerful feature of Oxen that allows you to create a named version of your data without affecting the original version. This is useful when you want to experiment with your changes affecting the original version. - -### Create Branch - -To create a new branch, use the [Repo.checkout](/python-api/repo#checkout) method. - -```python -from oxen import Repo - -repo = Repo("CatsVsDogs") -repo.checkout("add-dogs", create=True) -``` - -This both creates the branch and checks it out (the command line equivalent of `oxen checkout -b add-dogs`). - -### List Branches - -To list all of the branches in a repository, use the [Repo.branches](/python-api/repo#branches) method. - -```python -from oxen import Repo - -repo = Repo("CatsVsDogs") -print(repo.branches()) -``` - -Output: - -``` -[Branch(name=add-dogs, commit_id=3168391af834ac18), Branch(name=main, commit_id=3168391af834ac18)] -``` - -As you can see there should be a `main` branch and a `add-dogs` branch, each tied to a commit id. The commit ids will be the same at this point, because the branches have not diverged in content. - -## Next Steps - -Now that you have learned the basics of Oxen, the rest of the workflow is very similar to git. You can dive deeper into the [API Documentation](/python-api) to learn more about the methods available to you. diff --git a/getting-started/versioning.mdx b/getting-started/versioning.mdx deleted file mode 100644 index 978a0f7..0000000 --- a/getting-started/versioning.mdx +++ /dev/null @@ -1,649 +0,0 @@ ---- -title: '💾 Version Control' -description: 'Oxen.ai is built on top of a blazing fast data version control system that allows you to version, branch, and share datasets, model weights, and experiments with your team.' ---- - -Oxen's [open source data version control system](https://github.com/Oxen-AI/Oxen) shines at workflows and data sizes where git or git-lfs fall short. The interface is inspired by git, so that it is easy to learn, but has a few core differences. Oxen is built from the ground up to handle large datasets with many files or large csvs, parquet files, or other large binary blobs like model weights. - -## Versioning 101 - -The first thing you need to know about Oxen.ai is that it has both remote and local workflows. Remote workflows allow you to add files directly to the remote without pulling any data locally. Say we wanted to add a file to a dataset like ImageNet with [1 Million Files](/features/performance), you do not want to wait to clone all the files locally just to add yours. - - - ```python Python - from oxen import RemoteRepo - - # Connect your client - repo = RemoteRepo("my-username/my-repo") - # Upload the image - repo.add("images/image_1_000_001.png") - # Commit to the main branch - repo.commit("Adding the 1,000,001st image to the dataset") - ``` - - -This is just one example of how Oxen.ai enables a more developer friendly workflow for large datasets. There are also optimizations under the hood such as parallel file transfer, scalable merkle trees, and data deduplication to make Oxen go brrr (or mooo?). - -## Client and Server - -The open source version control tools come with a server to sync data to and a client that can interact with data locally and remotely. The client and server share a common core library that is written in Rust and is used to quickly sync data between the two. - -The server exposes a REST API that can be used to interact with data. Oxen.ai's clients include a [command line interface](/getting-started/cli), as well as bindings for [Rust](https://github.com/Oxen-AI/Oxen) 🦀, [Python](/getting-started/python) 🐍, and [HTTP interfaces](/http-api) 🌎 to make it easy to integrate into your workflow. - -## Installation - -Oxen makes versioning your datasets as easy as versioning your code. You can install through homebrew or pip or from our [releases page](https://github.com/Oxen-AI/Oxen/releases). - - - -```bash CLI -brew install oxen -``` - -```bash Python -pip install oxenai -``` - - - -## Remote vs Local Workflow - -In the world of version control, there are two main paradigms: centralized and decentralized. Centralized version control systems allow you to have remote first workflows where you do not need to have a fully copy of the data on your local machine. Decentralized version control systems like git by default duplicate all the data to every node in your network. - -Oxen Remote and Local Workflow - -While the decentralized nature of git makes it easy to maintain full copies of the history across many machines, this is not practical for large datasets. Oxen was designed from the ground up to be able to seamlessly switch between local (decentralized) and remote (centralized) workflows. Only clone what you need, and contribute back to the remote repository when you are done. - -## Remote Workflow - -To get started with the remote workflow, you need to setup an `oxen-server`. Oxen.ai provides both an open source server and a hosted solution that can be used to sync data between your local machine and the cloud. To try the hosted solution, you can create a free account at [https://oxen.ai](https://oxen.ai). - -To learn how to setup the open source server, check out the [server documentation](/getting-started/oxen-server). - -### Remote Repository - -If a remote repository already exists, you simply have to pass in the namespace/name of the remote repository you want to connect to. - - - -```python Python -from oxen import RemoteRepo - -repo = RemoteRepo("ox/CatDogBBox") -``` - - - -This is a cheap operation that just sets up the pointer to the remote repository. It does not download any data. - - -### Create a Remote Repository - -If you do not already have a remote repository, you can create one directly from Pyhton. You may want to start with an empty remote repository and add your data later. - - - -```python Python -from oxen import RemoteRepo - -repo = RemoteRepo.create("my-user/my-repo-name") -``` - -```bash CLI -oxen create-remote --name my-user/my-repo-name -``` - - - -By default we add a `README.md` file to the repository with an initial commit. If you want to create an empty repository without adding a `README.md` you can pass `empty=True` to the `create` method. - - - -```python Python -from oxen import RemoteRepo - -repo = RemoteRepo.create("my-user/my-repo-name", empty=True) -``` - -```bash CLI -oxen create-remote --name my-user/my-repo-name --empty -``` - - - -The reason you may want to start with an empty repository is if you already started a local repository and want to push it to the remote repository. This local repository already has a commit history. When pushing to a remote, commit histories must match. Hence we need to start with an empty remote repository without any commits if we want to push a local repository with a commit history. - - -### Add Files - -You can add files to the remote repository by passing the path to the file and the destination directory. This will upload the file to the remote repository and stage it for commit. - -```python Python -from oxen import RemoteRepo -repo = RemoteRepo("ox/CatDogBBox") -repo.add("images/000000002754.jpg", dst="images/") -``` - -### Commit Changes - -You can commit changes to the remote repository by passing a message. - -```python Python -repo.commit("Adding the 1,000,001st image to the dataset") -``` - -### File Exploration - -To see the files in the remote repository you can use `ls`. - - - -```python Python -from oxen import RemoteRepo - -repo = RemoteRepo("ox/CatDogBBox") -print(repo.ls()) -``` - - - -To view a specific directory you can pass the directory name to the `ls` method. - -Note: the directories are paginated so you will need to use the `page_num` parameter to view the next page of results. -There are also `total_pages`, `page_number`, and `total_entries` attributes that give you information about the pagination. - - - -```python Python -from oxen import RemoteRepo - -repo = RemoteRepo("ox/CatDogBBox") -images_results = repo.ls("images", page_num=1, page_size=10) -print(images_results) -print(images_results.total_pages) -print(images_results.page_number) -print(images_results.total_entries) -``` - - - -### Downloading Data - -You can download individual files and folders if you do not need the entire data repository for your job. - - - -```bash CLI -oxen download ox/CatDogBBox annotations/test.csv -``` - -```python Python -from oxen import RemoteRepo -repo = RemoteRepo("ox/CatDogBBox") -repo.download("annotations/test.csv") -``` - -```bash cURL -# URL Format: https://hub.oxen.ai/api/repos/:namespace/:repo_name/file/:revision/:path -# :revision can be a branch name or commit hash -curl -X GET -H "Authorization: Bearer $TOKEN" \ - https://hub.oxen.ai/api/repos/ox/CatDogBBox/file/main/annotations/test.csv \ - -o ~/Downloads/test.csv -``` - - - -### Checkout a Branch - -If you have a data on a separate branch that you want to view you can checkout a branch by passing the branch name to the `checkout` method. - -```python Python -from oxen import RemoteRepo -repo = RemoteRepo("ox/CatDogBBox") -repo.checkout("my-branch-name") -print(repo.ls()) -``` - -### Create a New Branch - -The `checkout` method also allows you to create a new branch if the branch does not exist. - -```python Python -from oxen import RemoteRepo -repo = RemoteRepo("ox/CatDogBBox") -repo.checkout("my-new-branch-name", create=True) -print(repo.ls()) -``` - -### View Branches - -To see all the branches in the remote repository you can use the `branches` method. - -```python Python -from oxen import RemoteRepo -repo = RemoteRepo("ox/CatDogBBox") -print(repo.branches()) -``` - -### Workspaces - -Under the hood, the way that we enable remote collaboration is through a concept called a [workspace](/concepts/workspaces). A workspace can be thought of as a working copy of changes, that is stored on the remote server. Just like you can `add` files before committing locally, you can `add` files to a workspace on the remote server before committing. This allows you to build up a set of changes remotely before committing them in bulk. - - - -```python Python -from oxen import RemoteRepo -from oxen import Workspace - -repo = RemoteRepo("ox/CatDogBBox") -workspace = Workspace(repo, "add-images") -workspace.add("/path/to/image.png") -status = workspace.status() -print(status.added_files()) -workspace.commit("Adding the 1,000,001st image to the dataset") -``` - -```bash CLI -oxen workspace add image.png -w my-workspace-id -oxen workspace status -w my-workspace-id -oxen workspace commit -w my-workspace-id -m "Adding the 1,000,001st image to the dataset" -``` - - - -The `RemoteRepo.add` method is a shortcut for creating a workspace and adding files to it. It creates a ephemeral workspace and adds the files to it, and deletes the workspace after committing. - -To learn more about workspaces, check out the [workspaces documentation](/concepts/workspaces). - -### Connect Local to Remote - -Remote repositories are identified by a remote URL. This is the URL that you can use to clone the repository. - -```python Python -from oxen import RemoteRepo - -repo = RemoteRepo.create("my-user/my-repo-name", empty=True) -print(repo.url()) -``` - -You can use this URL to clone the repository. - -```bash Python -# Local Repository -from oxen import Repo -from oxen import RemoteRepo - -remote_repo = RemoteRepo.create("my-user/my-repo-name", empty=True) -repo_url = remote_repo.url() - -local_repo = Repo("/path/to/local/repo") -local_repo.clone(repo_url) -``` - -Or you can set the remote of a local repository to the remote repository. - -```bash Python -from oxen import Repo -from oxen import RemoteRepo - -remote_repo = RemoteRepo.create("my-user/my-repo-name", empty=True) -repo_url = remote_repo.url() - -local_repo = Repo("/path/to/local/repo") -remote_repo.set_remote("origin", remote_repo.url()) -``` - -## Local Workflow - -Local workflow looks a lot like git. The downside is that you have to duplicate all the data locally. The upside is that oxen is optimized to make local workflows fast. - -### Clone Dataset - -Clone your first Oxen repository from the [OxenHub](https://oxen.ai/explore). - - - -```bash CLI -oxen clone https://hub.oxen.ai/ox/CatDogBBox -``` - -```python Python -import oxen - -# Clone the repository -repo = oxen.clone("ox/CatDogBBox") -``` - - - -### Initialize User - -Each change you make will be associated with a name and email. Set them before you get started so you know who changed what. The user data is saved by default in `~/.config/oxen/user_config.toml`. - - - -```bash CLI -oxen config --name "Bessie Oxington" --email "bessie@yourcomany.com" -``` - -```python Python -from oxen.user import config_user -config_user("Bessie Oxington", "bessie@oxen.ai") -``` - - - -### Create Repository - -Initialize your first Oxen repository, and commit the first version of your data. - - - -```bash CLI -# Initialize the repository -oxen init -# Write data to a file -printf '%s\n' 'name,age' 'bob,12' 'jane,13' > people.csv -# Stage the data for commit -oxen add people.csv -# Commit the changes with a message -oxen commit -m "Adding my data" -``` - -```python Python -import os -from oxen import Repo - -# Instantiate a Repo object and create the repo directory -repo = Repo("/path/to/data", mkdir=True) -# Initialize the repository -repo.init() -# Write data to a file -data_path = os.path.join(repo.path, "people.csv") -with open(data_path, "w") as f: - f.write("name,age\nbob,12\njane,13") -# Stage the data for commit -repo.add(data_path) -# Commit the changes with a message -repo.commit("Adding my data") -``` - - - -### Version Your Data - -Once your data has been committed, you can always return to that version. - -Confidently overwrite the file, move the file, delete the file, it doesn't matter. Oxen will always have a copy of the data at the time of the previous commit. - -### Create Branch - -It is good practice to create a new branch for changes you make to your data. This will allow you to easily compare the parallel versions of your data over time. - - - -```bash CLI -# Checkout a branch named `modify-data` -oxen checkout -b modify-data -# Overwrite data in existing file -printf '%s\n' 'name,age' 'bob,12' 'jane,13' 'joe,14' > people.csv -``` - -```python Python -import os -from oxen import Repo - -repo = Repo("/path/to/data") -# Create a new branch called `modify-data` -repo.checkout("modify-data", create=True) -# Overwrite data in existing file -data_path = os.path.join(repo.path, "people.csv") -with open(data_path, "w") as f: - f.write("name,age\nbob,12\njane,13\njoe,14") -``` - - - -### Delete Branch - -Once finished with a branch, you can delete it. - - - -```bash CLI -# Checkout main branch locally -oxen checkout main -# Delete 'other_branch' locally -oxen branch -d new_branch # may need -D if branch is not merged into main -# Delete branch in remote repo -oxen push origin --delete new_branch -``` - -```python Python -import os -from oxen import Repo - -# Instantiate a Repo object -repo = Repo("/path/to/data") -# Checkout the main branch -repo.checkout("main") -# Delete new_branch -repo.branch('new_branch', delete=True) -# Delete remote branch -repo.push('origin', 'new_branch', delete=True) -``` - - - -### Diff Changes - -View the change you made with the `oxen diff` command. This will show you the changes you made to your data since the last commit. - -```bash CLI -oxen diff image_classification_data.csv -``` - -Output: - -``` -Column changes: - + label (str) - -Row changes: - Δ 1 (modified) - + 3 (added) - - 2 (removed) - -shape: (6, 7) -+-------------+-----+-----+-------+--------+-------------+-------------------+ -| file | x | y | width | height | label.right | .oxen.diff.status | -| --- | --- | --- | --- | --- | --- | --- | -| str | i64 | i64 | i64 | i64 | str | str | -+-------------+-----+-----+-------+--------+-------------+-------------------+ -| image_0.jpg | 0 | 0 | 10 | 10 | cat | modified | -| image_1.jpg | 1 | 2 | 10 | 20 | null | removed | -| image_1.jpg | 200 | 100 | 10 | 20 | dog | added | -| image_2.jpg | 4 | 10 | 20 | 20 | null | removed | -| image_3.jpg | 4 | 10 | 20 | 20 | dog | added | -| image_4.jpg | 10 | 10 | 10 | 10 | dog | added | -+-------------+-----+-----+-------+--------+-------------+-------------------+ -``` - -Once you [push](#push-data) you changes to [OxenHub](https://oxen.ai), you can view the changes you made in your commit history. - -

- oxen cli demo -

- -The diff command line tool is more powerful than it looks on the surface. Oxen has the ability to diff files of many formats, and the ability to specify keys are targets in tabular diffs to make it easier to see what changed. - -For advanced usage, check out the [full diff documentation](/concepts/diffs). - -### Restore Changes - -If you are not happy with the changes you made to your data, you can restore them to the previous commit with the `oxen restore` command. - - - -```bash CLI -oxen restore --source tables/people.csv -``` - - - -### Commit Changes - -Once you are happy with the changes you have made to your data, you can commit them to the repository with a new message. - - - -```bash CLI -oxen add people.csv -oxen commit -m "Adding Joe to the dataset" -``` - -```python Python -from oxen import Repo - -repo = Repo("/path/to/data") -# Stage the data for commit -data_path = os.path.join(repo.path, "people.csv") -repo.add(data_path) -# Commit the changes with a message -repo.commit("Adding Joe to the dataset") -``` - - - -### View History - -To see the commit history of your repository, you can use the `oxen log` command. - - - -```bash CLI -oxen log -``` - -```python Python -from oxen import Repo - -# Instantiate a Repo object -repo = Repo("/path/to/data") -# Get the commit history -commits = repo.log() -``` - - - -### Checkout Main Branch - -Once you are done making changes to your data, you can return to the main branch with the `oxen checkout` command. - -Never fear, the file now has now been reverted to the inital commit again, but your changes will be saved in the branch you created. - - - -```bash CLI -oxen checkout main -``` - -```python Python -from oxen import Repo - -# Instantiate a Repo object -repo = Repo("/path/to/data") -# Checkout the main branch -repo.checkout("main") -``` - - - -### List Branches - -To see the branches in your repository, you can use the `oxen branch` command. - - - -```bash CLI -oxen branch -``` - -```python Python -from oxen import Repo - -# Instantiate a Repo object -repo = Repo("/path/to/data") -# Get the branches -print(repo.branches()) -``` - - - -### Push Data - -Once your data has been committed locally, you can sync it to the OxenHub. - -OxenHub is a free service that allows you to collaborate on your data in the cloud. You can create a free account at [https://oxen.ai](https://oxen.ai). - - - -```bash CLI -# Go create repo at https://oxen.ai -# ... -oxen config --set-remote origin https://hub.oxen.ai// -oxen config --auth hub.oxen.ai -oxen push origin main -# to push your other branch simply change the branch name from `main` to `modify-data` -``` - -```python Python -# Go create repo at https://oxen.ai -# ... -# Set where to push the data to (replace and with your remote) -repo.set_remote("origin", "https://hub.oxen.ai//") -# Set your auth token (defaults to hub.oxen.ai host) -oxen.auth.config_auth("YOUR_AUTH_TOKEN") -# Push the changes to the remote -repo.push() -``` - - - -### Clone Data - -Clone your data faster than ever before. Oxen has been optimized to the core to make pulling large datasets as fast as possible. - - - -```bash CLI -oxen clone https://hub.oxen.ai/ox/CatDogBBox -``` - -```python Python -from oxen import Repo - -# Clone the repository -repo = Repo("/path/to/dst") -repo = Repo.clone("https://hub.oxen.ai/ox/CatDogBBox") -``` - - - -### Pull Changes - -Only pull the changes you need. Oxen will only pull the files that have changed since the last time you pulled. - - - -```bash CLI -oxen pull origin main -``` - -```python Python -from oxen import Repo -repo = Repo("/path/to/repo") -repo.pull() -``` - - diff --git a/features/workspaces.mdx b/getting-started/workspaces.mdx similarity index 73% rename from features/workspaces.mdx rename to getting-started/workspaces.mdx index de3f68e..c97cf5e 100644 --- a/features/workspaces.mdx +++ b/getting-started/workspaces.mdx @@ -1,64 +1,29 @@ --- -title: '🧩 Partial Clones' -description: 'Oxen allows you to interact with your data without having to download the entire dataset locally.' +title: '🌎 Remote Workspaces' +description: 'Oxen allows you to interact with your data on the server without downloading it or committing it right away.' --- -Say you are working with a dataset with 100GB of images, you may want to contribute back to the dataset, or only need a small subset of the data to run a model. In these cases, it doesn't make sense to download the entire dataset locally. Instead, you can use partial clones. +Say you are working in a repository with over 100GB of data. For a variety of reasons, you may only want to interact with a subset of this data. It is expensive to clone the whole history. Maybe you want to add a single row to a dataset or upload a set of 10 images. You may only need a small subset of the data downloaded to view it locally before sending back up changes. In these cases, it doesn't make sense to download the entire repository history locally. Instead, you can use a combination of tools that oxen provides to only interact with the data you need. Oxen has three main ways of interacting with subsets of your data. -1) **Partial Clones** - Clone a subtree of the data in your repository to a local working directory. -2) **Download Read Only** - Download a read only copy of the subset to your local machine. -3) **Remote Workspaces** - Interact with your data all server side, no files are downloaded locally. +1) **Remote Workspaces** - Interact with your data all server side, no files are downloaded locally. +2) **Partial Clones** - Clone a subtree of the data in your repository to a local working directory. +3) **Download Read Only** - Download a read only copy of the subset to your local machine. Each of these methods has it's own benefits and trade offs. We will go over each of them in more detail below. - -## Partial Clones - -The first command line parameter you should be aware of is the `--filter` flag. This flag is inclusive for the paths you want to clone. - -```bash -oxen clone https://hub.oxen.ai/ox/Flowers --filter "images/roses" -``` - -This will clone all the data under the `images/roses` directory into a local working directory. Under the hood, it also creates a `.oxen` directory which contains the merkle tree for the cloned data, and content addressable copies of each file in the subtree. - -![Partial Clone](/images/PartialClones.png) - -You can also specify a depth parameter to control how deep the clone is. If you have many nested subdirectories, you can use the `--depth` flag to limit how deep the clone goes. - -```bash -oxen clone https://hub.oxen.ai/ox/Flowers --filter "." --depth 1 -``` - -Note that full clones and partial clones end up using ~2x the storage. This is because the clone contains the merkle tree for the cloned data, and content addressable copies of each file in the subtree. - -## Download Read Only - -If you have no intention of making any changes to the data, the easiest way to interact with a subset is to download a read only copy. This can be done with the `oxen download` command. - -```bash -oxen download ox/Flowers images/roses -``` - -Under the hood, this command does not download any of the history, content addressed version files, or other metadata. It simply downloads the data unpacks it to a local directory. - -![Download Read Only](/images/partial-download.jpg) - -This is the most efficient way to download data if you are simply going to read the data or throw it away later. - ## Remote Workspaces -You may not need a local copy of the data at all. If you are working with a remote dataset, you can interact with it all server side. +The concept of a remote workspace in Oxen is akin to having a working directory on the remote server where you can stage changes before committing. This allows you to perform multiple write operations without requiring a commit message for each one. Data written to workspaces is not ephemeral, as it does survive a server restart, but it is **not committed** to the history of the repository, so may be deleted without a trace. ![Remote Workspace](/images/RemoteWorkspaces.png) -Conceptually you can think of a workspace as a server side working directory where you can stage changes before committing them. Under the hood, a workspace is tied to a commit id. This means whatever changes you make will always be with respect to the commit you created the workspace off of. +Under the hood, a workspace is tied to a commit id and an origin branch. This means whatever changes you make will always be with respect to the commit you created the workspace off of. If the remote branch advances, you may have to resolve merge conflicts before committing. ### Instantiating a Workspace -A workspace is created off of a `RemoteRepo` and a branch name. The branch name is just a convenience for the user to create a workspace on the underlying commit id. +A workspace is created off of a remote `repository` and a `branch` name. The branch name is just a convenience for the user to create a workspace on the underlying `commit` id. @@ -190,3 +155,38 @@ oxen workspace commit -m "adding an image" -w my-workspace-id -b add-images 🎉 You have now committed data to the remote branch without cloning the full repo. Note: If the remote branch cannot be merged cleanly, the remote commit will fail, and you will have to resolve the merge conflicts with some more advanced commands which we will cover later. + +## Partial Clones + +The first command line parameter you should be aware of is the `--filter` flag. This flag is inclusive for the paths you want to clone. + +```bash +oxen clone https://hub.oxen.ai/ox/Flowers --filter "images/roses" +``` + +This will clone all the data under the `images/roses` directory into a local working directory. Under the hood, it also creates a `.oxen` directory which contains the merkle tree for the cloned data, and content addressable copies of each file in the subtree. + +![Partial Clone](/images/PartialClones.png) + +You can also specify a depth parameter to control how deep the clone is. If you have many nested subdirectories, you can use the `--depth` flag to limit how deep the clone goes. + +```bash +oxen clone https://hub.oxen.ai/ox/Flowers --filter "." --depth 1 +``` + +Note that full clones and partial clones end up using ~2x the storage. This is because the clone contains the merkle tree for the cloned data, and content addressable copies of each file in the subtree. + +## Download Read Only + +If you have no intention of making any changes to the data, the easiest way to interact with a subset is to download a read only copy. This can be done with the `oxen download` command. + +```bash +oxen download ox/Flowers images/roses +``` + +Under the hood, this command does not download any of the history, content addressed version files, or other metadata. It simply downloads the data unpacks it to a local directory. + +![Download Read Only](/images/partial-download.jpg) + +This is the most efficient way to download data if you are simply going to read the data or throw it away later. + diff --git a/http-api/example.mdx b/http-api/example.mdx index 31694a8..fbc94a6 100644 --- a/http-api/example.mdx +++ b/http-api/example.mdx @@ -579,13 +579,13 @@ upload_paths = { ## Next Steps -- [Download Files](/http-api/files/download_file) - Learn how to retrieve files -- [List Files](/http-api/files/list_entries) - Browse repository contents -- [Workspaces](/http-api/workspaces/list_workspaces) - Work with remote data without downloading -- [Branches](/http-api/branches/get_branches) - Manage repository branches +- [Download Files](/http-api/files/download-file) - Learn how to retrieve files +- [List Files](/http-api/directories/list-directory-contents) - Browse repository contents +- [Workspaces](/http-api/workspaces/list-workspaces) - Work with remote data without downloading +- [Branches](/http-api/branches/list-all-branches) - Manage repository branches ## Related Resources - [HTTP API Overview](/http-api) - [Python SDK](/python-api/remote_repo) -- [Authentication Guide](/getting-started/authentication) +- [Authentication Guide](/getting-started/auth) diff --git a/images/fine_tuning/bloxy-fine-tuning.png b/images/fine_tuning/bloxy-fine-tuning.png new file mode 100644 index 0000000..0cc7d5f Binary files /dev/null and b/images/fine_tuning/bloxy-fine-tuning.png differ diff --git a/images/platform_oxen.png b/images/platform_oxen.png new file mode 100644 index 0000000..3cc309c Binary files /dev/null and b/images/platform_oxen.png differ diff --git a/inference-api/quickstart/video-generation.mdx b/inference-api/quickstart/video-generation.mdx index 99be641..f520964 100644 --- a/inference-api/quickstart/video-generation.mdx +++ b/inference-api/quickstart/video-generation.mdx @@ -114,5 +114,5 @@ curl -H "Authorization: Bearer YOUR_API_KEY" \ ## What's Next - [Video Generation Reference](/inference-api/reference/video_generation) for the full parameter list -- [Kling O3 Pro: Reference to Video](/inference-api/reference/models/kling_o3_pro_reference_to_video) for multi-shot video with reference images +- [Kling O3 Pro: Reference to Video](/inference-api/reference/models/kling-video-o3-pro-reference-to-video) for multi-shot video with reference images - [Async Queue Reference](/inference-api/reference/async_queue) for batch generation diff --git a/inference-api/reference/models/claude-opus-4-1-20250805.mdx b/inference-api/reference/models/claude-opus-4-1-20250805.mdx index 30a6766..3614062 100644 --- a/inference-api/reference/models/claude-opus-4-1-20250805.mdx +++ b/inference-api/reference/models/claude-opus-4-1-20250805.mdx @@ -144,4 +144,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/claude-opus-4-20250514.mdx b/inference-api/reference/models/claude-opus-4-20250514.mdx index 33bbfb5..9f967cd 100644 --- a/inference-api/reference/models/claude-opus-4-20250514.mdx +++ b/inference-api/reference/models/claude-opus-4-20250514.mdx @@ -142,4 +142,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/claude-opus-4-5-20251101.mdx b/inference-api/reference/models/claude-opus-4-5-20251101.mdx index 1513dec..0b6f9d5 100644 --- a/inference-api/reference/models/claude-opus-4-5-20251101.mdx +++ b/inference-api/reference/models/claude-opus-4-5-20251101.mdx @@ -144,4 +144,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/claude-opus-4-6.mdx b/inference-api/reference/models/claude-opus-4-6.mdx index 62cf73d..8544dad 100644 --- a/inference-api/reference/models/claude-opus-4-6.mdx +++ b/inference-api/reference/models/claude-opus-4-6.mdx @@ -142,4 +142,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/claude-opus-4-7.mdx b/inference-api/reference/models/claude-opus-4-7.mdx index 4257f78..2c5d5dc 100644 --- a/inference-api/reference/models/claude-opus-4-7.mdx +++ b/inference-api/reference/models/claude-opus-4-7.mdx @@ -143,4 +143,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/claude-sonnet-4-20250514.mdx b/inference-api/reference/models/claude-sonnet-4-20250514.mdx index 813b8e2..0469444 100644 --- a/inference-api/reference/models/claude-sonnet-4-20250514.mdx +++ b/inference-api/reference/models/claude-sonnet-4-20250514.mdx @@ -142,4 +142,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/claude-sonnet-4-5.mdx b/inference-api/reference/models/claude-sonnet-4-5.mdx index 74cb8f7..358f3cf 100644 --- a/inference-api/reference/models/claude-sonnet-4-5.mdx +++ b/inference-api/reference/models/claude-sonnet-4-5.mdx @@ -130,4 +130,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/claude-sonnet-4-6.mdx b/inference-api/reference/models/claude-sonnet-4-6.mdx index c1cb255..33f3f3d 100644 --- a/inference-api/reference/models/claude-sonnet-4-6.mdx +++ b/inference-api/reference/models/claude-sonnet-4-6.mdx @@ -142,4 +142,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/deepseek-v4-flash.mdx b/inference-api/reference/models/deepseek-v4-flash.mdx index c90c936..5db8276 100644 --- a/inference-api/reference/models/deepseek-v4-flash.mdx +++ b/inference-api/reference/models/deepseek-v4-flash.mdx @@ -194,4 +194,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/deepseek-v4-pro.mdx b/inference-api/reference/models/deepseek-v4-pro.mdx index 97c48a4..0dea3d7 100644 --- a/inference-api/reference/models/deepseek-v4-pro.mdx +++ b/inference-api/reference/models/deepseek-v4-pro.mdx @@ -194,4 +194,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gemini-2-0-flash-001.mdx b/inference-api/reference/models/gemini-2-0-flash-001.mdx index f99b0b2..54315a1 100644 --- a/inference-api/reference/models/gemini-2-0-flash-001.mdx +++ b/inference-api/reference/models/gemini-2-0-flash-001.mdx @@ -199,4 +199,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gemini-2-5-flash-lite-preview-09-2025.mdx b/inference-api/reference/models/gemini-2-5-flash-lite-preview-09-2025.mdx index bf8bb3b..65daa49 100644 --- a/inference-api/reference/models/gemini-2-5-flash-lite-preview-09-2025.mdx +++ b/inference-api/reference/models/gemini-2-5-flash-lite-preview-09-2025.mdx @@ -197,4 +197,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gemini-2-5-flash.mdx b/inference-api/reference/models/gemini-2-5-flash.mdx index 6bd9de0..a2eded8 100644 --- a/inference-api/reference/models/gemini-2-5-flash.mdx +++ b/inference-api/reference/models/gemini-2-5-flash.mdx @@ -198,4 +198,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gemini-2-5-pro.mdx b/inference-api/reference/models/gemini-2-5-pro.mdx index d5eebd9..2c9fc06 100644 --- a/inference-api/reference/models/gemini-2-5-pro.mdx +++ b/inference-api/reference/models/gemini-2-5-pro.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gemini-3-1-flash-lite-preview.mdx b/inference-api/reference/models/gemini-3-1-flash-lite-preview.mdx index c20b86c..b4f7c68 100644 --- a/inference-api/reference/models/gemini-3-1-flash-lite-preview.mdx +++ b/inference-api/reference/models/gemini-3-1-flash-lite-preview.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gemini-3-1-pro-preview.mdx b/inference-api/reference/models/gemini-3-1-pro-preview.mdx index 6da85b9..d1f07b4 100644 --- a/inference-api/reference/models/gemini-3-1-pro-preview.mdx +++ b/inference-api/reference/models/gemini-3-1-pro-preview.mdx @@ -74,4 +74,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gemini-3-flash-preview.mdx b/inference-api/reference/models/gemini-3-flash-preview.mdx index 837a305..da2197c 100644 --- a/inference-api/reference/models/gemini-3-flash-preview.mdx +++ b/inference-api/reference/models/gemini-3-flash-preview.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gemma-3-27b-it.mdx b/inference-api/reference/models/gemma-3-27b-it.mdx index 2890dd4..d20aa87 100644 --- a/inference-api/reference/models/gemma-3-27b-it.mdx +++ b/inference-api/reference/models/gemma-3-27b-it.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gemma-4-31b-it.mdx b/inference-api/reference/models/gemma-4-31b-it.mdx index 2850494..e4da745 100644 --- a/inference-api/reference/models/gemma-4-31b-it.mdx +++ b/inference-api/reference/models/gemma-4-31b-it.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-4-1-2025-04-14.mdx b/inference-api/reference/models/gpt-4-1-2025-04-14.mdx index 2474b92..fa64971 100644 --- a/inference-api/reference/models/gpt-4-1-2025-04-14.mdx +++ b/inference-api/reference/models/gpt-4-1-2025-04-14.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-4-1-mini-2025-04-14.mdx b/inference-api/reference/models/gpt-4-1-mini-2025-04-14.mdx index 8b737a7..84b50a1 100644 --- a/inference-api/reference/models/gpt-4-1-mini-2025-04-14.mdx +++ b/inference-api/reference/models/gpt-4-1-mini-2025-04-14.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-4-1-nano-2025-04-14.mdx b/inference-api/reference/models/gpt-4-1-nano-2025-04-14.mdx index f01448c..5687909 100644 --- a/inference-api/reference/models/gpt-4-1-nano-2025-04-14.mdx +++ b/inference-api/reference/models/gpt-4-1-nano-2025-04-14.mdx @@ -202,4 +202,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-4o-mini.mdx b/inference-api/reference/models/gpt-4o-mini.mdx index 11a321a..1b70a46 100644 --- a/inference-api/reference/models/gpt-4o-mini.mdx +++ b/inference-api/reference/models/gpt-4o-mini.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-4o.mdx b/inference-api/reference/models/gpt-4o.mdx index 7dd2442..27ede67 100644 --- a/inference-api/reference/models/gpt-4o.mdx +++ b/inference-api/reference/models/gpt-4o.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-1-2025-11-13.mdx b/inference-api/reference/models/gpt-5-1-2025-11-13.mdx index 1e15c0d..64dd130 100644 --- a/inference-api/reference/models/gpt-5-1-2025-11-13.mdx +++ b/inference-api/reference/models/gpt-5-1-2025-11-13.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-2-2025-12-11.mdx b/inference-api/reference/models/gpt-5-2-2025-12-11.mdx index 6d816b4..2f7d0fc 100644 --- a/inference-api/reference/models/gpt-5-2-2025-12-11.mdx +++ b/inference-api/reference/models/gpt-5-2-2025-12-11.mdx @@ -199,4 +199,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-2-chat-latest.mdx b/inference-api/reference/models/gpt-5-2-chat-latest.mdx index 01d78da..945b111 100644 --- a/inference-api/reference/models/gpt-5-2-chat-latest.mdx +++ b/inference-api/reference/models/gpt-5-2-chat-latest.mdx @@ -199,4 +199,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-2025-08-07.mdx b/inference-api/reference/models/gpt-5-2025-08-07.mdx index 4931d91..a969a3c 100644 --- a/inference-api/reference/models/gpt-5-2025-08-07.mdx +++ b/inference-api/reference/models/gpt-5-2025-08-07.mdx @@ -201,4 +201,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-3-chat-latest.mdx b/inference-api/reference/models/gpt-5-3-chat-latest.mdx index 47a183b..ae3e0d4 100644 --- a/inference-api/reference/models/gpt-5-3-chat-latest.mdx +++ b/inference-api/reference/models/gpt-5-3-chat-latest.mdx @@ -199,4 +199,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-4-2026-03-05.mdx b/inference-api/reference/models/gpt-5-4-2026-03-05.mdx index 8f36dd1..416fbef 100644 --- a/inference-api/reference/models/gpt-5-4-2026-03-05.mdx +++ b/inference-api/reference/models/gpt-5-4-2026-03-05.mdx @@ -201,4 +201,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-4-mini.mdx b/inference-api/reference/models/gpt-5-4-mini.mdx index c0792d9..7aa4cba 100644 --- a/inference-api/reference/models/gpt-5-4-mini.mdx +++ b/inference-api/reference/models/gpt-5-4-mini.mdx @@ -201,4 +201,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-5-2026-04-23.mdx b/inference-api/reference/models/gpt-5-5-2026-04-23.mdx index 233cfe4..f10e2bb 100644 --- a/inference-api/reference/models/gpt-5-5-2026-04-23.mdx +++ b/inference-api/reference/models/gpt-5-5-2026-04-23.mdx @@ -201,4 +201,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-5-pro-2026-04-23.mdx b/inference-api/reference/models/gpt-5-5-pro-2026-04-23.mdx index 0626bfd..c2700d6 100644 --- a/inference-api/reference/models/gpt-5-5-pro-2026-04-23.mdx +++ b/inference-api/reference/models/gpt-5-5-pro-2026-04-23.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-mini.mdx b/inference-api/reference/models/gpt-5-mini.mdx index d53d39b..99f7b9e 100644 --- a/inference-api/reference/models/gpt-5-mini.mdx +++ b/inference-api/reference/models/gpt-5-mini.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-5-nano.mdx b/inference-api/reference/models/gpt-5-nano.mdx index 2c66098..e23f26c 100644 --- a/inference-api/reference/models/gpt-5-nano.mdx +++ b/inference-api/reference/models/gpt-5-nano.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/gpt-oss-120b.mdx b/inference-api/reference/models/gpt-oss-120b.mdx index bc8e632..87ecd77 100644 --- a/inference-api/reference/models/gpt-oss-120b.mdx +++ b/inference-api/reference/models/gpt-oss-120b.mdx @@ -202,4 +202,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/ministral-3b-latest.mdx b/inference-api/reference/models/ministral-3b-latest.mdx index b7b4d57..bf7739e 100644 --- a/inference-api/reference/models/ministral-3b-latest.mdx +++ b/inference-api/reference/models/ministral-3b-latest.mdx @@ -194,4 +194,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/ministral-8b-latest.mdx b/inference-api/reference/models/ministral-8b-latest.mdx index dab0c0e..30691d6 100644 --- a/inference-api/reference/models/ministral-8b-latest.mdx +++ b/inference-api/reference/models/ministral-8b-latest.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/mistral-large-2407.mdx b/inference-api/reference/models/mistral-large-2407.mdx index 2a96a7d..de1343c 100644 --- a/inference-api/reference/models/mistral-large-2407.mdx +++ b/inference-api/reference/models/mistral-large-2407.mdx @@ -194,4 +194,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/mistral-small-2503.mdx b/inference-api/reference/models/mistral-small-2503.mdx index 6e54066..5a3d15b 100644 --- a/inference-api/reference/models/mistral-small-2503.mdx +++ b/inference-api/reference/models/mistral-small-2503.mdx @@ -199,4 +199,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/moonshotai-kimi-k2-5.mdx b/inference-api/reference/models/moonshotai-kimi-k2-5.mdx index bcea468..90b6620 100644 --- a/inference-api/reference/models/moonshotai-kimi-k2-5.mdx +++ b/inference-api/reference/models/moonshotai-kimi-k2-5.mdx @@ -199,4 +199,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/nvidia-nemotron-120b-a12b.mdx b/inference-api/reference/models/nvidia-nemotron-120b-a12b.mdx index 5c33c0b..d33dce2 100644 --- a/inference-api/reference/models/nvidia-nemotron-120b-a12b.mdx +++ b/inference-api/reference/models/nvidia-nemotron-120b-a12b.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/o1.mdx b/inference-api/reference/models/o1.mdx index aafc35f..14e8e36 100644 --- a/inference-api/reference/models/o1.mdx +++ b/inference-api/reference/models/o1.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/o3-2025-04-16.mdx b/inference-api/reference/models/o3-2025-04-16.mdx index f4e2769..5251c3a 100644 --- a/inference-api/reference/models/o3-2025-04-16.mdx +++ b/inference-api/reference/models/o3-2025-04-16.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/o3-mini.mdx b/inference-api/reference/models/o3-mini.mdx index c34ff2c..e8d218f 100644 --- a/inference-api/reference/models/o3-mini.mdx +++ b/inference-api/reference/models/o3-mini.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/o4-mini-2025-04-16.mdx b/inference-api/reference/models/o4-mini-2025-04-16.mdx index d38194c..ca0bbc7 100644 --- a/inference-api/reference/models/o4-mini-2025-04-16.mdx +++ b/inference-api/reference/models/o4-mini-2025-04-16.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/open-mistral-7b.mdx b/inference-api/reference/models/open-mistral-7b.mdx index f9fb9a6..bce67d0 100644 --- a/inference-api/reference/models/open-mistral-7b.mdx +++ b/inference-api/reference/models/open-mistral-7b.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/open-mixtral-8x22b.mdx b/inference-api/reference/models/open-mixtral-8x22b.mdx index 02c9f6c..67685e1 100644 --- a/inference-api/reference/models/open-mixtral-8x22b.mdx +++ b/inference-api/reference/models/open-mixtral-8x22b.mdx @@ -197,4 +197,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/open-mixtral-8x7b.mdx b/inference-api/reference/models/open-mixtral-8x7b.mdx index 851611e..46bdcb2 100644 --- a/inference-api/reference/models/open-mixtral-8x7b.mdx +++ b/inference-api/reference/models/open-mixtral-8x7b.mdx @@ -195,4 +195,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/openai-gpt-oss-20b.mdx b/inference-api/reference/models/openai-gpt-oss-20b.mdx index 851d177..e0dd22c 100644 --- a/inference-api/reference/models/openai-gpt-oss-20b.mdx +++ b/inference-api/reference/models/openai-gpt-oss-20b.mdx @@ -204,4 +204,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/pixtral-12b.mdx b/inference-api/reference/models/pixtral-12b.mdx index 0ef59e7..429797d 100644 --- a/inference-api/reference/models/pixtral-12b.mdx +++ b/inference-api/reference/models/pixtral-12b.mdx @@ -195,4 +195,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/qwen3-6-plus.mdx b/inference-api/reference/models/qwen3-6-plus.mdx index af69acb..3ad247a 100644 --- a/inference-api/reference/models/qwen3-6-plus.mdx +++ b/inference-api/reference/models/qwen3-6-plus.mdx @@ -188,4 +188,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/qwen3-vl-2b-instruct.mdx b/inference-api/reference/models/qwen3-vl-2b-instruct.mdx index 5bf9db1..36c0b69 100644 --- a/inference-api/reference/models/qwen3-vl-2b-instruct.mdx +++ b/inference-api/reference/models/qwen3-vl-2b-instruct.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/qwen3-vl-4b-instruct.mdx b/inference-api/reference/models/qwen3-vl-4b-instruct.mdx index 46ecdc1..70eaedf 100644 --- a/inference-api/reference/models/qwen3-vl-4b-instruct.mdx +++ b/inference-api/reference/models/qwen3-vl-4b-instruct.mdx @@ -198,4 +198,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/qwen3-vl-8b-instruct.mdx b/inference-api/reference/models/qwen3-vl-8b-instruct.mdx index 1bcd851..f55697e 100644 --- a/inference-api/reference/models/qwen3-vl-8b-instruct.mdx +++ b/inference-api/reference/models/qwen3-vl-8b-instruct.mdx @@ -184,4 +184,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/sonar-deep-research.mdx b/inference-api/reference/models/sonar-deep-research.mdx index 909dc7d..723e7d7 100644 --- a/inference-api/reference/models/sonar-deep-research.mdx +++ b/inference-api/reference/models/sonar-deep-research.mdx @@ -198,4 +198,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/sonar-pro.mdx b/inference-api/reference/models/sonar-pro.mdx index 562c5bd..668e1e5 100644 --- a/inference-api/reference/models/sonar-pro.mdx +++ b/inference-api/reference/models/sonar-pro.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/sonar-reasoning-pro.mdx b/inference-api/reference/models/sonar-reasoning-pro.mdx index e9a6939..0f199f3 100644 --- a/inference-api/reference/models/sonar-reasoning-pro.mdx +++ b/inference-api/reference/models/sonar-reasoning-pro.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/sonar.mdx b/inference-api/reference/models/sonar.mdx index 34c2517..ed49ad9 100644 --- a/inference-api/reference/models/sonar.mdx +++ b/inference-api/reference/models/sonar.mdx @@ -200,4 +200,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/models/zai-org-glm-5.mdx b/inference-api/reference/models/zai-org-glm-5.mdx index 0a4a147..b463f64 100644 --- a/inference-api/reference/models/zai-org-glm-5.mdx +++ b/inference-api/reference/models/zai-org-glm-5.mdx @@ -196,4 +196,4 @@ curl -H "Authorization: Bearer $OXEN_API_KEY" https://hub.oxen.ai/api/ai/models/ ## Request parameters -This model follows the standard OpenAI chat completions request body. See the [chat completions reference](../inference-api.mdx) for the full parameter list. +This model follows the standard OpenAI chat completions request body. See the [chat completions reference](/inference-api/reference/chat_completions) for the full parameter list. diff --git a/inference-api/reference/video_generation.mdx b/inference-api/reference/video_generation.mdx index bfeaa0e..c1f87dd 100644 --- a/inference-api/reference/video_generation.mdx +++ b/inference-api/reference/video_generation.mdx @@ -19,7 +19,7 @@ For long-running or batch generation, consider using the [async queue](/inferenc |---|---|---|---|---| | `model` | string | **yes** | -- | Video model name (e.g. `kling-video-v2-6-pro-text-to-video`, `kling-video-o3-pro-reference-to-video`) | | `prompt` | string | **one of** | -- | Text prompt. Use this or `multi_prompt`, not both. | -| `multi_prompt` | array | **one of** | -- | Multi-shot prompts with per-shot duration. See [Kling O3 Pro reference](/inference-api/reference/models/kling_o3_pro_reference_to_video) for details. | +| `multi_prompt` | array | **one of** | -- | Multi-shot prompts with per-shot duration. See [Kling O3 Pro reference](/inference-api/reference/models/kling-video-o3-pro-reference-to-video) for details. | | `duration` | integer | no | 5 | Video duration in seconds (when using `prompt`). | | `aspect_ratio` | string | no | `"16:9"` | Aspect ratio (e.g. `"16:9"`, `"9:16"`, `"1:1"`). | | `input_image` | string/array | no | -- | Reference image(s) for image-to-video or reference-to-video models. | diff --git a/mint.json b/mint.json deleted file mode 100644 index 9042d3a..0000000 --- a/mint.json +++ /dev/null @@ -1,255 +0,0 @@ -{ - "$schema": "https://mintlify.com/schema.json", - "name": "Oxen.ai", - "logo": { - "dark": "/logo/dark.svg", - "light": "/logo/light.svg" - }, - "favicon": "/favicon.png", - "colors": { - "primary": "#0E87CB", - "light": "#0E87CB", - "dark": "#161616", - "background": { - "dark": "#0A0A0A" - }, - "anchors": { - "from": "#0E87CB", - "to": "#0E87CB" - } - }, - "topbarLinks": [ - { - "name": "Support", - "url": "https://discord.com/invite/s3tBEn7Ptg" - } - ], - "topbarCtaButton": { - "name": "Sign Up", - "url": "https://oxen.ai/register" - }, - "tabs": [ - { - "name": "Python API", - "url": "python-api" - }, - { - "name": "Repository API", - "url": "http-api", - "openapi": "https://dev.hub.oxen.ai/api/_spec/oxen_server_openapi.json" - }, - { - "name": "Fine-Tuning API", - "url": "fine-tuning-api", - "openapi": "https://dev.hub.oxen.ai/api/_spec/oxen_hub_api.json" - }, - { - "name": "Inference API", - "url": "inference-api" - } - ], - "anchors": [ - { - "name": "Documentation", - "icon": "book-open-cover", - "url": "https://docs.oxen.ai" - }, - { - "name": "Blog", - "icon": "newspaper", - "url": "https://blog.oxen.ai" - }, - { - "name": "GitHub", - "icon": "github", - "url": "https://github.com/Oxen-AI/Oxen" - } - ], - "navigation": [ - { - "group": "Get Started", - "pages": [ - "getting-started/intro", - { - "group": "⚡️ Models", - "pages": [ - "getting-started/inference", - "examples/inference/chat_completions", - "examples/inference/vision_language_models", - "examples/inference/image_generation", - "examples/inference/image_editing", - "examples/inference/video_generation" - ] - }, - { - "group": "🛠️ Fine-Tuning", - "pages": [ - "getting-started/fine-tuning", - "examples/fine-tuning/text_generation", - "examples/fine-tuning/chat_completions", - "examples/fine-tuning/image_understanding", - "examples/fine-tuning/image_generation", - "examples/fine-tuning/image_editing", - "examples/fine-tuning/video_generation" - ] - }, - "getting-started/datasets", - "getting-started/batch_inference", - "getting-started/versioning", - "features/deploy_from_directory" - ] - }, - { - "group": "Developer Tools", - "pages": [ - "getting-started/install", - { - "group": "💻 Command Line Interface", - "pages": [ - "getting-started/command-line/config", - "getting-started/command-line/import_data", - "getting-started/command-line/local_development", - "getting-started/command-line/push_changes", - "getting-started/command-line/dev_tools" - ] - }, - "getting-started/python", - "getting-started/oxen-server", - "features/performance" - ] - }, - { - "group": "Other Concepts", - "pages": [ - "concepts/diffs", - "concepts/file_metadata", - "concepts/workspaces" - ] - }, - { - "group": "Release Notes", - "pages": [ - "concepts/feature-updates" - ] - }, - { - "group": "Python API", - "pages": [ - "python-api/index", - "python-api/clone", - "python-api/data_frame", - "python-api/datasets", - "python-api/df_utils", - "python-api/diff/diff", - "python-api/diff/line_diff", - "python-api/diff/tabular_diff", - "python-api/diff/text_diff", - "python-api/init", - "python-api/oxen_fs", - "python-api/remote_repo", - "python-api/repo", - "python-api/repositories", - "python-api/workspace" - ] - }, - { - "group": "Repository API", - "pages": [ - { - "group": "Getting Started", - "pages": [ - "http-api/index", - "http-api/example" - ] - } - ] - }, - { - "group": "Inference API", - "pages": [ - "inference-api/overview", - { - "group": "Quick Starts", - "pages": [ - "inference-api/quickstart/chat", - "inference-api/quickstart/image-generation", - "inference-api/quickstart/video-generation", - "inference-api/quickstart/async-queue" - ] - }, - { - "group": "API Reference", - "pages": [ - "inference-api/reference/chat_completions", - "inference-api/reference/image_generation", - "inference-api/reference/image_editing", - "inference-api/reference/video_generation", - "inference-api/reference/async_queue", - "inference-api/reference/models/overview", - "inference-api/reference/model-references" - ] - }, - { - "group": "Model Walkthroughs", - "pages": [ - "inference-api/reference/models/walkthroughs/overview", - "inference-api/reference/models/walkthroughs/kling_o3_pro_reference_to_video", - "inference-api/reference/models/walkthroughs/kling_o3_pro_video_to_video_edit", - "inference-api/reference/models/walkthroughs/seedance_2_reference_to_video", - "inference-api/reference/models/walkthroughs/topaz_starlight_precise_2_5" - ] - } - ] - }, - { - "group": "Fine-Tuning API", - "pages": [ - "fine-tuning-api/overview", - { - "group": "Quick Starts", - "pages": [ - "fine-tuning-api/quickstart/text", - "fine-tuning-api/quickstart/image-generation", - "fine-tuning-api/quickstart/image-editing", - "fine-tuning-api/quickstart/video" - ] - }, - { - "group": "Tutorials", - "pages": [ - "fine-tuning-api/tutorials/01_fine_tuning", - "fine-tuning-api/tutorials/03_fine_tuning_image_generation", - "fine-tuning-api/tutorials/02_fine_tuning_image" - ] - }, - { - "group": "API Reference", - "pages": [ - "fine-tuning-api/reference/text_generation", - "fine-tuning-api/reference/text_chat_messages", - "fine-tuning-api/reference/image_generation", - "fine-tuning-api/reference/image_editing", - "fine-tuning-api/reference/multi_image_editing", - "fine-tuning-api/reference/image_to_text", - "fine-tuning-api/reference/image_to_video", - "fine-tuning-api/reference/text_to_video" - ] - }, - "fine-tuning-api/parameters" - ] - } - ], - "footerSocials": { - "twitter": "https://twitter.com/oxen_ai", - "github": "https://github.com/Oxen-AI/Oxen", - "linkedin": "https://www.linkedin.com/company/oxenai/" - }, - "analytics": { - "posthog": { - "apiKey": "phc_Do6VnS78puWwBQuPcJ7TkjixBZwsV4Xxc3ABHVNvjmE" - }, - "ga4": { - "measurementId": "G-H6D2C08EKP" - } - } -} diff --git a/python-api/data_frame.mdx b/python-api/data_frame.mdx index 3cda527..90a381f 100644 --- a/python-api/data_frame.mdx +++ b/python-api/data_frame.mdx @@ -12,7 +12,7 @@ class DataFrame() The DataFrame class allows you to perform CRUD operations on a remote data frame. -If you pass in a [Workspace](/concepts/workspaces) or a [RemoteRepo](/concepts/remote-repos) the data is indexed into DuckDB on an oxen-server without downloading the data locally. +If you pass in a [Workspace](/getting-started/workspaces) or a [RemoteRepo](/concepts/remote-repos) the data is indexed into DuckDB on an oxen-server without downloading the data locally. ## Examples diff --git a/python-api/index.mdx b/python-api/index.mdx index 53ee0d4..d10dde9 100644 --- a/python-api/index.mdx +++ b/python-api/index.mdx @@ -1,41 +1,275 @@ --- -title: All Modules -description: '🐍 Here lies detailed Python module documentation' +title: 'Introduction' +description: 'Learn how to get started with the oxenai Python package.' --- -For a detailed guide to get up and running with Python, see [Getting Started](/getting-started/python). +## Install -## Clone +```bash +pip install oxenai +``` + +## Clone Repository + +Clone a repository from the [Oxen Hub](https://oxen.ai) or your own [oxen-server](/getting-started/oxen-server). Detailed documentation for the [clone](/python-api/clone) method can be found in the [Module Reference](#module-reference) below. + +```python +import oxen +oxen.clone("ox/SpanishToEnglish") +``` + +This will create a directory called `SpanishToEnglish` in your current working directory and download the latest version of the repository. + +If you have not setup your API Key locally, you will get an error cloning data. View our [Authentication & Authorization documentation](/getting-started/auth) to learn more. + +## Initialize Local Repository + +If you are creating a new repository from scratch, you can initialize it with the [init](/python-api/init) method. + +We will be using a fictional repository called `CatsVsDogs` for this example. + +```python +import oxen +import os + +# Create an empty directory named CatsVsDogs +directory = "CatsVsDogs" +os.makedirs(directory) + +# Initialize the Oxen Repository +repo = oxen.init(directory) +``` + +This will create a `.oxen` directory to keep track of changes as you make them. + +## Load Existing Repository + +Use the [repo](/python-api/repo) class to interact with a repository that has already been initialized. + +```python +from oxen import Repo + +# Load the repository from the CatsVsDogs directory +repo = Repo("CatsVsDogs") +# Check the status of the repository +print(repo.status()) +``` + +## Add Files + +Now let's create a README.md file and [add](/python-api/repo#add) it to the local staging area. This will not commit the changes to the repository, but it will prepare them to be committed. + +```python +# ... continue from previous example + +# Create a README.md file +filename = os.path.join(repo.path, "README.md") +with open(filename, "w") as f: + f.write("# Cats vs. Dogs\n\nWhich is it? We will be using machine learning to find out!") + +# Add the README.md file to the staging area +repo.add(filename) + +# Confirm that the file has been staged +print(repo.status()) +``` + +## Commit Changes + +Now that we have added the README.md file to the staging area, we can commit the changes to the repository. + +```python +# ... continue from previous example + +# Commit the changes to the repository +repo.commit("Adding README.md") +``` + +## Diff Changes + +Oxen.ai has powerful diff tools built in that allow you to see the changes to files between commits, branches, and more. + +```python +result = oxen.diff("README.md") +print(result.get()) +``` + +To learn more about diffs checkout the [diff](/concepts/diffs) documentation or the [Python API Documentation](/python-api/diff/diff). + +## Push To Remote + +It's one thing to version your data locally, but where the real power comes in is when you can share your data with others. Oxen repositories can be pushed to a remote repository hosted on [Oxen Hub](https://oxen.ai) or your own [oxen-server](/getting-started/oxen-server). + +There are a few steps when pushing to a remote for the first time. + +1. [Create Remote](/python-api/remote_repo#create_repo) +2. [Point Local to Remote](/python-api/repo#set_remote) +3. [Push Changes](/python-api/repo#push) + +### Create Remote + +Before you can push to a remote repository, you must create it. This can be done with the [create_repo](/python-api/remote_repo#create_repo) method. + +```python +from oxen.remote_repo import create_repo + +# Create a remote repository +remote_name = "MyNamespace/CatsVsDogs" +remote_repo = create_repo(remote_name) +``` + +### Point Local to Remote + +Now that we have created the remote repository, we need to point our local repository to sync to it. This can be done with the [set_remote](/python-api/repo#set_remote) method. + +```python +from oxen import Repo + +# Load the local repository +repo = Repo("CatsVsDogs") + +# Point the local repository to the remote +repo.set_remote("origin", remote_repo.url()) +``` + +### Push Changes + +Now that we have created the remote repository and pointed our local repository to it, we can [push](/python-api/repo#push) our changes to the remote repository. + +```python +# Push the changes to the remote repository +repo.push() +``` + +### Full Push Example + +The end to end workflow from scratch looks like this: + +```python +from oxen import Repo +from oxen.remote_repo import create_repo +from oxen.auth import config_auth + +# 0. Load the local repository +repo = Repo("CatsVsDogs") + +# 1. Configure Authentication +config_auth("YOUR_AUTH_TOKEN") + +# 2. Create a remote repository +remote_name = "MyNamespace/CatsVsDogs" +repo = create_repo(remote_name) + +# 3. Point the local repository to the remote +repo.set_remote("origin", repo.url) + +# 4. Push the changes to the remote repository +repo.push() +``` + +## Pull Data + +Now that we have pushed our changes to the remote repository, we can [pull](/python-api/repo#pull) them down to another machine. + +```python +import oxen +import os + +repo_path = "CatsVsDogs" +if os.path.exists(repo_path): + # if you already have a local copy of the repository, you can load it + repo = oxen.Repo(repo_path) +else: + # if you don't have a local copy of the repository, you can clone it + repo = oxen.clone("ox/CatsVsDogs") + +# Pull the latest changes from the remote repository +repo.pull() +``` + +## OxenFS (fsspec Integration) + +OxenFS allows you to conveniently read and write files through a Pythonic file interface. + +```python +import oxen + +fs = oxen.OxenFS("openai", "gsm8k") +with fs.open("gsm8k_test.parquet") as f: + content = f.read() +``` + +It also integrates directly with third-party libraries like Pandas like this: + +```python +df = pd.read_parquet("oxen://openai:gsm8k@main/gsm8k_test.parquet") +``` + +See the full documentation for [OxenFS](/python-api/oxen_fs). + +## Branching + +Branching is a powerful feature of Oxen that allows you to create a named version of your data without affecting the original version. This is useful when you want to experiment with your changes affecting the original version. + +### Create Branch + +To create a new branch, use the [Repo.checkout](/python-api/repo#checkout) method. + +```python +from oxen import Repo + +repo = Repo("CatsVsDogs") +repo.checkout("add-dogs", create=True) +``` + +This both creates the branch and checks it out (the command line equivalent of `oxen checkout -b add-dogs`). + +### List Branches + +To list all of the branches in a repository, use the [Repo.branches](/python-api/repo#branches) method. + +```python +from oxen import Repo + +repo = Repo("CatsVsDogs") +print(repo.branches()) +``` + +Output: + +``` +[Branch(name=add-dogs, commit_id=3168391af834ac18), Branch(name=main, commit_id=3168391af834ac18)] +``` + +As you can see there should be a `main` branch and a `add-dogs` branch, each tied to a commit id. The commit ids will be the same at this point, because the branches have not diverged in content. + +## Module Reference + +Detailed documentation for each Python module. + +### Clone [clone](/python-api/clone) is used to download a repository to your local machine. -## Initialize Repository +### Initialize Repository [init](/python-api/init) is used to initialize a new local repository. -## Configure User +### Configure User [user](/python-api/user) is used to configure the user for a local repository. -## Setup Auth +### Setup Auth [auth](/python-api/auth) is used to configure authentication for remote repositories. -## Repositories - -The [Repositories](/python-api/repositories) page has an overview of the two -repository classes, and detailed documentation for each class can be found on -their respective pages below. - -#### Local Repository - -[Repo](/python-api/repo) is used to interact with data locally. +### Repositories -#### Remote Repository +The [Repositories](/python-api/repositories) page has an overview of the two repository classes, and detailed documentation for each class can be found on their respective pages below. -[RemoteRepo](/python-api/remote_repo) is used to interact with a remote data -without downloading all of it locally. +- [Repo](/python-api/repo) is used to interact with data locally. +- [RemoteRepo](/python-api/remote_repo) is used to interact with a remote data without downloading all of it locally. -## OxenFS +### OxenFS -[OxenFS](/python-api/oxen_fs) is an [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) backend that allows you to read and write files in your Oxen repo through a Pythonic file interface. It also provides a convenient integration point with third-party libraries. \ No newline at end of file +[OxenFS](/python-api/oxen_fs) is an [fsspec](https://filesystem-spec.readthedocs.io/en/latest/) backend that allows you to read and write files in your Oxen repo through a Pythonic file interface. It also provides a convenient integration point with third-party libraries. diff --git a/python-api/oxen_fs.mdx b/python-api/oxen_fs.mdx index 305f383..efc205d 100644 --- a/python-api/oxen_fs.mdx +++ b/python-api/oxen_fs.mdx @@ -39,7 +39,7 @@ with fs.open("data/train.csv") as f: ### Writing Files You must have write access to the repository to write files. See: -https://docs.oxen.ai/getting-started/python#private-repositories +https://docs.oxen.ai/python-api#private-repositories OxenFS will automatically commit the file to the repository when the context is exited (or the file is closed some other way). New diff --git a/python-api/repositories.mdx b/python-api/repositories.mdx index d7ca49d..b983114 100644 --- a/python-api/repositories.mdx +++ b/python-api/repositories.mdx @@ -51,7 +51,7 @@ print(df.head()) ### Add Files -Oxen has the concept of [Remote Workspaces](/concepts/workspaces) that make it easy to add data to a remote repository without ever downloading it locally. +Oxen has the concept of [Remote Workspaces](/getting-started/workspaces) that make it easy to add data to a remote repository without ever downloading it locally. ```python Python from oxen import RemoteRepo diff --git a/snippets/cta-button.mdx b/snippets/cta-button.mdx new file mode 100644 index 0000000..26ae1f8 --- /dev/null +++ b/snippets/cta-button.mdx @@ -0,0 +1,5 @@ +export const CTAButton = ({ href, children }) => ( + + {children} + +); diff --git a/style.css b/style.css new file mode 100644 index 0000000..b22cf0d --- /dev/null +++ b/style.css @@ -0,0 +1,19 @@ +.cta-button { + display: inline-block; + padding: 0.4rem 0.9rem; + margin: 1.25rem 0; + background-color: #8d34ff; + color: white !important; + font-weight: 600; + font-size: 0.875rem; + border-radius: 0.375rem; + text-decoration: none !important; + border-bottom: none !important; + box-shadow: none !important; + transition: opacity 0.15s ease; +} + +.cta-button:hover { + opacity: 0.85; + color: white !important; +}