Leveraging Git-Based Projects in Domino 5.0
By May Hu, Senior Product Manager at Domino, on February 14, 2022, in Product Updates
Domino automatically tracks all experimentation artifacts so data science work is reproducible, discoverable, and reusable – increasing Model Velocity and mitigating regulatory risk. These artifacts can be documented as part of the Domino File System (DFS), but many companies now prefer to use a centralized Git code repository (e.g. GitHub, GitLab, Bitbucket) so data science code can be integrated with the rest of the company’s CI/CD workflows to improve consistency and governance.
Domino supports common Git workflows such as pulling latest content, pushing changes, browsing files and more – all from within a Domino workspace to/from the Git service provider of your choice. This Git-first experience gives users more control over the syncing and versioning of complex workflows, and makes it easy for them to engage in version-controlled, code-based collaboration with other team members.
Domino 5.0 improves on existing capabilities by simplifying the steps needed to switch branches, and by guiding users through a consistent process to resolve conflicts when merging code into their chosen repository (vs. manual work jumping between different environments).
Switching Branches in a Workspace
Branches allow data scientists to develop features, fix bugs, or safely experiment with new ideas in a contained area of their repository. To maximize productivity, data scientists can quickly switch branches inside their workspace for both the main code repository and any additional imported repositories.
Data scientists can easily select from up to 10 branches that are listed in the drop-down menu. If repositories have more than 10 branches, there is an option to search for additional branches.
Resolving Merge Conflicts
Merge conflicts occur when competing changes are made to lines of code or even entire files (e.g. deleting files). Failure to manage conflicts correctly can result in an organization’s repository becoming corrupt - which will require data scientists to spend time identifying the reasons for the errors in the repository.
When syncing changes in a workspace to a Git repository, or pulling the latest changes from it into a workspace, Domino first fetches the latest content from the remote branch (git fetch). Next, changes are applied on top of the updated branch (git rebase). If a conflict is detected, Domino guides users through a UI-based workflow to consistently resolve conflicts when merging code into their chosen repository.
How it Works
Switching Branches In A Workspace
- Follow this article to create a Git-based project in Domino.
- Create and launch a workspace with the desired IDE.
- Switch between branches with ease, without leaving the workspace.
Resolving Conflicts When Pulling Files
- Select “Pull” for either the main code repository or imported repository.
- If there are merge conflicts, the following warning will appear:
“Use Remote Changes” will discard changes in your workspace and overwrite files in the workspace with remote changes.
“Resolve Manually” will lead users to resolve conflicts by the filename. For each file in conflict, users can choose to one of the following options:
- “Mark as resolved” assumes that the files have been edited to resolve conflict markers.
- “Use my changes” will overwrite remote files with changes in the workspace.
- “Use origin repo changes” will discard changes in the workspace and overwrite the file with remote changes.
Resolving Conflicts When Syncing Changes
- Click “Sync” for either the main code repository or imported repository.
- If conflicts are detected, users will receive the following warning and options:
- “Force my changes” will overwrite remote files with changes in your workspace. This means that the commit history on the remote will match the commit history in your workspace.
- “Resolve Manually” will lead users to resolve conflicts by the filename.
Domino continues to help data scientists become more productive while helping organizations drive consistency and efficiency in their journey to become model-driven. Domino 5.0 allows data scientists to easily comply with enterprise-wide requirements for leveraging centralized code repositories. With the ability to easily switch between branches in a workspace, data scientists can focus on improving their code and running experiments. The new steps to manage conflicts help data scientists check in code more efficiently, and also ensure that the organization’s repository can be trusted and maintained.
Domino is the Enterprise MLOps platform that seamlessly integrates code-driven model development, deployment, and monitoring to support rapid iteration and optimal model performance so companies can be certain to achieve maximum value from their data science models.
About the Author
May Hu is a Senior Product Manager with years of experience in ML platform and cloud technologies.
She currently leads the product strategy of collaboration and automation charter for Domino Data Lab.