In this short article I will show two different ways to interact with GitHub using Python. First I will use the GitHub API directly, sending and receiving HTTP request and even parsing the JSON responses. Next I will rely on the git command that I have installed on my computer and execute it from Python to clone a repository, make modifications to it, commit the modifications and finally push the modifications to the remote repository. This latter approach can be used with any Git repository and is not limited to GitHub repositories.
Note that I am just dabbling in Python so I will be very grateful if you comment on anything that can be improved!
Using the GitHub API
GitHub has an API that, among other things, allow you to interact with repositories on GitHub.
GitHub Developer Documentation
The GitHub developer’s entry web page can be found here. From there you can find the documentation of their API, which currently is version 3. Their API contains much more than the little part I use here that allows me to retrieve information about the commits in a repository. I found the documentation well written and easy to understand.
Invoking the GitHub API from Python
Before I knew about any of the GitHub API client libraries for Python I tried to interact directly with the GitHub API. After having had a look at a few of those libraries, I decided to keep this code as I felt that it was reasonably easy to understand and that it does what it is supposed to do. The following function retrieves a string containing the date of the last commit for a user’s GitHub repository:
import httplib import json def retrieve_latest_commit_date_for_github_repository(in_username, in_repository_name): """Retrieves the date of the last commit for the master branch of the user's GitHub repository. :param in_username: Name of user which repository to get last commit for. :param in_repository_name Name of user's repository for which to get last commit for. :return: String containing date of last commit, or None if GitHub request failed. """ github_api_https_connection = httplib.HTTPSConnection('api.github.com') repository_last_commit_date = None github_http_request_headers = {"Accept": "application/vnd.github.v3+json", "User-Agent": github_user} try: # Request only the one last commit for the supplied user's repository with supplied name. github_request_path = "/repos/" + in_username + "/" + in_repository_name + "/commits?page=1&per_page=1" github_api_https_connection.request("GET", github_request_path, None, github_http_request_headers) github_repository_last_commit_response = github_api_https_connection.getresponse() if github_repository_last_commit_response.status == 200: # Response was successful, now read and parse the JSON data. github_repository_last_commit_response_text = github_repository_last_commit_response.read() github_repository_last_commit_response_object = json.loads(github_repository_last_commit_response_text) repository_last_commit_date = github_repository_last_commit_response_object[0]['commit']['author']['date'] else: message = "ERROR: Request to GitHub failed with status %s and the reason was %s" % \ (github_repository_last_commit_response.status, github_repository_last_commit_response.reason) print(message) finally: github_api_https_connection.close() return repository_last_commit_date
Note that when using the above code, you need to define a global variable named github_user which is to contain your own GitHub user name. This name will be enclosed in the User-Agent HTTP header with each request to GitHub, which uses this to keep track of who makes requests.
Since the above code only reads public data from a repository, no authentication is required. For requests to the GitHub API that are not authenticated, there is a rate limit of 60 requests per second as of writing this article.
With the above function, I can now find out when the last commit was made to a GitHub repository, for instance my own Message-Cowboy repository:
latest_commit_date = retrieve_latest_commit_date_for_github_repository("krizsan", "message-cowboy")
Cloning, Committing and Pushing
Spurred by the success of the above code snippet, I went on to the next task: Clone a repository, modify a file, commit and push the modification to GitHub.
First I tried the same approach that I used above but gave up after realizing it was not a trivial task. Next I tried a few GitHub client libraries for Python but none of them seemed very straightforward given the task at hand.
Executing Git Commands from Python
After having searched for a while, I found what I think is a very elegant solution: Use the local git shell command from Python. I polished the code a little and this is the result:
import os import subprocess def execute_shell_command(cmd, work_dir): """Executes a shell command in a subprocess, waiting until it has completed. :param cmd: Command to execute. :param work_dir: Working directory path. """ pipe = subprocess.Popen(cmd, shell=True, cwd=work_dir, stdout=subprocess.PIPE, stderr=subprocess.PIPE) (out, error) = pipe.communicate() print out, error pipe.wait() def git_add(file_path, repo_dir): """Adds the file at supplied path to the Git index. File will not be copied to the repository directory. No control is performed to ensure that the file is located in the repository directory. :param file_path: Path to file to add to Git index. :param repo_dir: Repository directory. """ cmd = 'git add ' + file_path execute_shell_command(cmd, repo_dir) def git_commit(commit_message, repo_dir): """Commits the Git repository located in supplied repository directory with the supplied commit message. :param commit_message: Commit message. :param repo_dir: Directory containing Git repository to commit. """ cmd = 'git commit -am "%s"' % commit_message execute_shell_command(cmd, repo_dir) def git_push(repo_dir): """Pushes any changes in the Git repository located in supplied repository directory to remote git repository. :param repo_dir: Directory containing git repository to push. """ cmd = 'git push ' execute_shell_command(cmd, repo_dir) def git_clone(repo_url, repo_dir): """Clones the remote Git repository at supplied URL into the local directory at supplied path. The local directory to which the repository is to be clone is assumed to be empty. :param repo_url: URL of remote git repository. :param repo_dir: Directory which to clone the remote repository into. """ cmd = 'git clone ' + repo_url + ' ' + repo_dir execute_shell_command(cmd, repo_dir)
The above code assumes that you have the git command installed and configured on your computer. I have only tested it on *nix operating systems and in addition, I only implemented the Git commands needed. Implementing additional commands should be trivial. Note that this code can be used with any Git repository, not only GitHub.
Programmatically Update a File in a Git Repository
With the above code in place, it became simple to write code to clone a remote repository, add/modify a file in it, commit the changes and push them back to the remote repository. In this example I add/update a file named .date in a remote repository:
import os import shutil import tempfile def update_date_file_in_remote_git_repository(in_repo_url): """Clones the remote Git repository at supplied URL and adds/updates a .date file containing the current date and time. The changes are then pushed back to the remote Git repository. """ # Create temporary directory to clone the Git project into. repo_path = tempfile.mkdtemp() print("Repository path: " + repo_path) date_file_path = repo_path + '/.date' try: # Clone the remote GitHub repository. git_clone(in_repo_url, repo_path) # Create/update file with current date and time. if os.path.exists(date_file_path): os.remove(date_file_path) execute_shell_command('date > ' + date_file_path, repo_path) # Add new .date file to repository, commit and push the changes. git_add(date_file_path, repo_path) git_commit('Updated .date file', repo_path) git_push(repo_path) finally: # Delete the temporary directory holding the cloned project. shutil.rmtree(repo_path)
Note that I use the execute_shell_command function from earlier to run the date command and pipe its output into the .date file to get the current date and time.
Happy coding!
how will you authenticate a user to login into Gitub and do all the listed activities through code.