set_github_host() and set_gitlab_host() now print elapsed time after setting up the host (#798).set_github_host() / set_gitlab_host() failing when set_parallel() is called first. The order of function calls no longer matters (#796).get_orgs() failing for GitLab hosts with more than 10,000 groups. Since GitLab 15.7, the x-total header is suppressed from REST API responses when the result set exceeds 10,000 records. The org count is now obtained via GraphQL groups { count } query instead (#793).This patch improves parallel processing — set_parallel() and is_parallel() are now pipeable and GitLab batch fetching runs concurrently — and fixes several bugs in get_repos() and get_files().
set_parallel() and is_parallel() now take a gitstats object as first argument, making them pipeable (e.g. gitstats |> set_parallel(4)). Parallel state is stored on the object (#790).set_parallel() is active. Previously, batches were fetched sequentially even with parallelism enabled (#786).is_parallel() indicating whether parallel processing is currently active and public get_*() methods now emit ℹ Running in parallel mode. when verbose = TRUE (#779).get_repos() with with_code failing for GitLab when the code search returns more than 1000 repositories. The GraphQL resolver rejects queries with too many IDs; get_repos() now detects this limit error and automatically batches the IDs (#781).set_*_host() in case user doesn't pass full path in repos (#782).get_files() function (#776).EngineGraphQL R6 private methods into standalone package-level functions, enabling reuse by parallel workers without R6 serialization issues.This release introduces external storage backends (PostgreSQL and SQLite) for persisting pulled data across sessions, and optional parallel processing via mirai for faster API calls. It also brings several performance improvements, including a faster file tree retrieval and optimized GitLab repository queries.
set_postgres_storage(), set_sqlite_storage(), and set_local_storage() to configure external storage backends. PostgreSQL (via RPostgres/DBI) and SQLite (via RSQLite/DBI) are supported for persisting data in a database. Metadata (R classes, attributes) is preserved via a _metadata table (#602).remove_from_storage() to remove a named table from the active storage backend (#747).remove_postgres_storage() and remove_sqlite_storage() to fully remove a database storage backend — the PostgreSQL variant drops the GitStats schema, the SQLite variant deletes the database file — and revert to local storage (#759).get_storage_metadata() to retrieve metadata (R classes, custom attributes, column types) for a stored table (#748).mirai package. Use set_parallel() to enable concurrent data fetching across repositories and organizations (#736).add_languages parameter to get_repos() (TRUE by default). When set to FALSE, languages data is excluded from the output and the GitLab REST languages API calls are skipped, speeding up the process.set_owner_type() results to avoid redundant GraphQL calls when multiple get_* functions are used in the same session (#738).get_repos_trees(), substantially improving speed of retrieving repository file trees (#740).get_repos() for GitLab when specific repos are set. Previously, the repos_by_user GraphQL query searched the entire GitLab instance; now repos are queried directly by fullPath (#750).get_repos() returning NA for commit_sha on archived GitLab projects. When the GraphQL API returns null for lastCommit, a REST Branches API fallback can now retrieve the SHA. Use fill_empty_sha = TRUE in get_repos() to enable this (#746).The newest minor release includes new functions for retrieving pull requests (get_pull_requests()) and their statistics (get_pull_requests_stats()), prettified repository URL outputs in get_repos_urls(), along with refactoring and code cleanup.
get_repos_urls() function (#710).get_pull_requests() function for getting information about pull requests (#722).get_pull_requests_stats() function (#726).This patch release covers fixes for get_files() function and updates for until parameter in get_release_logs(), get_commits() and get_issues() functions.
pattern results with empty files structure (#711).until parameter in get_release_logs(), get_commits() and get_issues(). Functions will now include records from the specified date (e.g., passing "2025-12-08" to until will include data from December 8th, 2025), whereas previously, it only fetched data up to (but not including) that date (#718).This patch introduces the show_hosts() function to display host information, addresses empty GitLab project values in file tables, and updates verbose logic to default to FALSE while retaining critical messages.
show_hosts() function to print info on hosts set with set_*_host() functions (#672)verbose logic - by default user functions have now verbose set to FALSE. Still, most important messages are printed (e.g. time span of the whole process) (#704).Patch release with some improvements and fixes for the process of pulling repositories data when getting commits and files, as well as change of the idea of progress bars, which are now displayed on GitHost level, instead of organization level.
get_repos_data() method in get_commits() and get_files() with switching to REST API engine (#690).A minor release with some substantial performance improvements on searching repositories by code, new features like filtering repositories data by languages and adding new columns in get_repos() and get_files() output.
commit_sha column to get_repos() and get_files() outputs (#546).depth parameter in get_files() - previously 0 and 1 value returned same output, i.e. files from root. Now it works the way as explained in function documentation - value 0 returns files from the root and value 1 goes 1 level deeper (#663).language parameter to get_repos() function to pull repositories only with defined language (#654). For GitHub Search API it translates into language query, whereas in other cases the repositories output is simply filtered by the given language.repo_fullpath column, which replaced fullname in output of get_repos(). The fullname column was flawed in case of GitLab repositories as it was created out of repository name with organization. In case of GitLab repository name (which is more of a user friendly label) differs from repository path (which is in the URL), unlike in GitHub where repository name is repository path (#659). repo_name column for GitLab repositories now mirrors repository path.verbose role to control displaying of response error statuses (#669).get_repos() does not fail when user passes text, e.g. with spaces to the with_code parameter (#673).repo_id column in get_repos() and get_files() outputs for GitLab hosts - it consists now only of digits formatted as a character (#675).GraphQL errors in GitHub and GitLab (#622).HTTP 404 error (#653).organizations and repositories scope (#640). This change was motivated by the need to enable the call of functions based on Search API on public hosts (such as get_repos(with_code = {code})), whose performance is acceptable on large public repositories. In the case of other slower functions, users will be informed of the estimated data retrieval time via a progress bar.get_repos_with_R_packages() function (#644), as it is not in line with GitStats logic: getting formatted git data from repositories.repo_name instead of repository, githost instead of platform) for some tables (issues, files_content and repos) (#632).GraphQL (502 Bad Gateway) occurring during pulling commits (#636).org is set as a scope (#639).get_repos_trees() function (#614).A patch release with hot fixes to functions pulling repositories with code.
GraphQL errors due to old GitLab version with switching to REST engine (#615, @ThomUK).This release introduces the new functions to get data on organizations and issues, alongside several important fixes and optimizations, such as handling GitLab API limits more efficiently. Additional enhancements include renamed function, added time usage information and shortened data-pulling messages.
get_files() function when pattern is not defined (#605). Before, function call resulted with empty response or error.GraphQL API methods for parsing search responses (still gathered via REST) into repositories output.get_R_package_usage() to get_repos_with_R_package() as the output resembles the one from get_repos().GitStats is set to scan whole hosts (#589). Set type parameter to api by default. Setting type to web results in parsing GitLab api URLs which may be time consuming and it should not be a default option.GitStats is set to scan whole hosts (#583).This release brings some substantial improvements with making it possible to scan whole organizations and particular repositories for one host at the same time, boosting function to prepare commits statistics and simplifying workflow for getting files.
orgs and repos in set_*_host() functions (#400).get_commits_stats() function (#556, #557) with:
group_var parameter,time_interval parameter to time_aggregation,yearly aggregation to time_aggregation parameter,GitStats to commits_data object which allows to build workflow in one pipeline (create_gitstats() |> set_*_host() |> get_commits() |> get_commits_stats()).get_files_content() and get_files_structure() into one get_files() (#564)..error parameter to the set_*_host() functions to control if error should pop up when wrong input is passed (#547).author_name and author_login if it was missing in commits_table (#550).GraphQL response error when pulling repositories with R error. Earlier, GitStats just returned empty table with no clue on what has happened, as errors from GraphQL are returned as list outputs (they do not break code).orgs parameter in set_github_host() (#562).This is a patch release which introduces some hot fixes and new data in get_commits() output.
repo_url column to output of get_commits() function (#535).verbose mode is set to FALSE (#525) and fixed checking token scopes for GitLab (#526).get_repos_urls() output when individual repositories are set in set_*_host()(#529). Earlier the function pulled all repositories for an organization, even though, repositories were defined for the host, not whole organizations. This is similar to the solved earlier (#439).This is a patch release which introduces some improvements in get_R_package_usage() on speed and possibility to pull at once data on multiple R packages, new get_storage() function and some fixes for checking token scopes and setting hosts.
get_R_package_usage() function:
packages parameter replacing old package_name) (#494),split_output parameter has been added - when set to TRUE a list with tibbles (every element of the list for every package) instead of one tibble is returned.get_repos() (#492). Earlier this was only possible for GitHub organizations and GitLab groups.get_storage() function to retrieve data from GitStats object - whole or particular datasets (e.g. commits, repositories or R_package_usage) (#509).GitHost is not passed to GitStats. This also applies to situation when GitStats looks for default tokens (not defined by user). Earlier, if tests for token failed, an empty token was passed and GitStats was created, which was misleading for the user.github.com or https://github.com) to set_github_host() (#475).{host_url}, http://{host_url} or https://{host_url}) to host parameter in `set_*_host() function (#399).This minor release comes up with new get_files_structure() function and adjustments to get_files_content() so user can pull custom (by defining pattern of files and depth of directories) files tree from repository and pull their content.
get_files_structure() function to pull files structure for a given repository with possibility to control level of directories (depth parameter) and to limit output to files matching regex argument passed to pattern parameter (#338). Together with that, get_files() function was renamed to get_files_content() to better reflect its purpose.get_files_content() so it can make use of files_structure pulled to GitStats storage with get_files_structure() function - if file_path is set to NULL and use_files_structure() parameter to TRUE (both are by default)(#467).progress parameter to user functions to control showing of cli progress bar separately from messages (which are controlled with verbose) (#465).orgs nor repos specified) from warning to info (#456).gh-pages, lint and check for bumping version.This is a patch release with substantial improvements to some functions (get_repos(), get_files() and get_R_package_usage()), adding with_files and in_files parameters, fixing cache feature and introducing new get_repos_urls() function, a minimalist version of get_repos():
get_repos_urls() function to fetch repository URLs (either web or API - choose with type parameter). It may return also only these repository URLs that consist of a given file or files (with passing argument to with_files parameter) or a text in code blobs (with_code parameter). This is a minimalist version of get_repos(), which takes out all the process of parsing (search response into repositories one) and adding statistics on repositories. This makes it poorer with content but faster. (#425).with_files parameter to get_repos() function, which makes it possible to search for repositories with a given file or files and return full output for repositories.with_code parameter (as a character vector) in get_repos() and get_repos_urls() (282).in_files parameter to get_repos() which works with with_code parameter. When both are defined, GitStats searches code blobs only in given files.dplyr::glimpse() from get_*() functions, so there is printing to console only if get_*() function is not assigned to the object (#426).get_R_package_usage() consists now also of repository full name (#438).get_R_package_usage() with optimizing search of package names in DESCRIPTION and NAMESPACE files by removing filtering method and replacing it with filename: filter directly in search endpoint query (#428).get_files() when scanning scope is set to repositories. Earlier, it pulled given files from whole organizations, even if scanning scope was set to repos with set_*_host(). Now it shows only files for the given repositories (#439).verbose parameter controls now showing of the progress bars (#453).This is a patch release with some hot issues that needed to be addressed, notably covering set_*_host() functions with verbose control, tweaking a bit verbose feature in general, fixing pulling data for GitLab subgroups and speeding up get_files() function.
GitStats is set to scan whole hosts, with switching to Search API instead of pulling files via GraphQL (with iteration over organizations and repositories) (#411).orgs or repos) GitStats does not pull no more all organizations. Pulling all organizations from host is triggered only when user decides to pull repositories from organizations. If he decides, e.g. to pull repositories by code, there is no need to pull all organizations (which may be a time consuming process), as GitStats uses then Search API (#393).set_*_host() functions with verbose_off() or verbose parameter (#413).verbose to FALSE does not lead to hiding output of the get_*() functions - i.e. a glimpse of table will always appear after pulling data, even if the verbose is switched off. verbose parameter serves now only the purpose to show and hide messages to user (#423).set_*_host() function (#415)This is a major release with general changes in workflow (simplifying it), changes in setting GitStats hosts, deprecation of some not very useful features (like plots, setting parameters separately) and new get_release_logs() function.
set_host() function is replaced with more explicit set_github_host() and set_gitlab_host()(#373). If you wish to connect to public host (e.g. api.github.com), you do not need to pass argument to host parameter.repositories, commits, R_package_usage or other you should use directly corresponding get_*() functions instead of pull_*() which are deprecated. These get_*() functions pull data from API, parse it into table, add some goodies (additional columns) if needed and return table instead of GitStats object, which in our opinion is more intuitive and user-friendly (#345). That means you do not need to run in pipe two or three additional function calls as before, e.g. pull_repos(gitstats_object) %>% get_repos() %>% get_repos_stats(), but you just run
get_repos(gitstats_object) to get data you need.get_*() function GitStats will pull the data from its storage and not from API as for the first time, unless you change parameters for the function (e.g. starting date with since in get_commits()) or change directly the cache parameter in the function. (#333)pull_repos_contributors() as a separate function is deprecated. The parameter add_contributors is now set by default to TRUE in get_repos() which seems more reasonable as user gets all the data.get_commits() old parameters (date_from and date_until) were replaced with new, more concise (since and until).set_params() function is removed. (#386) Now the logic is moved straight to get_*() functions. For example, if you want to pull repositories with specific code blob, you do not need to define anything with set_params() (as previously with search_mode and phrase parameter) but you just simply run get_repos(with_code = 'your_code'). (#333)verbose have been introduced for limiting messages to user when pulling data - this parameter can be set in all get_*() functions. You can also turn the verbose mode on/off globally with verbose_on()/verbose_off() functions.get_repos_stats() function was deprecated as its role was unclear - unlike get_commit_stats() it did not aggregate repositories data into new stats table, but added only some new numeric columns, like number of contributors (contributors_n) or last activity in difftime format, which is now done within get_repos() function.team and filtering by language is no longer supported - these features where quite heavy for the package performance and did not bring much added value. If user needs, he can always filter the output (formatted responses pulled from API) by contributors or language. (#384)GitStats, they have been deprecated as the package is meant to be basically for back end purposes and this is the field where developer's effort should now go (#381). If needed and requested, plot functions may be brought up once more in next releases.get_release_logs() (#356).get_orgs() is renamed to show_orgs() to reflect that it does not pull data from API, but only shows what is in GitStats object.author_login and author_name (#332). This is due to the mix of GitHub/GitLab handles and display names in the author column (the original author name field in commits API response).GitStats object - now when you return GitStats object in console, it prints GitStats data divided into sections to give more readable information to user: scanning scope (organizations and repositories), and storage (the output tables stored in GitStats with basic information on dimensions) (#329).contributors response (#331).gts_to_posixt() helper which took dependencies on stringr was a cause for some users of passing empty value to since parameter to commits endpoint which ended in Bad Request Error (400) and infinite loop of retrying the response (#360).pull_R_package_usage() with get_R_package_usage() functions to pull repositories where package name is found in DESCRIPTION or NAMESPACE files or code blobs with phrases related to using an R package (library(package), require(package)) (#326, #341),pull_files() with get_files() to pull content of text files (#200).GitStats with set_host() function by using repos parameter instead of orgs (#330).id to repo_id and name to repo_name,default_branch column to repositories output as a consequence of #200.get_*_stats() functions to prepare summary stats from pulled data: repositories and commits (#276),gitstats_plot() which takes as an input repos_stats or commits_stats class objects (#276),get_* to pull_*; get_* functions are now to retrieve already pulled data from GitStats object (#294),setup() to set_params() (#294),set_connection() to set_host() (#271),add_team_member() to set_team_member() (#271).GITHUB_PAT or GITLAB_PAT), there is no need to pass them as an argument to set_host() (#120),pull_users() function to pull information on users (#199),orgs are passed (#258),get_orgs() function to print all organizations (#283),reset() function (#270)reset_language() or setting language parameter to All in setup() function (#231)contributors as basic stat when pulling repos by org and by phrase to improve speed of pulling repositories data. Added pull_repos_contributors() user function and add_contributors parameter to pull_repos() function to add conditionally information on contributors to repositories table (#235)api_url column as an address to the repository, not the host (#201),%>%) (#289).This is the first release of GitStats with given features:
create_gitstats() - creating GitStats object,set_connection() - adding hosts to GitStats object,setup() - setting search parameter to org, team or phrase, setting programming language of repositories,get_repos() - pulling repositories from GitHub and GitLab API in a standardized table,get_commits() - pulling commits from GitHub and GitLab API in a standardized table,set_team_member() - adding team members to GitStats object.