Calculate statistics#

Created on Wed Jul 19 04:43:43 2023

Copyright 2023 Roy Ruddle

Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

vizdataquality.calculate.calc(df, options=None)#

Profile a data frame or series to calculate aspects of data quality and descriptive statistics.

Parameters:
  • df (DataFrame or Series) – The data.

  • options (dict, optional) – The descriptive statistics to output (default is None; output everything)

Returns:

The descriptive statistics (seperate row for each variable; variable names are the index; columns are different descriptive statistics).

Return type:

DataFrame

vizdataquality.calculate.get_non_numeric_values(df)#

Return a list of the unique, non-numeric values in a dataframe.

Parameters:

df (DataFrame) – The data.

Returns:

The unique, non-numeric values.

Return type:

list

vizdataquality.calculate.get_value_lengths_examples(df)#

Get examples of the shortest, median and longest values of each column in a dataframe.

Parameters:

df (DataFrame) – The data.

Returns:

A dataframe containing the examples. The first column (‘Examples’) specifies what each row contains (e.g., Shortest value).

Return type:

DataFrame