abraxos.extract.read_csv_chunks

abraxos.extract.read_csv_chunks(path, chunksize, **kwargs)[source]

Reads a CSV file in chunks and captures malformed lines.

Parameters:
  • path (str) – Path to the CSV file.

  • chunksize (int) – Number of rows per chunk.

  • **kwargs (dict) – Additional arguments passed to pandas.read_csv.

Yields:

ReadCsvResult – A named tuple containing bad lines and the parsed DataFrame for the chunk.

Return type:

collections.abc.Generator[abraxos.extract.ReadCsvResult, None, None]

Examples

>>> for result in read_csv_chunks('data.csv', chunksize=100):
...     print(result.bad_lines)
...     print(result.dataframe)