Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit Testing for Document Chunkers #354

Open
khaledsulayman opened this issue Nov 8, 2024 · 0 comments
Open

Unit Testing for Document Chunkers #354

khaledsulayman opened this issue Nov 8, 2024 · 0 comments

Comments

@khaledsulayman
Copy link
Member

Currently, the unit tests for the document chunkers primarily revolve around class instantiation, etc.

We should definitely add tests for the .chunk_documents() method in each chunker class. Additionally, it would be good to add tests around some of the custom logic we have for building chunks based on docling's .json outputs.

On the other hand, the DocumentChunker interface may change as a result of #334, so we should decide whether or not we want to hold off and do that first. Once that refactor is complete, we will only need to test one class and it should be easier for us to parametrize and test different filetypes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant