DAY 1 16:55-17:10 JST Seminar Room B
JaEnKo
Onsite

Introduction of Web Corpus Built from Large-scale Proprietary Crawl Data

We created a Japanese web corpus based on the content obtained from approximately 1.5 billion URLs that we crawled ourselves. In this presentation, we will introduce how we created the corpus and how we use it within our company.

Speaker

Hirata Kodai

Hirata Kodai / LY Corporation

engineer at search company

  • GitHub

Joined the company as a new graduate in 2023. Involved in the development of web crawlers and projects using crawl data.

Back to Sessions