event
PPG Seminar Series: “From software heritage to code commons: A vision for transparent and responsible AI in code-based model training”

Information
-
Date
25-03-2025 to 25-03-2025
-
Hour
14:04 - 15:04
-
Location
Avenida Getúlio Vargas, Quitandinha, Petrópolis, RJ - Brasil
"From software heritage to code commons: A vision for transparent and responsible AI in code-based model training" is the theme of the next lecture offered in the seminar organized by the Graduate Program (PPG-LNCC), delivered by Roberto Di Cosmo, Director of the Innovation and Research Initiative for Free Software. The event takes place this Tuesday (25th) at 2 PM. As in previous editions, the lectures are free and open to the general public. The seminars are held in a hybrid format. For the online model, the event will be streamed via the Zoom application and live on the LNCC YouTube channel. For the in-person model, the lecture will take place in Auditorium B of the institution. To register for this webinar, visit: https://us02web.zoom.us/webinar/register/WN_aKCmybKlRqalpLYmGHIM7A#/registration _____________________________________________________________________________________________________ Abstract There is a strong interplay between software development and machine learning: AI models are providing new tools to develop software, while the inclusion of large publicly available codebases in training datasets helps improve large language models’ reasoning abilities, well beyond coding tasks. In the specific domain of source code the issue of transparency of the training dataset assumes a special weight in the broader debate around open versus closed models. Software Heritage, launched by Inria and in partnership with UNESCO, has been building the largest archive of publicly available source code for nearly a decade, and provides today the Software Hash Identifier for the over 50 billion software artifacts it collected from over 300 million projects, ensuring availability, guaranteeing integrity and enabling traceability of all its contents. Because of the core values that inform its approach to open access and code preservation, it is naturally concerned by these challenges. In this talk we will start from the principled stance on the use of the Software Heritage archive for training models, report on the lessons learned from the collaboration with the BigCode project that created StarCoder2, and then focus on the challenges, ethical considerations, and technical limitations that arise in the current approaches to use open codebases in AI, in particular when it comes to transparency, accountability, and resource efficiency. These limitations underscore the need for a Code Commons: a dedicated initiative to expand Software Heritage into a central resource for transparency, quality, accountability, and sustainability in machine learning on code. By promoting transparency and responsible stewardship, Software Heritage aims to help researchers, developers, and organizations navigate the challenges of AI in code-based applications. This talk invites all stakeholders to collaborate on this ambitious vision. _____________________________________________________________________________________________________ Speaker Roberto Di Cosmo is a full professor of Computer Science at University Paris Cité, currently on leave at Inria to lead Software Heritage, a non profit international initiative in partnership with UNESCO to build the universal software source code archive. His research interest include functional programming, parallel and distributed programming, semantics of programming languages, type systems, rewriting, linear logic, software engineering and analysis of large software collections. A long term Free Software advocate, contributing to its adoption since 1998 with books, seminars, articles and software, he created the Free Software thematic group of Systematic in October 2007, then IRILL (www.irill.org) in 2010, a research structure dedicated to Free and Open Source Software quality. He is president of the board of IMDEA Software, and member of the french national council for Open Science. Coordenação de Pós-graduação e Aperfeiçoamento copga@lncc.br Serviço de Comunicação Institucional secin@lncc.br Instituto de Inteligência Artificial instituto.ia@lncc.br
Information
-
Date
25-03-2025 to 25-03-2025
-
Hour
14:00 - 15:30
-
Location
Avenida Getúlio Vargas, Quitandinha, Petrópolis, RJ - Brasil