diff --git a/docs/tabpfn-nature.md b/docs/tabpfn-nature.md index 76064ce..026dea1 100644 --- a/docs/tabpfn-nature.md +++ b/docs/tabpfn-nature.md @@ -122,8 +122,6 @@ document.addEventListener('DOMContentLoaded', () => { -This page contains links to download, install, and set up TabPFN, as well as tutorials and resources to help you get started. -
OVERWRITE!
"},{"location":"aup/","title":"ACCEPTABLE USE POLICY","text":"Effective Date: January 8th 2025
This Acceptable Use Policy (\"AUP\") applies to the use of PriorLabs' Services. Where this AUP uses terms that are defined in the General Terms and Conditions (\"GTC\"), those terms shall have the meaning ascribed to them in the GTC.
PriorLabs reserves the right to change this AUP in accordance with the GTC at https://www.priorlabs.ai/aup.
"},{"location":"aup/#1-what-type-of-activity-is-prohibited","title":"1. What type of activity is prohibited?","text":"Customer shall not use, and encourage or allow any other person or entity to use the Services in prohibited manners, including but not limited to the following:
Customer may not upload any personal data within the meaning of the GDPR to the Contract Software or the Services.
Customer may not upload any material to the Contract Software or the Services that infringes the intellectual property rights or other rights of third parties, including but not limited to trademarks, copyrights, trade secrets, rights of publicity, or otherwise violating, infringing or misappropriating the rights of any third party.
Customer may not misappropriate, reverse-engineer, copy, disassemble, decompile, extract source code, trade secrets, or know-how, including PriorLabs' models, algorithms or artificial intelligence systems, or otherwise misuse or manipulate the Contract Software or Services or any part thereof.
Customer may not use the Services or the Contract Software in a way that imposes an unreasonable or disproportionately large load on PriorLabs' infrastructure, which adversely impacting the availability, reliability or stability of PriorLabs' Services.
Customer may not upload any viruses, spam, trojan horses, worms or any other malicious, harmful, or deleterious programs or code, including prompt-based manipulation or scraping behaviors, to the Contract Software or the Services.
Customer may not attempt to use the Services and Contract Software in a manner that compromises, circumvents, or tests the vulnerability of any of PriorLabs' technical safeguards or other security measures.
Customer may not use PriorLabs' Services or the Contract Software in any manner that may subject PriorLabs or any third party to liability, damages or danger.
Customer shall not use the Contract Software improperly or allow it to be used improperly, and in particular shall not use or upload to the Contract Software any content that is illegal or immoral and/or such content that serves to incite hatred, hate speech, illicit deep fakes, or fake news, or incites criminal acts or glorifies or trivializes violence, is sexually offensive or pornographic, is capable of seriously endangering children or young people morally or impairing their well-being or may damage the reputation of PriorLabs, and shall not refer to such content.
This list of prohibited uses is provided by way of example and should not be considered exhaustive.
"},{"location":"aup/#2-who-is-prohibited-from-using-the-services","title":"2. Who is prohibited from using the Services?","text":"Consumers within the meaning of Section 13 German Civil Code may not use PriorLabs' Services.
"},{"location":"cla/","title":"Contributor Agreement","text":""},{"location":"cla/#individual-contributor-exclusive-license-agreement","title":"Individual Contributor Exclusive License Agreement","text":""},{"location":"cla/#including-the-traditional-patent-license-option","title":"(including the Traditional Patent License OPTION)","text":"Thank you for your interest in contributing to PriorLabs's TabPFN (\"We\" or \"Us\").
The purpose of this contributor agreement (\"Agreement\") is to clarify and document the rights granted by contributors to Us. To make this document effective, please follow the instructions at https://www.priorlabs.ai/sign-cla.
"},{"location":"cla/#how-to-use-this-contributor-agreement","title":"How to use this Contributor Agreement","text":"If You are an employee and have created the Contribution as part of your employment, You need to have Your employer approve this Agreement or sign the Entity version of this document. If You do not own the Copyright in the entire work of authorship, any other author of the Contribution should also sign this \u2013 in any event, please contact Us at noah.homa@gmail.com
"},{"location":"cla/#1-definitions","title":"1. Definitions","text":"\"You\" means the individual Copyright owner who Submits a Contribution to Us.
\"Contribution\" means any original work of authorship, including any original modifications or additions to an existing work of authorship, Submitted by You to Us, in which You own the Copyright.
\"Copyright\" means all rights protecting works of authorship, including copyright, moral and neighboring rights, as appropriate, for the full term of their existence.
\"Material\" means the software or documentation made available by Us to third parties. When this Agreement covers more than one software project, the Material means the software or documentation to which the Contribution was Submitted. After You Submit the Contribution, it may be included in the Material.
\"Submit\" means any act by which a Contribution is transferred to Us by You by means of tangible or intangible media, including but not limited to electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, Us, but excluding any transfer that is conspicuously marked or otherwise designated in writing by You as \"Not a Contribution.\"
\"Documentation\" means any non-software portion of a Contribution.
"},{"location":"cla/#2-license-grant","title":"2. License grant","text":""},{"location":"cla/#21-copyright-license-to-us","title":"2.1 Copyright license to Us","text":"Subject to the terms and conditions of this Agreement, You hereby grant to Us a worldwide, royalty-free, Exclusive, perpetual and irrevocable (except as stated in Section 8.2) license, with the right to transfer an unlimited number of non-exclusive licenses or to grant sublicenses to third parties, under the Copyright covering the Contribution to use the Contribution by all means, including, but not limited to:
Moral Rights remain unaffected to the extent they are recognized and not waivable by applicable law. Notwithstanding, You may add your name to the attribution mechanism customary used in the Materials you Contribute to, such as the header of the source code files of Your Contribution, and We will respect this attribution when using Your Contribution.
"},{"location":"cla/#23-copyright-license-back-to-you","title":"2.3 Copyright license back to You","text":"Upon such grant of rights to Us, We immediately grant to You a worldwide, royalty-free, non-exclusive, perpetual and irrevocable license, with the right to transfer an unlimited number of non-exclusive licenses or to grant sublicenses to third parties, under the Copyright covering the Contribution to use the Contribution by all means, including, but not limited to:
This license back is limited to the Contribution and does not provide any rights to the Material.
"},{"location":"cla/#3-patents","title":"3. Patents","text":""},{"location":"cla/#31-patent-license","title":"3.1 Patent license","text":"Subject to the terms and conditions of this Agreement You hereby grant to Us and to recipients of Materials distributed by Us a worldwide, royalty-free, non-exclusive, perpetual and irrevocable (except as stated in Section 3.2) patent license, with the right to transfer an unlimited number of non-exclusive licenses or to grant sublicenses to third parties, to make, have made, use, sell, offer for sale, import and otherwise transfer the Contribution and the Contribution in combination with any Material (and portions of such combination). This license applies to all patents owned or controlled by You, whether already acquired or hereafter acquired, that would be infringed by making, having made, using, selling, offering for sale, importing or otherwise transferring of Your Contribution(s) alone or by combination of Your Contribution(s) with any Material.
"},{"location":"cla/#32-revocation-of-patent-license","title":"3.2 Revocation of patent license","text":"You reserve the right to revoke the patent license stated in section 3.1 if We make any infringement claim that is targeted at your Contribution and not asserted for a Defensive Purpose. An assertion of claims of the Patents shall be considered for a \"Defensive Purpose\" if the claims are asserted against an entity that has filed, maintained, threatened, or voluntarily participated in a patent infringement lawsuit against Us or any of Our licensees.
"},{"location":"cla/#4-license-obligations-by-us","title":"4. License obligations by Us","text":"We agree to license the Contribution only under the terms of the license or licenses that We are using on the Submission Date for the Material (including any rights to adopt any future version of a license).
In addition, We may use the following licenses for Documentation in the Contribution: CC-BY-4.0, CC-BY-ND-4.0, CC-BY-NC-4.0, CC-BY-NC-ND-4.0, CC-BY-NC-SA-4.0, CC-BY-SA-4.0, CC0-1.0, MIT License, Apache License, GNU General Public License (GPL) v2.0, GNU General Public License (GPL) v3.0, GNU Affero General Public License v3.0, GNU Lesser General Public License (LGPL) v2.1, GNU Lesser General Public License (LGPL) v3.0, Mozilla Public License 2.0, Eclipse Public License 2.0, Microsoft Public License (Ms-PL), Microsoft Reciprocal License (Ms-RL), BSD 2-Clause \"Simplified\" or \"FreeBSD\" license, BSD 3-Clause \"New\" or \"Revised\" license (including any right to adopt any future version of a license).
We agree to license patents owned or controlled by You only to the extent necessary to (sub)license Your Contribution(s) and the combination of Your Contribution(s) with the Material under the terms of the license or licenses that We are using on the Submission Date.
"},{"location":"cla/#5-disclaimer","title":"5. Disclaimer","text":"THE CONTRIBUTION IS PROVIDED \"AS IS\". MORE PARTICULARLY, ALL EXPRESS OR IMPLIED WARRANTIES INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTY OF SATISFACTORY QUALITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT ARE EXPRESSLY DISCLAIMED BY YOU TO US AND BY US TO YOU. TO THE EXTENT THAT ANY SUCH WARRANTIES CANNOT BE DISCLAIMED, SUCH WARRANTY IS LIMITED IN DURATION AND EXTENT TO THE MINIMUM PERIOD AND EXTENT PERMITTED BY LAW.
"},{"location":"cla/#6-consequential-damage-waiver","title":"6. Consequential damage waiver","text":"TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT WILL YOU OR WE BE LIABLE FOR ANY LOSS OF PROFITS, LOSS OF ANTICIPATED SAVINGS, LOSS OF DATA, INDIRECT, SPECIAL, INCIDENTAL, CONSEQUENTIAL AND EXEMPLARY DAMAGES ARISING OUT OF THIS AGREEMENT REGARDLESS OF THE LEGAL OR EQUITABLE THEORY (CONTRACT, TORT OR OTHERWISE) UPON WHICH THE CLAIM IS BASED.
"},{"location":"cla/#7-approximation-of-disclaimer-and-damage-waiver","title":"7. Approximation of disclaimer and damage waiver","text":"IF THE DISCLAIMER AND DAMAGE WAIVER MENTIONED IN SECTION 5. AND SECTION 6. CANNOT BE GIVEN LEGAL EFFECT UNDER APPLICABLE LOCAL LAW, REVIEWING COURTS SHALL APPLY LOCAL LAW THAT MOST CLOSELY APPROXIMATES AN ABSOLUTE WAIVER OF ALL CIVIL OR CONTRACTUAL LIABILITY IN CONNECTION WITH THE CONTRIBUTION.
"},{"location":"cla/#8-term","title":"8. Term","text":"8.1 This Agreement shall come into effect upon Your acceptance of the terms and conditions.
8.2 This Agreement shall apply for the term of the copyright and patents licensed here. However, You shall have the right to terminate the Agreement if We do not fulfill the obligations as set forth in Section 4. Such termination must be made in writing.
8.3 In the event of a termination of this Agreement Sections 5, 6, 7, 8 and 9 shall survive such termination and shall remain in full force thereafter. For the avoidance of doubt, Free and Open Source Software (sub)licenses that have already been granted for Contributions at the date of the termination shall remain in full force after the termination of this Agreement.
"},{"location":"cla/#9-miscellaneous","title":"9. Miscellaneous","text":"9.1 This Agreement and all disputes, claims, actions, suits or other proceedings arising out of this agreement or relating in any way to it shall be governed by the laws of Germany excluding its private international law provisions.
9.2 This Agreement sets out the entire agreement between You and Us for Your Contributions to Us and overrides all other agreements or understandings.
9.3 In case of Your death, this agreement shall continue with Your heirs. In case of more than one heir, all heirs must exercise their rights through a commonly authorized person.
9.4 If any provision of this Agreement is found void and unenforceable, such provision will be replaced to the extent possible with a provision that comes closest to the meaning of the original provision and that is enforceable. The terms and conditions set forth in this Agreement shall apply notwithstanding any failure of essential purpose of this Agreement or any limited remedy to the maximum extent possible under law.
9.5 You agree to notify Us of any facts or circumstances of which you become aware that would make this Agreement inaccurate in any respect.
"},{"location":"contribute/","title":"Contribute","text":"Put out project that people could contribute to and provide instructions for contributing
"},{"location":"docs/","title":"","text":"PriorLabs is building breakthrough foundation models that understand spreadsheets and databases. While foundation models have transformed text and images, tabular data has remained largely untouched. We're tackling this opportunity with technology that could revolutionize how we approach scientific discovery, medical research, financial modeling, and business intelligence.
"},{"location":"docs/#why-tabpfn","title":"Why TabPFN","text":"Rapid Training
TabPFN significantly reduces training time, outperforming traditional models tuned for hours in just a few seconds. For instance, it surpasses an ensemble of the strongest baselines in 2.8 seconds compared to 4 hours of tuning.
Superior Accuracy
TabPFN consistently outperforms state-of-the-art methods like gradient-boosted decision trees (GBDTs) on datasets with up to 10,000 samples. It achieves higher accuracy and better performance metrics across a range of classification and regression tasks.
Robustness
The model demonstrates robustness to various dataset characteristics, including uninformative features, outliers, and missing values, maintaining high performance where other methods struggle.
Generative Capabilities
As a generative transformer-based model, TabPFN can be fine-tuned for specific tasks, generate synthetic data, estimate densities, and learn reusable embeddings. This makes it versatile for various applications beyond standard prediction tasks.
Sklearn Interface
TabPFN follows the interfaces provided by scikit-learn, making it easy to integrate into existing workflows and utilize familiar functions for fitting, predicting, and evaluating models.
Minimal Preprocessing
The model handles various types of raw data, including missing values and categorical variables, with minimal preprocessing. This reduces the burden on users to perform extensive data preparation.
API Client
The fastest way to get started with TabPFN. Access our models through the cloud without requiring local GPU resources.
TabPFN Client
User Interface
Visual interface for no-code interaction with TabPFN. Perfect for quick experimentation and visualization.
Access GUI
Python Package
Local installation for research and privacy sesitive use cases with GPU support and scikit-learn compatible interface.
TabPFN Local
R Integration
Currently in development. Bringing TabPFN's capabilities to the R ecosystem for data scientists and researchers. Contact us for more information, or to get involved!
"},{"location":"enterprise/","title":"TabPFN Business","text":"
Unlock the hidden value in your company's databases and spreadsheets using TabPFN. Our state-of-the-art tabular foundation model is faster and more accurate in 96% of the use-cases and requires 50% less data as previous methods.
Save your data science team hours & days of work and enable them to focus on mission-critical business problems, even when data availability is limited.
"},{"location":"enterprise/#why-tabpfn-business","title":"Why TabPFN Business?","text":""},{"location":"enterprise/#access-to-enterprise-grade-features","title":"Access to Enterprise-Grade Features","text":"Please select all the ways you would like to hear from PriorLabs:
EmailYou can unsubscribe at any time by clicking the link in the footer of our emails.
We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices.
"},{"location":"newsletter/","title":"Stay Updated with TabPFN","text":"Join our newsletter to get the latest updates on TabPFN's development, best practices, and breakthrough research in tabular machine learning.
"},{"location":"newsletter/#what-youll-get","title":"What You'll Get","text":"Please select all the ways you would like to hear from PriorLabs:
EmailYou can unsubscribe at any time by clicking the link in the footer of our emails.
We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices.
"},{"location":"privacy_policy/","title":"Privacy policy","text":"PRIVACY POLICY\nLast updated: January 8th, 2025\n1. General information\nPrior Labs GmbH, Elisabeth-Emter-Weg 18, 79110 Freiburg im Breisgau (hereinafter \u201cPriorLabs\u201d, \u201cwe\u201d or \u201cus\u201d) takes the protection of personal data very seriously. \nWe treat personal data confidentially and always in accordance with the applicable data protection laws, in particular Regulation (EU) 2016/679 (hereinafter \u201cGeneral Data Protection Regulation\u201d or \u201cGDPR\u201d), the German Federal Data Protection Act (hereinafter \u201cBDSG\u201d), and in accordance with the provisions of this privacy policy.\nThe aim of this privacy policy is to inform you (hereinafter \u201cdata subject\u201d or \u201cyou\u201d) in accordance with Art. 12 et seq. GDPR about how we process your personal data and for what purposes we process your personal data when using our website https://priorlabs.ai/ (hereinafter \u201cWebsite\u201d), our services or contacting us.\nUnless otherwise stated in this privacy policy, the terms used here have the meaning as defined in the GDPR.\n2. Data controller\nPriorLabs acts as a controller within the meaning of the GDPR in relation to your personal data processed in connection with the use of our Website, Service or a contact made to or by PriorLabs. \nIf you have any questions about this privacy policy or the processing of your personal data, you can contact us at the following contact details:\nPrior Labs GmbH\nElisabeth-Emter-Weg 18\n79110 Freiburg im Breisgau\nE-mail: dataprotection@priorlabs.ai\n\nCategories, purposes and legal bases of the personal data processed\nWe process different categories of your personal data for different purposes. Below you can see which data we process in which contexts, for which purposes and on which legal basis we base the respective processing.\n2.1. Visiting our Website\nWhen visiting our Website for informational purposes, i.e., mere viewing and without you providing us with any other information, certain personal data is automatically collected each time the Website are called up and stored in so-called server log files. These are:\n\u2022 Browser type and version. The specific type and model of Internet browser you are using, such as Google Chrome, Mozilla Firefox, or Microsoft Edge, along with the specific version of the browser.\n\u2022 Operating system used. Your operating system for your digital activity, such as Windows, macOS, Linux, iOS, or Android.\n\u2022 Host name of the accessing computer. The unique name that your device has on the Internet or on a local network.\n\u2022 The date and time of access. The exact time of access to the Website. \n\u2022 IP address of the requesting computer. The unique numeric identifier assigned to a device when it connects to the Internet. \nSuch data is not merged with other data sources, and the data is not evaluated for marketing purposes. \nLegal basis:\nThe legal basis for the temporary storage and processing of such personal data is Art. 6 para. 1 sent. 1 lit. f GDPR. Our legitimate interest here is to be able to provide you with technically functional, attractive and user-friendly Website as well as to ensure the security of our systems.\nDuration of the storage:\nSuch personal data will be deleted as soon as it is no longer required to achieve the purpose for which it was collected. For personal data stored in log files, this is the case after 7 days at the latest. \nHowever, in some cases, e.g., due to legal retention periods we might be under the legal obligation to continue the storage of your personal data.\n2.2. Use of our Services\nWe provide you with a software to TabPFN foundation models in the context of the analysis, processing and evaluation of tabular business data (\u201cServices\u201d). Please note our Acceptable Use Policy which strictly prohibits the upload of personal data to use our Services. \nAlthough, you are not allowed to upload (tabular) personal data to have them analyzed, processed and evaluated, we are processing certain personal data when you are accessing our Services via our API.\n2.2.1. User account\nWhen you register your user account, we process the following personal data:\n\u2022 First and last name\n\u2022 E-mail address\n\u2022 Password\n\nLegal basis:\nWe process the aforementioned information to create your user account and, thus, such data will be processed for the performance of a contract or in order to take steps prior to entering into a contract in accordance with Art. 6 para. 1 sent. 1 lit. b GDPR. \nDuration of the storage:\nYou can delete your user account at any time by sending an e-mail with your request to dataprotection@priorlabs.ai. We will delete your user account when it has been inactive for 3 years.\n2.2.2. Usage data\nWhen you use our service, we process certain personal data about how you use it and the device you use to access it. We process the following usage data in the form of log files:\n\u2022 IP address of the requesting computer. The unique numeric identifier assigned to a device when it connects to the Internet. \n\u2022 Browser type and version. The specific type and model of Internet browser you are using, such as Google Chrome, Mozilla Firefox, or Microsoft Edge, along with the specific version of the browser.\n\u2022 Operating system used. Your operating system for your digital activity, such as Windows, macOS, Linux, iOS, or Android.\n\u2022 The date and time of access. The exact time of access to the Website. \n\u2022 Host name of the accessing computer. The unique name that your device has on the Internet or on a local network.\nThe processing of this data is used for the technical provision of our services and their contents, as well as to optimise their usability and ensure the security of our information technology systems.\nLegal basis:\nThe legal basis for the temporary storage and processing of such personal data is Art. 6 para. 1 sent. 1 lit. f GDPR. Our legitimate interest here is the technical provision of our services and their contents, as well as to optimise their usability and ensure the security of our information technology systems to be able to provide you with technically functional, attractive and user-friendly Website as well as to ensure the security of our systems.\nDuration of the storage:\nSuch personal data will be deleted as soon as it is no longer required to achieve the purpose for which it was collected. For personal data stored in log files, this is the case after 7 days at the latest. \nHowever, in some cases, e.g., due to legal retention periods we might be under the legal obligation to continue the storage of your personal data.\n2.3. Contact\nIt is possible to contact us on our Website by e-mail. When you contact us, we collect and process certain information in connection with your specific request, such as, e.g., your name, e-mail address, and other data requested by us or data you voluntarily provide to us (hereinafter \u201cContact Data\u201d). \nLegal basis:\nIf you contact us as part of an existing contractual relationship or contact us in advance for information about our range of services, the Contact Data will be processed for the performance of a contract or in order to take steps prior to entering into a contract and to respond to your contact request in accordance with Art. 6 para. 1 sent. 1 lit. b GDPR. \nOtherwise, the legal basis for the processing of Contact Data is Art. 6 para. 1 sent. 1 lit. f GDPR. The Contact Data is processed to pursue our legitimate interests in responding appropriately to customer/contact inquiries.\nDuration of storage:\nWe will delete Contact Data as soon as the purpose for data storage and processing no longer applies (e.g., after your request has been processed). \nHowever, in some cases, e.g., due to legal retention periods we might be under the legal obligation to continue the storage of your personal data.\n2.4. Newsletter\nWith your consent, we may process your personal data to send you a newsletter via e-mail that contains information about our products and services. To send you the newsletter, we require processing your e-mail address, date and time of your registration, your IP address and browser type. \nOur newsletters contain so-called tracking links that enable us to analyze the behavior of newsletter recipients. We may collect personal data such as regarding the opening of the newsletter (date and time), selected links, and the following information of the accessing computer system: IP address used, browser type and version, device type and operating system (\u201cTracking Data\u201d). This enables us to statistically analyze the success or failure of online marketing campaigns.\nLegal basis:\nThe data processing activities with regard to the newsletter sending and newsletter tracking only take place if and insofar as you have expressly consented to it within the merits of Article 6 para. 1 sent. 1 lit. a GDPR. Your prior consent for such processing activities is obtained during the newsletter subscription process (double opt-in) by way of independent consent declaration referring to this privacy policy.\nYou can revoke your consent at any time with effect for the future by clicking on the unsubscribe link in e-mails. The withdrawal of your consent does not affect the lawfulness of processing based on your consent before its withdrawal. \nDuration of storage:\nWe will delete your personal data as soon as the purpose for data storage and processing no longer applies. Your e-mail address will be stored for as long as the subscription to our newsletter is active. \nHowever, in some cases, e.g., due to legal retention periods, we might be under the legal obligation to continue the storage of your personal data.\n2.5. Social media and professional networks and platforms\nWe utilize the possibility of company appearances on social and professional networks and platforms (LinkedIn, Github, X and Discord) in order to be able to communicate with you and to inform you about our services and news about us. \nYou can, inter alia, access the respective network or platform by clicking on the respective network icon displayed on our Website, which includes a hyperlink. A hyperlink activated by clicking on it opens the external destination in a new browser window of your browser. No personal data is transferred to the respective network before this activation.\n2.5.1. Visiting our page on social media and professional networks and platforms\nThe respective aforementioned network or platform is, in principle, solely responsible for the processing of personal data when you visit our company page on one of those networks or platforms. \nPlease do not contact us via one of the networks or platforms if you wish to avoid this. You use such networks and platforms and their functions on your own responsibility. \n2.5.2. Communication via social media and professional networks and platforms\nWe process information that you have made available to us via our company page on the respective network or platform, e.g., your (user) name, e-mail address, contact details, communication content, job title, company name, industry, education, contact options, photo, and other data you voluntarily provide to us. The (user) names of the registered network or platform users who have visited our company page on the networks or platforms may be visible to us. \nLegal basis:\nIf you contact us as part of an existing contractual relationship or contact us in advance for information about our range of services, the personal data will be processed for the performance of a contract or in order to take steps prior to entering into a contract and to respond to your contact request in accordance with Art. 6 para. 1 sent. 1 lit. b GDPR. \nOtherwise, the legal basis for the processing of the personal data is Art. 6 para. 1 sent. 1 lit. f GDPR. The personal data is processed to pursue our legitimate interests in responding appropriately to customer/contact inquiries.\nDuration of storage:\nWe will delete your personal data as soon as the purpose for data storage and processing no longer applies (e.g., after your request has been processed). \nHowever, in some cases, e.g., due to legal retention periods we might be under the legal obligation to continue the storage of your personal data.\n3. Data receiver\nWe might transfer your personal data to certain data receivers if such transfer is necessary to fulfill our contractual and legal obligations.\nIn individual cases, we transfer personal data to our consultants in legal or tax matters, whereby these recipients act independently in their own data protection responsibilities and are also obliged to comply with the requirements of the GDPR and other applicable data protection regulations. In addition, they are bound by special confidentiality and secrecy obligations due to their professional position. \nIn the event of corporate transactions (e.g., sale of our business or a part of it), we may transfer personal data to involved advisors or to potential buyers.\nAdditionally, we also use services provided by various specialized companies, e.g., IT service providers, that process data on our behalf (hereinafter \u201cData Processors\u201d). We have concluded a data processing agreement according to Art. 28 GDPR or EU standard contractual clauses of the EU Commission pursuant to Art. 46 para. 2 lit. c GDPR with each service provider and they only process data in accordance with our instructions and not for their own purposes. \nOur current Data Processors are:\nData Processor Purpose of commissioning the Data Processor / purpose of processing\nOpenAI Processing text inputs to our model API\nMailchimp Newsletter Signup\nGoogle Analytics Usage analytics\n4. Data transfers to third countries\nYour personal data is generally processed in Germany and other countries within the European Economic Area (EEA).\nHowever, it may also be necessary for personal data to be transferred to recipients located outside the EEA, i.e., to third countries, such as the USA. If possible, we conclude the currently applicable EU standard contractual clauses of the EU Commission pursuant to Art. 46 para. 2 lit. c GDPR with all processors located outside the EEA. Otherwise, we ensure that a transfer only takes place if an adequacy decision exists with the respective third country and the recipient is certified under this, if necessary. We will provide you with respective documentation on request.\n5. Your rights\nThe following rights are available to you as a data subject in accordance with the provisions of the GDPR:\n5.1. Right of revocation\nYou may revoke your consent to the processing of your personal data at any time pursuant to Art. 7 para. 3 GDPR. Please note, that the revocation is only effective for the future. Processing that took place before the revocation remains unaffected. \n5.2. Right of access\nUnder the conditions of Art. 15 GDPR you have the right to request confirmation from us at any time as to whether we are processing personal data relating to you. If this is the case, you also have the right within the scope of Art. 15 GDPR to receive access to the personal data as well as certain other information about the personal data and a copy of your personal data. The restrictions of \u00a7 34 BDSG apply.\n5.3. Right to rectification\nUnder the conditions of Art. 16 GDPR you have the right to request us to correct the personal data stored about you if it is inaccurate or incomplete.\n5.4. Right to erasure\nYou have the right, under the conditions of Art. 17 GDPR, to demand that we delete the personal data concerning you without delay. \n5.5. Right to restrict processing\nYou have the right to request that we restrict the processing of your personal data under the conditions of Art. 18 GDPR.\n5.6. Right to data portability\nYou have the right, under the conditions of Art. 20 GDPR, to request that we hand over, in a structured, common and machine-readable format, the personal data concerning you that you have provided to us. Please note that this right only applies where the processing is based on your consent, or a contract and the processing is carried out by automated means.\n5.7. Right to object\nYou have the right to object to the processing of your personal data under the conditions of Art. 21 GDPR.\n5.8. Right to complain to a supervisory authority\nSubject to the requirements of Art. 77 GDPR, you have the right to file a complaint with a competent supervisory authority. As a rule, the data subject may contact the supervisory authority of his or her habitual residence or place of work or place of the alleged infringement or the registered office of PriorLabs. The supervisory authority responsible for PriorLabs is the State Commissioner for Data Protection and Freedom of Information for Baden-W\u00fcrttemberg. A list of all German supervisory authorities and their contact details can be found here.\n6. Obligation to provide data\nWhen you visit our Website, you may be required to provide us with certain personal data as described in this privacy policy. Beyond that, you are under no obligation to provide us with personal data. However, if you do not provide us with your personal data as required, you may not be able to contact us and/or we may not be able to contact you to respond to your inquiries or questions.\n7. Automated decisions/profiling\nThe processing of your personal data carried out by us does not contain any automated decisions in individual cases within the meaning of Art. 22 para. 1 GDPR.\n8. Changes to this privacy policy\nWe review this privacy policy regularly and may update it at any time. If we make changes to this privacy policy, we will change the date of the last update above. Please review this privacy policy regularly to be aware of any updates. The current version of this privacy policy can be accessed at any time at Priorlabs.ai/privacy.\n
"},{"location":"tabpfn-license/","title":"TabPFN License","text":" Prior Labs License\n Version 1.0, January 2025\n http://priorlabs.ai/tabpfn-license\n\n This license is a derivative of the Apache 2.0 license\n (http://www.apache.org/licenses/) with a single modification:\n The added Paragraph 10 introduces an enhanced attribution requirement\n inspired by the Llama 3 license.\n\n TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n 1. Definitions.\n\n \"License\" shall mean the terms and conditions for use, reproduction,\n and distribution as defined by Sections 1 through 9 of this document.\n\n \"Licensor\" shall mean the copyright owner or entity authorized by\n the copyright owner that is granting the License.\n\n \"Legal Entity\" shall mean the union of the acting entity and all\n other entities that control, are controlled by, or are under common\n control with that entity. For the purposes of this definition,\n \"control\" means (i) the power, direct or indirect, to cause the\n direction or management of such entity, whether by contract or\n otherwise, or (ii) ownership of fifty percent (50%) or more of the\n outstanding shares, or (iii) beneficial ownership of such entity.\n\n \"You\" (or \"Your\") shall mean an individual or Legal Entity\n exercising permissions granted by this License.\n\n \"Source\" form shall mean the preferred form for making modifications,\n including but not limited to software source code, documentation\n source, and configuration files.\n\n \"Object\" form shall mean any form resulting from mechanical\n transformation or translation of a Source form, including but\n not limited to compiled object code, generated documentation,\n and conversions to other media types.\n\n \"Work\" shall mean the work of authorship, whether in Source or\n Object form, made available under the License, as indicated by a\n copyright notice that is included in or attached to the work\n (an example is provided in the Appendix below).\n\n \"Derivative Works\" shall mean any work, whether in Source or Object\n form, that is based on (or derived from) the Work and for which the\n editorial revisions, annotations, elaborations, or other modifications\n represent, as a whole, an original work of authorship. For the purposes\n of this License, Derivative Works shall not include works that remain\n separable from, or merely link (or bind by name) to the interfaces of,\n the Work and Derivative Works thereof.\n\n \"Contribution\" shall mean any work of authorship, including\n the original version of the Work and any modifications or additions\n to that Work or Derivative Works thereof, that is intentionally\n submitted to Licensor for inclusion in the Work by the copyright owner\n or by an individual or Legal Entity authorized to submit on behalf of\n the copyright owner. For the purposes of this definition, \"submitted\"\n means any form of electronic, verbal, or written communication sent\n to the Licensor or its representatives, including but not limited to\n communication on electronic mailing lists, source code control systems,\n and issue tracking systems that are managed by, or on behalf of, the\n Licensor for the purpose of discussing and improving the Work, but\n excluding communication that is conspicuously marked or otherwise\n designated in writing by the copyright owner as \"Not a Contribution.\"\n\n \"Contributor\" shall mean Licensor and any individual or Legal Entity\n on behalf of whom a Contribution has been received by Licensor and\n subsequently incorporated within the Work.\n\n 2. Grant of Copyright License. Subject to the terms and conditions of\n this License, each Contributor hereby grants to You a perpetual,\n worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n copyright license to reproduce, prepare Derivative Works of,\n publicly display, publicly perform, sublicense, and distribute the\n Work and such Derivative Works in Source or Object form.\n\n 3. Grant of Patent License. Subject to the terms and conditions of\n this License, each Contributor hereby grants to You a perpetual,\n worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n (except as stated in this section) patent license to make, have made,\n use, offer to sell, sell, import, and otherwise transfer the Work,\n where such license applies only to those patent claims licensable\n by such Contributor that are necessarily infringed by their\n Contribution(s) alone or by combination of their Contribution(s)\n with the Work to which such Contribution(s) was submitted. If You\n institute patent litigation against any entity (including a\n cross-claim or counterclaim in a lawsuit) alleging that the Work\n or a Contribution incorporated within the Work constitutes direct\n or contributory patent infringement, then any patent licenses\n granted to You under this License for that Work shall terminate\n as of the date such litigation is filed.\n\n 4. Redistribution. You may reproduce and distribute copies of the\n Work or Derivative Works thereof in any medium, with or without\n modifications, and in Source or Object form, provided that You\n meet the following conditions:\n\n (a) You must give any other recipients of the Work or\n Derivative Works a copy of this License; and\n\n (b) You must cause any modified files to carry prominent notices\n stating that You changed the files; and\n\n (c) You must retain, in the Source form of any Derivative Works\n that You distribute, all copyright, patent, trademark, and\n attribution notices from the Source form of the Work,\n excluding those notices that do not pertain to any part of\n the Derivative Works; and\n\n (d) If the Work includes a \"NOTICE\" text file as part of its\n distribution, then any Derivative Works that You distribute must\n include a readable copy of the attribution notices contained\n within such NOTICE file, excluding those notices that do not\n pertain to any part of the Derivative Works, in at least one\n of the following places: within a NOTICE text file distributed\n as part of the Derivative Works; within the Source form or\n documentation, if provided along with the Derivative Works; or,\n within a display generated by the Derivative Works, if and\n wherever such third-party notices normally appear. The contents\n of the NOTICE file are for informational purposes only and\n do not modify the License. You may add Your own attribution\n notices within Derivative Works that You distribute, alongside\n or as an addendum to the NOTICE text from the Work, provided\n that such additional attribution notices cannot be construed\n as modifying the License.\n\n You may add Your own copyright statement to Your modifications and\n may provide additional or different license terms and conditions\n for use, reproduction, or distribution of Your modifications, or\n for any such Derivative Works as a whole, provided Your use,\n reproduction, and distribution of the Work otherwise complies with\n the conditions stated in this License.\n\n 5. Submission of Contributions. Unless You explicitly state otherwise,\n any Contribution intentionally submitted for inclusion in the Work\n by You to the Licensor shall be under the terms and conditions of\n this License, without any additional terms or conditions.\n Notwithstanding the above, nothing herein shall supersede or modify\n the terms of any separate license agreement you may have executed\n with Licensor regarding such Contributions.\n\n 6. Trademarks. This License does not grant permission to use the trade\n names, trademarks, service marks, or product names of the Licensor,\n except as required for reasonable and customary use in describing the\n origin of the Work and reproducing the content of the NOTICE file.\n\n 7. Disclaimer of Warranty. Unless required by applicable law or\n agreed to in writing, Licensor provides the Work (and each\n Contributor provides its Contributions) on an \"AS IS\" BASIS,\n WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n implied, including, without limitation, any warranties or conditions\n of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n PARTICULAR PURPOSE. You are solely responsible for determining the\n appropriateness of using or redistributing the Work and assume any\n risks associated with Your exercise of permissions under this License.\n\n 8. Limitation of Liability. In no event and under no legal theory,\n whether in tort (including negligence), contract, or otherwise,\n unless required by applicable law (such as deliberate and grossly\n negligent acts) or agreed to in writing, shall any Contributor be\n liable to You for damages, including any direct, indirect, special,\n incidental, or consequential damages of any character arising as a\n result of this License or out of the use or inability to use the\n Work (including but not limited to damages for loss of goodwill,\n work stoppage, computer failure or malfunction, or any and all\n other commercial damages or losses), even if such Contributor\n has been advised of the possibility of such damages.\n\n 9. Accepting Warranty or Additional Liability. While redistributing\n the Work or Derivative Works thereof, You may choose to offer,\n and charge a fee for, acceptance of support, warranty, indemnity,\n or other liability obligations and/or rights consistent with this\n License. However, in accepting such obligations, You may act only\n on Your own behalf and on Your sole responsibility, not on behalf\n of any other Contributor, and only if You agree to indemnify,\n defend, and hold each Contributor harmless for any liability\n incurred by, or claims asserted against, such Contributor by reason\n of your accepting any such warranty or additional liability.\n\n ---------------------- ADDITIONAL PROVISION --------------------------\n\n 10. Additional attribution.\n If You distribute or make available the Work or any Derivative\n Work thereof relating to any part of the source or model weights,\n or a product or service (including another AI model) that contains\n any source or model weights, You shall (A) provide a copy of this\n License with any such materials; and (B) prominently display\n \u201cBuilt with TabPFN\u201d on each related website, user interface, blogpost,\n about page, or product documentation. If You use the source or model\n weights or model outputs to create, train, fine tune, distil, or\n otherwise improve an AI model, which is distributed or made available,\n you shall also include \u201cTabPFN\u201d at the beginning of any such AI model name.\n To clarify, internal benchmarking and testing without external\n communication shall not qualify as distribution or making available\n pursuant to this Section 10 and no attribution under this Section 10\n shall be required.\n\n\n END OF TERMS AND CONDITIONS\n
"},{"location":"tabpfn-nature/","title":"Accurate predictions on small data with a tabular foundation model","text":"This page contains links to download, install, and set up TabPFN, as well as tutorials and resources to help you get started.
API Client
The fastest way to get started with TabPFN. Access our models through the cloud without requiring local GPU resources.
TabPFN Client
User Interface
Visual interface for no-code interaction with TabPFN. Perfect for quick experimentation and visualization.
Access GUI
Python Package
Local installation for research and privacy sesitive use cases with GPU support and scikit-learn compatible interface.
TabPFN Local
R Integration
Currently in development. Bringing TabPFN's capabilities to the R ecosystem for data scientists and researchers. Contact us for more information, or to get involved!
GENERAL TERMS AND CONDITIONS\n1. Scope of Application\n1.1. These general terms and conditions (\"GTC\") govern the provision of access to the TabPFN foundation models as available at https://www.priorlabs.ai (\u201cServices\u201d) provided by Prior Labs GmbH, Elisabeth-Emter-Weg 18, 79110 Freiburg im Breisgau (\u201cPriorLabs\").\n1.2. The Services of PriorLabs are directed exclusively at business customers (Unternehmer) within the meaning of Sec. 14 German Civil Code (B\u00fcrgerliches Gesetzbuch, BGB) (\u201cCustomer\u201d). PriorLabs may require the Customer to provide sufficient proof of its status as business customer prior to the conclusion of the contract. \n1.3. Conflicting or additional contractual conditions of the Customer shall only apply if PriorLabs expressly confirms them in writing. \n2. Conclusion of Contract\n2.1. The contract is concluded with the inclusion of these GTC (\u201cContract\u201d) at the earliest of (i) when the Customer registers and sets up an account via the Services (\u201cPriorLabs Account\u201d). \n2.2. Upon conclusion of the Contract, the Customer shall provide PriorLabs with all information that PriorLabs reasonably requires in order to provide the Services correctly and completely. The Customer is obliged to inform PriorLabs immediately of any relevant changes. \n3. Registration and PriorLabs Account\n3.1. In order to fully use the Services, the registration and setting up of a PriorLabs Account is required. By registering or using a PriorLabs Account, [the Customer agrees and represents that they created their PriorLabs Account, and they will use their PriorLabs Account only for themselves. Each Customer shall register only one PriorLabs Account. A PriorLabs Account is not transferable.\n3.2. If and to the extent, PriorLabs stores Customer\u2019s data, PriorLabs disclaims any liability for the storage, accessibility, or integrity of such data.\n3.3. The Customer is obliged (i) to provide complete and correct information about its person or entity at the time of registration and (ii) in case of respective changes to correct without undue delay this information insofar such information is mandatory for the performance of the Contract. \n3.4. If PriorLabs receives a notice or otherwise has reason to believe that the information or documents provided by the Customer are wholly or partially incorrect, incomplete or not up to date, PriorLabs is entitled to request the Customer to remedy the situation immediately. If the Customer fails to correct or complete the information or document within the set deadline, PriorLabs is entitled to restrict access to the Services and block the Customer until the Customer has fully complied with the request.\n3.5. The Customer must keep their log-in information secret and carefully secure access to their PriorLabs Account. The Customer shall take reasonable precautions to prevent unauthorized access to the PriorLabs Account, and to protect the Services from unauthorized use. The Customer is obliged to inform PriorLabs immediately if there are indications that a PriorLabs Account has been misused by a third party. The Customer\u2019s liability for any activity of or interaction with a corrupted account is subject to statutory rules.\n4. Contract Software\n4.1. PriorLabs has developed the TabPFN foundation models that allows the analysis, processing and evaluation of tabular data (\u201cContract Software\u201d).\n4.2. PriorLabs may, to the extent available, provide the Customer with Customer documentation for the Contract Software in digital form (e.g. as a pdf file).\n4.3. PriorLabs provides the Contract Software \"as is\" with the functionality, scope and performance and in a condition suitable for the contractual use.. PriorLabs disclaims any liability of the availability, accuracy, or correctness of the use of the Contract Software and does not warrant the integration in the Customer\u2019s IT systems. \n4.4. The functionality, scope and performance of the Contract Software may change during the Contract Term (as defined below). PriorLabs reserves the right to add, remove, change or substitute elements of the Contract Software as deemed necessary at any time, in particular for the purpose of increasing efficiency, improvements, additional features, and/or safety or due to changes in the legal situation, technical developments or for reasons of IT security, or cease providing the Services altogether. \n5. PriorLabs Intellectual Property\n5.1. PriorLabs remains the sole owner of all right, title, and interest in the Contract Software, including but not limited to any models, algorithms, and neural networks. To the extent PriorLabs provides any Services or access to the Contract Software free of charge, PriorLabs does not waive any rights in such Services or the Contract Software. \n5.2. Except as stated in these GTC, PriorLabs does not grant the Customer any rights to patents, copyrights, trade secrets, trademarks, or any other rights in respect to the Contract Software. \n5.3. By using the Contract Software or using any Services, the Customer does not acquire ownership of any rights in the Contract Software, Services, documentation, and/or any related intellectual property other than stated in these GTC.\n6. API Access \n6.1. PriorLabs allows registered Customers, as and to the extent available from time to time, access to the Contract Software via an application programming interface (\u201cAPI\u201d), non-exclusively, non-transferable and non-sublicensable to use it exclusively as provided on the PriorLabs website or as described in the Customer documentation for the API (\u201cAPI Access\u201d). \n6.2. The Customer\u2019s access to and use of the Services must at all times be in accordance with applicable laws and regulations. The Customer is solely responsible for knowing and complying with the applicable laws and regulations. Permitted conditions of use and scope of use of the Services are further set out in the Acceptable Use Policy available under https://www.priorlabs.ai/aup (\u201cAUP\u201d). The Customer acknowledges that the provisions set out in the AUP shall be deemed material obligations under this Contract.\n7. Customer Content; Licenses\n7.1. The Customer must own or hold valid rights of sufficient scope to any material, documents, data or other content uploaded into the Services and to be processed by the Contract Software (\u201cCustomer Content\u201d). The Customer Content consists exclusively of non-personal data within the meaning of the General Data Protection Regulation (\u201cGDPR\u201d), as set out in the AUP. \n7.2. PriorLabs shall take appropriate physical, technical, and organizational security measures with regard to the Contract Software and any Customer Content. \n7.3. The Customer grants PriorLabs the non-exclusive, worldwide, sublicensable right (i) to use Customer Content for the performance of PriorLabs\u2019 obligations under this Contract and, in particular, to reproduce such data on the server under PriorLabs\u2019 name itself or through a subcontractor for the purpose of providing the Service, and (ii) to use Customer Content as so-called training data in order to develop, test, and improve the Contract Software, in particular the underlying artificial intelligence systems and/or foundation models.\n7.4. The Customer is fully responsible for all Customer Content uploaded to the Services, in particular the Customer ensures that Customer Content is fit for PriorLabs\u2019 use in accordance with this Contract (including any necessary licenses pursuant to Section 7.3) and does not violate any applicable law or other rights of third parties, in particular copyright, trade secrets, or rights under the GDPR.\n8. Service Results\n8.1. The Contract Software may be used to generate certain analyses, content, documents, reports, or other results (\u201cService Results\u201d) based on Customer Content.\n8.2. The Customer may freely use the Service Results. PriorLabs provides the Service Results \"as is\". The Customer is responsible for reviewing any Service Results of its use of the Contract Software. PriorLabs does not warrant the accuracy, correctness, completeness, usability, or fitness for a certain purpose of the Service Results and does not assume any liability for Customer\u2019s use of Service Results. In particular, PriorLabs disclaims all warranty if the Customer modifies, adapts or combines Service Results with third-party material or products.\n8.3. PriorLabs may use the Service Results to develop, test and improve the Contract Software, in particular the underlying artificial intelligence systems and/or foundation models.\n9. Obligations of the Customer\n9.1. The Customer shall create their own backup copies of Customer Data in case of loss of data. PriorLabs provides a corresponding function for creating backup copies.\n9.2. The Customer shall inform PriorLabs without undue delay as soon as they become aware of the infringement of an intellectual property right or copyright in the Contract Software.\n9.3. The Customer shall ensure that all of its employees authorized to use the Contract Software have (i) received sufficient training on the safe use of the Contract Software, (ii) exercise the necessary care when using it, and (iii) are compliant with these GTC including the AUP .\n9.4. The Customer shall subject any end-users of the Contract Software and the Services to obligations reflecting the stipulations of this Contract, in particular the AUP. \n10. Blocking of Accesses\n10.1. PriorLabs is entitled to block access to the Contract Software and the Services temporarily or permanently if there are reliable indications that the Customer or, where applicable, one of its employees is violating or has violated material obligations under this GTC, including the Acceptable Use Policy, and/or applicable intellectual property, data protection of other statutory laws or if PriorLabs has another justified interest in the blocking, such as IT-security concerns. \n10.2. When deciding on a blocking, PriorLabs shall give due consideration to the legitimate interests of the Customer. PriorLabs shall inform the Customer of the blocking within a reasonable timeframe before the blocking comes into effect, provided that the information does not conflict with the purpose of the blocking. The blocking shall continue until the contractual or legal violation has been remedied in an appropriate manner.\n11. Limitation of Liability \n11.1. The Services are provided free of charge. Therefore, PriorLabs\u2019 liability is in any cases limited to acts of intent or gross negligence.\n11.2. The strict liability for damages for defects of the Services already existing at the beginning of the Contract Term (as defined below) in terms of Section 536a German Civil Code is excluded. The Services are provided on an \u201cas is\u201d basis, which, in accordance with Section 4 of these GTC, refers in particular to the marketability, availability, and security aspects of the Contract Software.\n12. Indemnity\nThe Customer shall indemnify PriorLabs from any and all claims of end-users or third parties who assert claims against PriorLabs on account of the use of the Services by the Customer or the Customer\u2019s end-users, in particular concerning any Customer Content used in combination with the Contract Software. The provisions of this Section shall apply mutatis mutandis to any liquidated damages (Vertragsstrafen) as well as to any administrative fines (Bu\u00dfgeld) or penalties imposed by the authorities or by the courts, to the extent that the Customer is responsible for such.\n13. Term; Termination of the Contract\n13.1. If not agreed otherwise, the Contract is concluded for an indefinite period of time until terminated by either Party (\"Contract Term\"). \n13.2. The Customer may terminate the Contract at any time by deleting its PriorLabs Account. \n13.3. PriorLabs reserves the right to terminate the Contract at any time but will consider the Customer\u2019s legitimate interests to the extent possible, e.g., by sending the notice of termination in due time to the email address provided by the Customer upon registration of the PriorLabs Account.\n13.4. The right of PriorLabs and the Customer to extraordinary termination without notice for cause shall remain unaffected.\n14. Changes to this Contract\n14.1. PriorLabs may change this Contract during the Contract Term in compliance with the following procedure, provided that the amendment is reasonable for the Customer, i.e. without significant legal or economic disadvantages, taking into account the interests of the Customer and that there is a valid reason for the amendment. Such a reason exists, in particular, in cases of new technical developments or changes in the regulatory environment.\n14.2. PriorLabs shall inform the Customer of any changes to this Contract at least 30 calendar days before the planned entry into force of the changes. The Customer may object to the changes within 30 calendar days from receipt of the notification. If no objection is made and the Customer continues to use the Services after expiry of the objection period, the changes shall be deemed to have been effectively agreed for all Services to be provided from the end of the objection period. In the notification, PriorLabs will inform the Customer of all relevant changes to the Contract, the objection period and the legal consequences of the expiry of the objection period without exercise of the right of objection. If the Customer objects to the changes, PriorLabs may terminate the Contract pursuant to Section 13.\n15. Final Provisions\n15.1. Should individual provisions of the Contract be or become invalid in whole or in part, this shall not affect the validity of the remaining provisions. Invalid provisions shall be replaced first and foremost by provisions that most closely correspond to the invalid provisions in a legally effective manner. The same applies to any loopholes.\n15.2. The law of the Federal Republic of Germany shall apply with the exception of its provisions on the choice of law which would lead to the application of another legal system. The validity of the CISG (\"UN Sales Convention\") is excluded. \n15.3. For Customers who are merchants (Kaufleute) within the meaning of the German Commercial Code (Handelsgesetzbuch), a special fund (Sonderverm\u00f6gen) under public law or a legal entity under public law, Berlin, Germany, shall be the exclusive place of jurisdiction for all disputes arising from the contractual relationship.\n\nStatus: January 2025\n***\n
"},{"location":"terms/","title":"Terms","text":"GENERAL TERMS AND CONDITIONS\n1. Scope of Application\n1.1. These general terms and conditions (\"GTC\") govern the provision of access to the TabPFN foundation models as available at https://www.priorlabs.ai (\u201cServices\u201d) provided by Prior Labs GmbH, Elisabeth-Emter-Weg 18, 79110 Freiburg im Breisgau (\u201cPriorLabs\").\n1.2. The Services of PriorLabs are directed exclusively at business customers (Unternehmer) within the meaning of Sec. 14 German Civil Code (B\u00fcrgerliches Gesetzbuch, BGB) (\u201cCustomer\u201d). PriorLabs may require the Customer to provide sufficient proof of its status as business customer prior to the conclusion of the contract. \n1.3. Conflicting or additional contractual conditions of the Customer shall only apply if PriorLabs expressly confirms them in writing. \n2. Conclusion of Contract\n2.1. The contract is concluded with the inclusion of these GTC (\u201cContract\u201d) at the earliest of (i) when the Customer registers and sets up an account via the Services (\u201cPriorLabs Account\u201d). \n2.2. Upon conclusion of the Contract, the Customer shall provide PriorLabs with all information that PriorLabs reasonably requires in order to provide the Services correctly and completely. The Customer is obliged to inform PriorLabs immediately of any relevant changes. \n3. Registration and PriorLabs Account\n3.1. In order to fully use the Services, the registration and setting up of a PriorLabs Account is required. By registering or using a PriorLabs Account, [the Customer agrees and represents that they created their PriorLabs Account, and they will use their PriorLabs Account only for themselves. Each Customer shall register only one PriorLabs Account. A PriorLabs Account is not transferable.\n3.2. If and to the extent, PriorLabs stores Customer\u2019s data, PriorLabs disclaims any liability for the storage, accessibility, or integrity of such data.\n3.3. The Customer is obliged (i) to provide complete and correct information about its person or entity at the time of registration and (ii) in case of respective changes to correct without undue delay this information insofar such information is mandatory for the performance of the Contract. \n3.4. If PriorLabs receives a notice or otherwise has reason to believe that the information or documents provided by the Customer are wholly or partially incorrect, incomplete or not up to date, PriorLabs is entitled to request the Customer to remedy the situation immediately. If the Customer fails to correct or complete the information or document within the set deadline, PriorLabs is entitled to restrict access to the Services and block the Customer until the Customer has fully complied with the request.\n3.5. The Customer must keep their log-in information secret and carefully secure access to their PriorLabs Account. The Customer shall take reasonable precautions to prevent unauthorized access to the PriorLabs Account, and to protect the Services from unauthorized use. The Customer is obliged to inform PriorLabs immediately if there are indications that a PriorLabs Account has been misused by a third party. The Customer\u2019s liability for any activity of or interaction with a corrupted account is subject to statutory rules.\n4. Contract Software\n4.1. PriorLabs has developed the TabPFN foundation models that allows the analysis, processing and evaluation of tabular data (\u201cContract Software\u201d).\n4.2. PriorLabs may, to the extent available, provide the Customer with Customer documentation for the Contract Software in digital form (e.g. as a pdf file).\n4.3. PriorLabs provides the Contract Software \"as is\" with the functionality, scope and performance and in a condition suitable for the contractual use.. PriorLabs disclaims any liability of the availability, accuracy, or correctness of the use of the Contract Software and does not warrant the integration in the Customer\u2019s IT systems. \n4.4. The functionality, scope and performance of the Contract Software may change during the Contract Term (as defined below). PriorLabs reserves the right to add, remove, change or substitute elements of the Contract Software as deemed necessary at any time, in particular for the purpose of increasing efficiency, improvements, additional features, and/or safety or due to changes in the legal situation, technical developments or for reasons of IT security, or cease providing the Services altogether. \n5. PriorLabs Intellectual Property\n5.1. PriorLabs remains the sole owner of all right, title, and interest in the Contract Software, including but not limited to any models, algorithms, and neural networks. To the extent PriorLabs provides any Services or access to the Contract Software free of charge, PriorLabs does not waive any rights in such Services or the Contract Software. \n5.2. Except as stated in these GTC, PriorLabs does not grant the Customer any rights to patents, copyrights, trade secrets, trademarks, or any other rights in respect to the Contract Software. \n5.3. By using the Contract Software or using any Services, the Customer does not acquire ownership of any rights in the Contract Software, Services, documentation, and/or any related intellectual property other than stated in these GTC.\n6. API Access \n6.1. PriorLabs allows registered Customers, as and to the extent available from time to time, access to the Contract Software via an application programming interface (\u201cAPI\u201d), non-exclusively, non-transferable and non-sublicensable to use it exclusively as provided on the PriorLabs website or as described in the Customer documentation for the API (\u201cAPI Access\u201d). \n6.2. The Customer\u2019s access to and use of the Services must at all times be in accordance with applicable laws and regulations. The Customer is solely responsible for knowing and complying with the applicable laws and regulations. Permitted conditions of use and scope of use of the Services are further set out in the Acceptable Use Policy available under https://www.priorlabs.ai/aup (\u201cAUP\u201d). The Customer acknowledges that the provisions set out in the AUP shall be deemed material obligations under this Contract.\n7. Customer Content; Licenses\n7.1. The Customer must own or hold valid rights of sufficient scope to any material, documents, data or other content uploaded into the Services and to be processed by the Contract Software (\u201cCustomer Content\u201d). The Customer Content consists exclusively of non-personal data within the meaning of the General Data Protection Regulation (\u201cGDPR\u201d), as set out in the AUP. \n7.2. PriorLabs shall take appropriate physical, technical, and organizational security measures with regard to the Contract Software and any Customer Content. \n7.3. The Customer grants PriorLabs the non-exclusive, worldwide, sublicensable right (i) to use Customer Content for the performance of PriorLabs\u2019 obligations under this Contract and, in particular, to reproduce such data on the server under PriorLabs\u2019 name itself or through a subcontractor for the purpose of providing the Service, and (ii) to use Customer Content as so-called training data in order to develop, test, and improve the Contract Software, in particular the underlying artificial intelligence systems and/or foundation models.\n7.4. The Customer is fully responsible for all Customer Content uploaded to the Services, in particular the Customer ensures that Customer Content is fit for PriorLabs\u2019 use in accordance with this Contract (including any necessary licenses pursuant to Section 7.3) and does not violate any applicable law or other rights of third parties, in particular copyright, trade secrets, or rights under the GDPR.\n8. Service Results\n8.1. The Contract Software may be used to generate certain analyses, content, documents, reports, or other results (\u201cService Results\u201d) based on Customer Content.\n8.2. The Customer may freely use the Service Results. PriorLabs provides the Service Results \"as is\". The Customer is responsible for reviewing any Service Results of its use of the Contract Software. PriorLabs does not warrant the accuracy, correctness, completeness, usability, or fitness for a certain purpose of the Service Results and does not assume any liability for Customer\u2019s use of Service Results. In particular, PriorLabs disclaims all warranty if the Customer modifies, adapts or combines Service Results with third-party material or products.\n8.3. PriorLabs may use the Service Results to develop, test and improve the Contract Software, in particular the underlying artificial intelligence systems and/or foundation models.\n9. Obligations of the Customer\n9.1. The Customer shall create their own backup copies of Customer Data in case of loss of data. PriorLabs provides a corresponding function for creating backup copies.\n9.2. The Customer shall inform PriorLabs without undue delay as soon as they become aware of the infringement of an intellectual property right or copyright in the Contract Software.\n9.3. The Customer shall ensure that all of its employees authorized to use the Contract Software have (i) received sufficient training on the safe use of the Contract Software, (ii) exercise the necessary care when using it, and (iii) are compliant with these GTC including the AUP .\n9.4. The Customer shall subject any end-users of the Contract Software and the Services to obligations reflecting the stipulations of this Contract, in particular the AUP. \n10. Blocking of Accesses\n10.1. PriorLabs is entitled to block access to the Contract Software and the Services temporarily or permanently if there are reliable indications that the Customer or, where applicable, one of its employees is violating or has violated material obligations under this GTC, including the Acceptable Use Policy, and/or applicable intellectual property, data protection of other statutory laws or if PriorLabs has another justified interest in the blocking, such as IT-security concerns. \n10.2. When deciding on a blocking, PriorLabs shall give due consideration to the legitimate interests of the Customer. PriorLabs shall inform the Customer of the blocking within a reasonable timeframe before the blocking comes into effect, provided that the information does not conflict with the purpose of the blocking. The blocking shall continue until the contractual or legal violation has been remedied in an appropriate manner.\n11. Limitation of Liability \n11.1. The Services are provided free of charge. Therefore, PriorLabs\u2019 liability is in any cases limited to acts of intent or gross negligence.\n11.2. The strict liability for damages for defects of the Services already existing at the beginning of the Contract Term (as defined below) in terms of Section 536a German Civil Code is excluded. The Services are provided on an \u201cas is\u201d basis, which, in accordance with Section 4 of these GTC, refers in particular to the marketability, availability, and security aspects of the Contract Software.\n12. Indemnity\nThe Customer shall indemnify PriorLabs from any and all claims of end-users or third parties who assert claims against PriorLabs on account of the use of the Services by the Customer or the Customer\u2019s end-users, in particular concerning any Customer Content used in combination with the Contract Software. The provisions of this Section shall apply mutatis mutandis to any liquidated damages (Vertragsstrafen) as well as to any administrative fines (Bu\u00dfgeld) or penalties imposed by the authorities or by the courts, to the extent that the Customer is responsible for such.\n13. Term; Termination of the Contract\n13.1. If not agreed otherwise, the Contract is concluded for an indefinite period of time until terminated by either Party (\"Contract Term\"). \n13.2. The Customer may terminate the Contract at any time by deleting its PriorLabs Account. \n13.3. PriorLabs reserves the right to terminate the Contract at any time but will consider the Customer\u2019s legitimate interests to the extent possible, e.g., by sending the notice of termination in due time to the email address provided by the Customer upon registration of the PriorLabs Account.\n13.4. The right of PriorLabs and the Customer to extraordinary termination without notice for cause shall remain unaffected.\n14. Changes to this Contract\n14.1. PriorLabs may change this Contract during the Contract Term in compliance with the following procedure, provided that the amendment is reasonable for the Customer, i.e. without significant legal or economic disadvantages, taking into account the interests of the Customer and that there is a valid reason for the amendment. Such a reason exists, in particular, in cases of new technical developments or changes in the regulatory environment.\n14.2. PriorLabs shall inform the Customer of any changes to this Contract at least 30 calendar days before the planned entry into force of the changes. The Customer may object to the changes within 30 calendar days from receipt of the notification. If no objection is made and the Customer continues to use the Services after expiry of the objection period, the changes shall be deemed to have been effectively agreed for all Services to be provided from the end of the objection period. In the notification, PriorLabs will inform the Customer of all relevant changes to the Contract, the objection period and the legal consequences of the expiry of the objection period without exercise of the right of objection. If the Customer objects to the changes, PriorLabs may terminate the Contract pursuant to Section 13.\n15. Final Provisions\n15.1. Should individual provisions of the Contract be or become invalid in whole or in part, this shall not affect the validity of the remaining provisions. Invalid provisions shall be replaced first and foremost by provisions that most closely correspond to the invalid provisions in a legally effective manner. The same applies to any loopholes.\n15.2. The law of the Federal Republic of Germany shall apply with the exception of its provisions on the choice of law which would lead to the application of another legal system. The validity of the CISG (\"UN Sales Convention\") is excluded. \n15.3. For Customers who are merchants (Kaufleute) within the meaning of the German Commercial Code (Handelsgesetzbuch), a special fund (Sonderverm\u00f6gen) under public law or a legal entity under public law, Berlin, Germany, shall be the exclusive place of jurisdiction for all disputes arising from the contractual relationship.\n\nStatus: January 2025\n***\n
"},{"location":"getting_started/api/","title":"TabPFN API Guide","text":""},{"location":"getting_started/api/#authentication","title":"Authentication","text":""},{"location":"getting_started/api/#interactive-login","title":"Interactive Login","text":"The first time you use TabPFN, you'll be guided through an interactive login process:
from tabpfn_client import init\ninit()\n
"},{"location":"getting_started/api/#managing-access-tokens","title":"Managing Access Tokens","text":"You can save your token for use on other machines:
import tabpfn_client\n# Get your token\ntoken = tabpfn_client.get_access_token()\n\n# Use token on another machine\ntabpfn_client.set_access_token(token)\n
"},{"location":"getting_started/api/#rate-limits","title":"Rate Limits","text":"Our API implements a fair usage system that resets daily at 00:00:00 UTC.
"},{"location":"getting_started/api/#usage-cost-calculation","title":"Usage Cost Calculation","text":"The cost for each API request is calculated as:
api_cost = (num_train_rows + num_test_rows) * num_cols * n_estimators\n
Where n_estimators
is by default 4 for classification tasks and 8 for regression tasks.
Track your API usage through response headers:
Header DescriptionX-RateLimit-Limit
Your total allowed usage X-RateLimit-Remaining
Remaining usage X-RateLimit-Reset
Reset timestamp (UTC)"},{"location":"getting_started/api/#current-limitations","title":"Current Limitations","text":"Important Data Guidelines
Maximum total cells per request must be below 100,000:
(num_train_rows + num_test_rows) * num_cols < 100,000\n
For regression with full output turned on (return_full_output=True
), the number of test samples must be below 500.
These limits will be relaxed in future releases.
"},{"location":"getting_started/api/#managing-user-data","title":"Managing User Data","text":"You can access and manage your personal information:
from tabpfn_client import UserDataClient\nprint(UserDataClient.get_data_summary())\n
"},{"location":"getting_started/api/#error-handling","title":"Error Handling","text":"The API uses standard HTTP status codes:
Code Meaning 200 Success 400 Invalid request 429 Rate limit exceededExample response, when limit reached:
{\n \"error\": \"API_LIMIT_REACHED\",\n \"message\": \"Usage limit exceeded\",\n \"next_available_at\": \"2024-01-07 00:00:00\"\n}\n
"},{"location":"getting_started/install/","title":"Installation","text":"You can access our models through our API (https://github.com/automl/tabpfn-client), via our user interface built on top of the API (https://www.ux.priorlabs.ai/) or locally.
Python API Client (No GPU, Online)Python Local (GPU)Web InterfaceRpip install tabpfn-client\n\n# TabPFN Extensions installs optional functionalities around the TabPFN model\n# These include post-hoc ensembles, interpretability tools, and more\ngit clone https://github.com/PriorLabs/tabpfn-extensions\npip install -e tabpfn-extensions\n
# TabPFN Extensions installs optional functionalities around the TabPFN model\n# These include post-hoc ensembles, interpretability tools, and more\npip install tabpfn\n
You can access our models through our Interface here.
Warning
R support is currently under development. You can find a work in progress at TabPFN R. Looking for contributors!
"},{"location":"getting_started/intended_use/","title":"Usage tips","text":"Note
For a simple example getting started with classification see classification tutorial.
We provide two comprehensive demo notebooks that guides through installation and functionalities. One colab tutorial using the cloud and one colab tutorial using the local GPU.
"},{"location":"getting_started/intended_use/#when-to-use-tabpfn","title":"When to use TabPFN","text":"TabPFN excels in handling small to medium-sized datasets with up to 10,000 samples and 500 features. For larger datasets, methods such as CatBoost, XGBoost, or AutoGluon are likely to outperform TabPFN.
"},{"location":"getting_started/intended_use/#intended-use-of-tabpfn","title":"Intended Use of TabPFN","text":"TabPFN is intended as a powerful drop-in replacement for traditional tabular data prediction tools, where top performance and fast training matter. It still requires data scientists to prepare the data using their domain knowledge. Data scientists will see benefits in performing feature engineering, data cleaning, and problem framing to get the most out of TabPFN.
"},{"location":"getting_started/intended_use/#limitations-of-tabpfn","title":"Limitations of TabPFN","text":"TabPFN is computationally efficient and can run inference on consumer hardware for most datasets. Training on a new dataset is recommended to run on a GPU as this speeds it up significantly. TabPFN is not optimized for real-time inference tasks, but V2 can perform much faster predictions than V1 of TabPFN.
"},{"location":"getting_started/intended_use/#data-preparation","title":"Data Preparation","text":"TabPFN can handle raw data with minimal preprocessing. Provide the data in a tabular format, and TabPFN will automatically handle missing values, encode categorical variables, and normalize features. While TabPFN works well out-of-the-box, performance can further be improved using dataset-specific preprocessings.
"},{"location":"getting_started/intended_use/#interpreting-results","title":"Interpreting Results","text":"TabPFN's predictions come with uncertainty estimates, allowing you to assess the reliability of the results. You can use SHAP to interpret TabPFN's predictions and identify the most important features driving the model's decisions.
"},{"location":"getting_started/intended_use/#hyperparameter-tuning","title":"Hyperparameter Tuning","text":"TabPFN provides strong performance out-of-the-box without extensive hyperparameter tuning. If you have additional computational resources, you can automatically tune its hyperparameters using post-hoc ensembling or random tuning.
"},{"location":"reference/tabpfn/base/","title":"Base","text":""},{"location":"reference/tabpfn/base/#tabpfn.base","title":"base","text":"Common logic for TabPFN models.
"},{"location":"reference/tabpfn/base/#tabpfn.base.create_inference_engine","title":"create_inference_engine","text":"create_inference_engine(\n *,\n X_train: ndarray,\n y_train: ndarray,\n model: PerFeatureTransformer,\n ensemble_configs: Any,\n cat_ix: list[int],\n fit_mode: Literal[\n \"low_memory\", \"fit_preprocessors\", \"fit_with_cache\"\n ],\n device_: device,\n rng: Generator,\n n_jobs: int,\n byte_size: int,\n forced_inference_dtype_: dtype | None,\n memory_saving_mode: (\n bool | Literal[\"auto\"] | float | int\n ),\n use_autocast_: bool\n) -> InferenceEngine\n
Creates the appropriate TabPFN inference engine based on fit_mode
.
Each execution mode will perform slightly different operations based on the mode specified by the user. In the case where preprocessors will be fit after prepare
, we will use them to further transform the associated borders with each ensemble config member.
Parameters:
Name Type Description DefaultX_train
ndarray
Training features
requiredy_train
ndarray
Training target
requiredmodel
PerFeatureTransformer
The loaded TabPFN model.
requiredensemble_configs
Any
The ensemble configurations to create multiple \"prompts\".
requiredcat_ix
list[int]
Indices of inferred categorical features.
requiredfit_mode
Literal['low_memory', 'fit_preprocessors', 'fit_with_cache']
Determines how we prepare inference (pre-cache or not).
requireddevice_
device
The device for inference.
requiredrng
Generator
Numpy random generator.
requiredn_jobs
int
Number of parallel CPU workers.
requiredbyte_size
int
Byte size for the chosen inference precision.
requiredforced_inference_dtype_
dtype | None
If not None, the forced dtype for inference.
requiredmemory_saving_mode
bool | Literal['auto'] | float | int
GPU/CPU memory saving settings.
requireduse_autocast_
bool
Whether we use torch.autocast for inference.
required"},{"location":"reference/tabpfn/base/#tabpfn.base.determine_precision","title":"determine_precision","text":"determine_precision(\n inference_precision: (\n dtype | Literal[\"autocast\", \"auto\"]\n ),\n device_: device,\n) -> tuple[bool, dtype | None, int]\n
Decide whether to use autocast or a forced precision dtype.
Parameters:
Name Type Description Defaultinference_precision
dtype | Literal['autocast', 'auto']
\"auto\"
, decide automatically based on the device.\"autocast\"
, explicitly use PyTorch autocast (mixed precision).torch.dtype
, force that precision.device_
device
The device on which inference is run.
requiredReturns:
Name Type Descriptionuse_autocast_
bool
True if mixed-precision autocast will be used.
forced_inference_dtype_
dtype | None
If not None, the forced precision dtype for the model.
byte_size
int
The byte size per element for the chosen precision.
"},{"location":"reference/tabpfn/base/#tabpfn.base.initialize_tabpfn_model","title":"initialize_tabpfn_model","text":"initialize_tabpfn_model(\n model_path: str | Path | Literal[\"auto\"],\n which: Literal[\"classifier\", \"regressor\"],\n fit_mode: Literal[\n \"low_memory\", \"fit_preprocessors\", \"fit_with_cache\"\n ],\n static_seed: int,\n) -> tuple[\n PerFeatureTransformer,\n InferenceConfig,\n FullSupportBarDistribution | None,\n]\n
Common logic to load the TabPFN model, set up the random state, and optionally download the model.
Parameters:
Name Type Description Defaultmodel_path
str | Path | Literal['auto']
Path or directive (\"auto\") to load the pre-trained model from.
requiredwhich
Literal['classifier', 'regressor']
Which TabPFN model to load.
requiredfit_mode
Literal['low_memory', 'fit_preprocessors', 'fit_with_cache']
Determines caching behavior.
requiredstatic_seed
int
Random seed for reproducibility logic.
requiredReturns:
Name Type Descriptionmodel
PerFeatureTransformer
The loaded TabPFN model.
config
InferenceConfig
The configuration object associated with the loaded model.
bar_distribution
FullSupportBarDistribution | None
The BarDistribution for regression (None
if classifier).
TabPFNClassifier class.
Example
import sklearn.datasets\nfrom tabpfn import TabPFNClassifier\n\nmodel = TabPFNClassifier()\n\nX, y = sklearn.datasets.load_iris(return_X_y=True)\n\nmodel.fit(X, y)\npredictions = model.predict(X)\n
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier","title":"TabPFNClassifier","text":" Bases: ClassifierMixin
, BaseEstimator
TabPFNClassifier class.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.class_counts_","title":"class_counts_instance-attribute
","text":"class_counts_: NDArray[Any]\n
The number of classes per class found in the target data during fit()
.
instance-attribute
","text":"classes_: NDArray[Any]\n
The unique classes found in the target data during fit()
.
instance-attribute
","text":"config_: InferenceConfig\n
The configuration of the loaded model to be used for inference.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.device_","title":"device_instance-attribute
","text":"device_: device\n
The device determined to be used.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.executor_","title":"executor_instance-attribute
","text":"executor_: InferenceEngine\n
The inference engine used to make predictions.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.feature_names_in_","title":"feature_names_in_instance-attribute
","text":"feature_names_in_: NDArray[Any]\n
The feature names of the input data.
May not be set if the input data does not have feature names, such as with a numpy array.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.forced_inference_dtype_","title":"forced_inference_dtype_instance-attribute
","text":"forced_inference_dtype_: _dtype | None\n
The forced inference dtype for the model based on inference_precision
.
instance-attribute
","text":"inferred_categorical_indices_: list[int]\n
The indices of the columns that were inferred to be categorical, as a product of any features deemed categorical by the user and what would work best for the model.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.interface_config_","title":"interface_config_instance-attribute
","text":"interface_config_: ModelInterfaceConfig\n
Additional configuration of the interface for expert users.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.label_encoder_","title":"label_encoder_instance-attribute
","text":"label_encoder_: LabelEncoder\n
The label encoder used to encode the target variable.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.n_classes_","title":"n_classes_instance-attribute
","text":"n_classes_: int\n
The number of classes found in the target data during fit()
.
instance-attribute
","text":"n_features_in_: int\n
The number of features in the input data used during fit()
.
instance-attribute
","text":"n_outputs_: Literal[1]\n
The number of outputs the model has. Only 1 for now
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.preprocessor_","title":"preprocessor_instance-attribute
","text":"preprocessor_: ColumnTransformer\n
The column transformer used to preprocess the input data to be numeric.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.use_autocast_","title":"use_autocast_instance-attribute
","text":"use_autocast_: bool\n
Whether torch's autocast should be used.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.fit","title":"fit","text":"fit(X: XType, y: YType) -> Self\n
Fit the model.
Parameters:
Name Type Description DefaultX
XType
The input data.
requiredy
YType
The target variable.
required"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.predict","title":"predict","text":"predict(X: XType) -> ndarray\n
Predict the class labels for the provided input samples.
Parameters:
Name Type Description DefaultX
XType
The input samples.
requiredReturns:
Type Descriptionndarray
The predicted class labels.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.predict_proba","title":"predict_proba","text":"predict_proba(X: XType) -> ndarray\n
Predict the probabilities of the classes for the provided input samples.
Parameters:
Name Type Description DefaultX
XType
The input data.
requiredReturns:
Type Descriptionndarray
The predicted probabilities of the classes.
"},{"location":"reference/tabpfn/constants/","title":"Constants","text":""},{"location":"reference/tabpfn/constants/#tabpfn.constants","title":"constants","text":"Various constants used throughout the library.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig","title":"ModelInterfaceConfigdataclass
","text":"Constants used as default HPs in the model interfaces.
These constants are not exposed to the models' init on purpose to reduce the complexity for users. Furthermore, most of these should not be optimized over by the (standard) user.
Several of the preprocessing options are supported by our code for efficiency reasons (to avoid loading TabPFN multiple times). However, these can also be applied outside of the model interface.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.CLASS_SHIFT_METHOD","title":"CLASS_SHIFT_METHODclass-attribute
instance-attribute
","text":"CLASS_SHIFT_METHOD: Literal[\"rotate\", \"shuffle\"] | None = (\n \"shuffle\"\n)\n
The method used to shift classes during preprocessing for ensembling to emulate the effect of invariance to class order. Without ensembling, TabPFN is not invariant to class order due to using a transformer. Shifting classes can have a positive effect on the model's performance. The options are: - If \"shuffle\", the classes are shuffled. - If \"rotate\", the classes are rotated (think of a ring). - If None, no class shifting is done.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.FEATURE_SHIFT_METHOD","title":"FEATURE_SHIFT_METHODclass-attribute
instance-attribute
","text":"FEATURE_SHIFT_METHOD: (\n Literal[\"shuffle\", \"rotate\"] | None\n) = \"shuffle\"\n
The method used to shift features during preprocessing for ensembling to emulate the effect of invariance to feature position. Without ensembling, TabPFN is not invariant to feature position due to using a transformer. Moreover, shifting features can have a positive effect on the model's performance. The options are: - If \"shuffle\", the features are shuffled. - If \"rotate\", the features are rotated (think of a ring). - If None, no feature shifting is done.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.FINGERPRINT_FEATURE","title":"FINGERPRINT_FEATUREclass-attribute
instance-attribute
","text":"FINGERPRINT_FEATURE: bool = True\n
Whether to add a fingerprint feature to the data. The added feature is a hash of the row, counting up for duplicates. This helps TabPFN to distinguish between duplicated data points in the input data. Otherwise, duplicates would be less obvious during attention. This is expected to improve prediction performance and help with stability if the data has many sample duplicates.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.FIX_NAN_BORDERS_AFTER_TARGET_TRANSFORM","title":"FIX_NAN_BORDERS_AFTER_TARGET_TRANSFORMclass-attribute
instance-attribute
","text":"FIX_NAN_BORDERS_AFTER_TARGET_TRANSFORM: bool = True\n
Whether to repair any borders of the bar distribution in regression that are NaN after the transformation. This can happen due to multiple reasons and should in general always be done.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MAX_NUMBER_OF_CLASSES","title":"MAX_NUMBER_OF_CLASSESclass-attribute
instance-attribute
","text":"MAX_NUMBER_OF_CLASSES: int = 10\n
The number of classes seen during pretraining for classification. If the number of classes is larger than this number, TabPFN requires an additional step to predict for more than classes.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MAX_NUMBER_OF_FEATURES","title":"MAX_NUMBER_OF_FEATURESclass-attribute
instance-attribute
","text":"MAX_NUMBER_OF_FEATURES: int = 500\n
The number of features that the pretraining was intended for. If the number of features is larger than this number, you may see degraded performance. Note, this is not the number of features seen by the model during pretraining but also accounts for expected generalization (i.e., length extrapolation).
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MAX_NUMBER_OF_SAMPLES","title":"MAX_NUMBER_OF_SAMPLESclass-attribute
instance-attribute
","text":"MAX_NUMBER_OF_SAMPLES: int = 10000\n
The number of samples that the pretraining was intended for. If the number of samples is larger than this number, you may see degraded performance. Note, this is not the number of samples seen by the model during pretraining but also accounts for expected generalization (i.e., length extrapolation).
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MAX_UNIQUE_FOR_CATEGORICAL_FEATURES","title":"MAX_UNIQUE_FOR_CATEGORICAL_FEATURESclass-attribute
instance-attribute
","text":"MAX_UNIQUE_FOR_CATEGORICAL_FEATURES: int = 30\n
The maximum number of unique values for a feature to be considered categorical. Otherwise, it is considered numerical.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MIN_NUMBER_SAMPLES_FOR_CATEGORICAL_INFERENCE","title":"MIN_NUMBER_SAMPLES_FOR_CATEGORICAL_INFERENCEclass-attribute
instance-attribute
","text":"MIN_NUMBER_SAMPLES_FOR_CATEGORICAL_INFERENCE: int = 100\n
The minimum number of samples in the data to run our infer which features might be categorical.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MIN_UNIQUE_FOR_NUMERICAL_FEATURES","title":"MIN_UNIQUE_FOR_NUMERICAL_FEATURESclass-attribute
instance-attribute
","text":"MIN_UNIQUE_FOR_NUMERICAL_FEATURES: int = 4\n
The minimum number of unique values for a feature to be considered numerical. Otherwise, it is considered categorical.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.OUTLIER_REMOVAL_STD","title":"OUTLIER_REMOVAL_STDclass-attribute
instance-attribute
","text":"OUTLIER_REMOVAL_STD: float | None | Literal[\"auto\"] = \"auto\"\n
The number of standard deviations from the mean to consider a sample an outlier. - If None, no outliers are removed. - If float, the number of standard deviations from the mean to consider a sample an outlier. - If \"auto\", the OUTLIER_REMOVAL_STD is automatically determined. -> 12.0 for classification and None for regression.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.POLYNOMIAL_FEATURES","title":"POLYNOMIAL_FEATURESclass-attribute
instance-attribute
","text":"POLYNOMIAL_FEATURES: Literal['no', 'all'] | int = 'no'\n
The number of 2 factor polynomial features to generate and add to the original data before passing the data to TabPFN. The polynomial features are generated by multiplying the original features together, e.g., this might add a feature x1*x2
to the features, if x1
and x2
are features. In total, this can add up O(n^2) many features. Adding polynomial features can improve predictive performance by exploiting simple feature engineering. - If \"no\", no polynomial features are added. - If \"all\", all possible polynomial features are added. - If an int, determines the maximal number of polynomial features to add to the original data.
class-attribute
instance-attribute
","text":"PREPROCESS_TRANSFORMS: list[PreprocessorConfig] | None = (\n None\n)\n
The preprocessing applied to the data before passing it to TabPFN. See PreprocessorConfig
for options and more details. If a list of PreprocessorConfig
is provided, the preprocessors are (repeatedly) applied across different estimators.
By default, for classification, two preprocessors are applied: 1. Uses the original input data, all features transformed with a quantile scaler, and the first n-many components of SVD transformer (whereby n is a fract of on the number of features or samples). Categorical features are ordinal encoded but all categories with less than 10 features are ignored. 2. Uses the original input data, with categorical features as ordinal encoded.
By default, for regression, two preprocessor are applied: 1. The same as for classification, with a minimal different quantile scaler. 2. The original input data power transformed and categories onehot encoded.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.REGRESSION_Y_PREPROCESS_TRANSFORMS","title":"REGRESSION_Y_PREPROCESS_TRANSFORMSclass-attribute
instance-attribute
","text":"REGRESSION_Y_PREPROCESS_TRANSFORMS: tuple[\n Literal[\"safepower\", \"power\", \"quantile_norm\", None],\n ...,\n] = (None, \"safepower\")\n
The preprocessing applied to the target variable before passing it to TabPFN for regression. This can be understood as scaling the target variable to better predict it. The preprocessors should be passed as a tuple/list and are then (repeatedly) used by the estimators in the ensembles.
By default, we use no preprocessing and a power transformation (if we have more than one estimator).
The options areclass-attribute
instance-attribute
","text":"SUBSAMPLE_SAMPLES: int | float | None = None\n
Subsample the input data sample/row-wise before performing any preprocessing and the TabPFN forward pass. - If None, no subsampling is done. - If an int, the number of samples to subsample (or oversample if SUBSAMPLE_SAMPLES
is larger than the number of samples). - If a float, the percentage of samples to subsample.
class-attribute
instance-attribute
","text":"USE_SKLEARN_16_DECIMAL_PRECISION: bool = False\n
Whether to round the probabilities to float 16 to match the precision of scikit-learn. This can help with reproducibility and compatibility with scikit-learn but is not recommended for general use. This is not exposed to the user or as a hyperparameter. To improve reproducibility,set ._sklearn_16_decimal_precision = True
before calling .predict()
or .predict_proba()
.
staticmethod
","text":"from_user_input(\n *, inference_config: dict | ModelInterfaceConfig | None\n) -> ModelInterfaceConfig\n
Converts the user input to a ModelInterfaceConfig
object.
The input inference_config can be a dictionary, a ModelInterfaceConfig
object, or None. If a dictionary is passed, the keys must match the attributes of ModelInterfaceConfig
. If a ModelInterfaceConfig
object is passed, it is returned as is. If None is passed, a new ModelInterfaceConfig
object is created with default values.
Module that defines different ways to run inference with TabPFN.
"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngine","title":"InferenceEnginedataclass
","text":" Bases: ABC
These define how tabpfn inference can be run.
As there are many things that can be cached, with multiple ways to parallelize, Executor
defines three primary things:
Most will define a method prepare()
which is specific to that inference engine. These do not share a common interface.
What to cache:
As we can prepare a lot of the transformers context, there is a tradeoff in terms of how much memory to be spent in caching. This memory is used when prepare()
is called, usually in fit()
.
Using the cached data for inference:
Based on what has been prepared for the transformer context, iter_outputs()
will use this cached information to make predictions.
Controlling parallelism:
As we have trivially parallel parts for inference, we can parallelize them. However as the GPU is typically a bottle-neck in most systems, we can define, where and how we would like to parallelize the inference.
abstractmethod
","text":"iter_outputs(\n X: ndarray, *, device: device, autocast: bool\n) -> Iterator[tuple[Tensor, EnsembleConfig]]\n
Iterate over the outputs of the model.
One for each ensemble configuration that was used to initialize the executor.
Parameters:
Name Type Description DefaultX
ndarray
The input data to make predictions on.
requireddevice
device
The device to run the model on.
requiredautocast
bool
Whether to use torch.autocast during inference.
required"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngineCacheKV","title":"InferenceEngineCacheKVdataclass
","text":" Bases: InferenceEngine
Inference engine that caches the actual KV cache calculated from the context of the processed training data.
This is by far the most memory intensive inference engine, as for each ensemble member we store the full KV cache of that model. For now this is held in CPU RAM (TODO(eddiebergman): verify)
"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngineCacheKV.prepare","title":"prepareclassmethod
","text":"prepare(\n X_train: ndarray,\n y_train: ndarray,\n *,\n cat_ix: list[int],\n ensemble_configs: Sequence[EnsembleConfig],\n n_workers: int,\n model: PerFeatureTransformer,\n device: device,\n rng: Generator,\n dtype_byte_size: int,\n force_inference_dtype: dtype | None,\n save_peak_mem: bool | Literal[\"auto\"] | float | int,\n autocast: bool\n) -> InferenceEngineCacheKV\n
Prepare the inference engine.
Parameters:
Name Type Description DefaultX_train
ndarray
The training data.
requiredy_train
ndarray
The training target.
requiredcat_ix
list[int]
The categorical indices.
requiredensemble_configs
Sequence[EnsembleConfig]
The ensemble configurations to use.
requiredn_workers
int
The number of workers to use.
requiredmodel
PerFeatureTransformer
The model to use.
requireddevice
device
The device to run the model on.
requiredrng
Generator
The random number generator.
requireddtype_byte_size
int
Size of the dtype in bytes.
requiredforce_inference_dtype
dtype | None
The dtype to force inference to.
requiredsave_peak_mem
bool | Literal['auto'] | float | int
Whether to save peak memory usage.
requiredautocast
bool
Whether to use torch.autocast during inference.
required"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngineCachePreprocessing","title":"InferenceEngineCachePreprocessingdataclass
","text":" Bases: InferenceEngine
Inference engine that caches the preprocessing for feeding as model context on predict.
This will fit the preprocessors on the training data, as well as cache the transformed training data on RAM (not GPU RAM).
This saves some time on each predict call, at the cost of increasing the amount of memory in RAM. The main functionality performed at predict()
time is to forward pass through the model which is currently done sequentially.
classmethod
","text":"prepare(\n X_train: ndarray,\n y_train: ndarray,\n *,\n cat_ix: list[int],\n model: PerFeatureTransformer,\n ensemble_configs: Sequence[EnsembleConfig],\n n_workers: int,\n rng: Generator,\n dtype_byte_size: int,\n force_inference_dtype: dtype | None,\n save_peak_mem: bool | Literal[\"auto\"] | float | int\n) -> InferenceEngineCachePreprocessing\n
Prepare the inference engine.
Parameters:
Name Type Description DefaultX_train
ndarray
The training data.
requiredy_train
ndarray
The training target.
requiredcat_ix
list[int]
The categorical indices.
requiredmodel
PerFeatureTransformer
The model to use.
requiredensemble_configs
Sequence[EnsembleConfig]
The ensemble configurations to use.
requiredn_workers
int
The number of workers to use.
requiredrng
Generator
The random number generator.
requireddtype_byte_size
int
The byte size of the dtype.
requiredforce_inference_dtype
dtype | None
The dtype to force inference to.
requiredsave_peak_mem
bool | Literal['auto'] | float | int
Whether to save peak memory usage.
requiredReturns:
Type DescriptionInferenceEngineCachePreprocessing
The prepared inference engine.
"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngineOnDemand","title":"InferenceEngineOnDemanddataclass
","text":" Bases: InferenceEngine
Inference engine that does not cache anything, computes everything as needed.
This is one of the slowest ways to run inference, as computation that could be cached is recomputed on every call. However the memory demand is lowest and can be more trivially parallelized across GPUs with some work.
"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngineOnDemand.prepare","title":"prepareclassmethod
","text":"prepare(\n X_train: ndarray,\n y_train: ndarray,\n *,\n cat_ix: list[int],\n model: PerFeatureTransformer,\n ensemble_configs: Sequence[EnsembleConfig],\n rng: Generator,\n n_workers: int,\n dtype_byte_size: int,\n force_inference_dtype: dtype | None,\n save_peak_mem: bool | Literal[\"auto\"] | float | int\n) -> InferenceEngineOnDemand\n
Prepare the inference engine.
Parameters:
Name Type Description DefaultX_train
ndarray
The training data.
requiredy_train
ndarray
The training target.
requiredcat_ix
list[int]
The categorical indices.
requiredmodel
PerFeatureTransformer
The model to use.
requiredensemble_configs
Sequence[EnsembleConfig]
The ensemble configurations to use.
requiredrng
Generator
The random number generator.
requiredn_workers
int
The number of workers to use.
requireddtype_byte_size
int
The byte size of the dtype.
requiredforce_inference_dtype
dtype | None
The dtype to force inference to.
requiredsave_peak_mem
bool | Literal['auto'] | float | int
Whether to save peak memory usage.
required"},{"location":"reference/tabpfn/preprocessing/","title":"Preprocessing","text":""},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing","title":"preprocessing","text":"Defines the preprocessing configurations that define the ensembling of different members.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.ClassifierEnsembleConfig","title":"ClassifierEnsembleConfigdataclass
","text":" Bases: EnsembleConfig
Configuration for a classifier ensemble member.
See EnsembleConfig for more details.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.ClassifierEnsembleConfig.generate_for_classification","title":"generate_for_classificationclassmethod
","text":"generate_for_classification(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n class_shift_method: Literal[\"rotate\", \"shuffle\"] | None,\n n_classes: int,\n random_state: int | Generator | None\n) -> list[ClassifierEnsembleConfig]\n
Generate ensemble configurations for classification.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredclass_shift_method
Literal['rotate', 'shuffle'] | None
How to shift classes for classpermutation.
requiredn_classes
int
Number of classes.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[ClassifierEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.ClassifierEnsembleConfig.generate_for_regression","title":"generate_for_regressionclassmethod
","text":"generate_for_regression(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n target_transforms: Sequence[\n TransformerMixin | Pipeline | None\n ],\n random_state: int | Generator | None\n) -> list[RegressorEnsembleConfig]\n
Generate ensemble configurations for regression.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredtarget_transforms
Sequence[TransformerMixin | Pipeline | None]
Target transformations to apply.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[RegressorEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.ClassifierEnsembleConfig.to_pipeline","title":"to_pipeline","text":"to_pipeline(\n *, random_state: int | Generator | None\n) -> SequentialFeatureTransformer\n
Convert the ensemble configuration to a preprocessing pipeline.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.EnsembleConfig","title":"EnsembleConfigdataclass
","text":"Configuration for an ensemble member.
Attributes:
Name Type Descriptionfeature_shift_count
int
How much to shift the features columns.
class_permutation
int
Permutation to apply to classes
preprocess_config
PreprocessorConfig
Preprocessor configuration to use.
subsample_ix
NDArray[int64] | None
Indices of samples to use for this ensemble member. If None
, no subsampling is done.
classmethod
","text":"generate_for_classification(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n class_shift_method: Literal[\"rotate\", \"shuffle\"] | None,\n n_classes: int,\n random_state: int | Generator | None\n) -> list[ClassifierEnsembleConfig]\n
Generate ensemble configurations for classification.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredclass_shift_method
Literal['rotate', 'shuffle'] | None
How to shift classes for classpermutation.
requiredn_classes
int
Number of classes.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[ClassifierEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.EnsembleConfig.generate_for_regression","title":"generate_for_regressionclassmethod
","text":"generate_for_regression(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n target_transforms: Sequence[\n TransformerMixin | Pipeline | None\n ],\n random_state: int | Generator | None\n) -> list[RegressorEnsembleConfig]\n
Generate ensemble configurations for regression.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredtarget_transforms
Sequence[TransformerMixin | Pipeline | None]
Target transformations to apply.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[RegressorEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.EnsembleConfig.to_pipeline","title":"to_pipeline","text":"to_pipeline(\n *, random_state: int | Generator | None\n) -> SequentialFeatureTransformer\n
Convert the ensemble configuration to a preprocessing pipeline.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.PreprocessorConfig","title":"PreprocessorConfigdataclass
","text":"Configuration for data preprocessors.
Attributes:
Name Type Descriptionname
Literal['per_feature', 'power', 'safepower', 'power_box', 'safepower_box', 'quantile_uni_coarse', 'quantile_norm_coarse', 'quantile_uni', 'quantile_norm', 'quantile_uni_fine', 'quantile_norm_fine', 'robust', 'kdi', 'none', 'kdi_random_alpha', 'kdi_uni', 'kdi_random_alpha_uni', 'adaptive', 'norm_and_kdi', 'kdi_alpha_0.3_uni', 'kdi_alpha_0.5_uni', 'kdi_alpha_0.8_uni', 'kdi_alpha_1.0_uni', 'kdi_alpha_1.2_uni', 'kdi_alpha_1.5_uni', 'kdi_alpha_2.0_uni', 'kdi_alpha_3.0_uni', 'kdi_alpha_5.0_uni', 'kdi_alpha_0.3', 'kdi_alpha_0.5', 'kdi_alpha_0.8', 'kdi_alpha_1.0', 'kdi_alpha_1.2', 'kdi_alpha_1.5', 'kdi_alpha_2.0', 'kdi_alpha_3.0', 'kdi_alpha_5.0']
Name of the preprocessor.
categorical_name
Literal['none', 'numeric', 'onehot', 'ordinal', 'ordinal_shuffled', 'ordinal_very_common_categories_shuffled']
Name of the categorical encoding method. Options: \"none\", \"numeric\", \"onehot\", \"ordinal\", \"ordinal_shuffled\", \"none\".
append_original
bool
Whether to append original features to the transformed features
subsample_features
float
Fraction of features to subsample. -1 means no subsampling.
global_transformer_name
str | None
Name of the global transformer to use.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.RegressorEnsembleConfig","title":"RegressorEnsembleConfigdataclass
","text":" Bases: EnsembleConfig
Configuration for a regression ensemble member.
See EnsembleConfig for more details.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.RegressorEnsembleConfig.generate_for_classification","title":"generate_for_classificationclassmethod
","text":"generate_for_classification(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n class_shift_method: Literal[\"rotate\", \"shuffle\"] | None,\n n_classes: int,\n random_state: int | Generator | None\n) -> list[ClassifierEnsembleConfig]\n
Generate ensemble configurations for classification.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredclass_shift_method
Literal['rotate', 'shuffle'] | None
How to shift classes for classpermutation.
requiredn_classes
int
Number of classes.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[ClassifierEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.RegressorEnsembleConfig.generate_for_regression","title":"generate_for_regressionclassmethod
","text":"generate_for_regression(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n target_transforms: Sequence[\n TransformerMixin | Pipeline | None\n ],\n random_state: int | Generator | None\n) -> list[RegressorEnsembleConfig]\n
Generate ensemble configurations for regression.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredtarget_transforms
Sequence[TransformerMixin | Pipeline | None]
Target transformations to apply.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[RegressorEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.RegressorEnsembleConfig.to_pipeline","title":"to_pipeline","text":"to_pipeline(\n *, random_state: int | Generator | None\n) -> SequentialFeatureTransformer\n
Convert the ensemble configuration to a preprocessing pipeline.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.balance","title":"balance","text":"balance(x: Iterable[T], n: int) -> list[T]\n
Take a list of elements and make a new list where each appears n
times.
default_classifier_preprocessor_configs() -> (\n list[PreprocessorConfig]\n)\n
Default preprocessor configurations for classification.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.default_regressor_preprocessor_configs","title":"default_regressor_preprocessor_configs","text":"default_regressor_preprocessor_configs() -> (\n list[PreprocessorConfig]\n)\n
Default preprocessor configurations for regression.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.fit_preprocessing","title":"fit_preprocessing","text":"fit_preprocessing(\n configs: Sequence[EnsembleConfig],\n X_train: ndarray,\n y_train: ndarray,\n *,\n random_state: int | Generator | None,\n cat_ix: list[int],\n n_workers: int,\n parallel_mode: Literal[\"block\", \"as-ready\", \"in-order\"]\n) -> Iterator[\n tuple[\n EnsembleConfig,\n SequentialFeatureTransformer,\n ndarray,\n ndarray,\n list[int],\n ]\n]\n
Fit preprocessing pipelines in parallel.
Parameters:
Name Type Description Defaultconfigs
Sequence[EnsembleConfig]
List of ensemble configurations.
requiredX_train
ndarray
Training data.
requiredy_train
ndarray
Training target.
requiredrandom_state
int | Generator | None
Random number generator.
requiredcat_ix
list[int]
Indices of categorical features.
requiredn_workers
int
Number of workers to use.
requiredparallel_mode
Literal['block', 'as-ready', 'in-order']
Parallel mode to use.
\"block\"
: Blocks until all workers are done. Returns in order.\"as-ready\"
: Returns results as they are ready. Any order.\"in-order\"
: Returns results in order, blocking only in the order that needs to be returned in.Returns:
Type DescriptionEnsembleConfig
Iterator of tuples containing the ensemble configuration, the fitted
SequentialFeatureTransformer
preprocessing pipeline, the transformed training data, the transformed target,
ndarray
and the indices of categorical features.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.fit_preprocessing_one","title":"fit_preprocessing_one","text":"fit_preprocessing_one(\n config: EnsembleConfig,\n X_train: ndarray,\n y_train: ndarray,\n random_state: int | Generator | None = None,\n *,\n cat_ix: list[int]\n) -> tuple[\n EnsembleConfig,\n SequentialFeatureTransformer,\n ndarray,\n ndarray,\n list[int],\n]\n
Fit preprocessing pipeline for a single ensemble configuration.
Parameters:
Name Type Description Defaultconfig
EnsembleConfig
Ensemble configuration.
requiredX_train
ndarray
Training data.
requiredy_train
ndarray
Training target.
requiredrandom_state
int | Generator | None
Random seed.
None
cat_ix
list[int]
Indices of categorical features.
requiredReturns:
Type DescriptionEnsembleConfig
Tuple containing the ensemble configuration, the fitted preprocessing pipeline,
SequentialFeatureTransformer
the transformed training data, the transformed target, and the indices of
ndarray
categorical features.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.generate_index_permutations","title":"generate_index_permutations","text":"generate_index_permutations(\n n: int,\n *,\n max_index: int,\n subsample: int | float,\n random_state: int | Generator | None\n) -> list[NDArray[int64]]\n
Generate indices for subsampling from the data.
Parameters:
Name Type Description Defaultn
int
Number of indices to generate.
requiredmax_index
int
Maximum index to generate.
requiredsubsample
int | float
Number of indices to subsample. If int
, subsample that many indices. If float, subsample that fraction of indices. random_state: Random number generator.
random_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[NDArray[int64]]
List of indices to subsample.
"},{"location":"reference/tabpfn/regressor/","title":"Regressor","text":""},{"location":"reference/tabpfn/regressor/#tabpfn.regressor","title":"regressor","text":"TabPFNRegressor class.
Example
import sklearn.datasets\nfrom tabpfn import TabPFNRegressor\n\nmodel = TabPFNRegressor()\nX, y = sklearn.datasets.make_regression(n_samples=50, n_features=10)\n\nmodel.fit(X, y)\npredictions = model.predict(X)\n
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor","title":"TabPFNRegressor","text":" Bases: RegressorMixin
, BaseEstimator
TabPFNRegressor class.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.bardist_","title":"bardist_instance-attribute
","text":"bardist_: FullSupportBarDistribution\n
The bar distribution of the target variable, used by the model.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.config_","title":"config_instance-attribute
","text":"config_: InferenceConfig\n
The configuration of the loaded model to be used for inference.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.device_","title":"device_instance-attribute
","text":"device_: device\n
The device determined to be used.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.executor_","title":"executor_instance-attribute
","text":"executor_: InferenceEngine\n
The inference engine used to make predictions.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.feature_names_in_","title":"feature_names_in_instance-attribute
","text":"feature_names_in_: NDArray[Any]\n
The feature names of the input data.
May not be set if the input data does not have feature names, such as with a numpy array.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.forced_inference_dtype_","title":"forced_inference_dtype_instance-attribute
","text":"forced_inference_dtype_: _dtype | None\n
The forced inference dtype for the model based on inference_precision
.
instance-attribute
","text":"inferred_categorical_indices_: list[int]\n
The indices of the columns that were inferred to be categorical, as a product of any features deemed categorical by the user and what would work best for the model.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.interface_config_","title":"interface_config_instance-attribute
","text":"interface_config_: ModelInterfaceConfig\n
Additional configuration of the interface for expert users.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.n_features_in_","title":"n_features_in_instance-attribute
","text":"n_features_in_: int\n
The number of features in the input data used during fit()
.
instance-attribute
","text":"n_outputs_: Literal[1]\n
The number of outputs the model supports. Only 1 for now
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.preprocessor_","title":"preprocessor_instance-attribute
","text":"preprocessor_: ColumnTransformer\n
The column transformer used to preprocess the input data to be numeric.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.renormalized_criterion_","title":"renormalized_criterion_instance-attribute
","text":"renormalized_criterion_: FullSupportBarDistribution\n
The normalized bar distribution used for computing the predictions.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.use_autocast_","title":"use_autocast_instance-attribute
","text":"use_autocast_: bool\n
Whether torch's autocast should be used.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.y_train_mean_","title":"y_train_mean_instance-attribute
","text":"y_train_mean_: float\n
The mean of the target variable during training.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.y_train_std","title":"y_train_stdinstance-attribute
","text":"y_train_std: float\n
The standard deviation of the target variable during training.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.fit","title":"fit","text":"fit(X: XType, y: YType) -> Self\n
Fit the model.
Parameters:
Name Type Description DefaultX
XType
The input data.
requiredy
YType
The target variable.
requiredReturns:
Type DescriptionSelf
self
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.predict","title":"predict","text":"predict(\n X: XType,\n *,\n output_type: Literal[\n \"mean\",\n \"median\",\n \"mode\",\n \"quantiles\",\n \"full\",\n \"main\",\n ] = \"mean\",\n quantiles: list[float] | None = None\n) -> (\n ndarray\n | list[ndarray]\n | dict[str, ndarray]\n | dict[str, ndarray | FullSupportBarDistribution]\n)\n
Predict the target variable.
Parameters:
Name Type Description DefaultX
XType
The input data.
requiredoutput_type
Literal['mean', 'median', 'mode', 'quantiles', 'full', 'main']
Determines the type of output to return.
\"mean\"
, we return the mean over the predicted distribution.\"median\"
, we return the median over the predicted distribution.\"mode\"
, we return the mode over the predicted distribution.\"quantiles\"
, we return the quantiles of the predicted distribution. The parameter output_quantiles
determines which quantiles are returned.\"main\"
, we return the all output types above in a dict.\"full\"
, we return the full output of the model, including the logits and the criterion, and all the output types from \"main\".'mean'
quantiles
list[float] | None
The quantiles to return if output=\"quantiles\"
.
By default, the [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
quantiles are returned. The predictions per quantile match the input order.
None
Returns:
Type Descriptionndarray | list[ndarray] | dict[str, ndarray] | dict[str, ndarray | FullSupportBarDistribution]
The predicted target variable or a list of predictions per quantile.
"},{"location":"reference/tabpfn/utils/","title":"Utils","text":""},{"location":"reference/tabpfn/utils/#tabpfn.utils","title":"utils","text":"A collection of random utilities for the TabPFN models.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.infer_categorical_features","title":"infer_categorical_features","text":"infer_categorical_features(\n X: ndarray,\n *,\n provided: Sequence[int] | None,\n min_samples_for_inference: int,\n max_unique_for_category: int,\n min_unique_for_numerical: int\n) -> list[int]\n
Infer the categorical features from the given data.
Note
This function may infer particular columns to not be categorical as defined by what suits the model predictions and it's pre-training.
Parameters:
Name Type Description DefaultX
ndarray
The data to infer the categorical features from.
requiredprovided
Sequence[int] | None
Any user provided indices of what is considered categorical.
requiredmin_samples_for_inference
int
The minimum number of samples required for automatic inference of features which were not provided as categorical.
requiredmax_unique_for_category
int
The maximum number of unique values for a feature to be considered categorical.
requiredmin_unique_for_numerical
int
The minimum number of unique values for a feature to be considered numerical.
requiredReturns:
Type Descriptionlist[int]
The indices of inferred categorical features.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.infer_device_and_type","title":"infer_device_and_type","text":"infer_device_and_type(\n device: str | device | None,\n) -> device\n
Infer the device and data type from the given device string.
Parameters:
Name Type Description Defaultdevice
str | device | None
The device to infer the type from.
requiredReturns:
Type Descriptiondevice
The inferred device
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.infer_fp16_inference_mode","title":"infer_fp16_inference_mode","text":"infer_fp16_inference_mode(\n device: device, *, enable: bool | None\n) -> bool\n
Infer whether fp16 inference should be enabled.
Parameters:
Name Type Description Defaultdevice
device
The device to validate against.
requiredenable
bool | None
Whether it should be enabled, True
or False
, otherwise if None
, detect if it's possible and use it if so.
Returns:
Type Descriptionbool
Whether to use fp16 inference or not.
Raises:
Type DescriptionValueError
If fp16 inference was enabled and device type does not support it.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.infer_random_state","title":"infer_random_state","text":"infer_random_state(\n random_state: int | RandomState | Generator | None,\n) -> tuple[int, Generator]\n
Infer the random state from the given input.
Parameters:
Name Type Description Defaultrandom_state
int | RandomState | Generator | None
The random state to infer.
requiredReturns:
Type Descriptiontuple[int, Generator]
A static integer seed and a random number generator.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.is_autocast_available","title":"is_autocast_available","text":"is_autocast_available(device_type: str) -> bool\n
Infer whether autocast is available for the given device type.
Parameters:
Name Type Description Defaultdevice_type
str
The device type to check for autocast availability.
requiredReturns:
Type Descriptionbool
Whether autocast is available for the given device type.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.load_model_criterion_config","title":"load_model_criterion_config","text":"load_model_criterion_config(\n model_path: None | str | Path,\n *,\n check_bar_distribution_criterion: bool,\n cache_trainset_representation: bool,\n which: Literal[\"regressor\", \"classifier\"],\n version: Literal[\"v2\"] = \"v2\",\n download: bool,\n model_seed: int\n) -> tuple[\n PerFeatureTransformer,\n BCEWithLogitsLoss\n | CrossEntropyLoss\n | FullSupportBarDistribution,\n InferenceConfig,\n]\n
Load the model, criterion, and config from the given path.
Parameters:
Name Type Description Defaultmodel_path
None | str | Path
The path to the model.
requiredcheck_bar_distribution_criterion
bool
Whether to check if the criterion is a FullSupportBarDistribution, which is the expected criterion for models trained for regression.
requiredcache_trainset_representation
bool
Whether the model should know to cache the trainset representation.
requiredwhich
Literal['regressor', 'classifier']
Whether the model is a regressor or classifier.
requiredversion
Literal['v2']
The version of the model.
'v2'
download
bool
Whether to download the model if it doesn't exist.
requiredmodel_seed
int
The seed of the model.
requiredReturns:
Type Descriptiontuple[PerFeatureTransformer, BCEWithLogitsLoss | CrossEntropyLoss | FullSupportBarDistribution, InferenceConfig]
The model, criterion, and config.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.translate_probs_across_borders","title":"translate_probs_across_borders","text":"translate_probs_across_borders(\n logits: Tensor, *, frm: Tensor, to: Tensor\n) -> Tensor\n
Translate the probabilities across the borders.
Parameters:
Name Type Description Defaultlogits
Tensor
The logits defining the distribution to translate.
requiredfrm
Tensor
The borders to translate from.
requiredto
Tensor
The borders to translate to.
requiredReturns:
Type DescriptionTensor
The translated probabilities.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.update_encoder_outlier_params","title":"update_encoder_outlier_params","text":"update_encoder_outlier_params(\n model: Module,\n remove_outliers_std: float | None,\n seed: int | None,\n *,\n inplace: Literal[True]\n) -> None\n
Update the encoder to handle outliers in the model.
Warning
This only happens inplace.
Parameters:
Name Type Description Defaultmodel
Module
The model to update.
requiredremove_outliers_std
float | None
The standard deviation to remove outliers.
requiredseed
int | None
The seed to use, if any.
requiredinplace
Literal[True]
Whether to do the operation inplace.
requiredRaises:
Type DescriptionValueError
If inplace
is not True
.
validate_X_predict(\n X: XType, estimator: TabPFNRegressor | TabPFNClassifier\n) -> ndarray\n
Validate the input data for prediction.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.validate_Xy_fit","title":"validate_Xy_fit","text":"validate_Xy_fit(\n X: XType,\n y: YType,\n estimator: TabPFNRegressor | TabPFNClassifier,\n *,\n max_num_features: int,\n max_num_samples: int,\n ensure_y_numeric: bool = False,\n ignore_pretraining_limits: bool = False\n) -> tuple[ndarray, ndarray, NDArray[Any] | None, int]\n
Validate the input data for fitting.
"},{"location":"reference/tabpfn/model/bar_distribution/","title":"Bar distribution","text":""},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution","title":"bar_distribution","text":""},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution","title":"BarDistribution","text":" Bases: Module
average_bar_distributions_into_this(\n list_of_bar_distributions: Sequence[BarDistribution],\n list_of_logits: Sequence[Tensor],\n *,\n average_logits: bool = False\n) -> Tensor\n
:param list_of_bar_distributions: :param list_of_logits: :param average_logits: :return:
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.cdf","title":"cdf","text":"cdf(logits: Tensor, ys: Tensor) -> Tensor\n
Calculates the cdf of the distribution described by the logits. The cdf is scaled by the width of the bars.
Parameters:
Name Type Description Defaultlogits
Tensor
tensor of shape (batch_size, ..., num_bars) with the logits describing the distribution
requiredys
Tensor
tensor of shape (batch_size, ..., n_ys to eval) or (n_ys to eval) with the targets.
required"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.cdf_temporary","title":"cdf_temporary","text":"cdf_temporary(logits: Tensor) -> Tensor\n
Cumulative distribution function.
TODO: this already exists here, make sure to merge, at the moment still used.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.get_probs_for_different_borders","title":"get_probs_for_different_borders","text":"get_probs_for_different_borders(\n logits: Tensor, new_borders: Tensor\n) -> Tensor\n
The logits describe the density of the distribution over the current self.borders.
This function returns the logits if the self.borders were changed to new_borders. This is useful to average the logits of different models.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.icdf","title":"icdf","text":"icdf(logits: Tensor, left_prob: float) -> Tensor\n
Implementation of the quantile function :param logits: Tensor of any shape, with the last dimension being logits :param left_prob: float: The probability mass to the left of the result. :return: Position with left_prob
probability weight to the left.
mean_of_square(logits: Tensor) -> Tensor\n
Computes E[x^2].
Parameters:
Name Type Description Defaultlogits
Tensor
Output of the model.
requiredReturns:
Type DescriptionTensor
mean of square
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.pi","title":"pi","text":"pi(\n logits: Tensor,\n best_f: float | Tensor,\n *,\n maximize: bool = True\n) -> Tensor\n
Acquisition Function: Probability of Improvement.
Parameters:
Name Type Description Defaultlogits
Tensor
as returned by Transformer
requiredbest_f
float | Tensor
best evaluation so far (the incumbent)
requiredmaximize
bool
whether to maximize
True
Returns:
Type DescriptionTensor
probability of improvement
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.plot","title":"plot","text":"plot(\n logits: Tensor,\n ax: Axes | None = None,\n zoom_to_quantile: float | None = None,\n **kwargs: Any\n) -> Axes\n
Plots the distribution.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.ucb","title":"ucb","text":"ucb(\n logits: Tensor,\n best_f: float,\n rest_prob: float = 1 - 0.682 / 2,\n *,\n maximize: bool = True\n) -> Tensor\n
UCB utility. Rest Prob is the amount of utility above (below) the confidence interval that is ignored.
Higher rest_prob is equivalent to lower beta in the standard GP-UCB formulation.
Parameters:
Name Type Description Defaultlogits
Tensor
Logits, as returned by the Transformer.
requiredrest_prob
float
The amount of utility above (below) the confidence interval that is ignored.
The default is equivalent to using GP-UCB with beta=1
. To get the corresponding beta
, where beta
is from the standard GP definition of UCB ucb_utility = mean + beta * std
, you can use this computation:
beta = math.sqrt(2)*torch.erfinv(torch.tensor(2*(1-rest_prob)-1))
1 - 0.682 / 2
best_f
float
Unused
requiredmaximize
bool
Whether to maximize.
True
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution","title":"FullSupportBarDistribution","text":" Bases: BarDistribution
average_bar_distributions_into_this(\n list_of_bar_distributions: Sequence[BarDistribution],\n list_of_logits: Sequence[Tensor],\n *,\n average_logits: bool = False\n) -> Tensor\n
:param list_of_bar_distributions: :param list_of_logits: :param average_logits: :return:
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.cdf","title":"cdf","text":"cdf(logits: Tensor, ys: Tensor) -> Tensor\n
Calculates the cdf of the distribution described by the logits. The cdf is scaled by the width of the bars.
Parameters:
Name Type Description Defaultlogits
Tensor
tensor of shape (batch_size, ..., num_bars) with the logits describing the distribution
requiredys
Tensor
tensor of shape (batch_size, ..., n_ys to eval) or (n_ys to eval) with the targets.
required"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.cdf_temporary","title":"cdf_temporary","text":"cdf_temporary(logits: Tensor) -> Tensor\n
Cumulative distribution function.
TODO: this already exists here, make sure to merge, at the moment still used.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.ei_for_halfnormal","title":"ei_for_halfnormal","text":"ei_for_halfnormal(\n scale: float,\n best_f: Tensor | float,\n *,\n maximize: bool = True\n) -> Tensor\n
EI for a standard normal distribution with mean 0 and variance scale
times 2.
Which is the same as the half normal EI. Tested this with MC approximation:
ei_for_halfnormal = lambda scale, best_f: (torch.distributions.HalfNormal(torch.tensor(scale)).sample((10_000_000,))- best_f ).clamp(min=0.).mean()\nprint([(ei_for_halfnormal(scale,best_f), FullSupportBarDistribution().ei_for_halfnormal(scale,best_f)) for scale in [0.1,1.,10.] for best_f in [.1,10.,4.]])\n
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.forward","title":"forward","text":"forward(\n logits: Tensor,\n y: Tensor,\n mean_prediction_logits: Tensor | None = None,\n) -> Tensor\n
Returns the negative log density (the loss).
y: T x B, logits: T x B x self.num_bars.
:param logits: Tensor of shape T x B x self.num_bars :param y: Tensor of shape T x B :param mean_prediction_logits: :return:
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.get_probs_for_different_borders","title":"get_probs_for_different_borders","text":"get_probs_for_different_borders(\n logits: Tensor, new_borders: Tensor\n) -> Tensor\n
The logits describe the density of the distribution over the current self.borders.
This function returns the logits if the self.borders were changed to new_borders. This is useful to average the logits of different models.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.icdf","title":"icdf","text":"icdf(logits: Tensor, left_prob: float) -> Tensor\n
Implementation of the quantile function :param logits: Tensor of any shape, with the last dimension being logits :param left_prob: float: The probability mass to the left of the result. :return: Position with left_prob
probability weight to the left.
mean_of_square(logits: Tensor) -> Tensor\n
Computes E[x^2].
Parameters:
Name Type Description Defaultlogits
Tensor
Output of the model.
required"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.pdf","title":"pdf","text":"pdf(logits: Tensor, y: Tensor) -> Tensor\n
Probability density function at y.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.pi","title":"pi","text":"pi(\n logits: Tensor,\n best_f: Tensor | float,\n *,\n maximize: bool = True\n) -> Tensor\n
Acquisition Function: Probability of Improvement.
Parameters:
Name Type Description Defaultlogits
Tensor
as returned by Transformer (evaluation_points x batch x feature_dim)
requiredbest_f
Tensor | float
best evaluation so far (the incumbent)
requiredmaximize
bool
whether to maximize
True
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.plot","title":"plot","text":"plot(\n logits: Tensor,\n ax: Axes | None = None,\n zoom_to_quantile: float | None = None,\n **kwargs: Any\n) -> Axes\n
Plots the distribution.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.sample","title":"sample","text":"sample(logits: Tensor, t: float = 1.0) -> Tensor\n
Samples values from the distribution.
Temperature t.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.ucb","title":"ucb","text":"ucb(\n logits: Tensor,\n best_f: float,\n rest_prob: float = 1 - 0.682 / 2,\n *,\n maximize: bool = True\n) -> Tensor\n
UCB utility. Rest Prob is the amount of utility above (below) the confidence interval that is ignored.
Higher rest_prob is equivalent to lower beta in the standard GP-UCB formulation.
Parameters:
Name Type Description Defaultlogits
Tensor
Logits, as returned by the Transformer.
requiredrest_prob
float
The amount of utility above (below) the confidence interval that is ignored.
The default is equivalent to using GP-UCB with beta=1
. To get the corresponding beta
, where beta
is from the standard GP definition of UCB ucb_utility = mean + beta * std
, you can use this computation:
beta = math.sqrt(2)*torch.erfinv(torch.tensor(2*(1-rest_prob)-1))
1 - 0.682 / 2
best_f
float
Unused
requiredmaximize
bool
Whether to maximize.
True
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.get_bucket_limits","title":"get_bucket_limits","text":"get_bucket_limits(\n num_outputs: int,\n full_range: tuple | None = None,\n ys: Tensor | None = None,\n *,\n verbose: bool = False,\n widen_bucket_limits_factor: float | None = None\n) -> Tensor\n
Decide for a set of bucket limits based on a distritbution of ys.
Parameters:
Name Type Description Defaultnum_outputs
int
This is only tested for num_outputs=1, but should work for larger num_outputs as well.
requiredfull_range
tuple | None
If ys is not passed, this is the range of the ys that should be used to estimate the bucket limits.
None
ys
Tensor | None
If ys is passed, this is the ys that should be used to estimate the bucket limits. Do not pass full_range in this case.
None
verbose
bool
Unused
False
widen_bucket_limits_factor
float | None
If set, the bucket limits are widened by this factor. This allows to have a slightly larger range than the actual data.
None
"},{"location":"reference/tabpfn/model/config/","title":"Config","text":""},{"location":"reference/tabpfn/model/config/#tabpfn.model.config","title":"config","text":""},{"location":"reference/tabpfn/model/config/#tabpfn.model.config.InferenceConfig","title":"InferenceConfig dataclass
","text":"Configuration for the TabPFN model.
"},{"location":"reference/tabpfn/model/config/#tabpfn.model.config.InferenceConfig.from_dict","title":"from_dictclassmethod
","text":"from_dict(config: dict) -> InferenceConfig\n
Create a Config object from a dictionary.
This method also does some sanity checking initially.
"},{"location":"reference/tabpfn/model/encoders/","title":"Encoders","text":""},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders","title":"encoders","text":""},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.CategoricalInputEncoderPerFeatureEncoderStep","title":"CategoricalInputEncoderPerFeatureEncoderStep","text":" Bases: SeqEncStep
Expects input of size 1.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.CategoricalInputEncoderPerFeatureEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.FrequencyFeatureEncoderStep","title":"FrequencyFeatureEncoderStep","text":" Bases: SeqEncStep
Encoder step to add frequency-based features to the input.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.FrequencyFeatureEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.InputEncoder","title":"InputEncoder","text":" Bases: Module
Base class for input encoders.
All input encoders should subclass this class and implement the forward
method.
forward(x: Tensor, single_eval_pos: int) -> Tensor\n
Encode the input tensor.
Parameters:
Name Type Description Defaultx
Tensor
The input tensor to encode.
requiredsingle_eval_pos
int
The position to use for single evaluation.
requiredReturns:
Type DescriptionTensor
The encoded tensor.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.InputNormalizationEncoderStep","title":"InputNormalizationEncoderStep","text":" Bases: SeqEncStep
Encoder step to normalize the input in different ways.
Can be used to normalize the input to a ranking, remove outliers, or normalize the input to have unit variance.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.InputNormalizationEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.InputNormalizationEncoderStep.reset_seed","title":"reset_seed","text":"reset_seed() -> None\n
Reset the random seed.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.LinearInputEncoder","title":"LinearInputEncoder","text":" Bases: Module
A simple linear input encoder.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.LinearInputEncoder.forward","title":"forward","text":"forward(*x: Tensor, **kwargs: Any) -> tuple[Tensor]\n
Apply the linear transformation to the input.
Parameters:
Name Type Description Default*x
Tensor
The input tensors to concatenate and transform.
()
**kwargs
Any
Unused keyword arguments.
{}
Returns:
Type Descriptiontuple[Tensor]
A tuple containing the transformed tensor.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.LinearInputEncoderStep","title":"LinearInputEncoderStep","text":" Bases: SeqEncStep
A simple linear input encoder step.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.LinearInputEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.NanHandlingEncoderStep","title":"NanHandlingEncoderStep","text":" Bases: SeqEncStep
Encoder step to handle NaN and infinite values in the input.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.NanHandlingEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.RemoveDuplicateFeaturesEncoderStep","title":"RemoveDuplicateFeaturesEncoderStep","text":" Bases: SeqEncStep
Encoder step to remove duplicate features.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.RemoveDuplicateFeaturesEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.RemoveEmptyFeaturesEncoderStep","title":"RemoveEmptyFeaturesEncoderStep","text":" Bases: SeqEncStep
Encoder step to remove empty (constant) features.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.RemoveEmptyFeaturesEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.SeqEncStep","title":"SeqEncStep","text":" Bases: Module
Abstract base class for sequential encoder steps.
SeqEncStep is a wrapper around a module that defines the expected input keys and the produced output keys. The outputs are assigned to the output keys in the order specified by out_keys
.
Subclasses should either implement _forward
or _fit
and _transform
. Subclasses that transform x
should always use _fit
and _transform
, creating any state that depends on the train set in _fit
and using it in _transform
. This allows fitting on data first and doing inference later without refitting. Subclasses that work with y
can alternatively use _forward
instead.
forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.SequentialEncoder","title":"SequentialEncoder","text":" Bases: Sequential
, InputEncoder
An encoder that applies a sequence of encoder steps.
SequentialEncoder allows building an encoder from a sequence of EncoderSteps. The input is passed through each step in the provided order.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.SequentialEncoder.forward","title":"forward","text":"forward(input: dict, **kwargs: Any) -> Tensor\n
Apply the sequence of encoder steps to the input.
Parameters:
Name Type Description Defaultinput
dict
The input state dictionary. If the input is not a dict and the first layer expects one input key, the input tensor is mapped to the key expected by the first layer.
required**kwargs
Any
Additional keyword arguments passed to each encoder step.
{}
Returns:
Type DescriptionTensor
The output of the final encoder step.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.VariableNumFeaturesEncoderStep","title":"VariableNumFeaturesEncoderStep","text":" Bases: SeqEncStep
Encoder step to handle variable number of features.
Transforms the input to a fixed number of features by appending zeros. Also normalizes the input by the number of used features to keep the variance of the input constant, even when zeros are appended.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.VariableNumFeaturesEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.normalize_data","title":"normalize_data","text":"normalize_data(\n data: Tensor,\n *,\n normalize_positions: int = -1,\n return_scaling: bool = False,\n clip: bool = True,\n std_only: bool = False,\n mean: Tensor | None = None,\n std: Tensor | None = None\n) -> Tensor | tuple[Tensor, tuple[Tensor, Tensor]]\n
Normalize data to mean 0 and std 1.
Parameters:
Name Type Description Defaultdata
Tensor
The data to normalize. (T, B, H)
requirednormalize_positions
int
If > 0, only use the first normalize_positions
positions for normalization.
-1
return_scaling
bool
If True, return the scaling parameters as well (mean, std).
False
std_only
bool
If True, only divide by std.
False
clip
bool
If True, clip the data to [-100, 100].
True
mean
Tensor | None
If given, use this value instead of computing it.
None
std
Tensor | None
If given, use this value instead of computing it.
None
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.select_features","title":"select_features","text":"select_features(x: Tensor, sel: Tensor) -> Tensor\n
Select features from the input tensor based on the selection mask.
Parameters:
Name Type Description Defaultx
Tensor
The input tensor.
requiredsel
Tensor
The boolean selection mask indicating which features to keep.
requiredReturns:
Type DescriptionTensor
The tensor with selected features.
"},{"location":"reference/tabpfn/model/layer/","title":"Layer","text":""},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer","title":"layer","text":""},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer.LayerNorm","title":"LayerNorm","text":" Bases: LayerNorm
Custom LayerNorm module that supports saving peak memory factor.
This module extends the PyTorch LayerNorm implementation to handle FP16 inputs efficiently and support saving peak memory factor.
Parameters:
Name Type Description Default*args
Any
Positional arguments passed to the base LayerNorm class.
()
**kwargs
Any
Keyword arguments passed to the base LayerNorm class.
{}
"},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer.LayerNorm.forward","title":"forward","text":"forward(\n input: Tensor,\n *,\n allow_inplace: bool = False,\n save_peak_mem_factor: int | None = None\n) -> Tensor\n
Perform layer normalization on the input tensor.
Parameters:
Name Type Description Defaultinput
Tensor
The input tensor.
requiredallow_inplace
bool
Whether to allow in-place operations. Default is False.
False
save_peak_mem_factor
int | None
The factor to save peak memory. Default is None.
None
Returns:
Type DescriptionTensor
The layer normalized tensor.
"},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer.PerFeatureEncoderLayer","title":"PerFeatureEncoderLayer","text":" Bases: Module
Transformer encoder layer that processes each feature block separately.
This layer consists of multi-head attention between features, multi-head attention between items, and feedforward neural networks (MLPs).
It supports various configurations and optimization options.
Parameters:
Name Type Description Defaultd_model
int
The dimensionality of the input and output embeddings.
requirednhead
int
The number of attention heads.
requireddim_feedforward
int | None
The dimensionality of the feedforward network. Default is None (2 * d_model).
None
activation
str
The activation function to use in the MLPs.
'relu'
layer_norm_eps
float
The epsilon value for layer normalization.
1e-05
pre_norm
bool
Whether to apply layer normalization before or after the attention and MLPs.
False
device
device | None
The device to use for the layer parameters.
None
dtype
dtype | None
The data type to use for the layer parameters.
None
recompute_attn
bool
Whether to recompute attention during backpropagation.
False
second_mlp
bool
Whether to include a second MLP in the layer.
False
layer_norm_with_elementwise_affine
bool
Whether to use elementwise affine parameters in layer normalization.
False
zero_init
bool
Whether to initialize the output of the MLPs to zero.
False
save_peak_mem_factor
int | None
The factor to save peak memory, only effective with post-norm.
None
attention_between_features
bool
Whether to apply attention between feature blocks.
True
multiquery_item_attention
bool
Whether to use multiquery attention for items.
False
multiquery_item_attention_for_test_set
bool
Whether to use multiquery attention for the test set.
False
attention_init_gain
float
The gain value for initializing attention parameters.
1.0
d_k
int | None
The dimensionality of the query and key vectors. Default is (d_model // nhead).
None
d_v
int | None
The dimensionality of the value vectors. Default is (d_model // nhead).
None
precomputed_kv
None | Tensor | tuple[Tensor, Tensor]
Precomputed key-value pairs for attention.
None
"},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer.PerFeatureEncoderLayer.empty_trainset_representation_cache","title":"empty_trainset_representation_cache","text":"empty_trainset_representation_cache() -> None\n
Empty the trainset representation cache.
"},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer.PerFeatureEncoderLayer.forward","title":"forward","text":"forward(\n state: Tensor,\n single_eval_pos: int | None = None,\n *,\n cache_trainset_representation: bool = False,\n att_src: Tensor | None = None\n) -> Tensor\n
Pass the input through the encoder layer.
Parameters:
Name Type Description Defaultstate
Tensor
The transformer state passed as input to the layer of shape (batch_size, num_items, num_feature_blocks, d_model).
requiredsingle_eval_pos
int | None
The position from which on everything is treated as test set.
None
cache_trainset_representation
bool
Whether to cache the trainset representation. If single_eval_pos is set (> 0 and not None), create a cache of the trainset KV. This may require a lot of memory. Otherwise, use cached KV representations for inference.
False
att_src
Tensor | None
The tensor to attend to from the final layer of the encoder. It has a shape of (batch_size, num_train_items, num_feature_blocks, d_model). This does not work with multiquery_item_attention_for_test_set and cache_trainset_representation at this point.
None
Returns:
Type DescriptionTensor
The transformer state passed through the encoder layer.
"},{"location":"reference/tabpfn/model/loading/","title":"Loading","text":""},{"location":"reference/tabpfn/model/loading/#tabpfn.model.loading","title":"loading","text":""},{"location":"reference/tabpfn/model/loading/#tabpfn.model.loading.download_model","title":"download_model","text":"download_model(\n to: Path,\n *,\n version: Literal[\"v2\"],\n which: Literal[\"classifier\", \"regressor\"],\n model_name: str | None = None\n) -> Literal[\"ok\"] | list[Exception]\n
Download a TabPFN model, trying all available sources.
Parameters:
Name Type Description Defaultto
Path
The directory to download the model to.
requiredversion
Literal['v2']
The version of the model to download.
requiredwhich
Literal['classifier', 'regressor']
The type of model to download.
requiredmodel_name
str | None
Optional specific model name to download.
None
Returns:
Type DescriptionLiteral['ok'] | list[Exception]
\"ok\" if the model was downloaded successfully, otherwise a list of
Literal['ok'] | list[Exception]
exceptions that occurred that can be handled as desired.
"},{"location":"reference/tabpfn/model/loading/#tabpfn.model.loading.load_model","title":"load_model","text":"load_model(*, path: Path, model_seed: int) -> tuple[\n PerFeatureTransformer,\n BCEWithLogitsLoss\n | CrossEntropyLoss\n | FullSupportBarDistribution,\n InferenceConfig,\n]\n
Loads a model from a given path.
Parameters:
Name Type Description Defaultpath
Path
Path to the checkpoint
requiredmodel_seed
int
The seed to use for the model
required"},{"location":"reference/tabpfn/model/memory/","title":"Memory","text":""},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory","title":"memory","text":""},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator","title":"MemoryUsageEstimator","text":""},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.convert_bytes_to_unit","title":"convert_bytes_to_unitclassmethod
","text":"convert_bytes_to_unit(\n value: float, unit: Literal[\"b\", \"mb\", \"gb\"]\n) -> float\n
Convenience method to convert bytes to a different unit.
Parameters:
Name Type Description Defaultvalue
float
The number of bytes.
requiredunit
Literal['b', 'mb', 'gb']
The unit to convert to.
requiredReturns:
Type Descriptionfloat
The number of bytes in the new unit.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.convert_units","title":"convert_unitsclassmethod
","text":"convert_units(\n value: float,\n from_unit: Literal[\"b\", \"mb\", \"gb\"],\n to_unit: Literal[\"b\", \"mb\", \"gb\"],\n) -> float\n
Convert a value from one unit to another.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.estimate_memory_of_one_batch","title":"estimate_memory_of_one_batchclassmethod
","text":"estimate_memory_of_one_batch(\n X: Tensor,\n model: Module,\n *,\n cache_kv: bool,\n dtype_byte_size: int,\n unit: Literal[\"b\", \"mb\", \"gb\"] = \"gb\",\n n_train_samples: int | None = None\n) -> float\n
Estimate the memory usage of a single batch.
The calculation is done based on the assumption that save_peak_mem_factor is not used (since this estimation is used to determine whether to use it).
Parameters:
Name Type Description DefaultX
Tensor
The input tensor.
requiredmodel
Module
The model to estimate the memory usage of.
requiredcache_kv
bool
Whether key and value tensors are cached.
requireddtype_byte_size
int
The size of the data type in bytes.
requiredunit
Literal['b', 'mb', 'gb']
The unit to convert the memory usage to.
'gb'
n_train_samples
int | None
The number of training samples (only for cache_kv mode)
None
Returns:
Type Descriptionfloat
The estimated memory usage of a single batch.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.estimate_memory_remainder_after_batch","title":"estimate_memory_remainder_after_batchclassmethod
","text":"estimate_memory_remainder_after_batch(\n X: Tensor,\n model: Module,\n *,\n cache_kv: bool,\n device: device,\n dtype_byte_size: int,\n safety_factor: float,\n n_train_samples: int | None = None,\n max_free_mem: float | int | None = None\n) -> float\n
Whether to save peak memory or not.
Parameters:
Name Type Description DefaultX
Tensor
The input tensor.
requiredmodel
Module
The model to estimate the memory usage of.
requiredcache_kv
bool
Whether key and value tensors are cached.
requireddevice
device
The device to use.
requireddtype_byte_size
int
The size of the data type in bytes.
requiredsafety_factor
float
The safety factor to apply.
requiredn_train_samples
int | None
The number of training samples (only for cache_kv mode)
None
max_free_mem
float | int | None
The amount of free memory available.
None
Returns:
Type Descriptionfloat
The amount of free memory available after a batch is computed.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.get_max_free_memory","title":"get_max_free_memoryclassmethod
","text":"get_max_free_memory(\n device: device,\n *,\n unit: Literal[\"b\", \"mb\", \"gb\"] = \"gb\",\n default_gb_cpu_if_failed_to_calculate: float\n) -> float\n
How much memory to use at most in GB, the memory usage will be calculated based on an estimation of the systems free memory.
For CUDA will use the free memory of the GPU. For CPU will default to 32 GB.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.get_max_free_memory--returns","title":"Returns:","text":"The maximum memory usage in GB.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.reset_peak_memory_if_required","title":"reset_peak_memory_if_requiredclassmethod
","text":"reset_peak_memory_if_required(\n save_peak_mem: bool | Literal[\"auto\"] | float | int,\n model: Module,\n X: Tensor,\n *,\n cache_kv: bool,\n device: device,\n dtype_byte_size: int,\n safety_factor: float = 5.0,\n n_train_samples: int | None = None\n) -> None\n
Reset the peak memory if required.
Parameters:
Name Type Description Defaultsave_peak_mem
bool | 'auto' | float | int
If bool, specifies whether to save peak memory or not. If \"auto\", the amount of free memory is estimated and the option is enabled or disabled based on the estimated usage. If float or int, it is considered as the amount of memory available (in GB) explicitly specified by the user. In this case, this value is used to estimate whether or not to save peak memory.
requiredmodel
Module
The model to reset the peak memory of.
requiredX
Tensor
The input tensor.
requiredcache_kv
bool
Whether key and value tensors are cached.
requireddevice
device
The device to use.
requireddtype_byte_size
int
The size of the data type in bytes.
requiredsafety_factor
float
The safety factor to apply.
5.0
n_train_samples
int
The number of training samples (to be used only for cache_kv mode)
None
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.support_save_peak_mem_factor","title":"support_save_peak_mem_factor","text":"support_save_peak_mem_factor(\n method: MethodType,\n) -> Callable\n
Can be applied to a method acting on a tensor 'x' whose first dimension is a flat batch dimension (i.e. the operation is trivially parallel over the first dimension).
For additional tensor arguments, it is assumed that the first dimension is again the batch dimension, and that non-tensor arguments can be passed as-is to splits when parallelizing over the batch dimension.
The decorator adds options 'add_input' to add the principal input 'x' to the result of the method and 'allow_inplace'. By setting 'allow_inplace', the caller indicates that 'x' is not used after the call and its buffer can be reused for the output.
Setting 'allow_inplace' does not ensure that the operation will be inplace, and the return value should be used for clarity and simplicity.
Moreover, it adds an optional int parameter 'save_peak_mem_factor' that is only supported in combination with 'allow_inplace' during inference and subdivides the operation into the specified number of chunks to reduce peak memory consumption.
"},{"location":"reference/tabpfn/model/mlp/","title":"Mlp","text":""},{"location":"reference/tabpfn/model/mlp/#tabpfn.model.mlp","title":"mlp","text":""},{"location":"reference/tabpfn/model/mlp/#tabpfn.model.mlp.Activation","title":"Activation","text":" Bases: Enum
Enum for activation functions.
"},{"location":"reference/tabpfn/model/mlp/#tabpfn.model.mlp.MLP","title":"MLP","text":" Bases: Module
Multi-Layer Perceptron (MLP) module.
This module consists of two linear layers with an activation function in between. It supports various configurations such as the hidden size, activation function, initializing the output to zero, and recomputing the forward pass during backpropagation.
Parameters:
Name Type Description Defaultsize
int
The input and output size of the MLP.
requiredhidden_size
int
The size of the hidden layer.
requiredactivation
Activation | str
The activation function to use. Can be either an Activation enum or a string representing the activation name.
requireddevice
device | None
The device to use for the linear layers.
requireddtype
dtype | None
The data type to use for the linear layers.
requiredinitialize_output_to_zero
bool
Whether to initialize the output layer weights to zero. Default is False.
False
recompute
bool
Whether to recompute the forward pass during backpropagation. This can save memory but increase computation time. Default is False.
False
Attributes:
Name Type Descriptionlinear1
Linear
The first linear layer.
linear2
Linear
The second linear layer.
activation
Activation
The activation function to use.
Examplemlp = MLP(size=128, hidden_size=256, activation='gelu', device='cuda') x = torch.randn(32, 128, device='cuda', dtype=torch.float32) output = mlp(x)
"},{"location":"reference/tabpfn/model/mlp/#tabpfn.model.mlp.MLP.forward","title":"forward","text":"forward(\n x: Tensor,\n *,\n add_input: bool = False,\n allow_inplace: bool = False,\n save_peak_mem_factor: int | None = None\n) -> Tensor\n
Performs the forward pass of the MLP.
Parameters:
Name Type Description Defaultx
Tensor
The input tensor.
requiredadd_input
bool
Whether to add input to the output. Default is False.
False
allow_inplace
bool
Indicates that 'x' is not used after the call and its buffer can be reused for the output. The operation is not guaranteed to be inplace. Default is False.
False
save_peak_mem_factor
int | None
If provided, enables a memory-saving technique that reduces peak memory usage during the forward pass. This requires 'add_input' and 'allow_inplace' to be True. See the documentation of the decorator 'support_save_peak_mem_factor' for details. Default is None.
None
"},{"location":"reference/tabpfn/model/multi_head_attention/","title":"Multi head attention","text":""},{"location":"reference/tabpfn/model/multi_head_attention/#tabpfn.model.multi_head_attention","title":"multi_head_attention","text":""},{"location":"reference/tabpfn/model/multi_head_attention/#tabpfn.model.multi_head_attention.MultiHeadAttention","title":"MultiHeadAttention","text":" Bases: Module
forward(\n x: Tensor,\n x_kv: Tensor | None = None,\n *,\n cache_kv: bool = False,\n add_input: bool = False,\n allow_inplace: bool = False,\n save_peak_mem_factor: int | None = None,\n reuse_first_head_kv: bool = False,\n only_cache_first_head_kv: bool = False,\n use_cached_kv: bool = False,\n use_second_set_of_queries: bool = False\n)\n
X is the current hidden and has a shape of [batch, ..., seq_len, input_size]. If keys and values are present in the cache and 'freeze_kv' is not set, they are obtained from there and 'x_kv' has to be None. Else, if 'x_kv' is not None, keys and values are obtained by applying the respective linear transformations to 'x_kv'. Else, keys and values are attained by applying the respective linear transformations to 'x' (self attention).
"},{"location":"reference/tabpfn/model/preprocessing/","title":"Preprocessing","text":""},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing","title":"preprocessing","text":""},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.AddFingerprintFeaturesStep","title":"AddFingerprintFeaturesStep","text":" Bases: FeaturePreprocessingTransformerStep
Adds a fingerprint feature to the features based on hash of each row.
If is_test = True
, it keeps the first hash even if there are collisions. If is_test = False
, it handles hash collisions by counting up and rehashing until a unique hash is found.
fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fits the preprocessor.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.AddFingerprintFeaturesStep.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transforms the data.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.FeaturePreprocessingTransformerStep","title":"FeaturePreprocessingTransformerStep","text":"Base class for feature preprocessing steps.
It's main abstraction is really just to provide categorical indices along the pipeline.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.FeaturePreprocessingTransformerStep.fit","title":"fit","text":"fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fits the preprocessor.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.FeaturePreprocessingTransformerStep.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transforms the data.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.KDITransformerWithNaN","title":"KDITransformerWithNaN","text":" Bases: KDITransformer
KDI transformer that can handle NaN values. It performs KDI with NaNs replaced by mean values and then fills the NaN values with NaNs after the transformation.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.RemoveConstantFeaturesStep","title":"RemoveConstantFeaturesStep","text":" Bases: FeaturePreprocessingTransformerStep
Remove features that are constant in the training data.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.RemoveConstantFeaturesStep.fit","title":"fit","text":"fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fits the preprocessor.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.RemoveConstantFeaturesStep.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transforms the data.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ReshapeFeatureDistributionsStep","title":"ReshapeFeatureDistributionsStep","text":" Bases: FeaturePreprocessingTransformerStep
Reshape the feature distributions using different transformations.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ReshapeFeatureDistributionsStep.fit","title":"fit","text":"fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fits the preprocessor.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ReshapeFeatureDistributionsStep.get_adaptive_preprocessors","title":"get_adaptive_preprocessorsstaticmethod
","text":"get_adaptive_preprocessors(\n num_examples: int = 100, random_state: int | None = None\n) -> dict[str, ColumnTransformer]\n
Returns a dictionary of adaptive column transformers that can be used to preprocess the data. Adaptive column transformers are used to preprocess the data based on the column type, they receive a pandas dataframe with column names, that indicate the column type. Column types are not datatypes, but rather a string that indicates how the data should be preprocessed.
Parameters:
Name Type Description Defaultnum_examples
int
The number of examples in the dataset.
100
random_state
int | None
The random state to use for the transformers.
None
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ReshapeFeatureDistributionsStep.get_column_types","title":"get_column_types staticmethod
","text":"get_column_types(X: ndarray) -> list[str]\n
Returns a list of column types for the given data, that indicate how the data should be preprocessed.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ReshapeFeatureDistributionsStep.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transforms the data.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.SafePowerTransformer","title":"SafePowerTransformer","text":" Bases: PowerTransformer
Power Transformer which reverts features back to their original values if they are transformed to large values or the output column does not have unit variance. This happens e.g. when the input data has a large number of outliers.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.SequentialFeatureTransformer","title":"SequentialFeatureTransformer","text":" Bases: UserList
A transformer that applies a sequence of feature preprocessing steps. This is very related to sklearn's Pipeline, but it is designed to work with categorical_features lists that are always passed on.
Currently this class is only used once, thus this could also be made less general if needed.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.SequentialFeatureTransformer.fit","title":"fit","text":"fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fit all the steps in the pipeline.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.SequentialFeatureTransformer.fit_transform","title":"fit_transform","text":"fit_transform(\n X: ndarray, categorical_features: list[int]\n) -> _TransformResult\n
Fit and transform the data using the fitted pipeline.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical features.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.SequentialFeatureTransformer.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transform the data using the fitted pipeline.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ShuffleFeaturesStep","title":"ShuffleFeaturesStep","text":" Bases: FeaturePreprocessingTransformerStep
Shuffle the features in the data.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ShuffleFeaturesStep.fit","title":"fit","text":"fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fits the preprocessor.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ShuffleFeaturesStep.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transforms the data.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.add_safe_standard_to_safe_power_without_standard","title":"add_safe_standard_to_safe_power_without_standard","text":"add_safe_standard_to_safe_power_without_standard(\n input_transformer: TransformerMixin,\n) -> Pipeline\n
In edge cases PowerTransformer can create inf values and similar. Then, the post standard scale crashes. This fixes this issue.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.make_box_cox_safe","title":"make_box_cox_safe","text":"make_box_cox_safe(\n input_transformer: TransformerMixin | Pipeline,\n) -> Pipeline\n
Make box cox save.
The Box-Cox transformation can only be applied to strictly positive data. With first MinMax scaling, we achieve this without loss of function. Additionally, for test data, we also need clipping.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.skew","title":"skew","text":"skew(x: ndarray) -> float\n
"},{"location":"reference/tabpfn/model/transformer/","title":"Transformer","text":""},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer","title":"transformer","text":""},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer.LayerStack","title":"LayerStack","text":" Bases: Module
Similar to nn.Sequential, but with support for passing keyword arguments to layers and stacks the same layer multiple times.
"},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer.PerFeatureTransformer","title":"PerFeatureTransformer","text":" Bases: Module
A Transformer model processes a token per feature and sample.
This model extends the standard Transformer architecture to operate on a per-feature basis. It allows for processing each feature separately while still leveraging the power of self-attention.
The model consists of an encoder, decoder, and optional components such as a feature positional embedding and a separate decoder for each feature.
"},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer.PerFeatureTransformer.forward","title":"forward","text":"forward(*args: Any, **kwargs: Any) -> dict[str, Tensor]\n
Performs a forward pass through the model.
This method supports multiple calling conventions:
model((x,y), **kwargs)
model(train_x, train_y, test_x, **kwargs)
model((style,x,y), **kwargs)
Parameters:
Name Type Description Defaulttrain_x
torch.Tensor | None The input data for the training set.
requiredtrain_y
torch.Tensor | None The target data for the training set.
requiredtest_x
torch.Tensor | None The input data for the test set.
requiredx
torch.Tensor The input data.
requiredy
torch.Tensor | None The target data.
requiredstyle
torch.Tensor | None The style vector.
requiredsingle_eval_pos
int The position to evaluate at.
requiredonly_return_standard_out
bool Whether to only return the standard output.
requireddata_dags
Any The data DAGs for each example.
requiredcategorical_inds
list[int] The indices of categorical features.
requiredfreeze_kv
bool Whether to freeze the key and value weights.
requiredReturns:
Type Descriptiondict[str, Tensor]
The output of the model, which can be a tensor or a dictionary of tensors.
"},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer.PerFeatureTransformer.reset_save_peak_mem_factor","title":"reset_save_peak_mem_factor","text":"reset_save_peak_mem_factor(\n factor: int | None = None,\n) -> None\n
Sets the save_peak_mem_factor for all layers.
This factor controls how much memory is saved during the forward pass in inference mode.
Setting this factor > 1 will cause the model to save more memory during the forward pass in inference mode.
A value of 8 is good for a 4x larger width in the fully-connected layers. and yields a situation were we need around 2*num_features*num_items*emsize*2
bytes of memory
for a forward pass (using mixed precision).
WARNING: It should only be used with post-norm.
Parameters:
Name Type Description Defaultfactor
int | None
The save_peak_mem_factor to set. Recommended value is 8.
None
"},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer.SerializableGenerator","title":"SerializableGenerator","text":" Bases: Generator
A serializable version of the torch.Generator, that cna be saved and pickled.
"},{"location":"reference/tabpfn_client/browser_auth/","title":"Browser auth","text":""},{"location":"reference/tabpfn_client/browser_auth/#tabpfn_client.browser_auth","title":"browser_auth","text":""},{"location":"reference/tabpfn_client/browser_auth/#tabpfn_client.browser_auth.BrowserAuthHandler","title":"BrowserAuthHandler","text":""},{"location":"reference/tabpfn_client/browser_auth/#tabpfn_client.browser_auth.BrowserAuthHandler.try_browser_login","title":"try_browser_login","text":"try_browser_login() -> Tuple[bool, Optional[str]]\n
Attempts to perform browser-based login Returns (success: bool, token: Optional[str])
"},{"location":"reference/tabpfn_client/client/","title":"Client","text":""},{"location":"reference/tabpfn_client/client/#tabpfn_client.client","title":"client","text":""},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager","title":"DatasetUIDCacheManager","text":"Manages a cache of the last 50 uploaded datasets, tracking dataset hashes and their UIDs.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager.add_dataset_uid","title":"add_dataset_uid","text":"add_dataset_uid(hash: str, dataset_uid: str)\n
Adds a new dataset to the cache, removing the oldest item if the cache exceeds 50 entries. Assumes the dataset is not already in the cache.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager.delete_uid","title":"delete_uid","text":"delete_uid(dataset_uid: str) -> Optional[str]\n
Deletes an entry from the cache based on the dataset UID.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager.get_dataset_uid","title":"get_dataset_uid","text":"get_dataset_uid(*args)\n
Generates hash by all received arguments and returns cached dataset uid if in cache, otherwise None.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager.load_cache","title":"load_cache","text":"load_cache()\n
Loads the cache from disk if it exists, otherwise initializes an empty cache.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager.save_cache","title":"save_cache","text":"save_cache()\n
Saves the current cache to disk.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.GCPOverloaded","title":"GCPOverloaded","text":" Bases: Exception
Exception raised when the Google Cloud Platform service is overloaded or unavailable.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient","title":"ServiceClient","text":" Bases: Singleton
Singleton class for handling communication with the server. It encapsulates all the API calls to the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.delete_all_datasets","title":"delete_all_datasetsclassmethod
","text":"delete_all_datasets() -> [str]\n
Delete all datasets uploaded by the user from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.delete_all_datasets--returns","title":"Returns","text":"deleted_dataset_uids : [str] The list of deleted dataset UIDs.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.delete_dataset","title":"delete_datasetclassmethod
","text":"delete_dataset(dataset_uid: str) -> list[str]\n
Delete the dataset with the provided UID from the server. Note that deleting a train set with lead to deleting all associated test sets.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.delete_dataset--parameters","title":"Parameters","text":"dataset_uid : str The UID of the dataset to be deleted.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.delete_dataset--returns","title":"Returns","text":"deleted_dataset_uids : [str] The list of deleted dataset UIDs.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.download_all_data","title":"download_all_dataclassmethod
","text":"download_all_data(save_dir: Path) -> Union[Path, None]\n
Download all data uploaded by the user from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.download_all_data--returns","title":"Returns","text":"save_path : Path | None The path to the downloaded file. Return None if download fails.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.fit","title":"fitclassmethod
","text":"fit(X, y, config=None) -> str\n
Upload a train set to server and return the train set UID if successful.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.fit--parameters","title":"Parameters","text":"X : array-like of shape (n_samples, n_features) The training input samples. y : array-like of shape (n_samples,) or (n_samples, n_outputs) The target values. config : dict, optional Configuration for the fit method. Includes tabpfn_systems and paper_version.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.fit--returns","title":"Returns","text":"train_set_uid : str The unique ID of the train set in the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.get_data_summary","title":"get_data_summaryclassmethod
","text":"get_data_summary() -> dict\n
Get the data summary of the user from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.get_data_summary--returns","title":"Returns","text":"data_summary : dict The data summary returned from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.get_password_policy","title":"get_password_policyclassmethod
","text":"get_password_policy() -> dict\n
Get the password policy from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.get_password_policy--returns","title":"Returns","text":"password_policy : {} The password policy returned from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.is_auth_token_outdated","title":"is_auth_token_outdatedclassmethod
","text":"is_auth_token_outdated(access_token) -> Union[bool, None]\n
Check if the provided access token is valid and return True if successful.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.login","title":"loginclassmethod
","text":"login(email: str, password: str) -> tuple[str, str]\n
Login with the provided credentials and return the access token if successful.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.login--parameters","title":"Parameters","text":"email : str password : str
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.login--returns","title":"Returns","text":"access_token : str | None The access token returned from the server. Return None if login fails. message : str The message returned from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.predict","title":"predictclassmethod
","text":"predict(\n train_set_uid: str,\n x_test,\n task: Literal[\"classification\", \"regression\"],\n predict_params: Union[dict, None] = None,\n tabpfn_config: Union[dict, None] = None,\n X_train=None,\n y_train=None,\n) -> dict[str, ndarray]\n
Predict the class labels for the provided data (test set).
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.predict--parameters","title":"Parameters","text":"train_set_uid : str The unique ID of the train set in the server. x_test : array-like of shape (n_samples, n_features) The test input.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.predict--returns","title":"Returns","text":"y_pred : array-like of shape (n_samples,) The predicted class labels.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.register","title":"registerclassmethod
","text":"register(\n email: str,\n password: str,\n password_confirm: str,\n validation_link: str,\n additional_info: dict,\n)\n
Register a new user with the provided credentials.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.register--parameters","title":"Parameters","text":"email : str password : str password_confirm : str validation_link: str additional_info : dict
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.register--returns","title":"Returns","text":"is_created : bool True if the user is created successfully. message : str The message returned from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.retrieve_greeting_messages","title":"retrieve_greeting_messagesclassmethod
","text":"retrieve_greeting_messages() -> list[str]\n
Retrieve greeting messages that are new for the user.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.send_reset_password_email","title":"send_reset_password_emailclassmethod
","text":"send_reset_password_email(email: str) -> tuple[bool, str]\n
Let the server send an email for resetting the password.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.send_verification_email","title":"send_verification_emailclassmethod
","text":"send_verification_email(\n access_token: str,\n) -> tuple[bool, str]\n
Let the server send an email for verifying the email.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.try_browser_login","title":"try_browser_loginclassmethod
","text":"try_browser_login() -> tuple[bool, str]\n
Attempts browser-based login flow Returns (success: bool, message: str)
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.try_connection","title":"try_connectionclassmethod
","text":"try_connection() -> bool\n
Check if server is reachable and accepts the connection.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.validate_email","title":"validate_emailclassmethod
","text":"validate_email(email: str) -> tuple[bool, str]\n
Send entered email to server that checks if it is valid and not already in use.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.validate_email--parameters","title":"Parameters","text":"email : str
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.validate_email--returns","title":"Returns","text":"is_valid : bool True if the email is valid. message : str The message returned from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.verify_email","title":"verify_emailclassmethod
","text":"verify_email(\n token: str, access_token: str\n) -> tuple[bool, str]\n
Verify the email with the provided token.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.verify_email--parameters","title":"Parameters","text":"token : str access_token : str
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.verify_email--returns","title":"Returns","text":"is_verified : bool True if the email is verified successfully. message : str The message returned from the server.
"},{"location":"reference/tabpfn_client/config/","title":"Config","text":""},{"location":"reference/tabpfn_client/config/#tabpfn_client.config","title":"config","text":""},{"location":"reference/tabpfn_client/config/#tabpfn_client.config.Config","title":"Config","text":""},{"location":"reference/tabpfn_client/constants/","title":"Constants","text":""},{"location":"reference/tabpfn_client/constants/#tabpfn_client.constants","title":"constants","text":""},{"location":"reference/tabpfn_client/estimator/","title":"Estimator","text":""},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator","title":"estimator","text":""},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNClassifier","title":"TabPFNClassifier","text":" Bases: BaseEstimator
, ClassifierMixin
, TabPFNModelSelection
predict(X)\n
Predict class labels for samples in X.
Parameters:
Name Type Description DefaultX
The input samples.
requiredReturns:
Type DescriptionThe predicted class labels.
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNClassifier.predict_proba","title":"predict_proba","text":"predict_proba(X)\n
Predict class probabilities for X.
Parameters:
Name Type Description DefaultX
The input samples.
requiredReturns:
Type DescriptionThe class probabilities of the input samples.
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNModelSelection","title":"TabPFNModelSelection","text":"Base class for TabPFN model selection and path handling.
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNRegressor","title":"TabPFNRegressor","text":" Bases: BaseEstimator
, RegressorMixin
, TabPFNModelSelection
predict(\n X: ndarray,\n output_type: Literal[\n \"mean\",\n \"median\",\n \"mode\",\n \"quantiles\",\n \"full\",\n \"main\",\n ] = \"mean\",\n quantiles: Optional[list[float]] = None,\n) -> Union[ndarray, list[ndarray], dict[str, ndarray]]\n
Predict regression target for X.
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNRegressor.predict--parameters","title":"Parameters","text":"X : array-like of shape (n_samples, n_features) The input samples. output_type : str, default=\"mean\" The type of prediction to return: - \"mean\": Return mean prediction - \"median\": Return median prediction - \"mode\": Return mode prediction - \"quantiles\": Return predictions for specified quantiles - \"full\": Return full prediction details - \"main\": Return main prediction metrics quantiles : list[float] or None, default=None Quantiles to compute when output_type=\"quantiles\". Default is [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNRegressor.predict--returns","title":"Returns","text":"array-like or dict The predicted values.
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.validate_data_size","title":"validate_data_size","text":"validate_data_size(\n X: ndarray, y: Union[ndarray, None] = None\n)\n
Check the integrity of the training data. - check if the number of rows between X and y is consistent if y is not None (ValueError) - check if the number of rows is less than MAX_ROWS (ValueError) - check if the number of columns is less than MAX_COLS (ValueError)
"},{"location":"reference/tabpfn_client/prompt_agent/","title":"Prompt agent","text":""},{"location":"reference/tabpfn_client/prompt_agent/#tabpfn_client.prompt_agent","title":"prompt_agent","text":""},{"location":"reference/tabpfn_client/prompt_agent/#tabpfn_client.prompt_agent.PromptAgent","title":"PromptAgent","text":""},{"location":"reference/tabpfn_client/prompt_agent/#tabpfn_client.prompt_agent.PromptAgent.password_req_to_policy","title":"password_req_to_policystaticmethod
","text":"password_req_to_policy(password_req: list[str])\n
Small function that receives password requirements as a list of strings like \"Length(8)\" and returns a corresponding PasswordPolicy object.
"},{"location":"reference/tabpfn_client/service_wrapper/","title":"Service wrapper","text":""},{"location":"reference/tabpfn_client/service_wrapper/#tabpfn_client.service_wrapper","title":"service_wrapper","text":""},{"location":"reference/tabpfn_client/service_wrapper/#tabpfn_client.service_wrapper.InferenceClient","title":"InferenceClient","text":" Bases: ServiceClientWrapper
, Singleton
Wrapper of ServiceClient to handle inference, including: - fitting - prediction
"},{"location":"reference/tabpfn_client/service_wrapper/#tabpfn_client.service_wrapper.UserAuthenticationClient","title":"UserAuthenticationClient","text":" Bases: ServiceClientWrapper
, Singleton
Wrapper of ServiceClient to handle user authentication, including: - user registration and login - access token caching
This is implemented as a singleton class with classmethods.
"},{"location":"reference/tabpfn_client/service_wrapper/#tabpfn_client.service_wrapper.UserAuthenticationClient.try_browser_login","title":"try_browser_loginclassmethod
","text":"try_browser_login() -> tuple[bool, str]\n
Try to authenticate using browser-based login
"},{"location":"reference/tabpfn_client/service_wrapper/#tabpfn_client.service_wrapper.UserDataClient","title":"UserDataClient","text":" Bases: ServiceClientWrapper
, Singleton
Wrapper of ServiceClient to handle user data, including: - query, or delete user account data - query, download, or delete uploaded data
"},{"location":"reference/tabpfn_client/tabpfn_common_utils/regression_pred_result/","title":"Regression pred result","text":""},{"location":"reference/tabpfn_client/tabpfn_common_utils/regression_pred_result/#tabpfn_client.tabpfn_common_utils.regression_pred_result","title":"regression_pred_result","text":""},{"location":"reference/tabpfn_client/tabpfn_common_utils/utils/","title":"Utils","text":""},{"location":"reference/tabpfn_client/tabpfn_common_utils/utils/#tabpfn_client.tabpfn_common_utils.utils","title":"utils","text":""},{"location":"reference/tabpfn_extensions/utils/","title":"Utils","text":""},{"location":"reference/tabpfn_extensions/utils/#tabpfn_extensions.utils","title":"utils","text":""},{"location":"reference/tabpfn_extensions/utils/#tabpfn_extensions.utils.get_tabpfn_models","title":"get_tabpfn_models","text":"get_tabpfn_models() -> Tuple[Type, Type, Type]\n
Get TabPFN models with fallback between local and client versions.
"},{"location":"reference/tabpfn_extensions/utils/#tabpfn_extensions.utils.is_tabpfn","title":"is_tabpfn","text":"is_tabpfn(estimator: Any) -> bool\n
Check if an estimator is a TabPFN model.
"},{"location":"reference/tabpfn_extensions/utils_todo/","title":"Utils todo","text":""},{"location":"reference/tabpfn_extensions/utils_todo/#tabpfn_extensions.utils_todo","title":"utils_todo","text":""},{"location":"reference/tabpfn_extensions/utils_todo/#tabpfn_extensions.utils_todo.infer_categorical_features","title":"infer_categorical_features","text":"infer_categorical_features(\n X: ndarray, categorical_features\n) -> List[int]\n
Infer the categorical features from the input data. We take self.categorical_features
as the initial list of categorical features.
Parameters:
Name Type Description DefaultX
ndarray
The input data.
requiredReturns:
Type DescriptionList[int]
Tuple[int, ...]: The indices of the categorical features.
"},{"location":"reference/tabpfn_extensions/benchmarking/experiment/","title":"Experiment","text":""},{"location":"reference/tabpfn_extensions/benchmarking/experiment/#tabpfn_extensions.benchmarking.experiment","title":"experiment","text":""},{"location":"reference/tabpfn_extensions/benchmarking/experiment/#tabpfn_extensions.benchmarking.experiment.Experiment","title":"Experiment","text":"Base class for experiments. Experiments should be reproducible, i.e. the settings should give all the information needed to run the experiment. Experiments should be deterministic, i.e. the same settings should always give the same results.
"},{"location":"reference/tabpfn_extensions/benchmarking/experiment/#tabpfn_extensions.benchmarking.experiment.Experiment.run","title":"run","text":"run(tabpfn, **kwargs)\n
Runs the experiment.
Should set self.results
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/","title":"Classifier as regressor","text":""},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor","title":"classifier_as_regressor","text":""},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor","title":"ClassifierAsRegressor","text":" Bases: RegressorMixin
Wrapper class to use a classifier as a regressor.
This class takes a classifier estimator and converts it into a regressor by encoding the target labels and treating the regression problem as a classification task.
Parameters:
Name Type Description Defaultestimator
object Classifier estimator to be used as a regressor.
requiredAttributes:
Name Type Descriptionlabel_encoder_
LabelEncoder Label encoder used to transform target regression labels to classes.
y_train_
array-like of shape (n_samples,) Transformed target labels used for training.
categorical_features
list List of categorical feature indices.
Example>>> from sklearn.datasets import load_diabetes\n>>> from sklearn.model_selection import train_test_split\n>>> from tabpfn_extensions import ManyClassClassifier, TabPFNClassifier, ClassifierAsRegressor\n>>> x, y = load_diabetes(return_X_y=True)\n>>> x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=42)\n>>> clf = TabPFNClassifier()\n>>> clf = ManyClassClassifier(clf, n_estimators=10, alphabet_size=clf.max_num_classes_)\n>>> reg = ClassifierAsRegressor(clf)\n>>> reg.fit(x_train, y_train)\n>>> y_pred = reg.predict(x_test)\n
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.fit","title":"fit","text":"fit(X, y)\n
Fit the classifier as a regressor.
Parameters:
Name Type Description DefaultX
array-like of shape (n_samples, n_features) Training data.
requiredy
array-like of shape (n_samples,) Target labels.
requiredReturns:
Name Type Descriptionself
object Fitted estimator.
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.get_optimization_mode","title":"get_optimization_mode","text":"get_optimization_mode()\n
Get the optimization mode for the regressor.
Returns:
Type Descriptionstr Optimization mode (\"mean\").
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.predict","title":"predict","text":"predict(X)\n
Predict the target values for the input data.
Parameters:
Name Type Description DefaultX
array-like of shape (n_samples, n_features) Input data.
requiredReturns:
Name Type Descriptiony_pred
array-like of shape (n_samples,) Predicted target values.
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.predict_full","title":"predict_full","text":"predict_full(X)\n
Predict the full set of output values for the input data.
Parameters:
Name Type Description DefaultX
array-like of shape (n_samples, n_features) Input data.
requiredReturns:
Type Descriptiondict Dictionary containing the predicted output values, including: - \"mean\": Predicted mean values. - \"median\": Predicted median values. - \"mode\": Predicted mode values. - \"logits\": Predicted logits. - \"buckets\": Predicted bucket probabilities. - \"quantile_{q:.2f}\": Predicted quantile values for each quantile q.
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.probabilities_to_logits_multiclass","title":"probabilities_to_logits_multiclassstaticmethod
","text":"probabilities_to_logits_multiclass(\n probabilities, eps=1e-06\n)\n
Convert probabilities to logits for a multi-class problem.
Parameters:
Name Type Description Defaultprobabilities
array-like of shape (n_samples, n_classes) Input probabilities for each class.
requiredeps
float, default=1e-6 Small value to avoid division by zero or taking logarithm of zero.
1e-06
Returns:
Name Type Descriptionlogits
array-like of shape (n_samples, n_classes) Output logits for each class.
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Set the categorical feature indices.
Parameters:
Name Type Description Defaultcategorical_features
list List of categorical feature indices.
required"},{"location":"reference/tabpfn_extensions/hpo/search_space/","title":"Search space","text":""},{"location":"reference/tabpfn_extensions/hpo/search_space/#tabpfn_extensions.hpo.search_space","title":"search_space","text":""},{"location":"reference/tabpfn_extensions/hpo/tuned_tabpfn/","title":"Tuned tabpfn","text":""},{"location":"reference/tabpfn_extensions/hpo/tuned_tabpfn/#tabpfn_extensions.hpo.tuned_tabpfn","title":"tuned_tabpfn","text":""},{"location":"reference/tabpfn_extensions/hpo/tuned_tabpfn/#tabpfn_extensions.hpo.tuned_tabpfn.TunedTabPFNBase","title":"TunedTabPFNBase","text":" Bases: BaseEstimator
Base class for tuned TabPFN models with proper categorical handling.
"},{"location":"reference/tabpfn_extensions/hpo/tuned_tabpfn/#tabpfn_extensions.hpo.tuned_tabpfn.TunedTabPFNClassifier","title":"TunedTabPFNClassifier","text":" Bases: TunedTabPFNBase
, ClassifierMixin
TabPFN Classifier with hyperparameter tuning and proper categorical handling.
"},{"location":"reference/tabpfn_extensions/hpo/tuned_tabpfn/#tabpfn_extensions.hpo.tuned_tabpfn.TunedTabPFNRegressor","title":"TunedTabPFNRegressor","text":" Bases: TunedTabPFNBase
, RegressorMixin
TabPFN Regressor with hyperparameter tuning and proper categorical handling.
"},{"location":"reference/tabpfn_extensions/interpretability/experiments/","title":"Experiments","text":""},{"location":"reference/tabpfn_extensions/interpretability/experiments/#tabpfn_extensions.interpretability.experiments","title":"experiments","text":""},{"location":"reference/tabpfn_extensions/interpretability/experiments/#tabpfn_extensions.interpretability.experiments.FeatureSelectionExperiment","title":"FeatureSelectionExperiment","text":" Bases: Experiment
This class is used to run experiments on generating synthetic data.
"},{"location":"reference/tabpfn_extensions/interpretability/experiments/#tabpfn_extensions.interpretability.experiments.FeatureSelectionExperiment.run","title":"run","text":"run(tabpfn, **kwargs)\n
:param tabpfn: :param kwargs: indices: list of indices from X features to use :return:
"},{"location":"reference/tabpfn_extensions/interpretability/experiments/#tabpfn_extensions.interpretability.experiments.FeatureSelectionInPredictExperiment","title":"FeatureSelectionInPredictExperiment","text":" Bases: Experiment
This class is used to run experiments on generating synthetic data.
"},{"location":"reference/tabpfn_extensions/interpretability/experiments/#tabpfn_extensions.interpretability.experiments.FeatureSelectionInPredictExperiment.run","title":"run","text":"run(tabpfn, **kwargs)\n
:param tabpfn: :param kwargs: indices: list of indices from X features to use :return:
"},{"location":"reference/tabpfn_extensions/interpretability/feature_selection/","title":"Feature selection","text":""},{"location":"reference/tabpfn_extensions/interpretability/feature_selection/#tabpfn_extensions.interpretability.feature_selection","title":"feature_selection","text":""},{"location":"reference/tabpfn_extensions/interpretability/shap/","title":"Shap","text":""},{"location":"reference/tabpfn_extensions/interpretability/shap/#tabpfn_extensions.interpretability.shap","title":"shap","text":""},{"location":"reference/tabpfn_extensions/interpretability/shap/#tabpfn_extensions.interpretability.shap.get_shap_values","title":"get_shap_values","text":"get_shap_values(\n estimator, test_x, attribute_names=None, **kwargs\n) -> ndarray\n
Computes SHAP (SHapley Additive exPlanations) values for the model's predictions on the given input features.
Parameters:
Name Type Description Defaulttest_x
Union[DataFrame, ndarray]
The input features to compute SHAP values for.
requiredkwargs
dict
Additional keyword arguments to pass to the SHAP explainer.
{}
Returns:
Type Descriptionndarray
np.ndarray: The computed SHAP values.
"},{"location":"reference/tabpfn_extensions/interpretability/shap/#tabpfn_extensions.interpretability.shap.plot_shap","title":"plot_shap","text":"plot_shap(shap_values: ndarray)\n
Plots the shap values for the given test data. It will plot aggregated shap values for each feature, as well as per sample shap values. Additionally, if multiple samples are provided, it will plot the 3 most important interactions with the most important feature.
Parameters:
Name Type Description Defaultshap_values
ndarray
required"},{"location":"reference/tabpfn_extensions/many_class/many_class_classifier/","title":"Many class classifier","text":""},{"location":"reference/tabpfn_extensions/many_class/many_class_classifier/#tabpfn_extensions.many_class.many_class_classifier","title":"many_class_classifier","text":""},{"location":"reference/tabpfn_extensions/many_class/many_class_classifier/#tabpfn_extensions.many_class.many_class_classifier.ManyClassClassifier","title":"ManyClassClassifier","text":" Bases: OutputCodeClassifier
Output-Code multiclass strategy with deciary codebook.
This class extends the original OutputCodeClassifier to support n-ary codebooks (with n=alphabet_size), allowing for handling more classes.
Parameters:
Name Type Description Defaultestimator
estimator object An estimator object implementing :term:fit
and one of :term:decision_function
or :term:predict_proba
. The base classifier should be able to handle up to alphabet_size
classes.
random_state
int, RandomState instance, default=None The generator used to initialize the codebook. Pass an int for reproducible output across multiple function calls. See :term:Glossary <random_state>
.
None
Attributes:
Name Type Descriptionestimators_
list of int(n_classes * code_size)
estimators Estimators used for predictions.
classes_
ndarray of shape (n_classes,) Array containing labels.
code_book_
ndarray of shape (n_classes, len(estimators_)
) Deciary array containing the code of each class.
>>> from sklearn.datasets import load_iris\n>>> from tabpfn.scripts.estimator import ManyClassClassifier, TabPFNClassifier\n>>> from sklearn.model_selection import train_test_split\n>>> x, y = load_iris(return_X_y=True)\n>>> x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=42)\n>>> clf = TabPFNClassifier()\n>>> clf = ManyClassClassifier(clf, alphabet_size=clf.max_num_classes_)\n>>> clf.fit(x_train, y_train)\n>>> clf.predict(x_test)\n
"},{"location":"reference/tabpfn_extensions/many_class/many_class_classifier/#tabpfn_extensions.many_class.many_class_classifier.ManyClassClassifier.fit","title":"fit","text":"fit(X, y, **fit_params)\n
Fit underlying estimators.
Parameters:
Name Type Description DefaultX
{array-like, sparse matrix} of shape (n_samples, n_features) Data.
requiredy
array-like of shape (n_samples,) Multi-class targets.
required**fit_params
dict Parameters passed to the estimator.fit
method of each sub-estimator.
{}
Returns:
Name Type Descriptionself
object Returns a fitted instance of self.
"},{"location":"reference/tabpfn_extensions/many_class/many_class_classifier/#tabpfn_extensions.many_class.many_class_classifier.ManyClassClassifier.predict_proba","title":"predict_proba","text":"predict_proba(X)\n
Predict probabilities using the underlying estimators.
Parameters:
Name Type Description DefaultX
{array-like, sparse matrix} of shape (n_samples, n_features) Data.
requiredReturns:
Name Type Descriptionp
ndarray of shape (n_samples, n_classes) Returns the probability of the samples for each class in the model, where classes are ordered as they are in self.classes_
.
Bases: ABC
, BaseEstimator
get_oof_per_estimator(\n X: ndarray,\n y: ndarray,\n *,\n return_loss_per_estimator: bool = False,\n impute_dropped_instances: bool = True,\n _extra_processing: bool = False\n) -> list[ndarray] | tuple[list[ndarray], list[float]]\n
Get OOF predictions for each base model.
Parameters:
Name Type Description DefaultX
ndarray
training data (features)
requiredy
ndarray
training labels
requiredreturn_loss_per_estimator
bool
if True, also return the loss per estimator.
False
impute_dropped_instances
bool
if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).
True
_extra_processing
bool
False
either only OOF predictions or OOF predictions and loss per estimator.
Type Descriptionlist[ndarray] | tuple[list[ndarray], list[float]]
If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/abstract_validation_utils/#tabpfn_extensions.post_hoc_ensembles.abstract_validation_utils.AbstractValidationUtils.not_enough_time","title":"not_enough_time","text":"not_enough_time(current_repeat: int) -> bool\n
Simple heuristic to stop cross-validation early if not enough time is left for another repeat.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/","title":"Greedy weighted ensemble","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble","title":"greedy_weighted_ensemble","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsemble","title":"GreedyWeightedEnsemble","text":" Bases: AbstractValidationUtils
get_oof_per_estimator(\n X: ndarray,\n y: ndarray,\n *,\n return_loss_per_estimator: bool = False,\n impute_dropped_instances: bool = True,\n _extra_processing: bool = False\n) -> list[ndarray] | tuple[list[ndarray], list[float]]\n
Get OOF predictions for each base model.
Parameters:
Name Type Description DefaultX
ndarray
training data (features)
requiredy
ndarray
training labels
requiredreturn_loss_per_estimator
bool
if True, also return the loss per estimator.
False
impute_dropped_instances
bool
if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).
True
_extra_processing
bool
False
either only OOF predictions or OOF predictions and loss per estimator.
Type Descriptionlist[ndarray] | tuple[list[ndarray], list[float]]
If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsemble.not_enough_time","title":"not_enough_time","text":"not_enough_time(current_repeat: int) -> bool\n
Simple heuristic to stop cross-validation early if not enough time is left for another repeat.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsembleClassifier","title":"GreedyWeightedEnsembleClassifier","text":" Bases: GreedyWeightedEnsemble
, AbstractValidationUtilsClassification
get_oof_per_estimator(\n X: ndarray,\n y: ndarray,\n *,\n return_loss_per_estimator: bool = False,\n impute_dropped_instances: bool = True,\n _extra_processing: bool = False\n) -> list[ndarray] | tuple[list[ndarray], list[float]]\n
Get OOF predictions for each base model.
Parameters:
Name Type Description DefaultX
ndarray
training data (features)
requiredy
ndarray
training labels
requiredreturn_loss_per_estimator
bool
if True, also return the loss per estimator.
False
impute_dropped_instances
bool
if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).
True
_extra_processing
bool
False
either only OOF predictions or OOF predictions and loss per estimator.
Type Descriptionlist[ndarray] | tuple[list[ndarray], list[float]]
If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsembleClassifier.not_enough_time","title":"not_enough_time","text":"not_enough_time(current_repeat: int) -> bool\n
Simple heuristic to stop cross-validation early if not enough time is left for another repeat.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsembleRegressor","title":"GreedyWeightedEnsembleRegressor","text":" Bases: GreedyWeightedEnsemble
, AbstractValidationUtilsRegression
get_oof_per_estimator(\n X: ndarray,\n y: ndarray,\n *,\n return_loss_per_estimator: bool = False,\n impute_dropped_instances: bool = True,\n _extra_processing: bool = False\n) -> list[ndarray] | tuple[list[ndarray], list[float]]\n
Get OOF predictions for each base model.
Parameters:
Name Type Description DefaultX
ndarray
training data (features)
requiredy
ndarray
training labels
requiredreturn_loss_per_estimator
bool
if True, also return the loss per estimator.
False
impute_dropped_instances
bool
if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).
True
_extra_processing
bool
False
either only OOF predictions or OOF predictions and loss per estimator.
Type Descriptionlist[ndarray] | tuple[list[ndarray], list[float]]
If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsembleRegressor.not_enough_time","title":"not_enough_time","text":"not_enough_time(current_repeat: int) -> bool\n
Simple heuristic to stop cross-validation early if not enough time is left for another repeat.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.caruana_weighted","title":"caruana_weighted","text":"caruana_weighted(\n predictions: list[ndarray],\n labels: ndarray,\n seed,\n n_iterations,\n loss_function,\n)\n
Caruana's ensemble selection with replacement.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/","title":"Pfn phe","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/#tabpfn_extensions.post_hoc_ensembles.pfn_phe","title":"pfn_phe","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/#tabpfn_extensions.post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor","title":"AutoPostHocEnsemblePredictor","text":" Bases: BaseEstimator
A wrapper to effectively performing post hoc ensemble with TabPFN models.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/#tabpfn_extensions.post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor.fit","title":"fit","text":"fit(\n X: ndarray,\n y: ndarray,\n categorical_feature_indices: list[int] | None = None,\n) -> AutoPostHocEnsemblePredictor\n
Fits the post hoc ensemble on the given data.
Parameters:
Name Type Description DefaultX
ndarray
The input data to fit the ensemble on.
requiredy
ndarray
The target values to fit the ensemble on.
requiredcategorical_feature_indices
list[int] | None
The indices of the categorical features in the data. If None, no categorical features are assumed to be present.
None
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/#tabpfn_extensions.post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor.predict","title":"predict","text":"predict(X: ndarray) -> ndarray\n
Predicts the target values for the given data.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/#tabpfn_extensions.post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor.predict_proba","title":"predict_proba","text":"predict_proba(X: ndarray) -> ndarray\n
Predicts the target values for the given data.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/save_splitting/","title":"Save splitting","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/save_splitting/#tabpfn_extensions.post_hoc_ensembles.save_splitting","title":"save_splitting","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/save_splitting/#tabpfn_extensions.post_hoc_ensembles.save_splitting.assert_valid_splits","title":"assert_valid_splits","text":"assert_valid_splits(\n splits: list[list[list[int], list[int]]],\n y: ndarray,\n *,\n non_empty: bool = True,\n each_selected_class_in_each_split_subset: bool = True,\n same_length_training_splits: bool = True\n)\n
Verify that the splits are valid.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/save_splitting/#tabpfn_extensions.post_hoc_ensembles.save_splitting.fix_split_by_dropping_classes","title":"fix_split_by_dropping_classes","text":"fix_split_by_dropping_classes(\n x: ndarray,\n y: ndarray,\n n_splits: int,\n spliter_kwargs: dict,\n) -> list[list[list[int], list[int]]]\n
Fixes stratifed splits for edge case.
For each class that has fewer instances than number of splits, we oversample before split to n_splits and then remove all oversamples and original samples from the splits; effectively removing the class from the data without touching the indices.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/save_splitting/#tabpfn_extensions.post_hoc_ensembles.save_splitting.get_cv_split_for_data","title":"get_cv_split_for_data","text":"get_cv_split_for_data(\n x: ndarray,\n y: ndarray,\n splits_seed: int,\n n_splits: int,\n *,\n stratified_split: bool,\n safety_shuffle: bool = True,\n auto_fix_stratified_splits: bool = False,\n force_same_length_training_splits: bool = False\n) -> list[list[list[int], list[int]]] | str\n
Safety shuffle and generate (safe) splits.
If it returns str at the first entry, no valid split could be generated and the str is the reason why. Due to the safety shuffle, the original x and y are also returned and must be used.
Note: the function does not support repeated splits at this point. Simply call this function multiple times with different seeds to get repeated splits.
Test with:
if __name__ == \"__main__\":\n print(\n get_cv_split_for_data(\n x=np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]).T,\n y=np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]),\n splits_seed=42,\n n_splits=3,\n stratified_split=True,\n auto_fix_stratified_splits=True,\n )\n )\n
Parameters:
Name Type Description Defaultx
ndarray
The data to split.
requiredy
ndarray
The labels to split.
requiredsplits_seed
int
The seed to use for the splits. Or a RandomState object.
requiredn_splits
int
The number of splits to generate.
requiredstratified_split
bool
Whether to use stratified splits.
requiredsafety_shuffle
bool
Whether to shuffle the data before splitting.
True
auto_fix_stratified_splits
bool
Whether to try to fix stratified splits automatically. Fix by dropping classes with less than n_splits samples.
False
force_same_length_training_splits
bool
Whether to force the training splits to have the same amount of samples. Force by duplicating random instance in the training subset of a too small split until all training splits have the same amount of samples.
False
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/","title":"Sklearn interface","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/#tabpfn_extensions.post_hoc_ensembles.sklearn_interface","title":"sklearn_interface","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/#tabpfn_extensions.post_hoc_ensembles.sklearn_interface.AutoTabPFNClassifier","title":"AutoTabPFNClassifier","text":" Bases: ClassifierMixin
, BaseEstimator
Automatic Post Hoc Ensemble Classifier for TabPFN models.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/#tabpfn_extensions.post_hoc_ensembles.sklearn_interface.AutoTabPFNClassifier--parameters","title":"Parameters","text":"max_time : int | None, default=None\n The maximum time to spend on fitting the post hoc ensemble.\npreset: {\"default\", \"custom_hps\", \"avoid_overfitting\"}, default=\"default\"\n The preset to use for the post hoc ensemble.\nges_scoring_string : str, default=\"roc\"\n The scoring string to use for the greedy ensemble search.\n Allowed values are: {\"accuracy\", \"roc\" / \"auroc\", \"f1\", \"log_loss\"}.\ndevice : {\"cpu\", \"cuda\"}, default=\"cuda\"\n The device to use for training and prediction.\nrandom_state : int, RandomState instance or None, default=None\n Controls both the randomness base models and the post hoc ensembling method.\ncategorical_feature_indices: list[int] or None, default=None\n The indices of the categorical features in the input data. Can also be passed to `fit()`.\nphe_init_args : dict | None, default=None\n The initialization arguments for the post hoc ensemble predictor.\n See post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor for more options and all details.\n
predictor_ : AutoPostHocEnsemblePredictor\n The predictor interface used to make predictions, see post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor for more.\nphe_init_args_ : dict\n The optional initialization arguments used for the post hoc ensemble predictor.\n
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/#tabpfn_extensions.post_hoc_ensembles.sklearn_interface.AutoTabPFNRegressor","title":"AutoTabPFNRegressor","text":" Bases: RegressorMixin
, BaseEstimator
Automatic Post Hoc Ensemble Regressor for TabPFN models.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/#tabpfn_extensions.post_hoc_ensembles.sklearn_interface.AutoTabPFNRegressor--parameters","title":"Parameters","text":"max_time : int | None, default=None\n The maximum time to spend on fitting the post hoc ensemble.\npreset: {\"default\", \"custom_hps\", \"avoid_overfitting\"}, default=\"default\"\n The preset to use for the post hoc ensemble.\nges_scoring_string : str, default=\"mse\"\n The scoring string to use for the greedy ensemble search.\n Allowed values are: {\"rmse\", \"mse\", \"mae\"}.\ndevice : {\"cpu\", \"cuda\"}, default=\"cuda\"\n The device to use for training and prediction.\nrandom_state : int, RandomState instance or None, default=None\n Controls both the randomness base models and the post hoc ensembling method.\ncategorical_feature_indices: list[int] or None, default=None\n The indices of the categorical features in the input data. Can also be passed to `fit()`.\nphe_init_args : dict | None, default=None\n The initialization arguments for the post hoc ensemble predictor.\n See post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor for more options and all details.\n
predictor_ : AutoPostHocEnsemblePredictor\n The predictor interface used to make predictions, see post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor for more.\nphe_init_args_ : dict\n The optional initialization arguments used for the post hoc ensemble predictor.\n
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/","title":"SklearnBasedDecisionTreeTabPFN","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN","title":"SklearnBasedDecisionTreeTabPFN","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNBase","title":"DecisionTreeTabPFNBase","text":" Bases: BaseDecisionTree
Class that implements a DT-TabPFN model based on sklearn package
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNBase.apply_tree","title":"apply_tree","text":"apply_tree(X)\n
Apply tree for different kinds of tree types. TODO: This function could also be overwritten in each type of tree
(N_samples, N_nodes, N_estimators) :param bootstrap_X: :return:
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNBase.predict_","title":"predict_","text":"predict_(X, y=None, check_input=True)\n
Predicts X :param X: Data that should be evaluated :param y: True labels of holdout data used for adaptive tree. - If not None: Prunes nodes based on the performance of the holdout data y - If None: Predicts the data based on the previous hold out performances :param check_input: :return: Probabilities of each class
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNBase.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier","title":"DecisionTreeTabPFNClassifier","text":" Bases: ClassifierMixin
, DecisionTreeTabPFNBase
Class that implements a DT-TabPFN model based on sklearn package
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier.apply_tree","title":"apply_tree","text":"apply_tree(X)\n
Apply tree for different kinds of tree types. TODO: This function could also be overwritten in each type of tree
(N_samples, N_nodes, N_estimators) :param bootstrap_X: :return:
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier.predict","title":"predict","text":"predict(X, check_input=True)\n
Predicts X_test :param X: Data that should be evaluated :param check_input: :return: Labels of the predictions
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier.predict_","title":"predict_","text":"predict_(X, y=None, check_input=True)\n
Predicts X :param X: Data that should be evaluated :param y: True labels of holdout data used for adaptive tree. - If not None: Prunes nodes based on the performance of the holdout data y - If None: Predicts the data based on the previous hold out performances :param check_input: :return: Probabilities of each class
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier.predict_proba","title":"predict_proba","text":"predict_proba(X, check_input=True)\n
Predicts X_test :param X: Data that should be evaluated :param check_input: :return: Probabilities of each class
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor","title":"DecisionTreeTabPFNRegressor","text":" Bases: RegressorMixin
, DecisionTreeTabPFNBase
Class that implements a DT-TabPFN model based on sklearn package
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor.apply_tree","title":"apply_tree","text":"apply_tree(X)\n
Apply tree for different kinds of tree types. TODO: This function could also be overwritten in each type of tree
(N_samples, N_nodes, N_estimators) :param bootstrap_X: :return:
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor.predict","title":"predict","text":"predict(X, check_input=True)\n
Predicts X_test :param X: Data that should be evaluated :param check_input: :return: Labels of the predictions
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor.predict_","title":"predict_","text":"predict_(X, y=None, check_input=True)\n
Predicts X :param X: Data that should be evaluated :param y: True labels of holdout data used for adaptive tree. - If not None: Prunes nodes based on the performance of the holdout data y - If None: Predicts the data based on the previous hold out performances :param check_input: :return: Probabilities of each class
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor.predict_full","title":"predict_full","text":"predict_full(X)\n
Predicts X :param X: Data that should be evaluated :param check_input: :return: Labels of the predictions
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/","title":"SklearnBasedRandomForestTabPFN","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN","title":"SklearnBasedRandomForestTabPFN","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNBase","title":"RandomForestTabPFNBase","text":"Base Class for common functionalities.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNBase.fit","title":"fit","text":"fit(X, y, sample_weight=None)\n
Fits RandomForestTabPFN :param X: Feature training data :param y: Label training data :param sample_weight: Weights of each sample :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNBase.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier","title":"RandomForestTabPFNClassifier","text":" Bases: RandomForestTabPFNBase
, RandomForestClassifier
RandomForestTabPFNClassifier.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier.fit","title":"fit","text":"fit(X, y, sample_weight=None)\n
Fits RandomForestTabPFN :param X: Feature training data :param y: Label training data :param sample_weight: Weights of each sample :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier.predict","title":"predict","text":"predict(X)\n
Predict class for X.
The predicted class of an input sample is a vote by the trees in the forest, weighted by their probability estimates. That is, the predicted class is the one with highest mean probability estimate across the trees.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier.predict--parameters","title":"Parameters","text":"X : {array-like, sparse matrix} of shape (n_samples, n_features) The input samples. Internally, its dtype will be converted to dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparse csr_matrix
.
y : ndarray of shape (n_samples,) or (n_samples, n_outputs) The predicted classes.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier.predict_proba","title":"predict_proba","text":"predict_proba(X)\n
Predict class probabilities for X.
The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier.predict_proba--parameters","title":"Parameters","text":"X : {array-like, sparse matrix} of shape (n_samples, n_features) The input samples. Internally, its dtype will be converted to dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparse csr_matrix
.
p : ndarray of shape (n_samples, n_classes), or a list of such arrays The class probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:classes_
.
set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNRegressor","title":"RandomForestTabPFNRegressor","text":" Bases: RandomForestTabPFNBase
, RandomForestRegressor
RandomForestTabPFNClassifier.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNRegressor.fit","title":"fit","text":"fit(X, y, sample_weight=None)\n
Fits RandomForestTabPFN :param X: Feature training data :param y: Label training data :param sample_weight: Weights of each sample :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNRegressor.predict","title":"predict","text":"predict(X)\n
Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the trees in the forest.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNRegressor.predict--parameters","title":"Parameters","text":"X : {array-like, sparse matrix} of shape (n_samples, n_features) The input samples. Internally, its dtype will be converted to dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparse csr_matrix
.
y : ndarray of shape (n_samples,) or (n_samples, n_outputs) The predicted values.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNRegressor.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/configs/","title":"Configs","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/configs/#tabpfn_extensions.rf_pfn.configs","title":"configs","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/utils/","title":"Utils","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/utils/#tabpfn_extensions.rf_pfn.utils","title":"utils","text":"Copyright 2023
Author: Lukas Schweizer schweizer.lukas@web.de
"},{"location":"reference/tabpfn_extensions/rf_pfn/utils/#tabpfn_extensions.rf_pfn.utils.preprocess_data","title":"preprocess_data","text":"preprocess_data(\n data,\n nan_values=True,\n one_hot_encoding=False,\n normalization=True,\n categorical_indices=None,\n)\n
This method preprocesses data regarding missing values, categorical features and data normalization (for the kNN Model) :param data: Data to preprocess :param nan_values: Preprocesses nan values if True :param one_hot_encoding: Whether use OHE for categoricals :param normalization: Normalizes data if True :param categorical_indices: Categorical columns of data :return: Preprocessed version of the data
"},{"location":"reference/tabpfn_extensions/scoring/scoring_utils/","title":"Scoring utils","text":""},{"location":"reference/tabpfn_extensions/scoring/scoring_utils/#tabpfn_extensions.scoring.scoring_utils","title":"scoring_utils","text":""},{"location":"reference/tabpfn_extensions/scoring/scoring_utils/#tabpfn_extensions.scoring.scoring_utils.safe_roc_auc_score","title":"safe_roc_auc_score","text":"safe_roc_auc_score(y_true, y_score, **kwargs)\n
Compute the Area Under the Receiver Operating Characteristic Curve (ROC AUC) score.
This function is a safe wrapper around sklearn.metrics.roc_auc_score
that handles cases where the input data may have missing classes or binary classification problems.
Parameters:
Name Type Description Defaulty_true
array-like of shape (n_samples,) True binary labels or binary label indicators.
requiredy_score
array-like of shape (n_samples,) or (n_samples, n_classes) Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions.
required**kwargs
dict Additional keyword arguments to pass to sklearn.metrics.roc_auc_score
.
{}
Returns:
Name Type Descriptionfloat
The ROC AUC score.
Raises:
Type DescriptionValueError
If there are missing classes in y_true
that cannot be handled.
score_classification(\n optimize_metric: Literal[\n \"roc\", \"auroc\", \"accuracy\", \"f1\", \"log_loss\"\n ],\n y_true,\n y_pred,\n sample_weight=None,\n *,\n y_pred_is_labels: bool = False\n)\n
General function to score classification predictions.
Parameters:
Name Type Description Defaultoptimize_metric
{\"roc\", \"auroc\", \"accuracy\", \"f1\", \"log_loss\"} The metric to use for scoring the predictions.
requiredy_true
array-like of shape (n_samples,) True labels or binary label indicators.
requiredy_pred
array-like of shape (n_samples,) or (n_samples, n_classes) Predicted labels, probabilities, or confidence values.
requiredsample_weight
array-like of shape (n_samples,), default=None Sample weights.
None
Returns:
Name Type Descriptionfloat
The score for the specified metric.
Raises:
Type DescriptionValueError
If an unknown metric is specified.
"},{"location":"reference/tabpfn_extensions/scoring/scoring_utils/#tabpfn_extensions.scoring.scoring_utils.score_regression","title":"score_regression","text":"score_regression(\n optimize_metric: Literal[\"rmse\", \"mse\", \"mae\"],\n y_true,\n y_pred,\n sample_weight=None,\n)\n
General function to score regression predictions.
Parameters:
Name Type Description Defaultoptimize_metric
{\"rmse\", \"mse\", \"mae\"} The metric to use for scoring the predictions.
requiredy_true
array-like of shape (n_samples,) True target values.
requiredy_pred
array-like of shape (n_samples,) Predicted target values.
requiredsample_weight
array-like of shape (n_samples,), default=None Sample weights.
None
Returns:
Name Type Descriptionfloat
The score for the specified metric.
Raises:
Type DescriptionValueError
If an unknown metric is specified.
"},{"location":"reference/tabpfn_extensions/scoring/scoring_utils/#tabpfn_extensions.scoring.scoring_utils.score_survival","title":"score_survival","text":"score_survival(\n optimize_metric: Literal[\"cindex\"],\n y_true,\n y_pred,\n event_observed,\n sample_weight=None,\n)\n
General function to score regression predictions.
Parameters:
Name Type Description Defaultoptimize_metric
{\"rmse\", \"mse\", \"mae\"} The metric to use for scoring the predictions.
requiredy_true
array-like of shape (n_samples,) True target values.
requiredy_pred
array-like of shape (n_samples,) Predicted target values.
requiredsample_weight
array-like of shape (n_samples,), default=None Sample weights.
None
Returns:
Name Type Descriptionfloat
The score for the specified metric.
Raises:
Type DescriptionValueError
If an unknown metric is specified.
"},{"location":"reference/tabpfn_extensions/sklearn_ensembles/configs/","title":"Configs","text":""},{"location":"reference/tabpfn_extensions/sklearn_ensembles/configs/#tabpfn_extensions.sklearn_ensembles.configs","title":"configs","text":""},{"location":"reference/tabpfn_extensions/sklearn_ensembles/meta_models/","title":"Meta models","text":""},{"location":"reference/tabpfn_extensions/sklearn_ensembles/meta_models/#tabpfn_extensions.sklearn_ensembles.meta_models","title":"meta_models","text":""},{"location":"reference/tabpfn_extensions/sklearn_ensembles/meta_models/#tabpfn_extensions.sklearn_ensembles.meta_models.get_tabpfn_outer_ensemble","title":"get_tabpfn_outer_ensemble","text":"get_tabpfn_outer_ensemble(config: TabPFNConfig, **kwargs)\n
This will create a model very similar to our standard TabPFN estimators, but it uses multiple model weights to generate predictions. Thus the configs.TabPFNModelPathsConfig
can contain multiple paths which are all used.
A product of the preprocessor_trasnforms and paths is created to yield interesting ensemble members.
This only supports multiclass for now. If you want to add regression, you probably want to add the y_transforms to the relevant_config_product. :param config: TabPFNConfig :param kwargs: kwargs are passed to get_single_tabpfn, e.g. device :return: A TabPFNEnsemble, which is a soft voting classifier that mixes multiple standard TabPFN estimators.
"},{"location":"reference/tabpfn_extensions/sklearn_ensembles/weighted_ensemble/","title":"Weighted ensemble","text":""},{"location":"reference/tabpfn_extensions/sklearn_ensembles/weighted_ensemble/#tabpfn_extensions.sklearn_ensembles.weighted_ensemble","title":"weighted_ensemble","text":""},{"location":"reference/tabpfn_extensions/unsupervised/experiments/","title":"Experiments","text":""},{"location":"reference/tabpfn_extensions/unsupervised/experiments/#tabpfn_extensions.unsupervised.experiments","title":"experiments","text":""},{"location":"reference/tabpfn_extensions/unsupervised/experiments/#tabpfn_extensions.unsupervised.experiments.EmbeddingUnsupervisedExperiment","title":"EmbeddingUnsupervisedExperiment","text":" Bases: Experiment
This class is used to run experiments on synthetic toy functions.
"},{"location":"reference/tabpfn_extensions/unsupervised/experiments/#tabpfn_extensions.unsupervised.experiments.GenerateSyntheticDataExperiment","title":"GenerateSyntheticDataExperiment","text":" Bases: Experiment
This class is used to run experiments on generating synthetic data.
"},{"location":"reference/tabpfn_extensions/unsupervised/experiments/#tabpfn_extensions.unsupervised.experiments.GenerateSyntheticDataExperiment.run","title":"run","text":"run(tabpfn, **kwargs)\n
:param tabpfn: :param kwargs: indices: list of indices from X features to use :return:
"},{"location":"reference/tabpfn_extensions/unsupervised/experiments/#tabpfn_extensions.unsupervised.experiments.OutlierDetectionUnsupervisedExperiment","title":"OutlierDetectionUnsupervisedExperiment","text":" Bases: Experiment
This class is used to run experiments for outlier detection.
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/","title":"Unsupervised","text":""},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised","title":"unsupervised","text":""},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel","title":"TabPFNUnsupervisedModel","text":" Bases: BaseEstimator
TabPFN experiments model for imputation, outlier detection, and synthetic data generation.
This model combines a TabPFNClassifier for categorical features and a TabPFNRegressor for numerical features to perform various experiments learning tasks on tabular data.
Parameters:
Name Type Description Defaulttabpfn_clf
TabPFNClassifier, optional TabPFNClassifier instance for handling categorical features. If not provided, the model assumes that there are no categorical features in the data.
None
tabpfn_reg
TabPFNRegressor, optional TabPFNRegressor instance for handling numerical features. If not provided, the model assumes that there are no numerical features in the data.
None
Attributes:
Name Type Descriptioncategorical_features
list List of indices of categorical features in the input data.
Example>>> tabpfn_clf = TabPFNClassifier()\n>>> tabpfn_reg = TabPFNRegressor()\n>>> model = TabPFNUnsupervisedModel(tabpfn_clf, tabpfn_reg)\n>>>\n>>> X = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]\n>>> model.fit(X)\n>>>\n>>> X_imputed = model.impute(X)\n>>> X_outliers = model.outliers(X)\n>>> X_synthetic = model.generate_synthetic_data(n_samples=100)\n
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.fit","title":"fit","text":"fit(X: ndarray, y: Optional[ndarray] = None) -> None\n
Fit the model to the input data.
Parameters:
Name Type Description DefaultX
array-like of shape (n_samples, n_features) Input data to fit the model.
requiredy
array-like of shape (n_samples,), optional Target values.
None
Returns:
Name Type Descriptionself
None
TabPFNUnsupervisedModel Fitted model.
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.generate_synthetic_data","title":"generate_synthetic_data","text":"generate_synthetic_data(\n n_samples=100, t=1.0, n_permutations=3\n)\n
Generate synthetic data using the trained models. Uses imputation method to generate synthetic data, passed with a matrix of nans. Samples are generated feature by feature in one pass, so samples are not dependent on each other per feature.
Parameters:
Name Type Description Defaultn_samples
int, default=100 Number of synthetic samples to generate.
100
t
float, default=1.0 Temperature for sampling from the imputation distribution. Lower values result in more deterministic samples.
1.0
Returns:
Type Descriptiontorch.Tensor of shape (n_samples, n_features) Generated synthetic data.
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.get_embeddings","title":"get_embeddings","text":"get_embeddings(\n X: tensor, per_column: bool = False\n) -> tensor\n
Get the transformer embeddings for the test data X.
Parameters:
Name Type Description DefaultX
tensor
required Returns:
Type Descriptiontensor
torch.Tensor of shape (n_samples, embedding_dim)
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.get_embeddings_per_column","title":"get_embeddings_per_column","text":"get_embeddings_per_column(X: tensor) -> tensor\n
Alternative implementation for get_embeddings, where we get the embeddings for each column as a label separately and concatenate the results. This alternative way needs more passes but might be more accurate
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.impute","title":"impute","text":"impute(\n X: tensor, t: float = 1e-09, n_permutations: int = 10\n) -> tensor\n
Impute missing values in the input data.
Parameters:
Name Type Description DefaultX
torch.Tensor of shape (n_samples, n_features) Input data with missing values encoded as np.nan.
requiredt
float, default=0.000000001 Temperature for sampling from the imputation distribution. Lower values result in more deterministic imputations.
1e-09
Returns:
Type Descriptiontensor
torch.Tensor of shape (n_samples, n_features) Imputed data with missing values replaced.
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.impute_","title":"impute_","text":"impute_(\n X: tensor,\n t: float = 1e-09,\n n_permutations: int = 10,\n condition_on_all_features: bool = True,\n) -> tensor\n
Impute missing values (np.nan) in X by sampling all cells independently from the trained models
:param X: Input data of the shape (num_examples, num_features) with missing values encoded as np.nan :param t: Temperature for sampling from the imputation distribution, lower values are more deterministic :return: Imputed data, with missing values replaced
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.impute_single_permutation_","title":"impute_single_permutation_","text":"impute_single_permutation_(\n X: tensor,\n feature_permutation: list[int] | tuple[int],\n t: float = 1e-09,\n condition_on_all_features: bool = True,\n) -> tensor\n
Impute missing values (np.nan) in X by sampling all cells independently from the trained models
:param X: Input data of the shape (num_examples, num_features) with missing values encoded as np.nan :param t: Temperature for sampling from the imputation distribution, lower values are more deterministic :return: Imputed data, with missing values replaced
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.outliers","title":"outliers","text":"outliers(X: tensor, n_permutations: int = 10) -> tensor\n
Preferred implementation for outliers, where we calculate the sample probability for each sample in X by multiplying the probabilities of each feature according to chain rule of probability. The first feature is estimated by using a zero feature as input.
Args X: Samples to calculate the sample probability for, shape (n_samples, n_features)
Returns:
Type Descriptiontensor
Sample unnormalized probability for each sample in X, shape (n_samples,)
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.efficient_random_permutation_","title":"efficient_random_permutation_","text":"efficient_random_permutation_(indices)\n
Generate a single random permutation from a very large space.
:param n: The size of the permutation (number of elements) :return: A list representing a random permutation of numbers from 0 to n-1
"},{"location":"research/papers/","title":"Papers","text":""},{"location":"research/papers/#tabpfn-followups","title":"TabPFN Followups","text":"Forecastpfn: Synthetically-trained zero-shot forecasting Dooley, Khurana, Mohapatra, Naidu, White Advances in Neural Information Processing Systems, 2024, Volume 36.
Interpretable machine learning for TabPFN Rundel, Kobialka, von Crailsheim, Feurer, Nagler, R{\"u}gamer World Conference on Explainable Artificial Intelligence, 2024, Pages 465--476.
Scaling tabpfn: Sketching and feature selection for tabular prior-data fitted networks Feuer, Hegde, Cohen arXiv preprint arXiv:2311.10609, 2023.
In-Context Data Distillation with TabPFN Ma, Thomas, Yu, Caterini arXiv preprint arXiv:2402.06971, 2024.
Tokenize features, enhancing tables: the FT-TABPFN model for tabular classification Liu, Yang, Liang, Pang, Zou arXiv preprint arXiv:2406.06891, 2024.
Towards Localization via Data Embedding for TabPFN Koshil, Nagler, Feurer, Eggensperger NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
Enhancing Classification Performance Through the Synergistic Use of XGBoost, TABPFN, and LGBM Models Prabowo, others 2023 15th International Congress on Advanced Applied Informatics Winter (IIAI-AAI-Winter), 2023, Pages 255--259.
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features Hoo, M{\"u}ller, Salinas, Hutter NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
TabPFGen--Tabular Data Generation with TabPFN Ma, Dankar, Stein, Yu, Caterini arXiv preprint arXiv:2406.05216, 2024.
Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data Helli, Schnurr, Hollmann, M{\"u}ller, Hutter arXiv preprint arXiv:2411.10634, 2024.
TabFlex: Scaling Tabular Learning to Millions with Linear Attention Zeng, Kang, Mueller NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
Retrieval \\& Fine-Tuning for In-Context Tabular Models Thomas, Ma, Hosseinzadeh, Golestan, Yu, Volkovs, Caterini arXiv preprint arXiv:2406.05207, 2024.
TabDPT: Scaling Tabular Foundation Models Ma, Thomas, Hosseinzadeh, Kamkari, Labach, Cresswell, Golestan, Yu, Volkovs, Caterini arXiv preprint arXiv:2410.18164, 2024.
Why In-Context Learning Transformers are Tabular Data Classifiers Breejen, Bae, Cha, Yun arXiv preprint arXiv:2405.13396, 2024.
MotherNet: Fast Training and Inference via Hyper-Network Transformers Mueller, Curino, Ramakrishnan NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
Mixture of In-Context Prompters for Tabular PFNs Xu, Cirit, Asadi, Sun, Wang arXiv preprint arXiv:2405.16156, 2024.
Fast and Accurate Zero-Training Classification for Tabular Engineering Data Picard, Ahmed arXiv preprint arXiv:2401.06948, 2024.
Fine-Tuning the Retrieval Mechanism for Tabular Deep Learning den Breejen, Bae, Cha, Kim, Koh, Yun NeurIPS 2023 Second Table Representation Learning Workshop, 2023.
TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks Feuer, Schirrmeister, Cherepanova, Hegde, Hutter, Goldblum, Cohen, White arXiv preprint arXiv:2402.11137, 2024.
Exploration of autoregressive models for in-context learning on tabular data Baur, Kim NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting Margeloiu, Bazaga, Simidjievski, Li{`o}, Jamnik arXiv preprint arXiv:2406.01805, 2024.
Large Scale Transfer Learning for Tabular Data via Language Modeling Gardner, Perdomo, Schmidt arXiv preprint arXiv:2406.12031, 2024.
AnnotatedTables: A Large Tabular Dataset with Language Model Annotations Hu, Fountalis, Tian, Vasiloglou arXiv preprint arXiv:2406.16349, 2024.
TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling Gorishniy, Kotelnikov, Babenko arXiv preprint arXiv:2410.24210, 2024.
Pre-Trained Tabular Transformer for Real-Time, Efficient, Stable Radiomics Data Processing: A Comprehensive Study Jiang, Jia, Zhang, Li 2023 IEEE International Conference on E-health Networking, Application \\& Services (Healthcom), 2023, Pages 276--281.
TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models Margeloiu, Jiang, Simidjievski, Jamnik arXiv preprint arXiv:2409.16118, 2024.
Augmenting Small-size Tabular Data with Class-Specific Energy-Based Models Margeloiu, Jiang, Simidjievski, Jamnik NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
FORECASTPFN: ZERO-SHOT LOW-RESOURCE FORECASTING Khurana, Dooley, Naidu, White, AI No Source, No Year.
What exactly has TabPFN learned to do? McCarter The Third Blogpost Track at ICLR 2024, No Year.
Statistical foundations of prior-data fitted networks Nagler International Conference on Machine Learning, 2023, Pages 25660--25676.
Why In-Context Learning Transformers are Tabular Data Classifiers den Breejen, Bae, Cha, Yun arXiv e-prints, 2024, Pages arXiv--2405.
"},{"location":"research/papers/#tabpfn-application","title":"TabPFN Application","text":"Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells Offensperger, Tin, Duran-Frigola, Hahn, Dobner, Ende, Strohbach, Rukavina, Brennsteiner, Ogilvie, others Science, 2024, Volume 384, Issue 6694, Pages eadk5864.
Deep learning for cross-selling health insurance classification Chu, Than, Jo 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST), 2024, Pages 453--457.
Early fault classification in rotating machinery with limited data using TabPFN Magad{'a}n, Rold{'a}n-G{'o}mez, Granda, Su{'a}rez IEEE Sensors Journal, 2023.
Artificial intelligence-driven predictive framework for early detection of still birth Alzakari, Aldrees, Umer, Cascone, Innab, Ashraf SLAS technology, 2024, Volume 29, Issue 6, Pages 100203.
Prostate Cancer Diagnosis via Visual Representation of Tabular Data and Deep Transfer Learning El-Melegy, Mamdouh, Ali, Badawy, El-Ghar, Alghamdi, El-Baz Bioengineering, 2024, Volume 11, Issue 7, Pages 635.
A machine learning-based approach for individualized prediction of short-term outcomes after anterior cervical corpectomy Karabacak, Schupper, Carr, Margetis Asian Spine Journal, 2024, Volume 18, Issue 4, Pages 541.
Comparing the Performance of a Deep Learning Model (TabPFN) for Predicting River Algal Blooms with Varying Data Composition Yang, Park Journal of Wetlands Research, 2024, Volume 26, Issue 3, Pages 197--203.
Adapting TabPFN for Zero-Inflated Metagenomic Data Perciballi, Granese, Fall, Zehraoui, Prifti, Zucker NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
Comprehensive peripheral blood immunoprofiling reveals five immunotypes with immunotherapy response characteristics in patients with cancer Dyikanov, Zaitsev, Vasileva, Wang, Sokolov, Bolshakov, Frank, Turova, Golubeva, Gantseva, others Cancer Cell, 2024, Volume 42, Issue 5, Pages 759--779.
Predicting dementia in Parkinson's disease on a small tabular dataset using hybrid LightGBM--TabPFN and SHAP Tran, Byeon Digital Health, 2024, Volume 10, Pages 20552076241272585.
Enhancing actuarial non-life pricing models via transformers Brauer European Actuarial Journal, 2024, Pages 1--22.
Machine learning-based diagnostic prediction of minimal change disease: model development study Noda, Ichikawa, Shibagaki Scientific Reports, 2024, Volume 14, Issue 1, Pages 23460.
Using AutoML and generative AI to predict the type of wildfire propagation in Canadian conifer forests Khanmohammadi, Cruz, Perrakis, Alexander, Arashpour Ecological Informatics, 2024, Volume 82, Pages 102711.
Machine learning applications on lunar meteorite minerals: From classification to mechanical properties prediction Pe{~n}a-Asensio, Trigo-Rodr{'\\i}guez, Sort, Ib{'a}{~n}ez-Insa, Rimola International Journal of Mining Science and Technology, 2024.
Data-Driven Prognostication in Distal Medium Vessel Occlusions Using Explainable Machine Learning Karabacak, Ozkara, Faizy, Hardigan, Heit, Lakhani, Margetis, Mocco, Nael, Wintermark, others American Journal of Neuroradiology, 2024.
"},{"location":"tutorials/cheat_sheet/","title":"Cheat Sheet / Best practices","text":"Look at Autogluon cheat sheet [https://auto.gluon.ai/stable/cheatsheet.html]
"},{"location":"tutorials/classification/","title":"Classification","text":"TabPFN provides a powerful interface for handling classification tasks on tabular data. The TabPFNClassifier
class can be used for binary and multi-class classification problems.
Below is an example of how to use TabPFNClassifier
for a multi-class classification task:
from tabpfn_client import TabPFNClassifier\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\n\n# Load the Iris dataset\nX, y = load_iris(return_X_y=True)\n\n# Split data\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Initialize and train classifier\nclassifier = TabPFNClassifier(device='cuda', N_ensemble_configurations=10)\nclassifier.fit(X_train, y_train)\n\n# Evaluate\ny_pred = classifier.predict(X_test)\nprint('Test Accuracy:', accuracy_score(y_test, y_pred))\n
from tabpfn import TabPFNClassifier\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\n\n# Load the Iris dataset\nX, y = load_iris(return_X_y=True)\n\n# Split data\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Initialize and train classifier\nclassifier = TabPFNClassifier(device='cuda', N_ensemble_configurations=10)\nclassifier.fit(X_train, y_train)\n\n# Evaluate\ny_pred = classifier.predict(X_test)\nprint('Test Accuracy:', accuracy_score(y_test, y_pred))\n
"},{"location":"tutorials/classification/#example-with-autotabpfnclassifier","title":"Example with AutoTabPFNClassifier","text":"Abstract
AutoTabPFNClassifier yields the most accurate predictions for TabPFN and is recommended for most use cases. The AutoTabPFNClassifier and AutoTabPFNRegressor automatically run a hyperparameter search and build an ensemble of strong hyperparameters. You can control the runtime using \u00b4max_time\u00b4 and need to make no further adjustments to get best results.
from tabpfn_extensions.post_hoc_ensembles.sklearn_interface import AutoTabPFNClassifier\n\n# we refer to the PHE variant of TabPFN as AutoTabPFN in the code\nclf = AutoTabPFNClassifier(device='auto', max_time=30)\nX, y = load_breast_cancer(return_X_y=True)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\n\nclf.fit(X_train, y_train)\n\npreds = clf.predict_proba(X_test)\ny_eval = np.argmax(preds, axis=1)\n\nprint('ROC AUC: ', sklearn.metrics.roc_auc_score(y_test, preds[:,1], multi_class='ovr'), 'Accuracy', sklearn.metrics.accuracy_score(y_test, y_eval))\n
"},{"location":"tutorials/distshift/","title":"TabPFN's Out-of-Distribution Excellence","text":"Recent research demonstrates TabPFN's out-of-distribution (OOD) performance on tabular data, with further improvements through Drift-Resilient modifications.
"},{"location":"tutorials/distshift/#key-performance-metrics","title":"Key Performance Metrics","text":"Model OOD Accuracy OOD ROC AUC TabPFN Base 0.688 0.786 TabPFN + Drift-Resilient 0.744 0.832 XGBoost 0.664 0.754 CatBoost 0.677 0.766"},{"location":"tutorials/distshift/#technical-improvements","title":"Technical Improvements","text":"The Drift-Resilient modifications introduce:
The enhanced model shows robust generalization across:
For comprehensive documentation and implementation details, visit the GitHub repository.
"},{"location":"tutorials/distshift/#citation","title":"Citation","text":"@inproceedings{\n helli2024driftresilient,\n title={Drift-Resilient Tab{PFN}: In-Context Learning Temporal Distribution Shifts on Tabular Data},\n author={Kai Helli and David Schnurr and Noah Hollmann and Samuel M{\\\"u}ller and Frank Hutter},\n booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},\n year={2024},\n url={https://openreview.net/forum?id=p3tSEFMwpG}\n}\n
"},{"location":"tutorials/regression/","title":"Regression","text":"TabPFN can also be applied to regression tasks using the TabPFNRegressor
class. This allows for predictive modeling of continuous outcomes.
An example usage of TabPFNRegressor
is shown below:
from tabpfn_client import TabPFNRegressor\nfrom sklearn.datasets import load_diabetes\nfrom sklearn.model_selection import train_test_split\nimport numpy as np\nimport sklearn\n\nreg = TabPFNRegressor(device='auto')\nX, y = load_diabetes(return_X_y=True)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\nreg.fit(X_train, y_train)\npreds = reg.predict(X_test)\n\nprint('Mean Squared Error (MSE): ', sklearn.metrics.mean_squared_error(y_test, preds))\nprint('Mean Absolute Error (MAE): ', sklearn.metrics.mean_absolute_error(y_test, preds))\nprint('R-squared (R^2): ', sklearn.metrics.r2_score(y_test, preds))\n
from tabpfn import TabPFNRegressor\nfrom sklearn.datasets import load_diabetes\nfrom sklearn.model_selection import train_test_split\nimport numpy as np\nimport sklearn\n\nreg = TabPFNRegressor(device='auto')\nX, y = load_diabetes(return_X_y=True)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\nreg.fit(X_train, y_train)\npreds = reg.predict(X_test)\n\nprint('Mean Squared Error (MSE): ', sklearn.metrics.mean_squared_error(y_test, preds))\nprint('Mean Absolute Error (MAE): ', sklearn.metrics.mean_absolute_error(y_test, preds))\nprint('R-squared (R^2): ', sklearn.metrics.r2_score(y_test, preds))\n
This example demonstrates how to train and evaluate a regression model. For more details on TabPFNRegressor and its parameters, refer to the API Reference section.
"},{"location":"tutorials/regression/#example-with-autotabpfnregressor","title":"Example with AutoTabPFNRegressor","text":"Abstract
AutoTabPFNRegressor yields the most accurate predictions for TabPFN and is recommended for most use cases. The AutoTabPFNClassifier and AutoTabPFNRegressor automatically run a hyperparameter search and build an ensemble of strong hyperparameters. You can control the runtime using \u00b4max_time\u00b4 and need to make no further adjustments to get best results.
from tabpfn_extensions.post_hoc_ensembles.sklearn_interface import AutoTabPFNRegressor\nfrom sklearn.datasets import load_diabetes\nfrom sklearn.model_selection import train_test_split\nimport numpy as np\nimport sklearn\n\nreg = AutoTabPFNRegressor(max_time=30) # runs for 30 seconds\nX, y = load_diabetes(return_X_y=True)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\nreg.fit(X_train, y_train)\npreds = reg.predict(X_test)\n\nprint('Mean Squared Error (MSE): ', sklearn.metrics.mean_squared_error(y_test, preds))\nprint('Mean Absolute Error (MAE): ', sklearn.metrics.mean_absolute_error(y_test, preds))\nprint('R-squared (R^2): ', sklearn.metrics.r2_score(y_test, preds))\n
"},{"location":"tutorials/timeseries/","title":"Time Series Tutorial","text":"TabPFN can be used for time series forecasting by framing it as a tabular regression problem. This tutorial demonstrates how to use the TabPFN Time Series package for accurate zero-shot forecasting. It was developed by Shi Bin Hoo, Samuel M\u00fcller, David Salinas and Frank Hutter.
"},{"location":"tutorials/timeseries/#quick-start","title":"Quick Start","text":"First, install the package:
!git clone https://github.com/liam-sbhoo/tabpfn-time-series.git\n!pip install -r tabpfn-time-series/requirements.txt\n
See the demo notebook for a complete example.
"},{"location":"tutorials/timeseries/#how-it-works","title":"How It Works","text":"TabPFN performs time series forecasting by:
This approach provides several benefits:
Join our Discord community for support and discussions about TabPFN time series forecasting.
"},{"location":"tutorials/unsupervised/","title":"Unsupervised functionalities","text":"Warning
This functionality is currently only supported using the Local TabPFN Version but not the API.
"},{"location":"tutorials/unsupervised/#data-generation","title":"Data Generation","text":"import numpy as np\nimport torch\nfrom sklearn.datasets import load_breast_cancer\nfrom sklearn.model_selection import train_test_split\nfrom tabpfn_extensions import TabPFNClassifier, TabPFNRegressor\nfrom tabpfn_extensions import unsupervised\n\n# Load the breast cancer dataset\ndf = load_breast_cancer(return_X_y=False)\nX, y = df[\"data\"], df[\"target\"]\nattribute_names = df[\"feature_names\"]\n\n# Split the data\nX_train, X_test, y_train, y_test = train_test_split(\n X, y, test_size=0.5, random_state=42\n)\n\n# Initialize TabPFN models\nclf = TabPFNClassifier(n_estimators=3)\nreg = TabPFNClassifier(n_estimators=3)\n\n# Initialize unsupervised model\nmodel_unsupervised = unsupervised.TabPFNUnsupervisedModel(\n tabpfn_clf=clf, tabpfn_reg=reg\n)\n\n# Select features for analysis (e.g., first two features)\nfeature_indices = [0, 1]\n\n# Create and run synthetic experiment\nexp_synthetic = unsupervised.experiments.GenerateSyntheticDataExperiment(\n task_type=\"unsupervised\"\n)\n\n# Convert data to torch tensors\nX_tensor = torch.tensor(X_train, dtype=torch.float32)\ny_tensor = torch.tensor(y_train, dtype=torch.float32)\n\n# Run the experiment\nresults = exp_synthetic.run(\n tabpfn=model_unsupervised,\n X=X_tensor,\n y=y_tensor,\n attribute_names=attribute_names,\n temp=1.0,\n n_samples=X_train.shape[0] * 3, # Generate 3x original samples\n indices=feature_indices,\n)\n
"},{"location":"tutorials/unsupervised/#outlier-detection","title":"Outlier Detection","text":"import torch\nfrom sklearn.datasets import load_breast_cancer\nfrom tabpfn_extensions import unsupervised\nfrom tabpfn_extensions import TabPFNClassifier, TabPFNRegressor\n\n# Load data\ndf = load_breast_cancer(return_X_y=False)\nX, y = df[\"data\"], df[\"target\"]\nattribute_names = df[\"feature_names\"]\n\n# Initialize models\nclf = TabPFNClassifier(n_estimators=4)\nreg = TabPFNRegressor(n_estimators=4)\nmodel_unsupervised = unsupervised.TabPFNUnsupervisedModel(\n tabpfn_clf=clf, tabpfn_reg=reg\n)\n\n# Run outlier detection\nexp_outlier = unsupervised.experiments.OutlierDetectionUnsupervisedExperiment(\n task_type=\"unsupervised\"\n)\nresults = exp_outlier.run(\n tabpfn=model_unsupervised,\n X=torch.tensor(X),\n y=torch.tensor(y),\n attribute_names=attribute_names,\n indices=[4, 12], # Analyze features 4 and 12\n)\n
"}]}
\ No newline at end of file
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Home","text":"OVERWRITE!
"},{"location":"aup/","title":"ACCEPTABLE USE POLICY","text":"Effective Date: January 8th 2025
This Acceptable Use Policy (\"AUP\") applies to the use of PriorLabs' Services. Where this AUP uses terms that are defined in the General Terms and Conditions (\"GTC\"), those terms shall have the meaning ascribed to them in the GTC.
PriorLabs reserves the right to change this AUP in accordance with the GTC at https://www.priorlabs.ai/aup.
"},{"location":"aup/#1-what-type-of-activity-is-prohibited","title":"1. What type of activity is prohibited?","text":"Customer shall not use, and encourage or allow any other person or entity to use the Services in prohibited manners, including but not limited to the following:
Customer may not upload any personal data within the meaning of the GDPR to the Contract Software or the Services.
Customer may not upload any material to the Contract Software or the Services that infringes the intellectual property rights or other rights of third parties, including but not limited to trademarks, copyrights, trade secrets, rights of publicity, or otherwise violating, infringing or misappropriating the rights of any third party.
Customer may not misappropriate, reverse-engineer, copy, disassemble, decompile, extract source code, trade secrets, or know-how, including PriorLabs' models, algorithms or artificial intelligence systems, or otherwise misuse or manipulate the Contract Software or Services or any part thereof.
Customer may not use the Services or the Contract Software in a way that imposes an unreasonable or disproportionately large load on PriorLabs' infrastructure, which adversely impacting the availability, reliability or stability of PriorLabs' Services.
Customer may not upload any viruses, spam, trojan horses, worms or any other malicious, harmful, or deleterious programs or code, including prompt-based manipulation or scraping behaviors, to the Contract Software or the Services.
Customer may not attempt to use the Services and Contract Software in a manner that compromises, circumvents, or tests the vulnerability of any of PriorLabs' technical safeguards or other security measures.
Customer may not use PriorLabs' Services or the Contract Software in any manner that may subject PriorLabs or any third party to liability, damages or danger.
Customer shall not use the Contract Software improperly or allow it to be used improperly, and in particular shall not use or upload to the Contract Software any content that is illegal or immoral and/or such content that serves to incite hatred, hate speech, illicit deep fakes, or fake news, or incites criminal acts or glorifies or trivializes violence, is sexually offensive or pornographic, is capable of seriously endangering children or young people morally or impairing their well-being or may damage the reputation of PriorLabs, and shall not refer to such content.
This list of prohibited uses is provided by way of example and should not be considered exhaustive.
"},{"location":"aup/#2-who-is-prohibited-from-using-the-services","title":"2. Who is prohibited from using the Services?","text":"Consumers within the meaning of Section 13 German Civil Code may not use PriorLabs' Services.
"},{"location":"cla/","title":"Contributor Agreement","text":""},{"location":"cla/#individual-contributor-exclusive-license-agreement","title":"Individual Contributor Exclusive License Agreement","text":""},{"location":"cla/#including-the-traditional-patent-license-option","title":"(including the Traditional Patent License OPTION)","text":"Thank you for your interest in contributing to PriorLabs's TabPFN (\"We\" or \"Us\").
The purpose of this contributor agreement (\"Agreement\") is to clarify and document the rights granted by contributors to Us. To make this document effective, please follow the instructions at https://www.priorlabs.ai/sign-cla.
"},{"location":"cla/#how-to-use-this-contributor-agreement","title":"How to use this Contributor Agreement","text":"If You are an employee and have created the Contribution as part of your employment, You need to have Your employer approve this Agreement or sign the Entity version of this document. If You do not own the Copyright in the entire work of authorship, any other author of the Contribution should also sign this \u2013 in any event, please contact Us at noah.homa@gmail.com
"},{"location":"cla/#1-definitions","title":"1. Definitions","text":"\"You\" means the individual Copyright owner who Submits a Contribution to Us.
\"Contribution\" means any original work of authorship, including any original modifications or additions to an existing work of authorship, Submitted by You to Us, in which You own the Copyright.
\"Copyright\" means all rights protecting works of authorship, including copyright, moral and neighboring rights, as appropriate, for the full term of their existence.
\"Material\" means the software or documentation made available by Us to third parties. When this Agreement covers more than one software project, the Material means the software or documentation to which the Contribution was Submitted. After You Submit the Contribution, it may be included in the Material.
\"Submit\" means any act by which a Contribution is transferred to Us by You by means of tangible or intangible media, including but not limited to electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, Us, but excluding any transfer that is conspicuously marked or otherwise designated in writing by You as \"Not a Contribution.\"
\"Documentation\" means any non-software portion of a Contribution.
"},{"location":"cla/#2-license-grant","title":"2. License grant","text":""},{"location":"cla/#21-copyright-license-to-us","title":"2.1 Copyright license to Us","text":"Subject to the terms and conditions of this Agreement, You hereby grant to Us a worldwide, royalty-free, Exclusive, perpetual and irrevocable (except as stated in Section 8.2) license, with the right to transfer an unlimited number of non-exclusive licenses or to grant sublicenses to third parties, under the Copyright covering the Contribution to use the Contribution by all means, including, but not limited to:
Moral Rights remain unaffected to the extent they are recognized and not waivable by applicable law. Notwithstanding, You may add your name to the attribution mechanism customary used in the Materials you Contribute to, such as the header of the source code files of Your Contribution, and We will respect this attribution when using Your Contribution.
"},{"location":"cla/#23-copyright-license-back-to-you","title":"2.3 Copyright license back to You","text":"Upon such grant of rights to Us, We immediately grant to You a worldwide, royalty-free, non-exclusive, perpetual and irrevocable license, with the right to transfer an unlimited number of non-exclusive licenses or to grant sublicenses to third parties, under the Copyright covering the Contribution to use the Contribution by all means, including, but not limited to:
This license back is limited to the Contribution and does not provide any rights to the Material.
"},{"location":"cla/#3-patents","title":"3. Patents","text":""},{"location":"cla/#31-patent-license","title":"3.1 Patent license","text":"Subject to the terms and conditions of this Agreement You hereby grant to Us and to recipients of Materials distributed by Us a worldwide, royalty-free, non-exclusive, perpetual and irrevocable (except as stated in Section 3.2) patent license, with the right to transfer an unlimited number of non-exclusive licenses or to grant sublicenses to third parties, to make, have made, use, sell, offer for sale, import and otherwise transfer the Contribution and the Contribution in combination with any Material (and portions of such combination). This license applies to all patents owned or controlled by You, whether already acquired or hereafter acquired, that would be infringed by making, having made, using, selling, offering for sale, importing or otherwise transferring of Your Contribution(s) alone or by combination of Your Contribution(s) with any Material.
"},{"location":"cla/#32-revocation-of-patent-license","title":"3.2 Revocation of patent license","text":"You reserve the right to revoke the patent license stated in section 3.1 if We make any infringement claim that is targeted at your Contribution and not asserted for a Defensive Purpose. An assertion of claims of the Patents shall be considered for a \"Defensive Purpose\" if the claims are asserted against an entity that has filed, maintained, threatened, or voluntarily participated in a patent infringement lawsuit against Us or any of Our licensees.
"},{"location":"cla/#4-license-obligations-by-us","title":"4. License obligations by Us","text":"We agree to license the Contribution only under the terms of the license or licenses that We are using on the Submission Date for the Material (including any rights to adopt any future version of a license).
In addition, We may use the following licenses for Documentation in the Contribution: CC-BY-4.0, CC-BY-ND-4.0, CC-BY-NC-4.0, CC-BY-NC-ND-4.0, CC-BY-NC-SA-4.0, CC-BY-SA-4.0, CC0-1.0, MIT License, Apache License, GNU General Public License (GPL) v2.0, GNU General Public License (GPL) v3.0, GNU Affero General Public License v3.0, GNU Lesser General Public License (LGPL) v2.1, GNU Lesser General Public License (LGPL) v3.0, Mozilla Public License 2.0, Eclipse Public License 2.0, Microsoft Public License (Ms-PL), Microsoft Reciprocal License (Ms-RL), BSD 2-Clause \"Simplified\" or \"FreeBSD\" license, BSD 3-Clause \"New\" or \"Revised\" license (including any right to adopt any future version of a license).
We agree to license patents owned or controlled by You only to the extent necessary to (sub)license Your Contribution(s) and the combination of Your Contribution(s) with the Material under the terms of the license or licenses that We are using on the Submission Date.
"},{"location":"cla/#5-disclaimer","title":"5. Disclaimer","text":"THE CONTRIBUTION IS PROVIDED \"AS IS\". MORE PARTICULARLY, ALL EXPRESS OR IMPLIED WARRANTIES INCLUDING, WITHOUT LIMITATION, ANY IMPLIED WARRANTY OF SATISFACTORY QUALITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT ARE EXPRESSLY DISCLAIMED BY YOU TO US AND BY US TO YOU. TO THE EXTENT THAT ANY SUCH WARRANTIES CANNOT BE DISCLAIMED, SUCH WARRANTY IS LIMITED IN DURATION AND EXTENT TO THE MINIMUM PERIOD AND EXTENT PERMITTED BY LAW.
"},{"location":"cla/#6-consequential-damage-waiver","title":"6. Consequential damage waiver","text":"TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT WILL YOU OR WE BE LIABLE FOR ANY LOSS OF PROFITS, LOSS OF ANTICIPATED SAVINGS, LOSS OF DATA, INDIRECT, SPECIAL, INCIDENTAL, CONSEQUENTIAL AND EXEMPLARY DAMAGES ARISING OUT OF THIS AGREEMENT REGARDLESS OF THE LEGAL OR EQUITABLE THEORY (CONTRACT, TORT OR OTHERWISE) UPON WHICH THE CLAIM IS BASED.
"},{"location":"cla/#7-approximation-of-disclaimer-and-damage-waiver","title":"7. Approximation of disclaimer and damage waiver","text":"IF THE DISCLAIMER AND DAMAGE WAIVER MENTIONED IN SECTION 5. AND SECTION 6. CANNOT BE GIVEN LEGAL EFFECT UNDER APPLICABLE LOCAL LAW, REVIEWING COURTS SHALL APPLY LOCAL LAW THAT MOST CLOSELY APPROXIMATES AN ABSOLUTE WAIVER OF ALL CIVIL OR CONTRACTUAL LIABILITY IN CONNECTION WITH THE CONTRIBUTION.
"},{"location":"cla/#8-term","title":"8. Term","text":"8.1 This Agreement shall come into effect upon Your acceptance of the terms and conditions.
8.2 This Agreement shall apply for the term of the copyright and patents licensed here. However, You shall have the right to terminate the Agreement if We do not fulfill the obligations as set forth in Section 4. Such termination must be made in writing.
8.3 In the event of a termination of this Agreement Sections 5, 6, 7, 8 and 9 shall survive such termination and shall remain in full force thereafter. For the avoidance of doubt, Free and Open Source Software (sub)licenses that have already been granted for Contributions at the date of the termination shall remain in full force after the termination of this Agreement.
"},{"location":"cla/#9-miscellaneous","title":"9. Miscellaneous","text":"9.1 This Agreement and all disputes, claims, actions, suits or other proceedings arising out of this agreement or relating in any way to it shall be governed by the laws of Germany excluding its private international law provisions.
9.2 This Agreement sets out the entire agreement between You and Us for Your Contributions to Us and overrides all other agreements or understandings.
9.3 In case of Your death, this agreement shall continue with Your heirs. In case of more than one heir, all heirs must exercise their rights through a commonly authorized person.
9.4 If any provision of this Agreement is found void and unenforceable, such provision will be replaced to the extent possible with a provision that comes closest to the meaning of the original provision and that is enforceable. The terms and conditions set forth in this Agreement shall apply notwithstanding any failure of essential purpose of this Agreement or any limited remedy to the maximum extent possible under law.
9.5 You agree to notify Us of any facts or circumstances of which you become aware that would make this Agreement inaccurate in any respect.
"},{"location":"contribute/","title":"Contribute","text":"Put out project that people could contribute to and provide instructions for contributing
"},{"location":"docs/","title":"","text":"PriorLabs is building breakthrough foundation models that understand spreadsheets and databases. While foundation models have transformed text and images, tabular data has remained largely untouched. We're tackling this opportunity with technology that could revolutionize how we approach scientific discovery, medical research, financial modeling, and business intelligence.
"},{"location":"docs/#why-tabpfn","title":"Why TabPFN","text":"Rapid Training
TabPFN significantly reduces training time, outperforming traditional models tuned for hours in just a few seconds. For instance, it surpasses an ensemble of the strongest baselines in 2.8 seconds compared to 4 hours of tuning.
Superior Accuracy
TabPFN consistently outperforms state-of-the-art methods like gradient-boosted decision trees (GBDTs) on datasets with up to 10,000 samples. It achieves higher accuracy and better performance metrics across a range of classification and regression tasks.
Robustness
The model demonstrates robustness to various dataset characteristics, including uninformative features, outliers, and missing values, maintaining high performance where other methods struggle.
Generative Capabilities
As a generative transformer-based model, TabPFN can be fine-tuned for specific tasks, generate synthetic data, estimate densities, and learn reusable embeddings. This makes it versatile for various applications beyond standard prediction tasks.
Sklearn Interface
TabPFN follows the interfaces provided by scikit-learn, making it easy to integrate into existing workflows and utilize familiar functions for fitting, predicting, and evaluating models.
Minimal Preprocessing
The model handles various types of raw data, including missing values and categorical variables, with minimal preprocessing. This reduces the burden on users to perform extensive data preparation.
API Client
The fastest way to get started with TabPFN. Access our models through the cloud without requiring local GPU resources.
TabPFN Client
User Interface
Visual interface for no-code interaction with TabPFN. Perfect for quick experimentation and visualization.
Access GUI
Python Package
Local installation for research and privacy sesitive use cases with GPU support and scikit-learn compatible interface.
TabPFN Local
R Integration
Currently in development. Bringing TabPFN's capabilities to the R ecosystem for data scientists and researchers. Contact us for more information, or to get involved!
"},{"location":"enterprise/","title":"TabPFN Business","text":"
Unlock the hidden value in your company's databases and spreadsheets using TabPFN. Our state-of-the-art tabular foundation model is faster and more accurate in 96% of the use-cases and requires 50% less data as previous methods.
Save your data science team hours & days of work and enable them to focus on mission-critical business problems, even when data availability is limited.
"},{"location":"enterprise/#why-tabpfn-business","title":"Why TabPFN Business?","text":""},{"location":"enterprise/#access-to-enterprise-grade-features","title":"Access to Enterprise-Grade Features","text":"Please select all the ways you would like to hear from PriorLabs:
EmailYou can unsubscribe at any time by clicking the link in the footer of our emails.
We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices.
"},{"location":"newsletter/","title":"Stay Updated with TabPFN","text":"Join our newsletter to get the latest updates on TabPFN's development, best practices, and breakthrough research in tabular machine learning.
"},{"location":"newsletter/#what-youll-get","title":"What You'll Get","text":"Please select all the ways you would like to hear from PriorLabs:
EmailYou can unsubscribe at any time by clicking the link in the footer of our emails.
We use Mailchimp as our marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp's privacy practices.
"},{"location":"privacy_policy/","title":"Privacy policy","text":"PRIVACY POLICY\nLast updated: January 8th, 2025\n1. General information\nPrior Labs GmbH, Elisabeth-Emter-Weg 18, 79110 Freiburg im Breisgau (hereinafter \u201cPriorLabs\u201d, \u201cwe\u201d or \u201cus\u201d) takes the protection of personal data very seriously. \nWe treat personal data confidentially and always in accordance with the applicable data protection laws, in particular Regulation (EU) 2016/679 (hereinafter \u201cGeneral Data Protection Regulation\u201d or \u201cGDPR\u201d), the German Federal Data Protection Act (hereinafter \u201cBDSG\u201d), and in accordance with the provisions of this privacy policy.\nThe aim of this privacy policy is to inform you (hereinafter \u201cdata subject\u201d or \u201cyou\u201d) in accordance with Art. 12 et seq. GDPR about how we process your personal data and for what purposes we process your personal data when using our website https://priorlabs.ai/ (hereinafter \u201cWebsite\u201d), our services or contacting us.\nUnless otherwise stated in this privacy policy, the terms used here have the meaning as defined in the GDPR.\n2. Data controller\nPriorLabs acts as a controller within the meaning of the GDPR in relation to your personal data processed in connection with the use of our Website, Service or a contact made to or by PriorLabs. \nIf you have any questions about this privacy policy or the processing of your personal data, you can contact us at the following contact details:\nPrior Labs GmbH\nElisabeth-Emter-Weg 18\n79110 Freiburg im Breisgau\nE-mail: dataprotection@priorlabs.ai\n\nCategories, purposes and legal bases of the personal data processed\nWe process different categories of your personal data for different purposes. Below you can see which data we process in which contexts, for which purposes and on which legal basis we base the respective processing.\n2.1. Visiting our Website\nWhen visiting our Website for informational purposes, i.e., mere viewing and without you providing us with any other information, certain personal data is automatically collected each time the Website are called up and stored in so-called server log files. These are:\n\u2022 Browser type and version. The specific type and model of Internet browser you are using, such as Google Chrome, Mozilla Firefox, or Microsoft Edge, along with the specific version of the browser.\n\u2022 Operating system used. Your operating system for your digital activity, such as Windows, macOS, Linux, iOS, or Android.\n\u2022 Host name of the accessing computer. The unique name that your device has on the Internet or on a local network.\n\u2022 The date and time of access. The exact time of access to the Website. \n\u2022 IP address of the requesting computer. The unique numeric identifier assigned to a device when it connects to the Internet. \nSuch data is not merged with other data sources, and the data is not evaluated for marketing purposes. \nLegal basis:\nThe legal basis for the temporary storage and processing of such personal data is Art. 6 para. 1 sent. 1 lit. f GDPR. Our legitimate interest here is to be able to provide you with technically functional, attractive and user-friendly Website as well as to ensure the security of our systems.\nDuration of the storage:\nSuch personal data will be deleted as soon as it is no longer required to achieve the purpose for which it was collected. For personal data stored in log files, this is the case after 7 days at the latest. \nHowever, in some cases, e.g., due to legal retention periods we might be under the legal obligation to continue the storage of your personal data.\n2.2. Use of our Services\nWe provide you with a software to TabPFN foundation models in the context of the analysis, processing and evaluation of tabular business data (\u201cServices\u201d). Please note our Acceptable Use Policy which strictly prohibits the upload of personal data to use our Services. \nAlthough, you are not allowed to upload (tabular) personal data to have them analyzed, processed and evaluated, we are processing certain personal data when you are accessing our Services via our API.\n2.2.1. User account\nWhen you register your user account, we process the following personal data:\n\u2022 First and last name\n\u2022 E-mail address\n\u2022 Password\n\nLegal basis:\nWe process the aforementioned information to create your user account and, thus, such data will be processed for the performance of a contract or in order to take steps prior to entering into a contract in accordance with Art. 6 para. 1 sent. 1 lit. b GDPR. \nDuration of the storage:\nYou can delete your user account at any time by sending an e-mail with your request to dataprotection@priorlabs.ai. We will delete your user account when it has been inactive for 3 years.\n2.2.2. Usage data\nWhen you use our service, we process certain personal data about how you use it and the device you use to access it. We process the following usage data in the form of log files:\n\u2022 IP address of the requesting computer. The unique numeric identifier assigned to a device when it connects to the Internet. \n\u2022 Browser type and version. The specific type and model of Internet browser you are using, such as Google Chrome, Mozilla Firefox, or Microsoft Edge, along with the specific version of the browser.\n\u2022 Operating system used. Your operating system for your digital activity, such as Windows, macOS, Linux, iOS, or Android.\n\u2022 The date and time of access. The exact time of access to the Website. \n\u2022 Host name of the accessing computer. The unique name that your device has on the Internet or on a local network.\nThe processing of this data is used for the technical provision of our services and their contents, as well as to optimise their usability and ensure the security of our information technology systems.\nLegal basis:\nThe legal basis for the temporary storage and processing of such personal data is Art. 6 para. 1 sent. 1 lit. f GDPR. Our legitimate interest here is the technical provision of our services and their contents, as well as to optimise their usability and ensure the security of our information technology systems to be able to provide you with technically functional, attractive and user-friendly Website as well as to ensure the security of our systems.\nDuration of the storage:\nSuch personal data will be deleted as soon as it is no longer required to achieve the purpose for which it was collected. For personal data stored in log files, this is the case after 7 days at the latest. \nHowever, in some cases, e.g., due to legal retention periods we might be under the legal obligation to continue the storage of your personal data.\n2.3. Contact\nIt is possible to contact us on our Website by e-mail. When you contact us, we collect and process certain information in connection with your specific request, such as, e.g., your name, e-mail address, and other data requested by us or data you voluntarily provide to us (hereinafter \u201cContact Data\u201d). \nLegal basis:\nIf you contact us as part of an existing contractual relationship or contact us in advance for information about our range of services, the Contact Data will be processed for the performance of a contract or in order to take steps prior to entering into a contract and to respond to your contact request in accordance with Art. 6 para. 1 sent. 1 lit. b GDPR. \nOtherwise, the legal basis for the processing of Contact Data is Art. 6 para. 1 sent. 1 lit. f GDPR. The Contact Data is processed to pursue our legitimate interests in responding appropriately to customer/contact inquiries.\nDuration of storage:\nWe will delete Contact Data as soon as the purpose for data storage and processing no longer applies (e.g., after your request has been processed). \nHowever, in some cases, e.g., due to legal retention periods we might be under the legal obligation to continue the storage of your personal data.\n2.4. Newsletter\nWith your consent, we may process your personal data to send you a newsletter via e-mail that contains information about our products and services. To send you the newsletter, we require processing your e-mail address, date and time of your registration, your IP address and browser type. \nOur newsletters contain so-called tracking links that enable us to analyze the behavior of newsletter recipients. We may collect personal data such as regarding the opening of the newsletter (date and time), selected links, and the following information of the accessing computer system: IP address used, browser type and version, device type and operating system (\u201cTracking Data\u201d). This enables us to statistically analyze the success or failure of online marketing campaigns.\nLegal basis:\nThe data processing activities with regard to the newsletter sending and newsletter tracking only take place if and insofar as you have expressly consented to it within the merits of Article 6 para. 1 sent. 1 lit. a GDPR. Your prior consent for such processing activities is obtained during the newsletter subscription process (double opt-in) by way of independent consent declaration referring to this privacy policy.\nYou can revoke your consent at any time with effect for the future by clicking on the unsubscribe link in e-mails. The withdrawal of your consent does not affect the lawfulness of processing based on your consent before its withdrawal. \nDuration of storage:\nWe will delete your personal data as soon as the purpose for data storage and processing no longer applies. Your e-mail address will be stored for as long as the subscription to our newsletter is active. \nHowever, in some cases, e.g., due to legal retention periods, we might be under the legal obligation to continue the storage of your personal data.\n2.5. Social media and professional networks and platforms\nWe utilize the possibility of company appearances on social and professional networks and platforms (LinkedIn, Github, X and Discord) in order to be able to communicate with you and to inform you about our services and news about us. \nYou can, inter alia, access the respective network or platform by clicking on the respective network icon displayed on our Website, which includes a hyperlink. A hyperlink activated by clicking on it opens the external destination in a new browser window of your browser. No personal data is transferred to the respective network before this activation.\n2.5.1. Visiting our page on social media and professional networks and platforms\nThe respective aforementioned network or platform is, in principle, solely responsible for the processing of personal data when you visit our company page on one of those networks or platforms. \nPlease do not contact us via one of the networks or platforms if you wish to avoid this. You use such networks and platforms and their functions on your own responsibility. \n2.5.2. Communication via social media and professional networks and platforms\nWe process information that you have made available to us via our company page on the respective network or platform, e.g., your (user) name, e-mail address, contact details, communication content, job title, company name, industry, education, contact options, photo, and other data you voluntarily provide to us. The (user) names of the registered network or platform users who have visited our company page on the networks or platforms may be visible to us. \nLegal basis:\nIf you contact us as part of an existing contractual relationship or contact us in advance for information about our range of services, the personal data will be processed for the performance of a contract or in order to take steps prior to entering into a contract and to respond to your contact request in accordance with Art. 6 para. 1 sent. 1 lit. b GDPR. \nOtherwise, the legal basis for the processing of the personal data is Art. 6 para. 1 sent. 1 lit. f GDPR. The personal data is processed to pursue our legitimate interests in responding appropriately to customer/contact inquiries.\nDuration of storage:\nWe will delete your personal data as soon as the purpose for data storage and processing no longer applies (e.g., after your request has been processed). \nHowever, in some cases, e.g., due to legal retention periods we might be under the legal obligation to continue the storage of your personal data.\n3. Data receiver\nWe might transfer your personal data to certain data receivers if such transfer is necessary to fulfill our contractual and legal obligations.\nIn individual cases, we transfer personal data to our consultants in legal or tax matters, whereby these recipients act independently in their own data protection responsibilities and are also obliged to comply with the requirements of the GDPR and other applicable data protection regulations. In addition, they are bound by special confidentiality and secrecy obligations due to their professional position. \nIn the event of corporate transactions (e.g., sale of our business or a part of it), we may transfer personal data to involved advisors or to potential buyers.\nAdditionally, we also use services provided by various specialized companies, e.g., IT service providers, that process data on our behalf (hereinafter \u201cData Processors\u201d). We have concluded a data processing agreement according to Art. 28 GDPR or EU standard contractual clauses of the EU Commission pursuant to Art. 46 para. 2 lit. c GDPR with each service provider and they only process data in accordance with our instructions and not for their own purposes. \nOur current Data Processors are:\nData Processor Purpose of commissioning the Data Processor / purpose of processing\nOpenAI Processing text inputs to our model API\nMailchimp Newsletter Signup\nGoogle Analytics Usage analytics\n4. Data transfers to third countries\nYour personal data is generally processed in Germany and other countries within the European Economic Area (EEA).\nHowever, it may also be necessary for personal data to be transferred to recipients located outside the EEA, i.e., to third countries, such as the USA. If possible, we conclude the currently applicable EU standard contractual clauses of the EU Commission pursuant to Art. 46 para. 2 lit. c GDPR with all processors located outside the EEA. Otherwise, we ensure that a transfer only takes place if an adequacy decision exists with the respective third country and the recipient is certified under this, if necessary. We will provide you with respective documentation on request.\n5. Your rights\nThe following rights are available to you as a data subject in accordance with the provisions of the GDPR:\n5.1. Right of revocation\nYou may revoke your consent to the processing of your personal data at any time pursuant to Art. 7 para. 3 GDPR. Please note, that the revocation is only effective for the future. Processing that took place before the revocation remains unaffected. \n5.2. Right of access\nUnder the conditions of Art. 15 GDPR you have the right to request confirmation from us at any time as to whether we are processing personal data relating to you. If this is the case, you also have the right within the scope of Art. 15 GDPR to receive access to the personal data as well as certain other information about the personal data and a copy of your personal data. The restrictions of \u00a7 34 BDSG apply.\n5.3. Right to rectification\nUnder the conditions of Art. 16 GDPR you have the right to request us to correct the personal data stored about you if it is inaccurate or incomplete.\n5.4. Right to erasure\nYou have the right, under the conditions of Art. 17 GDPR, to demand that we delete the personal data concerning you without delay. \n5.5. Right to restrict processing\nYou have the right to request that we restrict the processing of your personal data under the conditions of Art. 18 GDPR.\n5.6. Right to data portability\nYou have the right, under the conditions of Art. 20 GDPR, to request that we hand over, in a structured, common and machine-readable format, the personal data concerning you that you have provided to us. Please note that this right only applies where the processing is based on your consent, or a contract and the processing is carried out by automated means.\n5.7. Right to object\nYou have the right to object to the processing of your personal data under the conditions of Art. 21 GDPR.\n5.8. Right to complain to a supervisory authority\nSubject to the requirements of Art. 77 GDPR, you have the right to file a complaint with a competent supervisory authority. As a rule, the data subject may contact the supervisory authority of his or her habitual residence or place of work or place of the alleged infringement or the registered office of PriorLabs. The supervisory authority responsible for PriorLabs is the State Commissioner for Data Protection and Freedom of Information for Baden-W\u00fcrttemberg. A list of all German supervisory authorities and their contact details can be found here.\n6. Obligation to provide data\nWhen you visit our Website, you may be required to provide us with certain personal data as described in this privacy policy. Beyond that, you are under no obligation to provide us with personal data. However, if you do not provide us with your personal data as required, you may not be able to contact us and/or we may not be able to contact you to respond to your inquiries or questions.\n7. Automated decisions/profiling\nThe processing of your personal data carried out by us does not contain any automated decisions in individual cases within the meaning of Art. 22 para. 1 GDPR.\n8. Changes to this privacy policy\nWe review this privacy policy regularly and may update it at any time. If we make changes to this privacy policy, we will change the date of the last update above. Please review this privacy policy regularly to be aware of any updates. The current version of this privacy policy can be accessed at any time at Priorlabs.ai/privacy.\n
"},{"location":"tabpfn-license/","title":"TabPFN License","text":" Prior Labs License\n Version 1.0, January 2025\n http://priorlabs.ai/tabpfn-license\n\n This license is a derivative of the Apache 2.0 license\n (http://www.apache.org/licenses/) with a single modification:\n The added Paragraph 10 introduces an enhanced attribution requirement\n inspired by the Llama 3 license.\n\n TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION\n\n 1. Definitions.\n\n \"License\" shall mean the terms and conditions for use, reproduction,\n and distribution as defined by Sections 1 through 9 of this document.\n\n \"Licensor\" shall mean the copyright owner or entity authorized by\n the copyright owner that is granting the License.\n\n \"Legal Entity\" shall mean the union of the acting entity and all\n other entities that control, are controlled by, or are under common\n control with that entity. For the purposes of this definition,\n \"control\" means (i) the power, direct or indirect, to cause the\n direction or management of such entity, whether by contract or\n otherwise, or (ii) ownership of fifty percent (50%) or more of the\n outstanding shares, or (iii) beneficial ownership of such entity.\n\n \"You\" (or \"Your\") shall mean an individual or Legal Entity\n exercising permissions granted by this License.\n\n \"Source\" form shall mean the preferred form for making modifications,\n including but not limited to software source code, documentation\n source, and configuration files.\n\n \"Object\" form shall mean any form resulting from mechanical\n transformation or translation of a Source form, including but\n not limited to compiled object code, generated documentation,\n and conversions to other media types.\n\n \"Work\" shall mean the work of authorship, whether in Source or\n Object form, made available under the License, as indicated by a\n copyright notice that is included in or attached to the work\n (an example is provided in the Appendix below).\n\n \"Derivative Works\" shall mean any work, whether in Source or Object\n form, that is based on (or derived from) the Work and for which the\n editorial revisions, annotations, elaborations, or other modifications\n represent, as a whole, an original work of authorship. For the purposes\n of this License, Derivative Works shall not include works that remain\n separable from, or merely link (or bind by name) to the interfaces of,\n the Work and Derivative Works thereof.\n\n \"Contribution\" shall mean any work of authorship, including\n the original version of the Work and any modifications or additions\n to that Work or Derivative Works thereof, that is intentionally\n submitted to Licensor for inclusion in the Work by the copyright owner\n or by an individual or Legal Entity authorized to submit on behalf of\n the copyright owner. For the purposes of this definition, \"submitted\"\n means any form of electronic, verbal, or written communication sent\n to the Licensor or its representatives, including but not limited to\n communication on electronic mailing lists, source code control systems,\n and issue tracking systems that are managed by, or on behalf of, the\n Licensor for the purpose of discussing and improving the Work, but\n excluding communication that is conspicuously marked or otherwise\n designated in writing by the copyright owner as \"Not a Contribution.\"\n\n \"Contributor\" shall mean Licensor and any individual or Legal Entity\n on behalf of whom a Contribution has been received by Licensor and\n subsequently incorporated within the Work.\n\n 2. Grant of Copyright License. Subject to the terms and conditions of\n this License, each Contributor hereby grants to You a perpetual,\n worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n copyright license to reproduce, prepare Derivative Works of,\n publicly display, publicly perform, sublicense, and distribute the\n Work and such Derivative Works in Source or Object form.\n\n 3. Grant of Patent License. Subject to the terms and conditions of\n this License, each Contributor hereby grants to You a perpetual,\n worldwide, non-exclusive, no-charge, royalty-free, irrevocable\n (except as stated in this section) patent license to make, have made,\n use, offer to sell, sell, import, and otherwise transfer the Work,\n where such license applies only to those patent claims licensable\n by such Contributor that are necessarily infringed by their\n Contribution(s) alone or by combination of their Contribution(s)\n with the Work to which such Contribution(s) was submitted. If You\n institute patent litigation against any entity (including a\n cross-claim or counterclaim in a lawsuit) alleging that the Work\n or a Contribution incorporated within the Work constitutes direct\n or contributory patent infringement, then any patent licenses\n granted to You under this License for that Work shall terminate\n as of the date such litigation is filed.\n\n 4. Redistribution. You may reproduce and distribute copies of the\n Work or Derivative Works thereof in any medium, with or without\n modifications, and in Source or Object form, provided that You\n meet the following conditions:\n\n (a) You must give any other recipients of the Work or\n Derivative Works a copy of this License; and\n\n (b) You must cause any modified files to carry prominent notices\n stating that You changed the files; and\n\n (c) You must retain, in the Source form of any Derivative Works\n that You distribute, all copyright, patent, trademark, and\n attribution notices from the Source form of the Work,\n excluding those notices that do not pertain to any part of\n the Derivative Works; and\n\n (d) If the Work includes a \"NOTICE\" text file as part of its\n distribution, then any Derivative Works that You distribute must\n include a readable copy of the attribution notices contained\n within such NOTICE file, excluding those notices that do not\n pertain to any part of the Derivative Works, in at least one\n of the following places: within a NOTICE text file distributed\n as part of the Derivative Works; within the Source form or\n documentation, if provided along with the Derivative Works; or,\n within a display generated by the Derivative Works, if and\n wherever such third-party notices normally appear. The contents\n of the NOTICE file are for informational purposes only and\n do not modify the License. You may add Your own attribution\n notices within Derivative Works that You distribute, alongside\n or as an addendum to the NOTICE text from the Work, provided\n that such additional attribution notices cannot be construed\n as modifying the License.\n\n You may add Your own copyright statement to Your modifications and\n may provide additional or different license terms and conditions\n for use, reproduction, or distribution of Your modifications, or\n for any such Derivative Works as a whole, provided Your use,\n reproduction, and distribution of the Work otherwise complies with\n the conditions stated in this License.\n\n 5. Submission of Contributions. Unless You explicitly state otherwise,\n any Contribution intentionally submitted for inclusion in the Work\n by You to the Licensor shall be under the terms and conditions of\n this License, without any additional terms or conditions.\n Notwithstanding the above, nothing herein shall supersede or modify\n the terms of any separate license agreement you may have executed\n with Licensor regarding such Contributions.\n\n 6. Trademarks. This License does not grant permission to use the trade\n names, trademarks, service marks, or product names of the Licensor,\n except as required for reasonable and customary use in describing the\n origin of the Work and reproducing the content of the NOTICE file.\n\n 7. Disclaimer of Warranty. Unless required by applicable law or\n agreed to in writing, Licensor provides the Work (and each\n Contributor provides its Contributions) on an \"AS IS\" BASIS,\n WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or\n implied, including, without limitation, any warranties or conditions\n of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A\n PARTICULAR PURPOSE. You are solely responsible for determining the\n appropriateness of using or redistributing the Work and assume any\n risks associated with Your exercise of permissions under this License.\n\n 8. Limitation of Liability. In no event and under no legal theory,\n whether in tort (including negligence), contract, or otherwise,\n unless required by applicable law (such as deliberate and grossly\n negligent acts) or agreed to in writing, shall any Contributor be\n liable to You for damages, including any direct, indirect, special,\n incidental, or consequential damages of any character arising as a\n result of this License or out of the use or inability to use the\n Work (including but not limited to damages for loss of goodwill,\n work stoppage, computer failure or malfunction, or any and all\n other commercial damages or losses), even if such Contributor\n has been advised of the possibility of such damages.\n\n 9. Accepting Warranty or Additional Liability. While redistributing\n the Work or Derivative Works thereof, You may choose to offer,\n and charge a fee for, acceptance of support, warranty, indemnity,\n or other liability obligations and/or rights consistent with this\n License. However, in accepting such obligations, You may act only\n on Your own behalf and on Your sole responsibility, not on behalf\n of any other Contributor, and only if You agree to indemnify,\n defend, and hold each Contributor harmless for any liability\n incurred by, or claims asserted against, such Contributor by reason\n of your accepting any such warranty or additional liability.\n\n ---------------------- ADDITIONAL PROVISION --------------------------\n\n 10. Additional attribution.\n If You distribute or make available the Work or any Derivative\n Work thereof relating to any part of the source or model weights,\n or a product or service (including another AI model) that contains\n any source or model weights, You shall (A) provide a copy of this\n License with any such materials; and (B) prominently display\n \u201cBuilt with TabPFN\u201d on each related website, user interface, blogpost,\n about page, or product documentation. If You use the source or model\n weights or model outputs to create, train, fine tune, distil, or\n otherwise improve an AI model, which is distributed or made available,\n you shall also include \u201cTabPFN\u201d at the beginning of any such AI model name.\n To clarify, internal benchmarking and testing without external\n communication shall not qualify as distribution or making available\n pursuant to this Section 10 and no attribution under this Section 10\n shall be required.\n\n\n END OF TERMS AND CONDITIONS\n
"},{"location":"tabpfn-nature/","title":"Accurate predictions on small data with a tabular foundation model","text":"API Client
The fastest way to get started with TabPFN. Access our models through the cloud without requiring local GPU resources.
TabPFN Client
User Interface
Visual interface for no-code interaction with TabPFN. Perfect for quick experimentation and visualization.
Access GUI
Python Package
Local installation for research and privacy sesitive use cases with GPU support and scikit-learn compatible interface.
TabPFN Local
R Integration
Currently in development. Bringing TabPFN's capabilities to the R ecosystem for data scientists and researchers. Contact us for more information, or to get involved!
GENERAL TERMS AND CONDITIONS\n1. Scope of Application\n1.1. These general terms and conditions (\"GTC\") govern the provision of access to the TabPFN foundation models as available at https://www.priorlabs.ai (\u201cServices\u201d) provided by Prior Labs GmbH, Elisabeth-Emter-Weg 18, 79110 Freiburg im Breisgau (\u201cPriorLabs\").\n1.2. The Services of PriorLabs are directed exclusively at business customers (Unternehmer) within the meaning of Sec. 14 German Civil Code (B\u00fcrgerliches Gesetzbuch, BGB) (\u201cCustomer\u201d). PriorLabs may require the Customer to provide sufficient proof of its status as business customer prior to the conclusion of the contract. \n1.3. Conflicting or additional contractual conditions of the Customer shall only apply if PriorLabs expressly confirms them in writing. \n2. Conclusion of Contract\n2.1. The contract is concluded with the inclusion of these GTC (\u201cContract\u201d) at the earliest of (i) when the Customer registers and sets up an account via the Services (\u201cPriorLabs Account\u201d). \n2.2. Upon conclusion of the Contract, the Customer shall provide PriorLabs with all information that PriorLabs reasonably requires in order to provide the Services correctly and completely. The Customer is obliged to inform PriorLabs immediately of any relevant changes. \n3. Registration and PriorLabs Account\n3.1. In order to fully use the Services, the registration and setting up of a PriorLabs Account is required. By registering or using a PriorLabs Account, [the Customer agrees and represents that they created their PriorLabs Account, and they will use their PriorLabs Account only for themselves. Each Customer shall register only one PriorLabs Account. A PriorLabs Account is not transferable.\n3.2. If and to the extent, PriorLabs stores Customer\u2019s data, PriorLabs disclaims any liability for the storage, accessibility, or integrity of such data.\n3.3. The Customer is obliged (i) to provide complete and correct information about its person or entity at the time of registration and (ii) in case of respective changes to correct without undue delay this information insofar such information is mandatory for the performance of the Contract. \n3.4. If PriorLabs receives a notice or otherwise has reason to believe that the information or documents provided by the Customer are wholly or partially incorrect, incomplete or not up to date, PriorLabs is entitled to request the Customer to remedy the situation immediately. If the Customer fails to correct or complete the information or document within the set deadline, PriorLabs is entitled to restrict access to the Services and block the Customer until the Customer has fully complied with the request.\n3.5. The Customer must keep their log-in information secret and carefully secure access to their PriorLabs Account. The Customer shall take reasonable precautions to prevent unauthorized access to the PriorLabs Account, and to protect the Services from unauthorized use. The Customer is obliged to inform PriorLabs immediately if there are indications that a PriorLabs Account has been misused by a third party. The Customer\u2019s liability for any activity of or interaction with a corrupted account is subject to statutory rules.\n4. Contract Software\n4.1. PriorLabs has developed the TabPFN foundation models that allows the analysis, processing and evaluation of tabular data (\u201cContract Software\u201d).\n4.2. PriorLabs may, to the extent available, provide the Customer with Customer documentation for the Contract Software in digital form (e.g. as a pdf file).\n4.3. PriorLabs provides the Contract Software \"as is\" with the functionality, scope and performance and in a condition suitable for the contractual use.. PriorLabs disclaims any liability of the availability, accuracy, or correctness of the use of the Contract Software and does not warrant the integration in the Customer\u2019s IT systems. \n4.4. The functionality, scope and performance of the Contract Software may change during the Contract Term (as defined below). PriorLabs reserves the right to add, remove, change or substitute elements of the Contract Software as deemed necessary at any time, in particular for the purpose of increasing efficiency, improvements, additional features, and/or safety or due to changes in the legal situation, technical developments or for reasons of IT security, or cease providing the Services altogether. \n5. PriorLabs Intellectual Property\n5.1. PriorLabs remains the sole owner of all right, title, and interest in the Contract Software, including but not limited to any models, algorithms, and neural networks. To the extent PriorLabs provides any Services or access to the Contract Software free of charge, PriorLabs does not waive any rights in such Services or the Contract Software. \n5.2. Except as stated in these GTC, PriorLabs does not grant the Customer any rights to patents, copyrights, trade secrets, trademarks, or any other rights in respect to the Contract Software. \n5.3. By using the Contract Software or using any Services, the Customer does not acquire ownership of any rights in the Contract Software, Services, documentation, and/or any related intellectual property other than stated in these GTC.\n6. API Access \n6.1. PriorLabs allows registered Customers, as and to the extent available from time to time, access to the Contract Software via an application programming interface (\u201cAPI\u201d), non-exclusively, non-transferable and non-sublicensable to use it exclusively as provided on the PriorLabs website or as described in the Customer documentation for the API (\u201cAPI Access\u201d). \n6.2. The Customer\u2019s access to and use of the Services must at all times be in accordance with applicable laws and regulations. The Customer is solely responsible for knowing and complying with the applicable laws and regulations. Permitted conditions of use and scope of use of the Services are further set out in the Acceptable Use Policy available under https://www.priorlabs.ai/aup (\u201cAUP\u201d). The Customer acknowledges that the provisions set out in the AUP shall be deemed material obligations under this Contract.\n7. Customer Content; Licenses\n7.1. The Customer must own or hold valid rights of sufficient scope to any material, documents, data or other content uploaded into the Services and to be processed by the Contract Software (\u201cCustomer Content\u201d). The Customer Content consists exclusively of non-personal data within the meaning of the General Data Protection Regulation (\u201cGDPR\u201d), as set out in the AUP. \n7.2. PriorLabs shall take appropriate physical, technical, and organizational security measures with regard to the Contract Software and any Customer Content. \n7.3. The Customer grants PriorLabs the non-exclusive, worldwide, sublicensable right (i) to use Customer Content for the performance of PriorLabs\u2019 obligations under this Contract and, in particular, to reproduce such data on the server under PriorLabs\u2019 name itself or through a subcontractor for the purpose of providing the Service, and (ii) to use Customer Content as so-called training data in order to develop, test, and improve the Contract Software, in particular the underlying artificial intelligence systems and/or foundation models.\n7.4. The Customer is fully responsible for all Customer Content uploaded to the Services, in particular the Customer ensures that Customer Content is fit for PriorLabs\u2019 use in accordance with this Contract (including any necessary licenses pursuant to Section 7.3) and does not violate any applicable law or other rights of third parties, in particular copyright, trade secrets, or rights under the GDPR.\n8. Service Results\n8.1. The Contract Software may be used to generate certain analyses, content, documents, reports, or other results (\u201cService Results\u201d) based on Customer Content.\n8.2. The Customer may freely use the Service Results. PriorLabs provides the Service Results \"as is\". The Customer is responsible for reviewing any Service Results of its use of the Contract Software. PriorLabs does not warrant the accuracy, correctness, completeness, usability, or fitness for a certain purpose of the Service Results and does not assume any liability for Customer\u2019s use of Service Results. In particular, PriorLabs disclaims all warranty if the Customer modifies, adapts or combines Service Results with third-party material or products.\n8.3. PriorLabs may use the Service Results to develop, test and improve the Contract Software, in particular the underlying artificial intelligence systems and/or foundation models.\n9. Obligations of the Customer\n9.1. The Customer shall create their own backup copies of Customer Data in case of loss of data. PriorLabs provides a corresponding function for creating backup copies.\n9.2. The Customer shall inform PriorLabs without undue delay as soon as they become aware of the infringement of an intellectual property right or copyright in the Contract Software.\n9.3. The Customer shall ensure that all of its employees authorized to use the Contract Software have (i) received sufficient training on the safe use of the Contract Software, (ii) exercise the necessary care when using it, and (iii) are compliant with these GTC including the AUP .\n9.4. The Customer shall subject any end-users of the Contract Software and the Services to obligations reflecting the stipulations of this Contract, in particular the AUP. \n10. Blocking of Accesses\n10.1. PriorLabs is entitled to block access to the Contract Software and the Services temporarily or permanently if there are reliable indications that the Customer or, where applicable, one of its employees is violating or has violated material obligations under this GTC, including the Acceptable Use Policy, and/or applicable intellectual property, data protection of other statutory laws or if PriorLabs has another justified interest in the blocking, such as IT-security concerns. \n10.2. When deciding on a blocking, PriorLabs shall give due consideration to the legitimate interests of the Customer. PriorLabs shall inform the Customer of the blocking within a reasonable timeframe before the blocking comes into effect, provided that the information does not conflict with the purpose of the blocking. The blocking shall continue until the contractual or legal violation has been remedied in an appropriate manner.\n11. Limitation of Liability \n11.1. The Services are provided free of charge. Therefore, PriorLabs\u2019 liability is in any cases limited to acts of intent or gross negligence.\n11.2. The strict liability for damages for defects of the Services already existing at the beginning of the Contract Term (as defined below) in terms of Section 536a German Civil Code is excluded. The Services are provided on an \u201cas is\u201d basis, which, in accordance with Section 4 of these GTC, refers in particular to the marketability, availability, and security aspects of the Contract Software.\n12. Indemnity\nThe Customer shall indemnify PriorLabs from any and all claims of end-users or third parties who assert claims against PriorLabs on account of the use of the Services by the Customer or the Customer\u2019s end-users, in particular concerning any Customer Content used in combination with the Contract Software. The provisions of this Section shall apply mutatis mutandis to any liquidated damages (Vertragsstrafen) as well as to any administrative fines (Bu\u00dfgeld) or penalties imposed by the authorities or by the courts, to the extent that the Customer is responsible for such.\n13. Term; Termination of the Contract\n13.1. If not agreed otherwise, the Contract is concluded for an indefinite period of time until terminated by either Party (\"Contract Term\"). \n13.2. The Customer may terminate the Contract at any time by deleting its PriorLabs Account. \n13.3. PriorLabs reserves the right to terminate the Contract at any time but will consider the Customer\u2019s legitimate interests to the extent possible, e.g., by sending the notice of termination in due time to the email address provided by the Customer upon registration of the PriorLabs Account.\n13.4. The right of PriorLabs and the Customer to extraordinary termination without notice for cause shall remain unaffected.\n14. Changes to this Contract\n14.1. PriorLabs may change this Contract during the Contract Term in compliance with the following procedure, provided that the amendment is reasonable for the Customer, i.e. without significant legal or economic disadvantages, taking into account the interests of the Customer and that there is a valid reason for the amendment. Such a reason exists, in particular, in cases of new technical developments or changes in the regulatory environment.\n14.2. PriorLabs shall inform the Customer of any changes to this Contract at least 30 calendar days before the planned entry into force of the changes. The Customer may object to the changes within 30 calendar days from receipt of the notification. If no objection is made and the Customer continues to use the Services after expiry of the objection period, the changes shall be deemed to have been effectively agreed for all Services to be provided from the end of the objection period. In the notification, PriorLabs will inform the Customer of all relevant changes to the Contract, the objection period and the legal consequences of the expiry of the objection period without exercise of the right of objection. If the Customer objects to the changes, PriorLabs may terminate the Contract pursuant to Section 13.\n15. Final Provisions\n15.1. Should individual provisions of the Contract be or become invalid in whole or in part, this shall not affect the validity of the remaining provisions. Invalid provisions shall be replaced first and foremost by provisions that most closely correspond to the invalid provisions in a legally effective manner. The same applies to any loopholes.\n15.2. The law of the Federal Republic of Germany shall apply with the exception of its provisions on the choice of law which would lead to the application of another legal system. The validity of the CISG (\"UN Sales Convention\") is excluded. \n15.3. For Customers who are merchants (Kaufleute) within the meaning of the German Commercial Code (Handelsgesetzbuch), a special fund (Sonderverm\u00f6gen) under public law or a legal entity under public law, Berlin, Germany, shall be the exclusive place of jurisdiction for all disputes arising from the contractual relationship.\n\nStatus: January 2025\n***\n
"},{"location":"terms/","title":"Terms","text":"GENERAL TERMS AND CONDITIONS\n1. Scope of Application\n1.1. These general terms and conditions (\"GTC\") govern the provision of access to the TabPFN foundation models as available at https://www.priorlabs.ai (\u201cServices\u201d) provided by Prior Labs GmbH, Elisabeth-Emter-Weg 18, 79110 Freiburg im Breisgau (\u201cPriorLabs\").\n1.2. The Services of PriorLabs are directed exclusively at business customers (Unternehmer) within the meaning of Sec. 14 German Civil Code (B\u00fcrgerliches Gesetzbuch, BGB) (\u201cCustomer\u201d). PriorLabs may require the Customer to provide sufficient proof of its status as business customer prior to the conclusion of the contract. \n1.3. Conflicting or additional contractual conditions of the Customer shall only apply if PriorLabs expressly confirms them in writing. \n2. Conclusion of Contract\n2.1. The contract is concluded with the inclusion of these GTC (\u201cContract\u201d) at the earliest of (i) when the Customer registers and sets up an account via the Services (\u201cPriorLabs Account\u201d). \n2.2. Upon conclusion of the Contract, the Customer shall provide PriorLabs with all information that PriorLabs reasonably requires in order to provide the Services correctly and completely. The Customer is obliged to inform PriorLabs immediately of any relevant changes. \n3. Registration and PriorLabs Account\n3.1. In order to fully use the Services, the registration and setting up of a PriorLabs Account is required. By registering or using a PriorLabs Account, [the Customer agrees and represents that they created their PriorLabs Account, and they will use their PriorLabs Account only for themselves. Each Customer shall register only one PriorLabs Account. A PriorLabs Account is not transferable.\n3.2. If and to the extent, PriorLabs stores Customer\u2019s data, PriorLabs disclaims any liability for the storage, accessibility, or integrity of such data.\n3.3. The Customer is obliged (i) to provide complete and correct information about its person or entity at the time of registration and (ii) in case of respective changes to correct without undue delay this information insofar such information is mandatory for the performance of the Contract. \n3.4. If PriorLabs receives a notice or otherwise has reason to believe that the information or documents provided by the Customer are wholly or partially incorrect, incomplete or not up to date, PriorLabs is entitled to request the Customer to remedy the situation immediately. If the Customer fails to correct or complete the information or document within the set deadline, PriorLabs is entitled to restrict access to the Services and block the Customer until the Customer has fully complied with the request.\n3.5. The Customer must keep their log-in information secret and carefully secure access to their PriorLabs Account. The Customer shall take reasonable precautions to prevent unauthorized access to the PriorLabs Account, and to protect the Services from unauthorized use. The Customer is obliged to inform PriorLabs immediately if there are indications that a PriorLabs Account has been misused by a third party. The Customer\u2019s liability for any activity of or interaction with a corrupted account is subject to statutory rules.\n4. Contract Software\n4.1. PriorLabs has developed the TabPFN foundation models that allows the analysis, processing and evaluation of tabular data (\u201cContract Software\u201d).\n4.2. PriorLabs may, to the extent available, provide the Customer with Customer documentation for the Contract Software in digital form (e.g. as a pdf file).\n4.3. PriorLabs provides the Contract Software \"as is\" with the functionality, scope and performance and in a condition suitable for the contractual use.. PriorLabs disclaims any liability of the availability, accuracy, or correctness of the use of the Contract Software and does not warrant the integration in the Customer\u2019s IT systems. \n4.4. The functionality, scope and performance of the Contract Software may change during the Contract Term (as defined below). PriorLabs reserves the right to add, remove, change or substitute elements of the Contract Software as deemed necessary at any time, in particular for the purpose of increasing efficiency, improvements, additional features, and/or safety or due to changes in the legal situation, technical developments or for reasons of IT security, or cease providing the Services altogether. \n5. PriorLabs Intellectual Property\n5.1. PriorLabs remains the sole owner of all right, title, and interest in the Contract Software, including but not limited to any models, algorithms, and neural networks. To the extent PriorLabs provides any Services or access to the Contract Software free of charge, PriorLabs does not waive any rights in such Services or the Contract Software. \n5.2. Except as stated in these GTC, PriorLabs does not grant the Customer any rights to patents, copyrights, trade secrets, trademarks, or any other rights in respect to the Contract Software. \n5.3. By using the Contract Software or using any Services, the Customer does not acquire ownership of any rights in the Contract Software, Services, documentation, and/or any related intellectual property other than stated in these GTC.\n6. API Access \n6.1. PriorLabs allows registered Customers, as and to the extent available from time to time, access to the Contract Software via an application programming interface (\u201cAPI\u201d), non-exclusively, non-transferable and non-sublicensable to use it exclusively as provided on the PriorLabs website or as described in the Customer documentation for the API (\u201cAPI Access\u201d). \n6.2. The Customer\u2019s access to and use of the Services must at all times be in accordance with applicable laws and regulations. The Customer is solely responsible for knowing and complying with the applicable laws and regulations. Permitted conditions of use and scope of use of the Services are further set out in the Acceptable Use Policy available under https://www.priorlabs.ai/aup (\u201cAUP\u201d). The Customer acknowledges that the provisions set out in the AUP shall be deemed material obligations under this Contract.\n7. Customer Content; Licenses\n7.1. The Customer must own or hold valid rights of sufficient scope to any material, documents, data or other content uploaded into the Services and to be processed by the Contract Software (\u201cCustomer Content\u201d). The Customer Content consists exclusively of non-personal data within the meaning of the General Data Protection Regulation (\u201cGDPR\u201d), as set out in the AUP. \n7.2. PriorLabs shall take appropriate physical, technical, and organizational security measures with regard to the Contract Software and any Customer Content. \n7.3. The Customer grants PriorLabs the non-exclusive, worldwide, sublicensable right (i) to use Customer Content for the performance of PriorLabs\u2019 obligations under this Contract and, in particular, to reproduce such data on the server under PriorLabs\u2019 name itself or through a subcontractor for the purpose of providing the Service, and (ii) to use Customer Content as so-called training data in order to develop, test, and improve the Contract Software, in particular the underlying artificial intelligence systems and/or foundation models.\n7.4. The Customer is fully responsible for all Customer Content uploaded to the Services, in particular the Customer ensures that Customer Content is fit for PriorLabs\u2019 use in accordance with this Contract (including any necessary licenses pursuant to Section 7.3) and does not violate any applicable law or other rights of third parties, in particular copyright, trade secrets, or rights under the GDPR.\n8. Service Results\n8.1. The Contract Software may be used to generate certain analyses, content, documents, reports, or other results (\u201cService Results\u201d) based on Customer Content.\n8.2. The Customer may freely use the Service Results. PriorLabs provides the Service Results \"as is\". The Customer is responsible for reviewing any Service Results of its use of the Contract Software. PriorLabs does not warrant the accuracy, correctness, completeness, usability, or fitness for a certain purpose of the Service Results and does not assume any liability for Customer\u2019s use of Service Results. In particular, PriorLabs disclaims all warranty if the Customer modifies, adapts or combines Service Results with third-party material or products.\n8.3. PriorLabs may use the Service Results to develop, test and improve the Contract Software, in particular the underlying artificial intelligence systems and/or foundation models.\n9. Obligations of the Customer\n9.1. The Customer shall create their own backup copies of Customer Data in case of loss of data. PriorLabs provides a corresponding function for creating backup copies.\n9.2. The Customer shall inform PriorLabs without undue delay as soon as they become aware of the infringement of an intellectual property right or copyright in the Contract Software.\n9.3. The Customer shall ensure that all of its employees authorized to use the Contract Software have (i) received sufficient training on the safe use of the Contract Software, (ii) exercise the necessary care when using it, and (iii) are compliant with these GTC including the AUP .\n9.4. The Customer shall subject any end-users of the Contract Software and the Services to obligations reflecting the stipulations of this Contract, in particular the AUP. \n10. Blocking of Accesses\n10.1. PriorLabs is entitled to block access to the Contract Software and the Services temporarily or permanently if there are reliable indications that the Customer or, where applicable, one of its employees is violating or has violated material obligations under this GTC, including the Acceptable Use Policy, and/or applicable intellectual property, data protection of other statutory laws or if PriorLabs has another justified interest in the blocking, such as IT-security concerns. \n10.2. When deciding on a blocking, PriorLabs shall give due consideration to the legitimate interests of the Customer. PriorLabs shall inform the Customer of the blocking within a reasonable timeframe before the blocking comes into effect, provided that the information does not conflict with the purpose of the blocking. The blocking shall continue until the contractual or legal violation has been remedied in an appropriate manner.\n11. Limitation of Liability \n11.1. The Services are provided free of charge. Therefore, PriorLabs\u2019 liability is in any cases limited to acts of intent or gross negligence.\n11.2. The strict liability for damages for defects of the Services already existing at the beginning of the Contract Term (as defined below) in terms of Section 536a German Civil Code is excluded. The Services are provided on an \u201cas is\u201d basis, which, in accordance with Section 4 of these GTC, refers in particular to the marketability, availability, and security aspects of the Contract Software.\n12. Indemnity\nThe Customer shall indemnify PriorLabs from any and all claims of end-users or third parties who assert claims against PriorLabs on account of the use of the Services by the Customer or the Customer\u2019s end-users, in particular concerning any Customer Content used in combination with the Contract Software. The provisions of this Section shall apply mutatis mutandis to any liquidated damages (Vertragsstrafen) as well as to any administrative fines (Bu\u00dfgeld) or penalties imposed by the authorities or by the courts, to the extent that the Customer is responsible for such.\n13. Term; Termination of the Contract\n13.1. If not agreed otherwise, the Contract is concluded for an indefinite period of time until terminated by either Party (\"Contract Term\"). \n13.2. The Customer may terminate the Contract at any time by deleting its PriorLabs Account. \n13.3. PriorLabs reserves the right to terminate the Contract at any time but will consider the Customer\u2019s legitimate interests to the extent possible, e.g., by sending the notice of termination in due time to the email address provided by the Customer upon registration of the PriorLabs Account.\n13.4. The right of PriorLabs and the Customer to extraordinary termination without notice for cause shall remain unaffected.\n14. Changes to this Contract\n14.1. PriorLabs may change this Contract during the Contract Term in compliance with the following procedure, provided that the amendment is reasonable for the Customer, i.e. without significant legal or economic disadvantages, taking into account the interests of the Customer and that there is a valid reason for the amendment. Such a reason exists, in particular, in cases of new technical developments or changes in the regulatory environment.\n14.2. PriorLabs shall inform the Customer of any changes to this Contract at least 30 calendar days before the planned entry into force of the changes. The Customer may object to the changes within 30 calendar days from receipt of the notification. If no objection is made and the Customer continues to use the Services after expiry of the objection period, the changes shall be deemed to have been effectively agreed for all Services to be provided from the end of the objection period. In the notification, PriorLabs will inform the Customer of all relevant changes to the Contract, the objection period and the legal consequences of the expiry of the objection period without exercise of the right of objection. If the Customer objects to the changes, PriorLabs may terminate the Contract pursuant to Section 13.\n15. Final Provisions\n15.1. Should individual provisions of the Contract be or become invalid in whole or in part, this shall not affect the validity of the remaining provisions. Invalid provisions shall be replaced first and foremost by provisions that most closely correspond to the invalid provisions in a legally effective manner. The same applies to any loopholes.\n15.2. The law of the Federal Republic of Germany shall apply with the exception of its provisions on the choice of law which would lead to the application of another legal system. The validity of the CISG (\"UN Sales Convention\") is excluded. \n15.3. For Customers who are merchants (Kaufleute) within the meaning of the German Commercial Code (Handelsgesetzbuch), a special fund (Sonderverm\u00f6gen) under public law or a legal entity under public law, Berlin, Germany, shall be the exclusive place of jurisdiction for all disputes arising from the contractual relationship.\n\nStatus: January 2025\n***\n
"},{"location":"getting_started/api/","title":"TabPFN API Guide","text":""},{"location":"getting_started/api/#authentication","title":"Authentication","text":""},{"location":"getting_started/api/#interactive-login","title":"Interactive Login","text":"The first time you use TabPFN, you'll be guided through an interactive login process:
from tabpfn_client import init\ninit()\n
"},{"location":"getting_started/api/#managing-access-tokens","title":"Managing Access Tokens","text":"You can save your token for use on other machines:
import tabpfn_client\n# Get your token\ntoken = tabpfn_client.get_access_token()\n\n# Use token on another machine\ntabpfn_client.set_access_token(token)\n
"},{"location":"getting_started/api/#rate-limits","title":"Rate Limits","text":"Our API implements a fair usage system that resets daily at 00:00:00 UTC.
"},{"location":"getting_started/api/#usage-cost-calculation","title":"Usage Cost Calculation","text":"The cost for each API request is calculated as:
api_cost = (num_train_rows + num_test_rows) * num_cols * n_estimators\n
Where n_estimators
is by default 4 for classification tasks and 8 for regression tasks.
Track your API usage through response headers:
Header DescriptionX-RateLimit-Limit
Your total allowed usage X-RateLimit-Remaining
Remaining usage X-RateLimit-Reset
Reset timestamp (UTC)"},{"location":"getting_started/api/#current-limitations","title":"Current Limitations","text":"Important Data Guidelines
Maximum total cells per request must be below 100,000:
(num_train_rows + num_test_rows) * num_cols < 100,000\n
For regression with full output turned on (return_full_output=True
), the number of test samples must be below 500.
These limits will be relaxed in future releases.
"},{"location":"getting_started/api/#managing-user-data","title":"Managing User Data","text":"You can access and manage your personal information:
from tabpfn_client import UserDataClient\nprint(UserDataClient.get_data_summary())\n
"},{"location":"getting_started/api/#error-handling","title":"Error Handling","text":"The API uses standard HTTP status codes:
Code Meaning 200 Success 400 Invalid request 429 Rate limit exceededExample response, when limit reached:
{\n \"error\": \"API_LIMIT_REACHED\",\n \"message\": \"Usage limit exceeded\",\n \"next_available_at\": \"2024-01-07 00:00:00\"\n}\n
"},{"location":"getting_started/install/","title":"Installation","text":"You can access our models through our API (https://github.com/automl/tabpfn-client), via our user interface built on top of the API (https://www.ux.priorlabs.ai/) or locally.
Python API Client (No GPU, Online)Python Local (GPU)Web InterfaceRpip install tabpfn-client\n\n# TabPFN Extensions installs optional functionalities around the TabPFN model\n# These include post-hoc ensembles, interpretability tools, and more\ngit clone https://github.com/PriorLabs/tabpfn-extensions\npip install -e tabpfn-extensions\n
# TabPFN Extensions installs optional functionalities around the TabPFN model\n# These include post-hoc ensembles, interpretability tools, and more\npip install tabpfn\n
You can access our models through our Interface here.
Warning
R support is currently under development. You can find a work in progress at TabPFN R. Looking for contributors!
"},{"location":"getting_started/intended_use/","title":"Usage tips","text":"Note
For a simple example getting started with classification see classification tutorial.
We provide two comprehensive demo notebooks that guides through installation and functionalities. One colab tutorial using the cloud and one colab tutorial using the local GPU.
"},{"location":"getting_started/intended_use/#when-to-use-tabpfn","title":"When to use TabPFN","text":"TabPFN excels in handling small to medium-sized datasets with up to 10,000 samples and 500 features. For larger datasets, methods such as CatBoost, XGBoost, or AutoGluon are likely to outperform TabPFN.
"},{"location":"getting_started/intended_use/#intended-use-of-tabpfn","title":"Intended Use of TabPFN","text":"TabPFN is intended as a powerful drop-in replacement for traditional tabular data prediction tools, where top performance and fast training matter. It still requires data scientists to prepare the data using their domain knowledge. Data scientists will see benefits in performing feature engineering, data cleaning, and problem framing to get the most out of TabPFN.
"},{"location":"getting_started/intended_use/#limitations-of-tabpfn","title":"Limitations of TabPFN","text":"TabPFN is computationally efficient and can run inference on consumer hardware for most datasets. Training on a new dataset is recommended to run on a GPU as this speeds it up significantly. TabPFN is not optimized for real-time inference tasks, but V2 can perform much faster predictions than V1 of TabPFN.
"},{"location":"getting_started/intended_use/#data-preparation","title":"Data Preparation","text":"TabPFN can handle raw data with minimal preprocessing. Provide the data in a tabular format, and TabPFN will automatically handle missing values, encode categorical variables, and normalize features. While TabPFN works well out-of-the-box, performance can further be improved using dataset-specific preprocessings.
"},{"location":"getting_started/intended_use/#interpreting-results","title":"Interpreting Results","text":"TabPFN's predictions come with uncertainty estimates, allowing you to assess the reliability of the results. You can use SHAP to interpret TabPFN's predictions and identify the most important features driving the model's decisions.
"},{"location":"getting_started/intended_use/#hyperparameter-tuning","title":"Hyperparameter Tuning","text":"TabPFN provides strong performance out-of-the-box without extensive hyperparameter tuning. If you have additional computational resources, you can automatically tune its hyperparameters using post-hoc ensembling or random tuning.
"},{"location":"reference/tabpfn/base/","title":"Base","text":""},{"location":"reference/tabpfn/base/#tabpfn.base","title":"base","text":"Common logic for TabPFN models.
"},{"location":"reference/tabpfn/base/#tabpfn.base.create_inference_engine","title":"create_inference_engine","text":"create_inference_engine(\n *,\n X_train: ndarray,\n y_train: ndarray,\n model: PerFeatureTransformer,\n ensemble_configs: Any,\n cat_ix: list[int],\n fit_mode: Literal[\n \"low_memory\", \"fit_preprocessors\", \"fit_with_cache\"\n ],\n device_: device,\n rng: Generator,\n n_jobs: int,\n byte_size: int,\n forced_inference_dtype_: dtype | None,\n memory_saving_mode: (\n bool | Literal[\"auto\"] | float | int\n ),\n use_autocast_: bool\n) -> InferenceEngine\n
Creates the appropriate TabPFN inference engine based on fit_mode
.
Each execution mode will perform slightly different operations based on the mode specified by the user. In the case where preprocessors will be fit after prepare
, we will use them to further transform the associated borders with each ensemble config member.
Parameters:
Name Type Description DefaultX_train
ndarray
Training features
requiredy_train
ndarray
Training target
requiredmodel
PerFeatureTransformer
The loaded TabPFN model.
requiredensemble_configs
Any
The ensemble configurations to create multiple \"prompts\".
requiredcat_ix
list[int]
Indices of inferred categorical features.
requiredfit_mode
Literal['low_memory', 'fit_preprocessors', 'fit_with_cache']
Determines how we prepare inference (pre-cache or not).
requireddevice_
device
The device for inference.
requiredrng
Generator
Numpy random generator.
requiredn_jobs
int
Number of parallel CPU workers.
requiredbyte_size
int
Byte size for the chosen inference precision.
requiredforced_inference_dtype_
dtype | None
If not None, the forced dtype for inference.
requiredmemory_saving_mode
bool | Literal['auto'] | float | int
GPU/CPU memory saving settings.
requireduse_autocast_
bool
Whether we use torch.autocast for inference.
required"},{"location":"reference/tabpfn/base/#tabpfn.base.determine_precision","title":"determine_precision","text":"determine_precision(\n inference_precision: (\n dtype | Literal[\"autocast\", \"auto\"]\n ),\n device_: device,\n) -> tuple[bool, dtype | None, int]\n
Decide whether to use autocast or a forced precision dtype.
Parameters:
Name Type Description Defaultinference_precision
dtype | Literal['autocast', 'auto']
\"auto\"
, decide automatically based on the device.\"autocast\"
, explicitly use PyTorch autocast (mixed precision).torch.dtype
, force that precision.device_
device
The device on which inference is run.
requiredReturns:
Name Type Descriptionuse_autocast_
bool
True if mixed-precision autocast will be used.
forced_inference_dtype_
dtype | None
If not None, the forced precision dtype for the model.
byte_size
int
The byte size per element for the chosen precision.
"},{"location":"reference/tabpfn/base/#tabpfn.base.initialize_tabpfn_model","title":"initialize_tabpfn_model","text":"initialize_tabpfn_model(\n model_path: str | Path | Literal[\"auto\"],\n which: Literal[\"classifier\", \"regressor\"],\n fit_mode: Literal[\n \"low_memory\", \"fit_preprocessors\", \"fit_with_cache\"\n ],\n static_seed: int,\n) -> tuple[\n PerFeatureTransformer,\n InferenceConfig,\n FullSupportBarDistribution | None,\n]\n
Common logic to load the TabPFN model, set up the random state, and optionally download the model.
Parameters:
Name Type Description Defaultmodel_path
str | Path | Literal['auto']
Path or directive (\"auto\") to load the pre-trained model from.
requiredwhich
Literal['classifier', 'regressor']
Which TabPFN model to load.
requiredfit_mode
Literal['low_memory', 'fit_preprocessors', 'fit_with_cache']
Determines caching behavior.
requiredstatic_seed
int
Random seed for reproducibility logic.
requiredReturns:
Name Type Descriptionmodel
PerFeatureTransformer
The loaded TabPFN model.
config
InferenceConfig
The configuration object associated with the loaded model.
bar_distribution
FullSupportBarDistribution | None
The BarDistribution for regression (None
if classifier).
TabPFNClassifier class.
Example
import sklearn.datasets\nfrom tabpfn import TabPFNClassifier\n\nmodel = TabPFNClassifier()\n\nX, y = sklearn.datasets.load_iris(return_X_y=True)\n\nmodel.fit(X, y)\npredictions = model.predict(X)\n
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier","title":"TabPFNClassifier","text":" Bases: ClassifierMixin
, BaseEstimator
TabPFNClassifier class.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.class_counts_","title":"class_counts_instance-attribute
","text":"class_counts_: NDArray[Any]\n
The number of classes per class found in the target data during fit()
.
instance-attribute
","text":"classes_: NDArray[Any]\n
The unique classes found in the target data during fit()
.
instance-attribute
","text":"config_: InferenceConfig\n
The configuration of the loaded model to be used for inference.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.device_","title":"device_instance-attribute
","text":"device_: device\n
The device determined to be used.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.executor_","title":"executor_instance-attribute
","text":"executor_: InferenceEngine\n
The inference engine used to make predictions.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.feature_names_in_","title":"feature_names_in_instance-attribute
","text":"feature_names_in_: NDArray[Any]\n
The feature names of the input data.
May not be set if the input data does not have feature names, such as with a numpy array.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.forced_inference_dtype_","title":"forced_inference_dtype_instance-attribute
","text":"forced_inference_dtype_: _dtype | None\n
The forced inference dtype for the model based on inference_precision
.
instance-attribute
","text":"inferred_categorical_indices_: list[int]\n
The indices of the columns that were inferred to be categorical, as a product of any features deemed categorical by the user and what would work best for the model.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.interface_config_","title":"interface_config_instance-attribute
","text":"interface_config_: ModelInterfaceConfig\n
Additional configuration of the interface for expert users.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.label_encoder_","title":"label_encoder_instance-attribute
","text":"label_encoder_: LabelEncoder\n
The label encoder used to encode the target variable.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.n_classes_","title":"n_classes_instance-attribute
","text":"n_classes_: int\n
The number of classes found in the target data during fit()
.
instance-attribute
","text":"n_features_in_: int\n
The number of features in the input data used during fit()
.
instance-attribute
","text":"n_outputs_: Literal[1]\n
The number of outputs the model has. Only 1 for now
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.preprocessor_","title":"preprocessor_instance-attribute
","text":"preprocessor_: ColumnTransformer\n
The column transformer used to preprocess the input data to be numeric.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.use_autocast_","title":"use_autocast_instance-attribute
","text":"use_autocast_: bool\n
Whether torch's autocast should be used.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.fit","title":"fit","text":"fit(X: XType, y: YType) -> Self\n
Fit the model.
Parameters:
Name Type Description DefaultX
XType
The input data.
requiredy
YType
The target variable.
required"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.predict","title":"predict","text":"predict(X: XType) -> ndarray\n
Predict the class labels for the provided input samples.
Parameters:
Name Type Description DefaultX
XType
The input samples.
requiredReturns:
Type Descriptionndarray
The predicted class labels.
"},{"location":"reference/tabpfn/classifier/#tabpfn.classifier.TabPFNClassifier.predict_proba","title":"predict_proba","text":"predict_proba(X: XType) -> ndarray\n
Predict the probabilities of the classes for the provided input samples.
Parameters:
Name Type Description DefaultX
XType
The input data.
requiredReturns:
Type Descriptionndarray
The predicted probabilities of the classes.
"},{"location":"reference/tabpfn/constants/","title":"Constants","text":""},{"location":"reference/tabpfn/constants/#tabpfn.constants","title":"constants","text":"Various constants used throughout the library.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig","title":"ModelInterfaceConfigdataclass
","text":"Constants used as default HPs in the model interfaces.
These constants are not exposed to the models' init on purpose to reduce the complexity for users. Furthermore, most of these should not be optimized over by the (standard) user.
Several of the preprocessing options are supported by our code for efficiency reasons (to avoid loading TabPFN multiple times). However, these can also be applied outside of the model interface.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.CLASS_SHIFT_METHOD","title":"CLASS_SHIFT_METHODclass-attribute
instance-attribute
","text":"CLASS_SHIFT_METHOD: Literal[\"rotate\", \"shuffle\"] | None = (\n \"shuffle\"\n)\n
The method used to shift classes during preprocessing for ensembling to emulate the effect of invariance to class order. Without ensembling, TabPFN is not invariant to class order due to using a transformer. Shifting classes can have a positive effect on the model's performance. The options are: - If \"shuffle\", the classes are shuffled. - If \"rotate\", the classes are rotated (think of a ring). - If None, no class shifting is done.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.FEATURE_SHIFT_METHOD","title":"FEATURE_SHIFT_METHODclass-attribute
instance-attribute
","text":"FEATURE_SHIFT_METHOD: (\n Literal[\"shuffle\", \"rotate\"] | None\n) = \"shuffle\"\n
The method used to shift features during preprocessing for ensembling to emulate the effect of invariance to feature position. Without ensembling, TabPFN is not invariant to feature position due to using a transformer. Moreover, shifting features can have a positive effect on the model's performance. The options are: - If \"shuffle\", the features are shuffled. - If \"rotate\", the features are rotated (think of a ring). - If None, no feature shifting is done.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.FINGERPRINT_FEATURE","title":"FINGERPRINT_FEATUREclass-attribute
instance-attribute
","text":"FINGERPRINT_FEATURE: bool = True\n
Whether to add a fingerprint feature to the data. The added feature is a hash of the row, counting up for duplicates. This helps TabPFN to distinguish between duplicated data points in the input data. Otherwise, duplicates would be less obvious during attention. This is expected to improve prediction performance and help with stability if the data has many sample duplicates.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.FIX_NAN_BORDERS_AFTER_TARGET_TRANSFORM","title":"FIX_NAN_BORDERS_AFTER_TARGET_TRANSFORMclass-attribute
instance-attribute
","text":"FIX_NAN_BORDERS_AFTER_TARGET_TRANSFORM: bool = True\n
Whether to repair any borders of the bar distribution in regression that are NaN after the transformation. This can happen due to multiple reasons and should in general always be done.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MAX_NUMBER_OF_CLASSES","title":"MAX_NUMBER_OF_CLASSESclass-attribute
instance-attribute
","text":"MAX_NUMBER_OF_CLASSES: int = 10\n
The number of classes seen during pretraining for classification. If the number of classes is larger than this number, TabPFN requires an additional step to predict for more than classes.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MAX_NUMBER_OF_FEATURES","title":"MAX_NUMBER_OF_FEATURESclass-attribute
instance-attribute
","text":"MAX_NUMBER_OF_FEATURES: int = 500\n
The number of features that the pretraining was intended for. If the number of features is larger than this number, you may see degraded performance. Note, this is not the number of features seen by the model during pretraining but also accounts for expected generalization (i.e., length extrapolation).
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MAX_NUMBER_OF_SAMPLES","title":"MAX_NUMBER_OF_SAMPLESclass-attribute
instance-attribute
","text":"MAX_NUMBER_OF_SAMPLES: int = 10000\n
The number of samples that the pretraining was intended for. If the number of samples is larger than this number, you may see degraded performance. Note, this is not the number of samples seen by the model during pretraining but also accounts for expected generalization (i.e., length extrapolation).
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MAX_UNIQUE_FOR_CATEGORICAL_FEATURES","title":"MAX_UNIQUE_FOR_CATEGORICAL_FEATURESclass-attribute
instance-attribute
","text":"MAX_UNIQUE_FOR_CATEGORICAL_FEATURES: int = 30\n
The maximum number of unique values for a feature to be considered categorical. Otherwise, it is considered numerical.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MIN_NUMBER_SAMPLES_FOR_CATEGORICAL_INFERENCE","title":"MIN_NUMBER_SAMPLES_FOR_CATEGORICAL_INFERENCEclass-attribute
instance-attribute
","text":"MIN_NUMBER_SAMPLES_FOR_CATEGORICAL_INFERENCE: int = 100\n
The minimum number of samples in the data to run our infer which features might be categorical.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.MIN_UNIQUE_FOR_NUMERICAL_FEATURES","title":"MIN_UNIQUE_FOR_NUMERICAL_FEATURESclass-attribute
instance-attribute
","text":"MIN_UNIQUE_FOR_NUMERICAL_FEATURES: int = 4\n
The minimum number of unique values for a feature to be considered numerical. Otherwise, it is considered categorical.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.OUTLIER_REMOVAL_STD","title":"OUTLIER_REMOVAL_STDclass-attribute
instance-attribute
","text":"OUTLIER_REMOVAL_STD: float | None | Literal[\"auto\"] = \"auto\"\n
The number of standard deviations from the mean to consider a sample an outlier. - If None, no outliers are removed. - If float, the number of standard deviations from the mean to consider a sample an outlier. - If \"auto\", the OUTLIER_REMOVAL_STD is automatically determined. -> 12.0 for classification and None for regression.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.POLYNOMIAL_FEATURES","title":"POLYNOMIAL_FEATURESclass-attribute
instance-attribute
","text":"POLYNOMIAL_FEATURES: Literal['no', 'all'] | int = 'no'\n
The number of 2 factor polynomial features to generate and add to the original data before passing the data to TabPFN. The polynomial features are generated by multiplying the original features together, e.g., this might add a feature x1*x2
to the features, if x1
and x2
are features. In total, this can add up O(n^2) many features. Adding polynomial features can improve predictive performance by exploiting simple feature engineering. - If \"no\", no polynomial features are added. - If \"all\", all possible polynomial features are added. - If an int, determines the maximal number of polynomial features to add to the original data.
class-attribute
instance-attribute
","text":"PREPROCESS_TRANSFORMS: list[PreprocessorConfig] | None = (\n None\n)\n
The preprocessing applied to the data before passing it to TabPFN. See PreprocessorConfig
for options and more details. If a list of PreprocessorConfig
is provided, the preprocessors are (repeatedly) applied across different estimators.
By default, for classification, two preprocessors are applied: 1. Uses the original input data, all features transformed with a quantile scaler, and the first n-many components of SVD transformer (whereby n is a fract of on the number of features or samples). Categorical features are ordinal encoded but all categories with less than 10 features are ignored. 2. Uses the original input data, with categorical features as ordinal encoded.
By default, for regression, two preprocessor are applied: 1. The same as for classification, with a minimal different quantile scaler. 2. The original input data power transformed and categories onehot encoded.
"},{"location":"reference/tabpfn/constants/#tabpfn.constants.ModelInterfaceConfig.REGRESSION_Y_PREPROCESS_TRANSFORMS","title":"REGRESSION_Y_PREPROCESS_TRANSFORMSclass-attribute
instance-attribute
","text":"REGRESSION_Y_PREPROCESS_TRANSFORMS: tuple[\n Literal[\"safepower\", \"power\", \"quantile_norm\", None],\n ...,\n] = (None, \"safepower\")\n
The preprocessing applied to the target variable before passing it to TabPFN for regression. This can be understood as scaling the target variable to better predict it. The preprocessors should be passed as a tuple/list and are then (repeatedly) used by the estimators in the ensembles.
By default, we use no preprocessing and a power transformation (if we have more than one estimator).
The options areclass-attribute
instance-attribute
","text":"SUBSAMPLE_SAMPLES: int | float | None = None\n
Subsample the input data sample/row-wise before performing any preprocessing and the TabPFN forward pass. - If None, no subsampling is done. - If an int, the number of samples to subsample (or oversample if SUBSAMPLE_SAMPLES
is larger than the number of samples). - If a float, the percentage of samples to subsample.
class-attribute
instance-attribute
","text":"USE_SKLEARN_16_DECIMAL_PRECISION: bool = False\n
Whether to round the probabilities to float 16 to match the precision of scikit-learn. This can help with reproducibility and compatibility with scikit-learn but is not recommended for general use. This is not exposed to the user or as a hyperparameter. To improve reproducibility,set ._sklearn_16_decimal_precision = True
before calling .predict()
or .predict_proba()
.
staticmethod
","text":"from_user_input(\n *, inference_config: dict | ModelInterfaceConfig | None\n) -> ModelInterfaceConfig\n
Converts the user input to a ModelInterfaceConfig
object.
The input inference_config can be a dictionary, a ModelInterfaceConfig
object, or None. If a dictionary is passed, the keys must match the attributes of ModelInterfaceConfig
. If a ModelInterfaceConfig
object is passed, it is returned as is. If None is passed, a new ModelInterfaceConfig
object is created with default values.
Module that defines different ways to run inference with TabPFN.
"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngine","title":"InferenceEnginedataclass
","text":" Bases: ABC
These define how tabpfn inference can be run.
As there are many things that can be cached, with multiple ways to parallelize, Executor
defines three primary things:
Most will define a method prepare()
which is specific to that inference engine. These do not share a common interface.
What to cache:
As we can prepare a lot of the transformers context, there is a tradeoff in terms of how much memory to be spent in caching. This memory is used when prepare()
is called, usually in fit()
.
Using the cached data for inference:
Based on what has been prepared for the transformer context, iter_outputs()
will use this cached information to make predictions.
Controlling parallelism:
As we have trivially parallel parts for inference, we can parallelize them. However as the GPU is typically a bottle-neck in most systems, we can define, where and how we would like to parallelize the inference.
abstractmethod
","text":"iter_outputs(\n X: ndarray, *, device: device, autocast: bool\n) -> Iterator[tuple[Tensor, EnsembleConfig]]\n
Iterate over the outputs of the model.
One for each ensemble configuration that was used to initialize the executor.
Parameters:
Name Type Description DefaultX
ndarray
The input data to make predictions on.
requireddevice
device
The device to run the model on.
requiredautocast
bool
Whether to use torch.autocast during inference.
required"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngineCacheKV","title":"InferenceEngineCacheKVdataclass
","text":" Bases: InferenceEngine
Inference engine that caches the actual KV cache calculated from the context of the processed training data.
This is by far the most memory intensive inference engine, as for each ensemble member we store the full KV cache of that model. For now this is held in CPU RAM (TODO(eddiebergman): verify)
"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngineCacheKV.prepare","title":"prepareclassmethod
","text":"prepare(\n X_train: ndarray,\n y_train: ndarray,\n *,\n cat_ix: list[int],\n ensemble_configs: Sequence[EnsembleConfig],\n n_workers: int,\n model: PerFeatureTransformer,\n device: device,\n rng: Generator,\n dtype_byte_size: int,\n force_inference_dtype: dtype | None,\n save_peak_mem: bool | Literal[\"auto\"] | float | int,\n autocast: bool\n) -> InferenceEngineCacheKV\n
Prepare the inference engine.
Parameters:
Name Type Description DefaultX_train
ndarray
The training data.
requiredy_train
ndarray
The training target.
requiredcat_ix
list[int]
The categorical indices.
requiredensemble_configs
Sequence[EnsembleConfig]
The ensemble configurations to use.
requiredn_workers
int
The number of workers to use.
requiredmodel
PerFeatureTransformer
The model to use.
requireddevice
device
The device to run the model on.
requiredrng
Generator
The random number generator.
requireddtype_byte_size
int
Size of the dtype in bytes.
requiredforce_inference_dtype
dtype | None
The dtype to force inference to.
requiredsave_peak_mem
bool | Literal['auto'] | float | int
Whether to save peak memory usage.
requiredautocast
bool
Whether to use torch.autocast during inference.
required"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngineCachePreprocessing","title":"InferenceEngineCachePreprocessingdataclass
","text":" Bases: InferenceEngine
Inference engine that caches the preprocessing for feeding as model context on predict.
This will fit the preprocessors on the training data, as well as cache the transformed training data on RAM (not GPU RAM).
This saves some time on each predict call, at the cost of increasing the amount of memory in RAM. The main functionality performed at predict()
time is to forward pass through the model which is currently done sequentially.
classmethod
","text":"prepare(\n X_train: ndarray,\n y_train: ndarray,\n *,\n cat_ix: list[int],\n model: PerFeatureTransformer,\n ensemble_configs: Sequence[EnsembleConfig],\n n_workers: int,\n rng: Generator,\n dtype_byte_size: int,\n force_inference_dtype: dtype | None,\n save_peak_mem: bool | Literal[\"auto\"] | float | int\n) -> InferenceEngineCachePreprocessing\n
Prepare the inference engine.
Parameters:
Name Type Description DefaultX_train
ndarray
The training data.
requiredy_train
ndarray
The training target.
requiredcat_ix
list[int]
The categorical indices.
requiredmodel
PerFeatureTransformer
The model to use.
requiredensemble_configs
Sequence[EnsembleConfig]
The ensemble configurations to use.
requiredn_workers
int
The number of workers to use.
requiredrng
Generator
The random number generator.
requireddtype_byte_size
int
The byte size of the dtype.
requiredforce_inference_dtype
dtype | None
The dtype to force inference to.
requiredsave_peak_mem
bool | Literal['auto'] | float | int
Whether to save peak memory usage.
requiredReturns:
Type DescriptionInferenceEngineCachePreprocessing
The prepared inference engine.
"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngineOnDemand","title":"InferenceEngineOnDemanddataclass
","text":" Bases: InferenceEngine
Inference engine that does not cache anything, computes everything as needed.
This is one of the slowest ways to run inference, as computation that could be cached is recomputed on every call. However the memory demand is lowest and can be more trivially parallelized across GPUs with some work.
"},{"location":"reference/tabpfn/inference/#tabpfn.inference.InferenceEngineOnDemand.prepare","title":"prepareclassmethod
","text":"prepare(\n X_train: ndarray,\n y_train: ndarray,\n *,\n cat_ix: list[int],\n model: PerFeatureTransformer,\n ensemble_configs: Sequence[EnsembleConfig],\n rng: Generator,\n n_workers: int,\n dtype_byte_size: int,\n force_inference_dtype: dtype | None,\n save_peak_mem: bool | Literal[\"auto\"] | float | int\n) -> InferenceEngineOnDemand\n
Prepare the inference engine.
Parameters:
Name Type Description DefaultX_train
ndarray
The training data.
requiredy_train
ndarray
The training target.
requiredcat_ix
list[int]
The categorical indices.
requiredmodel
PerFeatureTransformer
The model to use.
requiredensemble_configs
Sequence[EnsembleConfig]
The ensemble configurations to use.
requiredrng
Generator
The random number generator.
requiredn_workers
int
The number of workers to use.
requireddtype_byte_size
int
The byte size of the dtype.
requiredforce_inference_dtype
dtype | None
The dtype to force inference to.
requiredsave_peak_mem
bool | Literal['auto'] | float | int
Whether to save peak memory usage.
required"},{"location":"reference/tabpfn/preprocessing/","title":"Preprocessing","text":""},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing","title":"preprocessing","text":"Defines the preprocessing configurations that define the ensembling of different members.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.ClassifierEnsembleConfig","title":"ClassifierEnsembleConfigdataclass
","text":" Bases: EnsembleConfig
Configuration for a classifier ensemble member.
See EnsembleConfig for more details.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.ClassifierEnsembleConfig.generate_for_classification","title":"generate_for_classificationclassmethod
","text":"generate_for_classification(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n class_shift_method: Literal[\"rotate\", \"shuffle\"] | None,\n n_classes: int,\n random_state: int | Generator | None\n) -> list[ClassifierEnsembleConfig]\n
Generate ensemble configurations for classification.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredclass_shift_method
Literal['rotate', 'shuffle'] | None
How to shift classes for classpermutation.
requiredn_classes
int
Number of classes.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[ClassifierEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.ClassifierEnsembleConfig.generate_for_regression","title":"generate_for_regressionclassmethod
","text":"generate_for_regression(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n target_transforms: Sequence[\n TransformerMixin | Pipeline | None\n ],\n random_state: int | Generator | None\n) -> list[RegressorEnsembleConfig]\n
Generate ensemble configurations for regression.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredtarget_transforms
Sequence[TransformerMixin | Pipeline | None]
Target transformations to apply.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[RegressorEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.ClassifierEnsembleConfig.to_pipeline","title":"to_pipeline","text":"to_pipeline(\n *, random_state: int | Generator | None\n) -> SequentialFeatureTransformer\n
Convert the ensemble configuration to a preprocessing pipeline.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.EnsembleConfig","title":"EnsembleConfigdataclass
","text":"Configuration for an ensemble member.
Attributes:
Name Type Descriptionfeature_shift_count
int
How much to shift the features columns.
class_permutation
int
Permutation to apply to classes
preprocess_config
PreprocessorConfig
Preprocessor configuration to use.
subsample_ix
NDArray[int64] | None
Indices of samples to use for this ensemble member. If None
, no subsampling is done.
classmethod
","text":"generate_for_classification(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n class_shift_method: Literal[\"rotate\", \"shuffle\"] | None,\n n_classes: int,\n random_state: int | Generator | None\n) -> list[ClassifierEnsembleConfig]\n
Generate ensemble configurations for classification.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredclass_shift_method
Literal['rotate', 'shuffle'] | None
How to shift classes for classpermutation.
requiredn_classes
int
Number of classes.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[ClassifierEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.EnsembleConfig.generate_for_regression","title":"generate_for_regressionclassmethod
","text":"generate_for_regression(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n target_transforms: Sequence[\n TransformerMixin | Pipeline | None\n ],\n random_state: int | Generator | None\n) -> list[RegressorEnsembleConfig]\n
Generate ensemble configurations for regression.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredtarget_transforms
Sequence[TransformerMixin | Pipeline | None]
Target transformations to apply.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[RegressorEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.EnsembleConfig.to_pipeline","title":"to_pipeline","text":"to_pipeline(\n *, random_state: int | Generator | None\n) -> SequentialFeatureTransformer\n
Convert the ensemble configuration to a preprocessing pipeline.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.PreprocessorConfig","title":"PreprocessorConfigdataclass
","text":"Configuration for data preprocessors.
Attributes:
Name Type Descriptionname
Literal['per_feature', 'power', 'safepower', 'power_box', 'safepower_box', 'quantile_uni_coarse', 'quantile_norm_coarse', 'quantile_uni', 'quantile_norm', 'quantile_uni_fine', 'quantile_norm_fine', 'robust', 'kdi', 'none', 'kdi_random_alpha', 'kdi_uni', 'kdi_random_alpha_uni', 'adaptive', 'norm_and_kdi', 'kdi_alpha_0.3_uni', 'kdi_alpha_0.5_uni', 'kdi_alpha_0.8_uni', 'kdi_alpha_1.0_uni', 'kdi_alpha_1.2_uni', 'kdi_alpha_1.5_uni', 'kdi_alpha_2.0_uni', 'kdi_alpha_3.0_uni', 'kdi_alpha_5.0_uni', 'kdi_alpha_0.3', 'kdi_alpha_0.5', 'kdi_alpha_0.8', 'kdi_alpha_1.0', 'kdi_alpha_1.2', 'kdi_alpha_1.5', 'kdi_alpha_2.0', 'kdi_alpha_3.0', 'kdi_alpha_5.0']
Name of the preprocessor.
categorical_name
Literal['none', 'numeric', 'onehot', 'ordinal', 'ordinal_shuffled', 'ordinal_very_common_categories_shuffled']
Name of the categorical encoding method. Options: \"none\", \"numeric\", \"onehot\", \"ordinal\", \"ordinal_shuffled\", \"none\".
append_original
bool
Whether to append original features to the transformed features
subsample_features
float
Fraction of features to subsample. -1 means no subsampling.
global_transformer_name
str | None
Name of the global transformer to use.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.RegressorEnsembleConfig","title":"RegressorEnsembleConfigdataclass
","text":" Bases: EnsembleConfig
Configuration for a regression ensemble member.
See EnsembleConfig for more details.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.RegressorEnsembleConfig.generate_for_classification","title":"generate_for_classificationclassmethod
","text":"generate_for_classification(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n class_shift_method: Literal[\"rotate\", \"shuffle\"] | None,\n n_classes: int,\n random_state: int | Generator | None\n) -> list[ClassifierEnsembleConfig]\n
Generate ensemble configurations for classification.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredclass_shift_method
Literal['rotate', 'shuffle'] | None
How to shift classes for classpermutation.
requiredn_classes
int
Number of classes.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[ClassifierEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.RegressorEnsembleConfig.generate_for_regression","title":"generate_for_regressionclassmethod
","text":"generate_for_regression(\n *,\n n: int,\n subsample_size: int | float | None,\n max_index: int,\n add_fingerprint_feature: bool,\n polynomial_features: Literal[\"no\", \"all\"] | int,\n feature_shift_decoder: (\n Literal[\"shuffle\", \"rotate\"] | None\n ),\n preprocessor_configs: Sequence[PreprocessorConfig],\n target_transforms: Sequence[\n TransformerMixin | Pipeline | None\n ],\n random_state: int | Generator | None\n) -> list[RegressorEnsembleConfig]\n
Generate ensemble configurations for regression.
Parameters:
Name Type Description Defaultn
int
Number of ensemble configurations to generate.
requiredsubsample_size
int | float | None
Number of samples to subsample. If int, subsample that many samples. If float, subsample that fraction of samples. If None
, no subsampling is done.
max_index
int
Maximum index to generate for.
requiredadd_fingerprint_feature
bool
Whether to add fingerprint features.
requiredpolynomial_features
Literal['no', 'all'] | int
Maximum number of polynomial features to add, if any.
requiredfeature_shift_decoder
Literal['shuffle', 'rotate'] | None
How shift features
requiredpreprocessor_configs
Sequence[PreprocessorConfig]
Preprocessor configurations to use on the data.
requiredtarget_transforms
Sequence[TransformerMixin | Pipeline | None]
Target transformations to apply.
requiredrandom_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[RegressorEnsembleConfig]
List of ensemble configurations.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.RegressorEnsembleConfig.to_pipeline","title":"to_pipeline","text":"to_pipeline(\n *, random_state: int | Generator | None\n) -> SequentialFeatureTransformer\n
Convert the ensemble configuration to a preprocessing pipeline.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.balance","title":"balance","text":"balance(x: Iterable[T], n: int) -> list[T]\n
Take a list of elements and make a new list where each appears n
times.
default_classifier_preprocessor_configs() -> (\n list[PreprocessorConfig]\n)\n
Default preprocessor configurations for classification.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.default_regressor_preprocessor_configs","title":"default_regressor_preprocessor_configs","text":"default_regressor_preprocessor_configs() -> (\n list[PreprocessorConfig]\n)\n
Default preprocessor configurations for regression.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.fit_preprocessing","title":"fit_preprocessing","text":"fit_preprocessing(\n configs: Sequence[EnsembleConfig],\n X_train: ndarray,\n y_train: ndarray,\n *,\n random_state: int | Generator | None,\n cat_ix: list[int],\n n_workers: int,\n parallel_mode: Literal[\"block\", \"as-ready\", \"in-order\"]\n) -> Iterator[\n tuple[\n EnsembleConfig,\n SequentialFeatureTransformer,\n ndarray,\n ndarray,\n list[int],\n ]\n]\n
Fit preprocessing pipelines in parallel.
Parameters:
Name Type Description Defaultconfigs
Sequence[EnsembleConfig]
List of ensemble configurations.
requiredX_train
ndarray
Training data.
requiredy_train
ndarray
Training target.
requiredrandom_state
int | Generator | None
Random number generator.
requiredcat_ix
list[int]
Indices of categorical features.
requiredn_workers
int
Number of workers to use.
requiredparallel_mode
Literal['block', 'as-ready', 'in-order']
Parallel mode to use.
\"block\"
: Blocks until all workers are done. Returns in order.\"as-ready\"
: Returns results as they are ready. Any order.\"in-order\"
: Returns results in order, blocking only in the order that needs to be returned in.Returns:
Type DescriptionEnsembleConfig
Iterator of tuples containing the ensemble configuration, the fitted
SequentialFeatureTransformer
preprocessing pipeline, the transformed training data, the transformed target,
ndarray
and the indices of categorical features.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.fit_preprocessing_one","title":"fit_preprocessing_one","text":"fit_preprocessing_one(\n config: EnsembleConfig,\n X_train: ndarray,\n y_train: ndarray,\n random_state: int | Generator | None = None,\n *,\n cat_ix: list[int]\n) -> tuple[\n EnsembleConfig,\n SequentialFeatureTransformer,\n ndarray,\n ndarray,\n list[int],\n]\n
Fit preprocessing pipeline for a single ensemble configuration.
Parameters:
Name Type Description Defaultconfig
EnsembleConfig
Ensemble configuration.
requiredX_train
ndarray
Training data.
requiredy_train
ndarray
Training target.
requiredrandom_state
int | Generator | None
Random seed.
None
cat_ix
list[int]
Indices of categorical features.
requiredReturns:
Type DescriptionEnsembleConfig
Tuple containing the ensemble configuration, the fitted preprocessing pipeline,
SequentialFeatureTransformer
the transformed training data, the transformed target, and the indices of
ndarray
categorical features.
"},{"location":"reference/tabpfn/preprocessing/#tabpfn.preprocessing.generate_index_permutations","title":"generate_index_permutations","text":"generate_index_permutations(\n n: int,\n *,\n max_index: int,\n subsample: int | float,\n random_state: int | Generator | None\n) -> list[NDArray[int64]]\n
Generate indices for subsampling from the data.
Parameters:
Name Type Description Defaultn
int
Number of indices to generate.
requiredmax_index
int
Maximum index to generate.
requiredsubsample
int | float
Number of indices to subsample. If int
, subsample that many indices. If float, subsample that fraction of indices. random_state: Random number generator.
random_state
int | Generator | None
Random number generator.
requiredReturns:
Type Descriptionlist[NDArray[int64]]
List of indices to subsample.
"},{"location":"reference/tabpfn/regressor/","title":"Regressor","text":""},{"location":"reference/tabpfn/regressor/#tabpfn.regressor","title":"regressor","text":"TabPFNRegressor class.
Example
import sklearn.datasets\nfrom tabpfn import TabPFNRegressor\n\nmodel = TabPFNRegressor()\nX, y = sklearn.datasets.make_regression(n_samples=50, n_features=10)\n\nmodel.fit(X, y)\npredictions = model.predict(X)\n
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor","title":"TabPFNRegressor","text":" Bases: RegressorMixin
, BaseEstimator
TabPFNRegressor class.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.bardist_","title":"bardist_instance-attribute
","text":"bardist_: FullSupportBarDistribution\n
The bar distribution of the target variable, used by the model.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.config_","title":"config_instance-attribute
","text":"config_: InferenceConfig\n
The configuration of the loaded model to be used for inference.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.device_","title":"device_instance-attribute
","text":"device_: device\n
The device determined to be used.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.executor_","title":"executor_instance-attribute
","text":"executor_: InferenceEngine\n
The inference engine used to make predictions.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.feature_names_in_","title":"feature_names_in_instance-attribute
","text":"feature_names_in_: NDArray[Any]\n
The feature names of the input data.
May not be set if the input data does not have feature names, such as with a numpy array.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.forced_inference_dtype_","title":"forced_inference_dtype_instance-attribute
","text":"forced_inference_dtype_: _dtype | None\n
The forced inference dtype for the model based on inference_precision
.
instance-attribute
","text":"inferred_categorical_indices_: list[int]\n
The indices of the columns that were inferred to be categorical, as a product of any features deemed categorical by the user and what would work best for the model.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.interface_config_","title":"interface_config_instance-attribute
","text":"interface_config_: ModelInterfaceConfig\n
Additional configuration of the interface for expert users.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.n_features_in_","title":"n_features_in_instance-attribute
","text":"n_features_in_: int\n
The number of features in the input data used during fit()
.
instance-attribute
","text":"n_outputs_: Literal[1]\n
The number of outputs the model supports. Only 1 for now
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.preprocessor_","title":"preprocessor_instance-attribute
","text":"preprocessor_: ColumnTransformer\n
The column transformer used to preprocess the input data to be numeric.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.renormalized_criterion_","title":"renormalized_criterion_instance-attribute
","text":"renormalized_criterion_: FullSupportBarDistribution\n
The normalized bar distribution used for computing the predictions.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.use_autocast_","title":"use_autocast_instance-attribute
","text":"use_autocast_: bool\n
Whether torch's autocast should be used.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.y_train_mean_","title":"y_train_mean_instance-attribute
","text":"y_train_mean_: float\n
The mean of the target variable during training.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.y_train_std","title":"y_train_stdinstance-attribute
","text":"y_train_std: float\n
The standard deviation of the target variable during training.
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.fit","title":"fit","text":"fit(X: XType, y: YType) -> Self\n
Fit the model.
Parameters:
Name Type Description DefaultX
XType
The input data.
requiredy
YType
The target variable.
requiredReturns:
Type DescriptionSelf
self
"},{"location":"reference/tabpfn/regressor/#tabpfn.regressor.TabPFNRegressor.predict","title":"predict","text":"predict(\n X: XType,\n *,\n output_type: Literal[\n \"mean\",\n \"median\",\n \"mode\",\n \"quantiles\",\n \"full\",\n \"main\",\n ] = \"mean\",\n quantiles: list[float] | None = None\n) -> (\n ndarray\n | list[ndarray]\n | dict[str, ndarray]\n | dict[str, ndarray | FullSupportBarDistribution]\n)\n
Predict the target variable.
Parameters:
Name Type Description DefaultX
XType
The input data.
requiredoutput_type
Literal['mean', 'median', 'mode', 'quantiles', 'full', 'main']
Determines the type of output to return.
\"mean\"
, we return the mean over the predicted distribution.\"median\"
, we return the median over the predicted distribution.\"mode\"
, we return the mode over the predicted distribution.\"quantiles\"
, we return the quantiles of the predicted distribution. The parameter output_quantiles
determines which quantiles are returned.\"main\"
, we return the all output types above in a dict.\"full\"
, we return the full output of the model, including the logits and the criterion, and all the output types from \"main\".'mean'
quantiles
list[float] | None
The quantiles to return if output=\"quantiles\"
.
By default, the [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
quantiles are returned. The predictions per quantile match the input order.
None
Returns:
Type Descriptionndarray | list[ndarray] | dict[str, ndarray] | dict[str, ndarray | FullSupportBarDistribution]
The predicted target variable or a list of predictions per quantile.
"},{"location":"reference/tabpfn/utils/","title":"Utils","text":""},{"location":"reference/tabpfn/utils/#tabpfn.utils","title":"utils","text":"A collection of random utilities for the TabPFN models.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.infer_categorical_features","title":"infer_categorical_features","text":"infer_categorical_features(\n X: ndarray,\n *,\n provided: Sequence[int] | None,\n min_samples_for_inference: int,\n max_unique_for_category: int,\n min_unique_for_numerical: int\n) -> list[int]\n
Infer the categorical features from the given data.
Note
This function may infer particular columns to not be categorical as defined by what suits the model predictions and it's pre-training.
Parameters:
Name Type Description DefaultX
ndarray
The data to infer the categorical features from.
requiredprovided
Sequence[int] | None
Any user provided indices of what is considered categorical.
requiredmin_samples_for_inference
int
The minimum number of samples required for automatic inference of features which were not provided as categorical.
requiredmax_unique_for_category
int
The maximum number of unique values for a feature to be considered categorical.
requiredmin_unique_for_numerical
int
The minimum number of unique values for a feature to be considered numerical.
requiredReturns:
Type Descriptionlist[int]
The indices of inferred categorical features.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.infer_device_and_type","title":"infer_device_and_type","text":"infer_device_and_type(\n device: str | device | None,\n) -> device\n
Infer the device and data type from the given device string.
Parameters:
Name Type Description Defaultdevice
str | device | None
The device to infer the type from.
requiredReturns:
Type Descriptiondevice
The inferred device
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.infer_fp16_inference_mode","title":"infer_fp16_inference_mode","text":"infer_fp16_inference_mode(\n device: device, *, enable: bool | None\n) -> bool\n
Infer whether fp16 inference should be enabled.
Parameters:
Name Type Description Defaultdevice
device
The device to validate against.
requiredenable
bool | None
Whether it should be enabled, True
or False
, otherwise if None
, detect if it's possible and use it if so.
Returns:
Type Descriptionbool
Whether to use fp16 inference or not.
Raises:
Type DescriptionValueError
If fp16 inference was enabled and device type does not support it.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.infer_random_state","title":"infer_random_state","text":"infer_random_state(\n random_state: int | RandomState | Generator | None,\n) -> tuple[int, Generator]\n
Infer the random state from the given input.
Parameters:
Name Type Description Defaultrandom_state
int | RandomState | Generator | None
The random state to infer.
requiredReturns:
Type Descriptiontuple[int, Generator]
A static integer seed and a random number generator.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.is_autocast_available","title":"is_autocast_available","text":"is_autocast_available(device_type: str) -> bool\n
Infer whether autocast is available for the given device type.
Parameters:
Name Type Description Defaultdevice_type
str
The device type to check for autocast availability.
requiredReturns:
Type Descriptionbool
Whether autocast is available for the given device type.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.load_model_criterion_config","title":"load_model_criterion_config","text":"load_model_criterion_config(\n model_path: None | str | Path,\n *,\n check_bar_distribution_criterion: bool,\n cache_trainset_representation: bool,\n which: Literal[\"regressor\", \"classifier\"],\n version: Literal[\"v2\"] = \"v2\",\n download: bool,\n model_seed: int\n) -> tuple[\n PerFeatureTransformer,\n BCEWithLogitsLoss\n | CrossEntropyLoss\n | FullSupportBarDistribution,\n InferenceConfig,\n]\n
Load the model, criterion, and config from the given path.
Parameters:
Name Type Description Defaultmodel_path
None | str | Path
The path to the model.
requiredcheck_bar_distribution_criterion
bool
Whether to check if the criterion is a FullSupportBarDistribution, which is the expected criterion for models trained for regression.
requiredcache_trainset_representation
bool
Whether the model should know to cache the trainset representation.
requiredwhich
Literal['regressor', 'classifier']
Whether the model is a regressor or classifier.
requiredversion
Literal['v2']
The version of the model.
'v2'
download
bool
Whether to download the model if it doesn't exist.
requiredmodel_seed
int
The seed of the model.
requiredReturns:
Type Descriptiontuple[PerFeatureTransformer, BCEWithLogitsLoss | CrossEntropyLoss | FullSupportBarDistribution, InferenceConfig]
The model, criterion, and config.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.translate_probs_across_borders","title":"translate_probs_across_borders","text":"translate_probs_across_borders(\n logits: Tensor, *, frm: Tensor, to: Tensor\n) -> Tensor\n
Translate the probabilities across the borders.
Parameters:
Name Type Description Defaultlogits
Tensor
The logits defining the distribution to translate.
requiredfrm
Tensor
The borders to translate from.
requiredto
Tensor
The borders to translate to.
requiredReturns:
Type DescriptionTensor
The translated probabilities.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.update_encoder_outlier_params","title":"update_encoder_outlier_params","text":"update_encoder_outlier_params(\n model: Module,\n remove_outliers_std: float | None,\n seed: int | None,\n *,\n inplace: Literal[True]\n) -> None\n
Update the encoder to handle outliers in the model.
Warning
This only happens inplace.
Parameters:
Name Type Description Defaultmodel
Module
The model to update.
requiredremove_outliers_std
float | None
The standard deviation to remove outliers.
requiredseed
int | None
The seed to use, if any.
requiredinplace
Literal[True]
Whether to do the operation inplace.
requiredRaises:
Type DescriptionValueError
If inplace
is not True
.
validate_X_predict(\n X: XType, estimator: TabPFNRegressor | TabPFNClassifier\n) -> ndarray\n
Validate the input data for prediction.
"},{"location":"reference/tabpfn/utils/#tabpfn.utils.validate_Xy_fit","title":"validate_Xy_fit","text":"validate_Xy_fit(\n X: XType,\n y: YType,\n estimator: TabPFNRegressor | TabPFNClassifier,\n *,\n max_num_features: int,\n max_num_samples: int,\n ensure_y_numeric: bool = False,\n ignore_pretraining_limits: bool = False\n) -> tuple[ndarray, ndarray, NDArray[Any] | None, int]\n
Validate the input data for fitting.
"},{"location":"reference/tabpfn/model/bar_distribution/","title":"Bar distribution","text":""},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution","title":"bar_distribution","text":""},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution","title":"BarDistribution","text":" Bases: Module
average_bar_distributions_into_this(\n list_of_bar_distributions: Sequence[BarDistribution],\n list_of_logits: Sequence[Tensor],\n *,\n average_logits: bool = False\n) -> Tensor\n
:param list_of_bar_distributions: :param list_of_logits: :param average_logits: :return:
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.cdf","title":"cdf","text":"cdf(logits: Tensor, ys: Tensor) -> Tensor\n
Calculates the cdf of the distribution described by the logits. The cdf is scaled by the width of the bars.
Parameters:
Name Type Description Defaultlogits
Tensor
tensor of shape (batch_size, ..., num_bars) with the logits describing the distribution
requiredys
Tensor
tensor of shape (batch_size, ..., n_ys to eval) or (n_ys to eval) with the targets.
required"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.cdf_temporary","title":"cdf_temporary","text":"cdf_temporary(logits: Tensor) -> Tensor\n
Cumulative distribution function.
TODO: this already exists here, make sure to merge, at the moment still used.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.get_probs_for_different_borders","title":"get_probs_for_different_borders","text":"get_probs_for_different_borders(\n logits: Tensor, new_borders: Tensor\n) -> Tensor\n
The logits describe the density of the distribution over the current self.borders.
This function returns the logits if the self.borders were changed to new_borders. This is useful to average the logits of different models.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.icdf","title":"icdf","text":"icdf(logits: Tensor, left_prob: float) -> Tensor\n
Implementation of the quantile function :param logits: Tensor of any shape, with the last dimension being logits :param left_prob: float: The probability mass to the left of the result. :return: Position with left_prob
probability weight to the left.
mean_of_square(logits: Tensor) -> Tensor\n
Computes E[x^2].
Parameters:
Name Type Description Defaultlogits
Tensor
Output of the model.
requiredReturns:
Type DescriptionTensor
mean of square
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.pi","title":"pi","text":"pi(\n logits: Tensor,\n best_f: float | Tensor,\n *,\n maximize: bool = True\n) -> Tensor\n
Acquisition Function: Probability of Improvement.
Parameters:
Name Type Description Defaultlogits
Tensor
as returned by Transformer
requiredbest_f
float | Tensor
best evaluation so far (the incumbent)
requiredmaximize
bool
whether to maximize
True
Returns:
Type DescriptionTensor
probability of improvement
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.plot","title":"plot","text":"plot(\n logits: Tensor,\n ax: Axes | None = None,\n zoom_to_quantile: float | None = None,\n **kwargs: Any\n) -> Axes\n
Plots the distribution.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.BarDistribution.ucb","title":"ucb","text":"ucb(\n logits: Tensor,\n best_f: float,\n rest_prob: float = 1 - 0.682 / 2,\n *,\n maximize: bool = True\n) -> Tensor\n
UCB utility. Rest Prob is the amount of utility above (below) the confidence interval that is ignored.
Higher rest_prob is equivalent to lower beta in the standard GP-UCB formulation.
Parameters:
Name Type Description Defaultlogits
Tensor
Logits, as returned by the Transformer.
requiredrest_prob
float
The amount of utility above (below) the confidence interval that is ignored.
The default is equivalent to using GP-UCB with beta=1
. To get the corresponding beta
, where beta
is from the standard GP definition of UCB ucb_utility = mean + beta * std
, you can use this computation:
beta = math.sqrt(2)*torch.erfinv(torch.tensor(2*(1-rest_prob)-1))
1 - 0.682 / 2
best_f
float
Unused
requiredmaximize
bool
Whether to maximize.
True
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution","title":"FullSupportBarDistribution","text":" Bases: BarDistribution
average_bar_distributions_into_this(\n list_of_bar_distributions: Sequence[BarDistribution],\n list_of_logits: Sequence[Tensor],\n *,\n average_logits: bool = False\n) -> Tensor\n
:param list_of_bar_distributions: :param list_of_logits: :param average_logits: :return:
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.cdf","title":"cdf","text":"cdf(logits: Tensor, ys: Tensor) -> Tensor\n
Calculates the cdf of the distribution described by the logits. The cdf is scaled by the width of the bars.
Parameters:
Name Type Description Defaultlogits
Tensor
tensor of shape (batch_size, ..., num_bars) with the logits describing the distribution
requiredys
Tensor
tensor of shape (batch_size, ..., n_ys to eval) or (n_ys to eval) with the targets.
required"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.cdf_temporary","title":"cdf_temporary","text":"cdf_temporary(logits: Tensor) -> Tensor\n
Cumulative distribution function.
TODO: this already exists here, make sure to merge, at the moment still used.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.ei_for_halfnormal","title":"ei_for_halfnormal","text":"ei_for_halfnormal(\n scale: float,\n best_f: Tensor | float,\n *,\n maximize: bool = True\n) -> Tensor\n
EI for a standard normal distribution with mean 0 and variance scale
times 2.
Which is the same as the half normal EI. Tested this with MC approximation:
ei_for_halfnormal = lambda scale, best_f: (torch.distributions.HalfNormal(torch.tensor(scale)).sample((10_000_000,))- best_f ).clamp(min=0.).mean()\nprint([(ei_for_halfnormal(scale,best_f), FullSupportBarDistribution().ei_for_halfnormal(scale,best_f)) for scale in [0.1,1.,10.] for best_f in [.1,10.,4.]])\n
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.forward","title":"forward","text":"forward(\n logits: Tensor,\n y: Tensor,\n mean_prediction_logits: Tensor | None = None,\n) -> Tensor\n
Returns the negative log density (the loss).
y: T x B, logits: T x B x self.num_bars.
:param logits: Tensor of shape T x B x self.num_bars :param y: Tensor of shape T x B :param mean_prediction_logits: :return:
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.get_probs_for_different_borders","title":"get_probs_for_different_borders","text":"get_probs_for_different_borders(\n logits: Tensor, new_borders: Tensor\n) -> Tensor\n
The logits describe the density of the distribution over the current self.borders.
This function returns the logits if the self.borders were changed to new_borders. This is useful to average the logits of different models.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.icdf","title":"icdf","text":"icdf(logits: Tensor, left_prob: float) -> Tensor\n
Implementation of the quantile function :param logits: Tensor of any shape, with the last dimension being logits :param left_prob: float: The probability mass to the left of the result. :return: Position with left_prob
probability weight to the left.
mean_of_square(logits: Tensor) -> Tensor\n
Computes E[x^2].
Parameters:
Name Type Description Defaultlogits
Tensor
Output of the model.
required"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.pdf","title":"pdf","text":"pdf(logits: Tensor, y: Tensor) -> Tensor\n
Probability density function at y.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.pi","title":"pi","text":"pi(\n logits: Tensor,\n best_f: Tensor | float,\n *,\n maximize: bool = True\n) -> Tensor\n
Acquisition Function: Probability of Improvement.
Parameters:
Name Type Description Defaultlogits
Tensor
as returned by Transformer (evaluation_points x batch x feature_dim)
requiredbest_f
Tensor | float
best evaluation so far (the incumbent)
requiredmaximize
bool
whether to maximize
True
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.plot","title":"plot","text":"plot(\n logits: Tensor,\n ax: Axes | None = None,\n zoom_to_quantile: float | None = None,\n **kwargs: Any\n) -> Axes\n
Plots the distribution.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.sample","title":"sample","text":"sample(logits: Tensor, t: float = 1.0) -> Tensor\n
Samples values from the distribution.
Temperature t.
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.FullSupportBarDistribution.ucb","title":"ucb","text":"ucb(\n logits: Tensor,\n best_f: float,\n rest_prob: float = 1 - 0.682 / 2,\n *,\n maximize: bool = True\n) -> Tensor\n
UCB utility. Rest Prob is the amount of utility above (below) the confidence interval that is ignored.
Higher rest_prob is equivalent to lower beta in the standard GP-UCB formulation.
Parameters:
Name Type Description Defaultlogits
Tensor
Logits, as returned by the Transformer.
requiredrest_prob
float
The amount of utility above (below) the confidence interval that is ignored.
The default is equivalent to using GP-UCB with beta=1
. To get the corresponding beta
, where beta
is from the standard GP definition of UCB ucb_utility = mean + beta * std
, you can use this computation:
beta = math.sqrt(2)*torch.erfinv(torch.tensor(2*(1-rest_prob)-1))
1 - 0.682 / 2
best_f
float
Unused
requiredmaximize
bool
Whether to maximize.
True
"},{"location":"reference/tabpfn/model/bar_distribution/#tabpfn.model.bar_distribution.get_bucket_limits","title":"get_bucket_limits","text":"get_bucket_limits(\n num_outputs: int,\n full_range: tuple | None = None,\n ys: Tensor | None = None,\n *,\n verbose: bool = False,\n widen_bucket_limits_factor: float | None = None\n) -> Tensor\n
Decide for a set of bucket limits based on a distritbution of ys.
Parameters:
Name Type Description Defaultnum_outputs
int
This is only tested for num_outputs=1, but should work for larger num_outputs as well.
requiredfull_range
tuple | None
If ys is not passed, this is the range of the ys that should be used to estimate the bucket limits.
None
ys
Tensor | None
If ys is passed, this is the ys that should be used to estimate the bucket limits. Do not pass full_range in this case.
None
verbose
bool
Unused
False
widen_bucket_limits_factor
float | None
If set, the bucket limits are widened by this factor. This allows to have a slightly larger range than the actual data.
None
"},{"location":"reference/tabpfn/model/config/","title":"Config","text":""},{"location":"reference/tabpfn/model/config/#tabpfn.model.config","title":"config","text":""},{"location":"reference/tabpfn/model/config/#tabpfn.model.config.InferenceConfig","title":"InferenceConfig dataclass
","text":"Configuration for the TabPFN model.
"},{"location":"reference/tabpfn/model/config/#tabpfn.model.config.InferenceConfig.from_dict","title":"from_dictclassmethod
","text":"from_dict(config: dict) -> InferenceConfig\n
Create a Config object from a dictionary.
This method also does some sanity checking initially.
"},{"location":"reference/tabpfn/model/encoders/","title":"Encoders","text":""},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders","title":"encoders","text":""},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.CategoricalInputEncoderPerFeatureEncoderStep","title":"CategoricalInputEncoderPerFeatureEncoderStep","text":" Bases: SeqEncStep
Expects input of size 1.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.CategoricalInputEncoderPerFeatureEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.FrequencyFeatureEncoderStep","title":"FrequencyFeatureEncoderStep","text":" Bases: SeqEncStep
Encoder step to add frequency-based features to the input.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.FrequencyFeatureEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.InputEncoder","title":"InputEncoder","text":" Bases: Module
Base class for input encoders.
All input encoders should subclass this class and implement the forward
method.
forward(x: Tensor, single_eval_pos: int) -> Tensor\n
Encode the input tensor.
Parameters:
Name Type Description Defaultx
Tensor
The input tensor to encode.
requiredsingle_eval_pos
int
The position to use for single evaluation.
requiredReturns:
Type DescriptionTensor
The encoded tensor.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.InputNormalizationEncoderStep","title":"InputNormalizationEncoderStep","text":" Bases: SeqEncStep
Encoder step to normalize the input in different ways.
Can be used to normalize the input to a ranking, remove outliers, or normalize the input to have unit variance.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.InputNormalizationEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.InputNormalizationEncoderStep.reset_seed","title":"reset_seed","text":"reset_seed() -> None\n
Reset the random seed.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.LinearInputEncoder","title":"LinearInputEncoder","text":" Bases: Module
A simple linear input encoder.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.LinearInputEncoder.forward","title":"forward","text":"forward(*x: Tensor, **kwargs: Any) -> tuple[Tensor]\n
Apply the linear transformation to the input.
Parameters:
Name Type Description Default*x
Tensor
The input tensors to concatenate and transform.
()
**kwargs
Any
Unused keyword arguments.
{}
Returns:
Type Descriptiontuple[Tensor]
A tuple containing the transformed tensor.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.LinearInputEncoderStep","title":"LinearInputEncoderStep","text":" Bases: SeqEncStep
A simple linear input encoder step.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.LinearInputEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.NanHandlingEncoderStep","title":"NanHandlingEncoderStep","text":" Bases: SeqEncStep
Encoder step to handle NaN and infinite values in the input.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.NanHandlingEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.RemoveDuplicateFeaturesEncoderStep","title":"RemoveDuplicateFeaturesEncoderStep","text":" Bases: SeqEncStep
Encoder step to remove duplicate features.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.RemoveDuplicateFeaturesEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.RemoveEmptyFeaturesEncoderStep","title":"RemoveEmptyFeaturesEncoderStep","text":" Bases: SeqEncStep
Encoder step to remove empty (constant) features.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.RemoveEmptyFeaturesEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.SeqEncStep","title":"SeqEncStep","text":" Bases: Module
Abstract base class for sequential encoder steps.
SeqEncStep is a wrapper around a module that defines the expected input keys and the produced output keys. The outputs are assigned to the output keys in the order specified by out_keys
.
Subclasses should either implement _forward
or _fit
and _transform
. Subclasses that transform x
should always use _fit
and _transform
, creating any state that depends on the train set in _fit
and using it in _transform
. This allows fitting on data first and doing inference later without refitting. Subclasses that work with y
can alternatively use _forward
instead.
forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.SequentialEncoder","title":"SequentialEncoder","text":" Bases: Sequential
, InputEncoder
An encoder that applies a sequence of encoder steps.
SequentialEncoder allows building an encoder from a sequence of EncoderSteps. The input is passed through each step in the provided order.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.SequentialEncoder.forward","title":"forward","text":"forward(input: dict, **kwargs: Any) -> Tensor\n
Apply the sequence of encoder steps to the input.
Parameters:
Name Type Description Defaultinput
dict
The input state dictionary. If the input is not a dict and the first layer expects one input key, the input tensor is mapped to the key expected by the first layer.
required**kwargs
Any
Additional keyword arguments passed to each encoder step.
{}
Returns:
Type DescriptionTensor
The output of the final encoder step.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.VariableNumFeaturesEncoderStep","title":"VariableNumFeaturesEncoderStep","text":" Bases: SeqEncStep
Encoder step to handle variable number of features.
Transforms the input to a fixed number of features by appending zeros. Also normalizes the input by the number of used features to keep the variance of the input constant, even when zeros are appended.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.VariableNumFeaturesEncoderStep.forward","title":"forward","text":"forward(\n state: dict,\n cache_trainset_representation: bool = False,\n **kwargs: Any\n) -> dict\n
Perform the forward pass of the encoder step.
Parameters:
Name Type Description Defaultstate
dict
The input state dictionary containing the input tensors.
requiredcache_trainset_representation
bool
Whether to cache the training set representation. Only supported for _fit and _transform (not _forward).
False
**kwargs
Any
Additional keyword arguments passed to the encoder step.
{}
Returns:
Type Descriptiondict
The updated state dictionary with the output tensors assigned to the output keys.
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.normalize_data","title":"normalize_data","text":"normalize_data(\n data: Tensor,\n *,\n normalize_positions: int = -1,\n return_scaling: bool = False,\n clip: bool = True,\n std_only: bool = False,\n mean: Tensor | None = None,\n std: Tensor | None = None\n) -> Tensor | tuple[Tensor, tuple[Tensor, Tensor]]\n
Normalize data to mean 0 and std 1.
Parameters:
Name Type Description Defaultdata
Tensor
The data to normalize. (T, B, H)
requirednormalize_positions
int
If > 0, only use the first normalize_positions
positions for normalization.
-1
return_scaling
bool
If True, return the scaling parameters as well (mean, std).
False
std_only
bool
If True, only divide by std.
False
clip
bool
If True, clip the data to [-100, 100].
True
mean
Tensor | None
If given, use this value instead of computing it.
None
std
Tensor | None
If given, use this value instead of computing it.
None
"},{"location":"reference/tabpfn/model/encoders/#tabpfn.model.encoders.select_features","title":"select_features","text":"select_features(x: Tensor, sel: Tensor) -> Tensor\n
Select features from the input tensor based on the selection mask.
Parameters:
Name Type Description Defaultx
Tensor
The input tensor.
requiredsel
Tensor
The boolean selection mask indicating which features to keep.
requiredReturns:
Type DescriptionTensor
The tensor with selected features.
"},{"location":"reference/tabpfn/model/layer/","title":"Layer","text":""},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer","title":"layer","text":""},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer.LayerNorm","title":"LayerNorm","text":" Bases: LayerNorm
Custom LayerNorm module that supports saving peak memory factor.
This module extends the PyTorch LayerNorm implementation to handle FP16 inputs efficiently and support saving peak memory factor.
Parameters:
Name Type Description Default*args
Any
Positional arguments passed to the base LayerNorm class.
()
**kwargs
Any
Keyword arguments passed to the base LayerNorm class.
{}
"},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer.LayerNorm.forward","title":"forward","text":"forward(\n input: Tensor,\n *,\n allow_inplace: bool = False,\n save_peak_mem_factor: int | None = None\n) -> Tensor\n
Perform layer normalization on the input tensor.
Parameters:
Name Type Description Defaultinput
Tensor
The input tensor.
requiredallow_inplace
bool
Whether to allow in-place operations. Default is False.
False
save_peak_mem_factor
int | None
The factor to save peak memory. Default is None.
None
Returns:
Type DescriptionTensor
The layer normalized tensor.
"},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer.PerFeatureEncoderLayer","title":"PerFeatureEncoderLayer","text":" Bases: Module
Transformer encoder layer that processes each feature block separately.
This layer consists of multi-head attention between features, multi-head attention between items, and feedforward neural networks (MLPs).
It supports various configurations and optimization options.
Parameters:
Name Type Description Defaultd_model
int
The dimensionality of the input and output embeddings.
requirednhead
int
The number of attention heads.
requireddim_feedforward
int | None
The dimensionality of the feedforward network. Default is None (2 * d_model).
None
activation
str
The activation function to use in the MLPs.
'relu'
layer_norm_eps
float
The epsilon value for layer normalization.
1e-05
pre_norm
bool
Whether to apply layer normalization before or after the attention and MLPs.
False
device
device | None
The device to use for the layer parameters.
None
dtype
dtype | None
The data type to use for the layer parameters.
None
recompute_attn
bool
Whether to recompute attention during backpropagation.
False
second_mlp
bool
Whether to include a second MLP in the layer.
False
layer_norm_with_elementwise_affine
bool
Whether to use elementwise affine parameters in layer normalization.
False
zero_init
bool
Whether to initialize the output of the MLPs to zero.
False
save_peak_mem_factor
int | None
The factor to save peak memory, only effective with post-norm.
None
attention_between_features
bool
Whether to apply attention between feature blocks.
True
multiquery_item_attention
bool
Whether to use multiquery attention for items.
False
multiquery_item_attention_for_test_set
bool
Whether to use multiquery attention for the test set.
False
attention_init_gain
float
The gain value for initializing attention parameters.
1.0
d_k
int | None
The dimensionality of the query and key vectors. Default is (d_model // nhead).
None
d_v
int | None
The dimensionality of the value vectors. Default is (d_model // nhead).
None
precomputed_kv
None | Tensor | tuple[Tensor, Tensor]
Precomputed key-value pairs for attention.
None
"},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer.PerFeatureEncoderLayer.empty_trainset_representation_cache","title":"empty_trainset_representation_cache","text":"empty_trainset_representation_cache() -> None\n
Empty the trainset representation cache.
"},{"location":"reference/tabpfn/model/layer/#tabpfn.model.layer.PerFeatureEncoderLayer.forward","title":"forward","text":"forward(\n state: Tensor,\n single_eval_pos: int | None = None,\n *,\n cache_trainset_representation: bool = False,\n att_src: Tensor | None = None\n) -> Tensor\n
Pass the input through the encoder layer.
Parameters:
Name Type Description Defaultstate
Tensor
The transformer state passed as input to the layer of shape (batch_size, num_items, num_feature_blocks, d_model).
requiredsingle_eval_pos
int | None
The position from which on everything is treated as test set.
None
cache_trainset_representation
bool
Whether to cache the trainset representation. If single_eval_pos is set (> 0 and not None), create a cache of the trainset KV. This may require a lot of memory. Otherwise, use cached KV representations for inference.
False
att_src
Tensor | None
The tensor to attend to from the final layer of the encoder. It has a shape of (batch_size, num_train_items, num_feature_blocks, d_model). This does not work with multiquery_item_attention_for_test_set and cache_trainset_representation at this point.
None
Returns:
Type DescriptionTensor
The transformer state passed through the encoder layer.
"},{"location":"reference/tabpfn/model/loading/","title":"Loading","text":""},{"location":"reference/tabpfn/model/loading/#tabpfn.model.loading","title":"loading","text":""},{"location":"reference/tabpfn/model/loading/#tabpfn.model.loading.download_model","title":"download_model","text":"download_model(\n to: Path,\n *,\n version: Literal[\"v2\"],\n which: Literal[\"classifier\", \"regressor\"],\n model_name: str | None = None\n) -> Literal[\"ok\"] | list[Exception]\n
Download a TabPFN model, trying all available sources.
Parameters:
Name Type Description Defaultto
Path
The directory to download the model to.
requiredversion
Literal['v2']
The version of the model to download.
requiredwhich
Literal['classifier', 'regressor']
The type of model to download.
requiredmodel_name
str | None
Optional specific model name to download.
None
Returns:
Type DescriptionLiteral['ok'] | list[Exception]
\"ok\" if the model was downloaded successfully, otherwise a list of
Literal['ok'] | list[Exception]
exceptions that occurred that can be handled as desired.
"},{"location":"reference/tabpfn/model/loading/#tabpfn.model.loading.load_model","title":"load_model","text":"load_model(*, path: Path, model_seed: int) -> tuple[\n PerFeatureTransformer,\n BCEWithLogitsLoss\n | CrossEntropyLoss\n | FullSupportBarDistribution,\n InferenceConfig,\n]\n
Loads a model from a given path.
Parameters:
Name Type Description Defaultpath
Path
Path to the checkpoint
requiredmodel_seed
int
The seed to use for the model
required"},{"location":"reference/tabpfn/model/memory/","title":"Memory","text":""},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory","title":"memory","text":""},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator","title":"MemoryUsageEstimator","text":""},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.convert_bytes_to_unit","title":"convert_bytes_to_unitclassmethod
","text":"convert_bytes_to_unit(\n value: float, unit: Literal[\"b\", \"mb\", \"gb\"]\n) -> float\n
Convenience method to convert bytes to a different unit.
Parameters:
Name Type Description Defaultvalue
float
The number of bytes.
requiredunit
Literal['b', 'mb', 'gb']
The unit to convert to.
requiredReturns:
Type Descriptionfloat
The number of bytes in the new unit.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.convert_units","title":"convert_unitsclassmethod
","text":"convert_units(\n value: float,\n from_unit: Literal[\"b\", \"mb\", \"gb\"],\n to_unit: Literal[\"b\", \"mb\", \"gb\"],\n) -> float\n
Convert a value from one unit to another.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.estimate_memory_of_one_batch","title":"estimate_memory_of_one_batchclassmethod
","text":"estimate_memory_of_one_batch(\n X: Tensor,\n model: Module,\n *,\n cache_kv: bool,\n dtype_byte_size: int,\n unit: Literal[\"b\", \"mb\", \"gb\"] = \"gb\",\n n_train_samples: int | None = None\n) -> float\n
Estimate the memory usage of a single batch.
The calculation is done based on the assumption that save_peak_mem_factor is not used (since this estimation is used to determine whether to use it).
Parameters:
Name Type Description DefaultX
Tensor
The input tensor.
requiredmodel
Module
The model to estimate the memory usage of.
requiredcache_kv
bool
Whether key and value tensors are cached.
requireddtype_byte_size
int
The size of the data type in bytes.
requiredunit
Literal['b', 'mb', 'gb']
The unit to convert the memory usage to.
'gb'
n_train_samples
int | None
The number of training samples (only for cache_kv mode)
None
Returns:
Type Descriptionfloat
The estimated memory usage of a single batch.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.estimate_memory_remainder_after_batch","title":"estimate_memory_remainder_after_batchclassmethod
","text":"estimate_memory_remainder_after_batch(\n X: Tensor,\n model: Module,\n *,\n cache_kv: bool,\n device: device,\n dtype_byte_size: int,\n safety_factor: float,\n n_train_samples: int | None = None,\n max_free_mem: float | int | None = None\n) -> float\n
Whether to save peak memory or not.
Parameters:
Name Type Description DefaultX
Tensor
The input tensor.
requiredmodel
Module
The model to estimate the memory usage of.
requiredcache_kv
bool
Whether key and value tensors are cached.
requireddevice
device
The device to use.
requireddtype_byte_size
int
The size of the data type in bytes.
requiredsafety_factor
float
The safety factor to apply.
requiredn_train_samples
int | None
The number of training samples (only for cache_kv mode)
None
max_free_mem
float | int | None
The amount of free memory available.
None
Returns:
Type Descriptionfloat
The amount of free memory available after a batch is computed.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.get_max_free_memory","title":"get_max_free_memoryclassmethod
","text":"get_max_free_memory(\n device: device,\n *,\n unit: Literal[\"b\", \"mb\", \"gb\"] = \"gb\",\n default_gb_cpu_if_failed_to_calculate: float\n) -> float\n
How much memory to use at most in GB, the memory usage will be calculated based on an estimation of the systems free memory.
For CUDA will use the free memory of the GPU. For CPU will default to 32 GB.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.get_max_free_memory--returns","title":"Returns:","text":"The maximum memory usage in GB.
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.MemoryUsageEstimator.reset_peak_memory_if_required","title":"reset_peak_memory_if_requiredclassmethod
","text":"reset_peak_memory_if_required(\n save_peak_mem: bool | Literal[\"auto\"] | float | int,\n model: Module,\n X: Tensor,\n *,\n cache_kv: bool,\n device: device,\n dtype_byte_size: int,\n safety_factor: float = 5.0,\n n_train_samples: int | None = None\n) -> None\n
Reset the peak memory if required.
Parameters:
Name Type Description Defaultsave_peak_mem
bool | 'auto' | float | int
If bool, specifies whether to save peak memory or not. If \"auto\", the amount of free memory is estimated and the option is enabled or disabled based on the estimated usage. If float or int, it is considered as the amount of memory available (in GB) explicitly specified by the user. In this case, this value is used to estimate whether or not to save peak memory.
requiredmodel
Module
The model to reset the peak memory of.
requiredX
Tensor
The input tensor.
requiredcache_kv
bool
Whether key and value tensors are cached.
requireddevice
device
The device to use.
requireddtype_byte_size
int
The size of the data type in bytes.
requiredsafety_factor
float
The safety factor to apply.
5.0
n_train_samples
int
The number of training samples (to be used only for cache_kv mode)
None
"},{"location":"reference/tabpfn/model/memory/#tabpfn.model.memory.support_save_peak_mem_factor","title":"support_save_peak_mem_factor","text":"support_save_peak_mem_factor(\n method: MethodType,\n) -> Callable\n
Can be applied to a method acting on a tensor 'x' whose first dimension is a flat batch dimension (i.e. the operation is trivially parallel over the first dimension).
For additional tensor arguments, it is assumed that the first dimension is again the batch dimension, and that non-tensor arguments can be passed as-is to splits when parallelizing over the batch dimension.
The decorator adds options 'add_input' to add the principal input 'x' to the result of the method and 'allow_inplace'. By setting 'allow_inplace', the caller indicates that 'x' is not used after the call and its buffer can be reused for the output.
Setting 'allow_inplace' does not ensure that the operation will be inplace, and the return value should be used for clarity and simplicity.
Moreover, it adds an optional int parameter 'save_peak_mem_factor' that is only supported in combination with 'allow_inplace' during inference and subdivides the operation into the specified number of chunks to reduce peak memory consumption.
"},{"location":"reference/tabpfn/model/mlp/","title":"Mlp","text":""},{"location":"reference/tabpfn/model/mlp/#tabpfn.model.mlp","title":"mlp","text":""},{"location":"reference/tabpfn/model/mlp/#tabpfn.model.mlp.Activation","title":"Activation","text":" Bases: Enum
Enum for activation functions.
"},{"location":"reference/tabpfn/model/mlp/#tabpfn.model.mlp.MLP","title":"MLP","text":" Bases: Module
Multi-Layer Perceptron (MLP) module.
This module consists of two linear layers with an activation function in between. It supports various configurations such as the hidden size, activation function, initializing the output to zero, and recomputing the forward pass during backpropagation.
Parameters:
Name Type Description Defaultsize
int
The input and output size of the MLP.
requiredhidden_size
int
The size of the hidden layer.
requiredactivation
Activation | str
The activation function to use. Can be either an Activation enum or a string representing the activation name.
requireddevice
device | None
The device to use for the linear layers.
requireddtype
dtype | None
The data type to use for the linear layers.
requiredinitialize_output_to_zero
bool
Whether to initialize the output layer weights to zero. Default is False.
False
recompute
bool
Whether to recompute the forward pass during backpropagation. This can save memory but increase computation time. Default is False.
False
Attributes:
Name Type Descriptionlinear1
Linear
The first linear layer.
linear2
Linear
The second linear layer.
activation
Activation
The activation function to use.
Examplemlp = MLP(size=128, hidden_size=256, activation='gelu', device='cuda') x = torch.randn(32, 128, device='cuda', dtype=torch.float32) output = mlp(x)
"},{"location":"reference/tabpfn/model/mlp/#tabpfn.model.mlp.MLP.forward","title":"forward","text":"forward(\n x: Tensor,\n *,\n add_input: bool = False,\n allow_inplace: bool = False,\n save_peak_mem_factor: int | None = None\n) -> Tensor\n
Performs the forward pass of the MLP.
Parameters:
Name Type Description Defaultx
Tensor
The input tensor.
requiredadd_input
bool
Whether to add input to the output. Default is False.
False
allow_inplace
bool
Indicates that 'x' is not used after the call and its buffer can be reused for the output. The operation is not guaranteed to be inplace. Default is False.
False
save_peak_mem_factor
int | None
If provided, enables a memory-saving technique that reduces peak memory usage during the forward pass. This requires 'add_input' and 'allow_inplace' to be True. See the documentation of the decorator 'support_save_peak_mem_factor' for details. Default is None.
None
"},{"location":"reference/tabpfn/model/multi_head_attention/","title":"Multi head attention","text":""},{"location":"reference/tabpfn/model/multi_head_attention/#tabpfn.model.multi_head_attention","title":"multi_head_attention","text":""},{"location":"reference/tabpfn/model/multi_head_attention/#tabpfn.model.multi_head_attention.MultiHeadAttention","title":"MultiHeadAttention","text":" Bases: Module
forward(\n x: Tensor,\n x_kv: Tensor | None = None,\n *,\n cache_kv: bool = False,\n add_input: bool = False,\n allow_inplace: bool = False,\n save_peak_mem_factor: int | None = None,\n reuse_first_head_kv: bool = False,\n only_cache_first_head_kv: bool = False,\n use_cached_kv: bool = False,\n use_second_set_of_queries: bool = False\n)\n
X is the current hidden and has a shape of [batch, ..., seq_len, input_size]. If keys and values are present in the cache and 'freeze_kv' is not set, they are obtained from there and 'x_kv' has to be None. Else, if 'x_kv' is not None, keys and values are obtained by applying the respective linear transformations to 'x_kv'. Else, keys and values are attained by applying the respective linear transformations to 'x' (self attention).
"},{"location":"reference/tabpfn/model/preprocessing/","title":"Preprocessing","text":""},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing","title":"preprocessing","text":""},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.AddFingerprintFeaturesStep","title":"AddFingerprintFeaturesStep","text":" Bases: FeaturePreprocessingTransformerStep
Adds a fingerprint feature to the features based on hash of each row.
If is_test = True
, it keeps the first hash even if there are collisions. If is_test = False
, it handles hash collisions by counting up and rehashing until a unique hash is found.
fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fits the preprocessor.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.AddFingerprintFeaturesStep.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transforms the data.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.FeaturePreprocessingTransformerStep","title":"FeaturePreprocessingTransformerStep","text":"Base class for feature preprocessing steps.
It's main abstraction is really just to provide categorical indices along the pipeline.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.FeaturePreprocessingTransformerStep.fit","title":"fit","text":"fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fits the preprocessor.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.FeaturePreprocessingTransformerStep.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transforms the data.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.KDITransformerWithNaN","title":"KDITransformerWithNaN","text":" Bases: KDITransformer
KDI transformer that can handle NaN values. It performs KDI with NaNs replaced by mean values and then fills the NaN values with NaNs after the transformation.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.RemoveConstantFeaturesStep","title":"RemoveConstantFeaturesStep","text":" Bases: FeaturePreprocessingTransformerStep
Remove features that are constant in the training data.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.RemoveConstantFeaturesStep.fit","title":"fit","text":"fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fits the preprocessor.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.RemoveConstantFeaturesStep.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transforms the data.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ReshapeFeatureDistributionsStep","title":"ReshapeFeatureDistributionsStep","text":" Bases: FeaturePreprocessingTransformerStep
Reshape the feature distributions using different transformations.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ReshapeFeatureDistributionsStep.fit","title":"fit","text":"fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fits the preprocessor.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ReshapeFeatureDistributionsStep.get_adaptive_preprocessors","title":"get_adaptive_preprocessorsstaticmethod
","text":"get_adaptive_preprocessors(\n num_examples: int = 100, random_state: int | None = None\n) -> dict[str, ColumnTransformer]\n
Returns a dictionary of adaptive column transformers that can be used to preprocess the data. Adaptive column transformers are used to preprocess the data based on the column type, they receive a pandas dataframe with column names, that indicate the column type. Column types are not datatypes, but rather a string that indicates how the data should be preprocessed.
Parameters:
Name Type Description Defaultnum_examples
int
The number of examples in the dataset.
100
random_state
int | None
The random state to use for the transformers.
None
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ReshapeFeatureDistributionsStep.get_column_types","title":"get_column_types staticmethod
","text":"get_column_types(X: ndarray) -> list[str]\n
Returns a list of column types for the given data, that indicate how the data should be preprocessed.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ReshapeFeatureDistributionsStep.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transforms the data.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.SafePowerTransformer","title":"SafePowerTransformer","text":" Bases: PowerTransformer
Power Transformer which reverts features back to their original values if they are transformed to large values or the output column does not have unit variance. This happens e.g. when the input data has a large number of outliers.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.SequentialFeatureTransformer","title":"SequentialFeatureTransformer","text":" Bases: UserList
A transformer that applies a sequence of feature preprocessing steps. This is very related to sklearn's Pipeline, but it is designed to work with categorical_features lists that are always passed on.
Currently this class is only used once, thus this could also be made less general if needed.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.SequentialFeatureTransformer.fit","title":"fit","text":"fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fit all the steps in the pipeline.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.SequentialFeatureTransformer.fit_transform","title":"fit_transform","text":"fit_transform(\n X: ndarray, categorical_features: list[int]\n) -> _TransformResult\n
Fit and transform the data using the fitted pipeline.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical features.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.SequentialFeatureTransformer.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transform the data using the fitted pipeline.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ShuffleFeaturesStep","title":"ShuffleFeaturesStep","text":" Bases: FeaturePreprocessingTransformerStep
Shuffle the features in the data.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ShuffleFeaturesStep.fit","title":"fit","text":"fit(X: ndarray, categorical_features: list[int]) -> Self\n
Fits the preprocessor.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features)
requiredcategorical_features
list[int]
list of indices of categorical feature.
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.ShuffleFeaturesStep.transform","title":"transform","text":"transform(X: ndarray) -> _TransformResult\n
Transforms the data.
Parameters:
Name Type Description DefaultX
ndarray
2d array of shape (n_samples, n_features).
required"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.add_safe_standard_to_safe_power_without_standard","title":"add_safe_standard_to_safe_power_without_standard","text":"add_safe_standard_to_safe_power_without_standard(\n input_transformer: TransformerMixin,\n) -> Pipeline\n
In edge cases PowerTransformer can create inf values and similar. Then, the post standard scale crashes. This fixes this issue.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.make_box_cox_safe","title":"make_box_cox_safe","text":"make_box_cox_safe(\n input_transformer: TransformerMixin | Pipeline,\n) -> Pipeline\n
Make box cox save.
The Box-Cox transformation can only be applied to strictly positive data. With first MinMax scaling, we achieve this without loss of function. Additionally, for test data, we also need clipping.
"},{"location":"reference/tabpfn/model/preprocessing/#tabpfn.model.preprocessing.skew","title":"skew","text":"skew(x: ndarray) -> float\n
"},{"location":"reference/tabpfn/model/transformer/","title":"Transformer","text":""},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer","title":"transformer","text":""},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer.LayerStack","title":"LayerStack","text":" Bases: Module
Similar to nn.Sequential, but with support for passing keyword arguments to layers and stacks the same layer multiple times.
"},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer.PerFeatureTransformer","title":"PerFeatureTransformer","text":" Bases: Module
A Transformer model processes a token per feature and sample.
This model extends the standard Transformer architecture to operate on a per-feature basis. It allows for processing each feature separately while still leveraging the power of self-attention.
The model consists of an encoder, decoder, and optional components such as a feature positional embedding and a separate decoder for each feature.
"},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer.PerFeatureTransformer.forward","title":"forward","text":"forward(*args: Any, **kwargs: Any) -> dict[str, Tensor]\n
Performs a forward pass through the model.
This method supports multiple calling conventions:
model((x,y), **kwargs)
model(train_x, train_y, test_x, **kwargs)
model((style,x,y), **kwargs)
Parameters:
Name Type Description Defaulttrain_x
torch.Tensor | None The input data for the training set.
requiredtrain_y
torch.Tensor | None The target data for the training set.
requiredtest_x
torch.Tensor | None The input data for the test set.
requiredx
torch.Tensor The input data.
requiredy
torch.Tensor | None The target data.
requiredstyle
torch.Tensor | None The style vector.
requiredsingle_eval_pos
int The position to evaluate at.
requiredonly_return_standard_out
bool Whether to only return the standard output.
requireddata_dags
Any The data DAGs for each example.
requiredcategorical_inds
list[int] The indices of categorical features.
requiredfreeze_kv
bool Whether to freeze the key and value weights.
requiredReturns:
Type Descriptiondict[str, Tensor]
The output of the model, which can be a tensor or a dictionary of tensors.
"},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer.PerFeatureTransformer.reset_save_peak_mem_factor","title":"reset_save_peak_mem_factor","text":"reset_save_peak_mem_factor(\n factor: int | None = None,\n) -> None\n
Sets the save_peak_mem_factor for all layers.
This factor controls how much memory is saved during the forward pass in inference mode.
Setting this factor > 1 will cause the model to save more memory during the forward pass in inference mode.
A value of 8 is good for a 4x larger width in the fully-connected layers. and yields a situation were we need around 2*num_features*num_items*emsize*2
bytes of memory
for a forward pass (using mixed precision).
WARNING: It should only be used with post-norm.
Parameters:
Name Type Description Defaultfactor
int | None
The save_peak_mem_factor to set. Recommended value is 8.
None
"},{"location":"reference/tabpfn/model/transformer/#tabpfn.model.transformer.SerializableGenerator","title":"SerializableGenerator","text":" Bases: Generator
A serializable version of the torch.Generator, that cna be saved and pickled.
"},{"location":"reference/tabpfn_client/browser_auth/","title":"Browser auth","text":""},{"location":"reference/tabpfn_client/browser_auth/#tabpfn_client.browser_auth","title":"browser_auth","text":""},{"location":"reference/tabpfn_client/browser_auth/#tabpfn_client.browser_auth.BrowserAuthHandler","title":"BrowserAuthHandler","text":""},{"location":"reference/tabpfn_client/browser_auth/#tabpfn_client.browser_auth.BrowserAuthHandler.try_browser_login","title":"try_browser_login","text":"try_browser_login() -> Tuple[bool, Optional[str]]\n
Attempts to perform browser-based login Returns (success: bool, token: Optional[str])
"},{"location":"reference/tabpfn_client/client/","title":"Client","text":""},{"location":"reference/tabpfn_client/client/#tabpfn_client.client","title":"client","text":""},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager","title":"DatasetUIDCacheManager","text":"Manages a cache of the last 50 uploaded datasets, tracking dataset hashes and their UIDs.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager.add_dataset_uid","title":"add_dataset_uid","text":"add_dataset_uid(hash: str, dataset_uid: str)\n
Adds a new dataset to the cache, removing the oldest item if the cache exceeds 50 entries. Assumes the dataset is not already in the cache.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager.delete_uid","title":"delete_uid","text":"delete_uid(dataset_uid: str) -> Optional[str]\n
Deletes an entry from the cache based on the dataset UID.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager.get_dataset_uid","title":"get_dataset_uid","text":"get_dataset_uid(*args)\n
Generates hash by all received arguments and returns cached dataset uid if in cache, otherwise None.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager.load_cache","title":"load_cache","text":"load_cache()\n
Loads the cache from disk if it exists, otherwise initializes an empty cache.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.DatasetUIDCacheManager.save_cache","title":"save_cache","text":"save_cache()\n
Saves the current cache to disk.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.GCPOverloaded","title":"GCPOverloaded","text":" Bases: Exception
Exception raised when the Google Cloud Platform service is overloaded or unavailable.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient","title":"ServiceClient","text":" Bases: Singleton
Singleton class for handling communication with the server. It encapsulates all the API calls to the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.delete_all_datasets","title":"delete_all_datasetsclassmethod
","text":"delete_all_datasets() -> [str]\n
Delete all datasets uploaded by the user from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.delete_all_datasets--returns","title":"Returns","text":"deleted_dataset_uids : [str] The list of deleted dataset UIDs.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.delete_dataset","title":"delete_datasetclassmethod
","text":"delete_dataset(dataset_uid: str) -> list[str]\n
Delete the dataset with the provided UID from the server. Note that deleting a train set with lead to deleting all associated test sets.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.delete_dataset--parameters","title":"Parameters","text":"dataset_uid : str The UID of the dataset to be deleted.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.delete_dataset--returns","title":"Returns","text":"deleted_dataset_uids : [str] The list of deleted dataset UIDs.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.download_all_data","title":"download_all_dataclassmethod
","text":"download_all_data(save_dir: Path) -> Union[Path, None]\n
Download all data uploaded by the user from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.download_all_data--returns","title":"Returns","text":"save_path : Path | None The path to the downloaded file. Return None if download fails.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.fit","title":"fitclassmethod
","text":"fit(X, y, config=None) -> str\n
Upload a train set to server and return the train set UID if successful.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.fit--parameters","title":"Parameters","text":"X : array-like of shape (n_samples, n_features) The training input samples. y : array-like of shape (n_samples,) or (n_samples, n_outputs) The target values. config : dict, optional Configuration for the fit method. Includes tabpfn_systems and paper_version.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.fit--returns","title":"Returns","text":"train_set_uid : str The unique ID of the train set in the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.get_data_summary","title":"get_data_summaryclassmethod
","text":"get_data_summary() -> dict\n
Get the data summary of the user from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.get_data_summary--returns","title":"Returns","text":"data_summary : dict The data summary returned from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.get_password_policy","title":"get_password_policyclassmethod
","text":"get_password_policy() -> dict\n
Get the password policy from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.get_password_policy--returns","title":"Returns","text":"password_policy : {} The password policy returned from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.is_auth_token_outdated","title":"is_auth_token_outdatedclassmethod
","text":"is_auth_token_outdated(access_token) -> Union[bool, None]\n
Check if the provided access token is valid and return True if successful.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.login","title":"loginclassmethod
","text":"login(email: str, password: str) -> tuple[str, str]\n
Login with the provided credentials and return the access token if successful.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.login--parameters","title":"Parameters","text":"email : str password : str
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.login--returns","title":"Returns","text":"access_token : str | None The access token returned from the server. Return None if login fails. message : str The message returned from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.predict","title":"predictclassmethod
","text":"predict(\n train_set_uid: str,\n x_test,\n task: Literal[\"classification\", \"regression\"],\n predict_params: Union[dict, None] = None,\n tabpfn_config: Union[dict, None] = None,\n X_train=None,\n y_train=None,\n) -> dict[str, ndarray]\n
Predict the class labels for the provided data (test set).
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.predict--parameters","title":"Parameters","text":"train_set_uid : str The unique ID of the train set in the server. x_test : array-like of shape (n_samples, n_features) The test input.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.predict--returns","title":"Returns","text":"y_pred : array-like of shape (n_samples,) The predicted class labels.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.register","title":"registerclassmethod
","text":"register(\n email: str,\n password: str,\n password_confirm: str,\n validation_link: str,\n additional_info: dict,\n)\n
Register a new user with the provided credentials.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.register--parameters","title":"Parameters","text":"email : str password : str password_confirm : str validation_link: str additional_info : dict
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.register--returns","title":"Returns","text":"is_created : bool True if the user is created successfully. message : str The message returned from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.retrieve_greeting_messages","title":"retrieve_greeting_messagesclassmethod
","text":"retrieve_greeting_messages() -> list[str]\n
Retrieve greeting messages that are new for the user.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.send_reset_password_email","title":"send_reset_password_emailclassmethod
","text":"send_reset_password_email(email: str) -> tuple[bool, str]\n
Let the server send an email for resetting the password.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.send_verification_email","title":"send_verification_emailclassmethod
","text":"send_verification_email(\n access_token: str,\n) -> tuple[bool, str]\n
Let the server send an email for verifying the email.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.try_browser_login","title":"try_browser_loginclassmethod
","text":"try_browser_login() -> tuple[bool, str]\n
Attempts browser-based login flow Returns (success: bool, message: str)
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.try_connection","title":"try_connectionclassmethod
","text":"try_connection() -> bool\n
Check if server is reachable and accepts the connection.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.validate_email","title":"validate_emailclassmethod
","text":"validate_email(email: str) -> tuple[bool, str]\n
Send entered email to server that checks if it is valid and not already in use.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.validate_email--parameters","title":"Parameters","text":"email : str
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.validate_email--returns","title":"Returns","text":"is_valid : bool True if the email is valid. message : str The message returned from the server.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.verify_email","title":"verify_emailclassmethod
","text":"verify_email(\n token: str, access_token: str\n) -> tuple[bool, str]\n
Verify the email with the provided token.
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.verify_email--parameters","title":"Parameters","text":"token : str access_token : str
"},{"location":"reference/tabpfn_client/client/#tabpfn_client.client.ServiceClient.verify_email--returns","title":"Returns","text":"is_verified : bool True if the email is verified successfully. message : str The message returned from the server.
"},{"location":"reference/tabpfn_client/config/","title":"Config","text":""},{"location":"reference/tabpfn_client/config/#tabpfn_client.config","title":"config","text":""},{"location":"reference/tabpfn_client/config/#tabpfn_client.config.Config","title":"Config","text":""},{"location":"reference/tabpfn_client/constants/","title":"Constants","text":""},{"location":"reference/tabpfn_client/constants/#tabpfn_client.constants","title":"constants","text":""},{"location":"reference/tabpfn_client/estimator/","title":"Estimator","text":""},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator","title":"estimator","text":""},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNClassifier","title":"TabPFNClassifier","text":" Bases: BaseEstimator
, ClassifierMixin
, TabPFNModelSelection
predict(X)\n
Predict class labels for samples in X.
Parameters:
Name Type Description DefaultX
The input samples.
requiredReturns:
Type DescriptionThe predicted class labels.
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNClassifier.predict_proba","title":"predict_proba","text":"predict_proba(X)\n
Predict class probabilities for X.
Parameters:
Name Type Description DefaultX
The input samples.
requiredReturns:
Type DescriptionThe class probabilities of the input samples.
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNModelSelection","title":"TabPFNModelSelection","text":"Base class for TabPFN model selection and path handling.
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNRegressor","title":"TabPFNRegressor","text":" Bases: BaseEstimator
, RegressorMixin
, TabPFNModelSelection
predict(\n X: ndarray,\n output_type: Literal[\n \"mean\",\n \"median\",\n \"mode\",\n \"quantiles\",\n \"full\",\n \"main\",\n ] = \"mean\",\n quantiles: Optional[list[float]] = None,\n) -> Union[ndarray, list[ndarray], dict[str, ndarray]]\n
Predict regression target for X.
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNRegressor.predict--parameters","title":"Parameters","text":"X : array-like of shape (n_samples, n_features) The input samples. output_type : str, default=\"mean\" The type of prediction to return: - \"mean\": Return mean prediction - \"median\": Return median prediction - \"mode\": Return mode prediction - \"quantiles\": Return predictions for specified quantiles - \"full\": Return full prediction details - \"main\": Return main prediction metrics quantiles : list[float] or None, default=None Quantiles to compute when output_type=\"quantiles\". Default is [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.TabPFNRegressor.predict--returns","title":"Returns","text":"array-like or dict The predicted values.
"},{"location":"reference/tabpfn_client/estimator/#tabpfn_client.estimator.validate_data_size","title":"validate_data_size","text":"validate_data_size(\n X: ndarray, y: Union[ndarray, None] = None\n)\n
Check the integrity of the training data. - check if the number of rows between X and y is consistent if y is not None (ValueError) - check if the number of rows is less than MAX_ROWS (ValueError) - check if the number of columns is less than MAX_COLS (ValueError)
"},{"location":"reference/tabpfn_client/prompt_agent/","title":"Prompt agent","text":""},{"location":"reference/tabpfn_client/prompt_agent/#tabpfn_client.prompt_agent","title":"prompt_agent","text":""},{"location":"reference/tabpfn_client/prompt_agent/#tabpfn_client.prompt_agent.PromptAgent","title":"PromptAgent","text":""},{"location":"reference/tabpfn_client/prompt_agent/#tabpfn_client.prompt_agent.PromptAgent.password_req_to_policy","title":"password_req_to_policystaticmethod
","text":"password_req_to_policy(password_req: list[str])\n
Small function that receives password requirements as a list of strings like \"Length(8)\" and returns a corresponding PasswordPolicy object.
"},{"location":"reference/tabpfn_client/service_wrapper/","title":"Service wrapper","text":""},{"location":"reference/tabpfn_client/service_wrapper/#tabpfn_client.service_wrapper","title":"service_wrapper","text":""},{"location":"reference/tabpfn_client/service_wrapper/#tabpfn_client.service_wrapper.InferenceClient","title":"InferenceClient","text":" Bases: ServiceClientWrapper
, Singleton
Wrapper of ServiceClient to handle inference, including: - fitting - prediction
"},{"location":"reference/tabpfn_client/service_wrapper/#tabpfn_client.service_wrapper.UserAuthenticationClient","title":"UserAuthenticationClient","text":" Bases: ServiceClientWrapper
, Singleton
Wrapper of ServiceClient to handle user authentication, including: - user registration and login - access token caching
This is implemented as a singleton class with classmethods.
"},{"location":"reference/tabpfn_client/service_wrapper/#tabpfn_client.service_wrapper.UserAuthenticationClient.try_browser_login","title":"try_browser_loginclassmethod
","text":"try_browser_login() -> tuple[bool, str]\n
Try to authenticate using browser-based login
"},{"location":"reference/tabpfn_client/service_wrapper/#tabpfn_client.service_wrapper.UserDataClient","title":"UserDataClient","text":" Bases: ServiceClientWrapper
, Singleton
Wrapper of ServiceClient to handle user data, including: - query, or delete user account data - query, download, or delete uploaded data
"},{"location":"reference/tabpfn_client/tabpfn_common_utils/regression_pred_result/","title":"Regression pred result","text":""},{"location":"reference/tabpfn_client/tabpfn_common_utils/regression_pred_result/#tabpfn_client.tabpfn_common_utils.regression_pred_result","title":"regression_pred_result","text":""},{"location":"reference/tabpfn_client/tabpfn_common_utils/utils/","title":"Utils","text":""},{"location":"reference/tabpfn_client/tabpfn_common_utils/utils/#tabpfn_client.tabpfn_common_utils.utils","title":"utils","text":""},{"location":"reference/tabpfn_extensions/utils/","title":"Utils","text":""},{"location":"reference/tabpfn_extensions/utils/#tabpfn_extensions.utils","title":"utils","text":""},{"location":"reference/tabpfn_extensions/utils/#tabpfn_extensions.utils.get_tabpfn_models","title":"get_tabpfn_models","text":"get_tabpfn_models() -> Tuple[Type, Type, Type]\n
Get TabPFN models with fallback between local and client versions.
"},{"location":"reference/tabpfn_extensions/utils/#tabpfn_extensions.utils.is_tabpfn","title":"is_tabpfn","text":"is_tabpfn(estimator: Any) -> bool\n
Check if an estimator is a TabPFN model.
"},{"location":"reference/tabpfn_extensions/utils_todo/","title":"Utils todo","text":""},{"location":"reference/tabpfn_extensions/utils_todo/#tabpfn_extensions.utils_todo","title":"utils_todo","text":""},{"location":"reference/tabpfn_extensions/utils_todo/#tabpfn_extensions.utils_todo.infer_categorical_features","title":"infer_categorical_features","text":"infer_categorical_features(\n X: ndarray, categorical_features\n) -> List[int]\n
Infer the categorical features from the input data. We take self.categorical_features
as the initial list of categorical features.
Parameters:
Name Type Description DefaultX
ndarray
The input data.
requiredReturns:
Type DescriptionList[int]
Tuple[int, ...]: The indices of the categorical features.
"},{"location":"reference/tabpfn_extensions/benchmarking/experiment/","title":"Experiment","text":""},{"location":"reference/tabpfn_extensions/benchmarking/experiment/#tabpfn_extensions.benchmarking.experiment","title":"experiment","text":""},{"location":"reference/tabpfn_extensions/benchmarking/experiment/#tabpfn_extensions.benchmarking.experiment.Experiment","title":"Experiment","text":"Base class for experiments. Experiments should be reproducible, i.e. the settings should give all the information needed to run the experiment. Experiments should be deterministic, i.e. the same settings should always give the same results.
"},{"location":"reference/tabpfn_extensions/benchmarking/experiment/#tabpfn_extensions.benchmarking.experiment.Experiment.run","title":"run","text":"run(tabpfn, **kwargs)\n
Runs the experiment.
Should set self.results
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/","title":"Classifier as regressor","text":""},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor","title":"classifier_as_regressor","text":""},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor","title":"ClassifierAsRegressor","text":" Bases: RegressorMixin
Wrapper class to use a classifier as a regressor.
This class takes a classifier estimator and converts it into a regressor by encoding the target labels and treating the regression problem as a classification task.
Parameters:
Name Type Description Defaultestimator
object Classifier estimator to be used as a regressor.
requiredAttributes:
Name Type Descriptionlabel_encoder_
LabelEncoder Label encoder used to transform target regression labels to classes.
y_train_
array-like of shape (n_samples,) Transformed target labels used for training.
categorical_features
list List of categorical feature indices.
Example>>> from sklearn.datasets import load_diabetes\n>>> from sklearn.model_selection import train_test_split\n>>> from tabpfn_extensions import ManyClassClassifier, TabPFNClassifier, ClassifierAsRegressor\n>>> x, y = load_diabetes(return_X_y=True)\n>>> x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=42)\n>>> clf = TabPFNClassifier()\n>>> clf = ManyClassClassifier(clf, n_estimators=10, alphabet_size=clf.max_num_classes_)\n>>> reg = ClassifierAsRegressor(clf)\n>>> reg.fit(x_train, y_train)\n>>> y_pred = reg.predict(x_test)\n
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.fit","title":"fit","text":"fit(X, y)\n
Fit the classifier as a regressor.
Parameters:
Name Type Description DefaultX
array-like of shape (n_samples, n_features) Training data.
requiredy
array-like of shape (n_samples,) Target labels.
requiredReturns:
Name Type Descriptionself
object Fitted estimator.
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.get_optimization_mode","title":"get_optimization_mode","text":"get_optimization_mode()\n
Get the optimization mode for the regressor.
Returns:
Type Descriptionstr Optimization mode (\"mean\").
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.predict","title":"predict","text":"predict(X)\n
Predict the target values for the input data.
Parameters:
Name Type Description DefaultX
array-like of shape (n_samples, n_features) Input data.
requiredReturns:
Name Type Descriptiony_pred
array-like of shape (n_samples,) Predicted target values.
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.predict_full","title":"predict_full","text":"predict_full(X)\n
Predict the full set of output values for the input data.
Parameters:
Name Type Description DefaultX
array-like of shape (n_samples, n_features) Input data.
requiredReturns:
Type Descriptiondict Dictionary containing the predicted output values, including: - \"mean\": Predicted mean values. - \"median\": Predicted median values. - \"mode\": Predicted mode values. - \"logits\": Predicted logits. - \"buckets\": Predicted bucket probabilities. - \"quantile_{q:.2f}\": Predicted quantile values for each quantile q.
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.probabilities_to_logits_multiclass","title":"probabilities_to_logits_multiclassstaticmethod
","text":"probabilities_to_logits_multiclass(\n probabilities, eps=1e-06\n)\n
Convert probabilities to logits for a multi-class problem.
Parameters:
Name Type Description Defaultprobabilities
array-like of shape (n_samples, n_classes) Input probabilities for each class.
requiredeps
float, default=1e-6 Small value to avoid division by zero or taking logarithm of zero.
1e-06
Returns:
Name Type Descriptionlogits
array-like of shape (n_samples, n_classes) Output logits for each class.
"},{"location":"reference/tabpfn_extensions/classifier_as_regressor/classifier_as_regressor/#tabpfn_extensions.classifier_as_regressor.classifier_as_regressor.ClassifierAsRegressor.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Set the categorical feature indices.
Parameters:
Name Type Description Defaultcategorical_features
list List of categorical feature indices.
required"},{"location":"reference/tabpfn_extensions/hpo/search_space/","title":"Search space","text":""},{"location":"reference/tabpfn_extensions/hpo/search_space/#tabpfn_extensions.hpo.search_space","title":"search_space","text":""},{"location":"reference/tabpfn_extensions/hpo/tuned_tabpfn/","title":"Tuned tabpfn","text":""},{"location":"reference/tabpfn_extensions/hpo/tuned_tabpfn/#tabpfn_extensions.hpo.tuned_tabpfn","title":"tuned_tabpfn","text":""},{"location":"reference/tabpfn_extensions/hpo/tuned_tabpfn/#tabpfn_extensions.hpo.tuned_tabpfn.TunedTabPFNBase","title":"TunedTabPFNBase","text":" Bases: BaseEstimator
Base class for tuned TabPFN models with proper categorical handling.
"},{"location":"reference/tabpfn_extensions/hpo/tuned_tabpfn/#tabpfn_extensions.hpo.tuned_tabpfn.TunedTabPFNClassifier","title":"TunedTabPFNClassifier","text":" Bases: TunedTabPFNBase
, ClassifierMixin
TabPFN Classifier with hyperparameter tuning and proper categorical handling.
"},{"location":"reference/tabpfn_extensions/hpo/tuned_tabpfn/#tabpfn_extensions.hpo.tuned_tabpfn.TunedTabPFNRegressor","title":"TunedTabPFNRegressor","text":" Bases: TunedTabPFNBase
, RegressorMixin
TabPFN Regressor with hyperparameter tuning and proper categorical handling.
"},{"location":"reference/tabpfn_extensions/interpretability/experiments/","title":"Experiments","text":""},{"location":"reference/tabpfn_extensions/interpretability/experiments/#tabpfn_extensions.interpretability.experiments","title":"experiments","text":""},{"location":"reference/tabpfn_extensions/interpretability/experiments/#tabpfn_extensions.interpretability.experiments.FeatureSelectionExperiment","title":"FeatureSelectionExperiment","text":" Bases: Experiment
This class is used to run experiments on generating synthetic data.
"},{"location":"reference/tabpfn_extensions/interpretability/experiments/#tabpfn_extensions.interpretability.experiments.FeatureSelectionExperiment.run","title":"run","text":"run(tabpfn, **kwargs)\n
:param tabpfn: :param kwargs: indices: list of indices from X features to use :return:
"},{"location":"reference/tabpfn_extensions/interpretability/experiments/#tabpfn_extensions.interpretability.experiments.FeatureSelectionInPredictExperiment","title":"FeatureSelectionInPredictExperiment","text":" Bases: Experiment
This class is used to run experiments on generating synthetic data.
"},{"location":"reference/tabpfn_extensions/interpretability/experiments/#tabpfn_extensions.interpretability.experiments.FeatureSelectionInPredictExperiment.run","title":"run","text":"run(tabpfn, **kwargs)\n
:param tabpfn: :param kwargs: indices: list of indices from X features to use :return:
"},{"location":"reference/tabpfn_extensions/interpretability/feature_selection/","title":"Feature selection","text":""},{"location":"reference/tabpfn_extensions/interpretability/feature_selection/#tabpfn_extensions.interpretability.feature_selection","title":"feature_selection","text":""},{"location":"reference/tabpfn_extensions/interpretability/shap/","title":"Shap","text":""},{"location":"reference/tabpfn_extensions/interpretability/shap/#tabpfn_extensions.interpretability.shap","title":"shap","text":""},{"location":"reference/tabpfn_extensions/interpretability/shap/#tabpfn_extensions.interpretability.shap.get_shap_values","title":"get_shap_values","text":"get_shap_values(\n estimator, test_x, attribute_names=None, **kwargs\n) -> ndarray\n
Computes SHAP (SHapley Additive exPlanations) values for the model's predictions on the given input features.
Parameters:
Name Type Description Defaulttest_x
Union[DataFrame, ndarray]
The input features to compute SHAP values for.
requiredkwargs
dict
Additional keyword arguments to pass to the SHAP explainer.
{}
Returns:
Type Descriptionndarray
np.ndarray: The computed SHAP values.
"},{"location":"reference/tabpfn_extensions/interpretability/shap/#tabpfn_extensions.interpretability.shap.plot_shap","title":"plot_shap","text":"plot_shap(shap_values: ndarray)\n
Plots the shap values for the given test data. It will plot aggregated shap values for each feature, as well as per sample shap values. Additionally, if multiple samples are provided, it will plot the 3 most important interactions with the most important feature.
Parameters:
Name Type Description Defaultshap_values
ndarray
required"},{"location":"reference/tabpfn_extensions/many_class/many_class_classifier/","title":"Many class classifier","text":""},{"location":"reference/tabpfn_extensions/many_class/many_class_classifier/#tabpfn_extensions.many_class.many_class_classifier","title":"many_class_classifier","text":""},{"location":"reference/tabpfn_extensions/many_class/many_class_classifier/#tabpfn_extensions.many_class.many_class_classifier.ManyClassClassifier","title":"ManyClassClassifier","text":" Bases: OutputCodeClassifier
Output-Code multiclass strategy with deciary codebook.
This class extends the original OutputCodeClassifier to support n-ary codebooks (with n=alphabet_size), allowing for handling more classes.
Parameters:
Name Type Description Defaultestimator
estimator object An estimator object implementing :term:fit
and one of :term:decision_function
or :term:predict_proba
. The base classifier should be able to handle up to alphabet_size
classes.
random_state
int, RandomState instance, default=None The generator used to initialize the codebook. Pass an int for reproducible output across multiple function calls. See :term:Glossary <random_state>
.
None
Attributes:
Name Type Descriptionestimators_
list of int(n_classes * code_size)
estimators Estimators used for predictions.
classes_
ndarray of shape (n_classes,) Array containing labels.
code_book_
ndarray of shape (n_classes, len(estimators_)
) Deciary array containing the code of each class.
>>> from sklearn.datasets import load_iris\n>>> from tabpfn.scripts.estimator import ManyClassClassifier, TabPFNClassifier\n>>> from sklearn.model_selection import train_test_split\n>>> x, y = load_iris(return_X_y=True)\n>>> x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=42)\n>>> clf = TabPFNClassifier()\n>>> clf = ManyClassClassifier(clf, alphabet_size=clf.max_num_classes_)\n>>> clf.fit(x_train, y_train)\n>>> clf.predict(x_test)\n
"},{"location":"reference/tabpfn_extensions/many_class/many_class_classifier/#tabpfn_extensions.many_class.many_class_classifier.ManyClassClassifier.fit","title":"fit","text":"fit(X, y, **fit_params)\n
Fit underlying estimators.
Parameters:
Name Type Description DefaultX
{array-like, sparse matrix} of shape (n_samples, n_features) Data.
requiredy
array-like of shape (n_samples,) Multi-class targets.
required**fit_params
dict Parameters passed to the estimator.fit
method of each sub-estimator.
{}
Returns:
Name Type Descriptionself
object Returns a fitted instance of self.
"},{"location":"reference/tabpfn_extensions/many_class/many_class_classifier/#tabpfn_extensions.many_class.many_class_classifier.ManyClassClassifier.predict_proba","title":"predict_proba","text":"predict_proba(X)\n
Predict probabilities using the underlying estimators.
Parameters:
Name Type Description DefaultX
{array-like, sparse matrix} of shape (n_samples, n_features) Data.
requiredReturns:
Name Type Descriptionp
ndarray of shape (n_samples, n_classes) Returns the probability of the samples for each class in the model, where classes are ordered as they are in self.classes_
.
Bases: ABC
, BaseEstimator
get_oof_per_estimator(\n X: ndarray,\n y: ndarray,\n *,\n return_loss_per_estimator: bool = False,\n impute_dropped_instances: bool = True,\n _extra_processing: bool = False\n) -> list[ndarray] | tuple[list[ndarray], list[float]]\n
Get OOF predictions for each base model.
Parameters:
Name Type Description DefaultX
ndarray
training data (features)
requiredy
ndarray
training labels
requiredreturn_loss_per_estimator
bool
if True, also return the loss per estimator.
False
impute_dropped_instances
bool
if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).
True
_extra_processing
bool
False
either only OOF predictions or OOF predictions and loss per estimator.
Type Descriptionlist[ndarray] | tuple[list[ndarray], list[float]]
If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/abstract_validation_utils/#tabpfn_extensions.post_hoc_ensembles.abstract_validation_utils.AbstractValidationUtils.not_enough_time","title":"not_enough_time","text":"not_enough_time(current_repeat: int) -> bool\n
Simple heuristic to stop cross-validation early if not enough time is left for another repeat.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/","title":"Greedy weighted ensemble","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble","title":"greedy_weighted_ensemble","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsemble","title":"GreedyWeightedEnsemble","text":" Bases: AbstractValidationUtils
get_oof_per_estimator(\n X: ndarray,\n y: ndarray,\n *,\n return_loss_per_estimator: bool = False,\n impute_dropped_instances: bool = True,\n _extra_processing: bool = False\n) -> list[ndarray] | tuple[list[ndarray], list[float]]\n
Get OOF predictions for each base model.
Parameters:
Name Type Description DefaultX
ndarray
training data (features)
requiredy
ndarray
training labels
requiredreturn_loss_per_estimator
bool
if True, also return the loss per estimator.
False
impute_dropped_instances
bool
if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).
True
_extra_processing
bool
False
either only OOF predictions or OOF predictions and loss per estimator.
Type Descriptionlist[ndarray] | tuple[list[ndarray], list[float]]
If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsemble.not_enough_time","title":"not_enough_time","text":"not_enough_time(current_repeat: int) -> bool\n
Simple heuristic to stop cross-validation early if not enough time is left for another repeat.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsembleClassifier","title":"GreedyWeightedEnsembleClassifier","text":" Bases: GreedyWeightedEnsemble
, AbstractValidationUtilsClassification
get_oof_per_estimator(\n X: ndarray,\n y: ndarray,\n *,\n return_loss_per_estimator: bool = False,\n impute_dropped_instances: bool = True,\n _extra_processing: bool = False\n) -> list[ndarray] | tuple[list[ndarray], list[float]]\n
Get OOF predictions for each base model.
Parameters:
Name Type Description DefaultX
ndarray
training data (features)
requiredy
ndarray
training labels
requiredreturn_loss_per_estimator
bool
if True, also return the loss per estimator.
False
impute_dropped_instances
bool
if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).
True
_extra_processing
bool
False
either only OOF predictions or OOF predictions and loss per estimator.
Type Descriptionlist[ndarray] | tuple[list[ndarray], list[float]]
If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsembleClassifier.not_enough_time","title":"not_enough_time","text":"not_enough_time(current_repeat: int) -> bool\n
Simple heuristic to stop cross-validation early if not enough time is left for another repeat.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsembleRegressor","title":"GreedyWeightedEnsembleRegressor","text":" Bases: GreedyWeightedEnsemble
, AbstractValidationUtilsRegression
get_oof_per_estimator(\n X: ndarray,\n y: ndarray,\n *,\n return_loss_per_estimator: bool = False,\n impute_dropped_instances: bool = True,\n _extra_processing: bool = False\n) -> list[ndarray] | tuple[list[ndarray], list[float]]\n
Get OOF predictions for each base model.
Parameters:
Name Type Description DefaultX
ndarray
training data (features)
requiredy
ndarray
training labels
requiredreturn_loss_per_estimator
bool
if True, also return the loss per estimator.
False
impute_dropped_instances
bool
if True, impute instances that were dropped during the splits (e.g., due to not enough instances per class).
True
_extra_processing
bool
False
either only OOF predictions or OOF predictions and loss per estimator.
Type Descriptionlist[ndarray] | tuple[list[ndarray], list[float]]
If self.is_holdout is True, the OOF predictions can return NaN values for instances not covered during repeated holdout.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.GreedyWeightedEnsembleRegressor.not_enough_time","title":"not_enough_time","text":"not_enough_time(current_repeat: int) -> bool\n
Simple heuristic to stop cross-validation early if not enough time is left for another repeat.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/greedy_weighted_ensemble/#tabpfn_extensions.post_hoc_ensembles.greedy_weighted_ensemble.caruana_weighted","title":"caruana_weighted","text":"caruana_weighted(\n predictions: list[ndarray],\n labels: ndarray,\n seed,\n n_iterations,\n loss_function,\n)\n
Caruana's ensemble selection with replacement.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/","title":"Pfn phe","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/#tabpfn_extensions.post_hoc_ensembles.pfn_phe","title":"pfn_phe","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/#tabpfn_extensions.post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor","title":"AutoPostHocEnsemblePredictor","text":" Bases: BaseEstimator
A wrapper to effectively performing post hoc ensemble with TabPFN models.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/#tabpfn_extensions.post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor.fit","title":"fit","text":"fit(\n X: ndarray,\n y: ndarray,\n categorical_feature_indices: list[int] | None = None,\n) -> AutoPostHocEnsemblePredictor\n
Fits the post hoc ensemble on the given data.
Parameters:
Name Type Description DefaultX
ndarray
The input data to fit the ensemble on.
requiredy
ndarray
The target values to fit the ensemble on.
requiredcategorical_feature_indices
list[int] | None
The indices of the categorical features in the data. If None, no categorical features are assumed to be present.
None
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/#tabpfn_extensions.post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor.predict","title":"predict","text":"predict(X: ndarray) -> ndarray\n
Predicts the target values for the given data.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/pfn_phe/#tabpfn_extensions.post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor.predict_proba","title":"predict_proba","text":"predict_proba(X: ndarray) -> ndarray\n
Predicts the target values for the given data.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/save_splitting/","title":"Save splitting","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/save_splitting/#tabpfn_extensions.post_hoc_ensembles.save_splitting","title":"save_splitting","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/save_splitting/#tabpfn_extensions.post_hoc_ensembles.save_splitting.assert_valid_splits","title":"assert_valid_splits","text":"assert_valid_splits(\n splits: list[list[list[int], list[int]]],\n y: ndarray,\n *,\n non_empty: bool = True,\n each_selected_class_in_each_split_subset: bool = True,\n same_length_training_splits: bool = True\n)\n
Verify that the splits are valid.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/save_splitting/#tabpfn_extensions.post_hoc_ensembles.save_splitting.fix_split_by_dropping_classes","title":"fix_split_by_dropping_classes","text":"fix_split_by_dropping_classes(\n x: ndarray,\n y: ndarray,\n n_splits: int,\n spliter_kwargs: dict,\n) -> list[list[list[int], list[int]]]\n
Fixes stratifed splits for edge case.
For each class that has fewer instances than number of splits, we oversample before split to n_splits and then remove all oversamples and original samples from the splits; effectively removing the class from the data without touching the indices.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/save_splitting/#tabpfn_extensions.post_hoc_ensembles.save_splitting.get_cv_split_for_data","title":"get_cv_split_for_data","text":"get_cv_split_for_data(\n x: ndarray,\n y: ndarray,\n splits_seed: int,\n n_splits: int,\n *,\n stratified_split: bool,\n safety_shuffle: bool = True,\n auto_fix_stratified_splits: bool = False,\n force_same_length_training_splits: bool = False\n) -> list[list[list[int], list[int]]] | str\n
Safety shuffle and generate (safe) splits.
If it returns str at the first entry, no valid split could be generated and the str is the reason why. Due to the safety shuffle, the original x and y are also returned and must be used.
Note: the function does not support repeated splits at this point. Simply call this function multiple times with different seeds to get repeated splits.
Test with:
if __name__ == \"__main__\":\n print(\n get_cv_split_for_data(\n x=np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]]).T,\n y=np.array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4]),\n splits_seed=42,\n n_splits=3,\n stratified_split=True,\n auto_fix_stratified_splits=True,\n )\n )\n
Parameters:
Name Type Description Defaultx
ndarray
The data to split.
requiredy
ndarray
The labels to split.
requiredsplits_seed
int
The seed to use for the splits. Or a RandomState object.
requiredn_splits
int
The number of splits to generate.
requiredstratified_split
bool
Whether to use stratified splits.
requiredsafety_shuffle
bool
Whether to shuffle the data before splitting.
True
auto_fix_stratified_splits
bool
Whether to try to fix stratified splits automatically. Fix by dropping classes with less than n_splits samples.
False
force_same_length_training_splits
bool
Whether to force the training splits to have the same amount of samples. Force by duplicating random instance in the training subset of a too small split until all training splits have the same amount of samples.
False
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/","title":"Sklearn interface","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/#tabpfn_extensions.post_hoc_ensembles.sklearn_interface","title":"sklearn_interface","text":""},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/#tabpfn_extensions.post_hoc_ensembles.sklearn_interface.AutoTabPFNClassifier","title":"AutoTabPFNClassifier","text":" Bases: ClassifierMixin
, BaseEstimator
Automatic Post Hoc Ensemble Classifier for TabPFN models.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/#tabpfn_extensions.post_hoc_ensembles.sklearn_interface.AutoTabPFNClassifier--parameters","title":"Parameters","text":"max_time : int | None, default=None\n The maximum time to spend on fitting the post hoc ensemble.\npreset: {\"default\", \"custom_hps\", \"avoid_overfitting\"}, default=\"default\"\n The preset to use for the post hoc ensemble.\nges_scoring_string : str, default=\"roc\"\n The scoring string to use for the greedy ensemble search.\n Allowed values are: {\"accuracy\", \"roc\" / \"auroc\", \"f1\", \"log_loss\"}.\ndevice : {\"cpu\", \"cuda\"}, default=\"cuda\"\n The device to use for training and prediction.\nrandom_state : int, RandomState instance or None, default=None\n Controls both the randomness base models and the post hoc ensembling method.\ncategorical_feature_indices: list[int] or None, default=None\n The indices of the categorical features in the input data. Can also be passed to `fit()`.\nphe_init_args : dict | None, default=None\n The initialization arguments for the post hoc ensemble predictor.\n See post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor for more options and all details.\n
predictor_ : AutoPostHocEnsemblePredictor\n The predictor interface used to make predictions, see post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor for more.\nphe_init_args_ : dict\n The optional initialization arguments used for the post hoc ensemble predictor.\n
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/#tabpfn_extensions.post_hoc_ensembles.sklearn_interface.AutoTabPFNRegressor","title":"AutoTabPFNRegressor","text":" Bases: RegressorMixin
, BaseEstimator
Automatic Post Hoc Ensemble Regressor for TabPFN models.
"},{"location":"reference/tabpfn_extensions/post_hoc_ensembles/sklearn_interface/#tabpfn_extensions.post_hoc_ensembles.sklearn_interface.AutoTabPFNRegressor--parameters","title":"Parameters","text":"max_time : int | None, default=None\n The maximum time to spend on fitting the post hoc ensemble.\npreset: {\"default\", \"custom_hps\", \"avoid_overfitting\"}, default=\"default\"\n The preset to use for the post hoc ensemble.\nges_scoring_string : str, default=\"mse\"\n The scoring string to use for the greedy ensemble search.\n Allowed values are: {\"rmse\", \"mse\", \"mae\"}.\ndevice : {\"cpu\", \"cuda\"}, default=\"cuda\"\n The device to use for training and prediction.\nrandom_state : int, RandomState instance or None, default=None\n Controls both the randomness base models and the post hoc ensembling method.\ncategorical_feature_indices: list[int] or None, default=None\n The indices of the categorical features in the input data. Can also be passed to `fit()`.\nphe_init_args : dict | None, default=None\n The initialization arguments for the post hoc ensemble predictor.\n See post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor for more options and all details.\n
predictor_ : AutoPostHocEnsemblePredictor\n The predictor interface used to make predictions, see post_hoc_ensembles.pfn_phe.AutoPostHocEnsemblePredictor for more.\nphe_init_args_ : dict\n The optional initialization arguments used for the post hoc ensemble predictor.\n
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/","title":"SklearnBasedDecisionTreeTabPFN","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN","title":"SklearnBasedDecisionTreeTabPFN","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNBase","title":"DecisionTreeTabPFNBase","text":" Bases: BaseDecisionTree
Class that implements a DT-TabPFN model based on sklearn package
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNBase.apply_tree","title":"apply_tree","text":"apply_tree(X)\n
Apply tree for different kinds of tree types. TODO: This function could also be overwritten in each type of tree
(N_samples, N_nodes, N_estimators) :param bootstrap_X: :return:
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNBase.predict_","title":"predict_","text":"predict_(X, y=None, check_input=True)\n
Predicts X :param X: Data that should be evaluated :param y: True labels of holdout data used for adaptive tree. - If not None: Prunes nodes based on the performance of the holdout data y - If None: Predicts the data based on the previous hold out performances :param check_input: :return: Probabilities of each class
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNBase.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier","title":"DecisionTreeTabPFNClassifier","text":" Bases: ClassifierMixin
, DecisionTreeTabPFNBase
Class that implements a DT-TabPFN model based on sklearn package
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier.apply_tree","title":"apply_tree","text":"apply_tree(X)\n
Apply tree for different kinds of tree types. TODO: This function could also be overwritten in each type of tree
(N_samples, N_nodes, N_estimators) :param bootstrap_X: :return:
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier.predict","title":"predict","text":"predict(X, check_input=True)\n
Predicts X_test :param X: Data that should be evaluated :param check_input: :return: Labels of the predictions
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier.predict_","title":"predict_","text":"predict_(X, y=None, check_input=True)\n
Predicts X :param X: Data that should be evaluated :param y: True labels of holdout data used for adaptive tree. - If not None: Prunes nodes based on the performance of the holdout data y - If None: Predicts the data based on the previous hold out performances :param check_input: :return: Probabilities of each class
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier.predict_proba","title":"predict_proba","text":"predict_proba(X, check_input=True)\n
Predicts X_test :param X: Data that should be evaluated :param check_input: :return: Probabilities of each class
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNClassifier.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor","title":"DecisionTreeTabPFNRegressor","text":" Bases: RegressorMixin
, DecisionTreeTabPFNBase
Class that implements a DT-TabPFN model based on sklearn package
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor.apply_tree","title":"apply_tree","text":"apply_tree(X)\n
Apply tree for different kinds of tree types. TODO: This function could also be overwritten in each type of tree
(N_samples, N_nodes, N_estimators) :param bootstrap_X: :return:
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor.predict","title":"predict","text":"predict(X, check_input=True)\n
Predicts X_test :param X: Data that should be evaluated :param check_input: :return: Labels of the predictions
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor.predict_","title":"predict_","text":"predict_(X, y=None, check_input=True)\n
Predicts X :param X: Data that should be evaluated :param y: True labels of holdout data used for adaptive tree. - If not None: Prunes nodes based on the performance of the holdout data y - If None: Predicts the data based on the previous hold out performances :param check_input: :return: Probabilities of each class
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor.predict_full","title":"predict_full","text":"predict_full(X)\n
Predicts X :param X: Data that should be evaluated :param check_input: :return: Labels of the predictions
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedDecisionTreeTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedDecisionTreeTabPFN.DecisionTreeTabPFNRegressor.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/","title":"SklearnBasedRandomForestTabPFN","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN","title":"SklearnBasedRandomForestTabPFN","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNBase","title":"RandomForestTabPFNBase","text":"Base Class for common functionalities.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNBase.fit","title":"fit","text":"fit(X, y, sample_weight=None)\n
Fits RandomForestTabPFN :param X: Feature training data :param y: Label training data :param sample_weight: Weights of each sample :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNBase.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier","title":"RandomForestTabPFNClassifier","text":" Bases: RandomForestTabPFNBase
, RandomForestClassifier
RandomForestTabPFNClassifier.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier.fit","title":"fit","text":"fit(X, y, sample_weight=None)\n
Fits RandomForestTabPFN :param X: Feature training data :param y: Label training data :param sample_weight: Weights of each sample :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier.predict","title":"predict","text":"predict(X)\n
Predict class for X.
The predicted class of an input sample is a vote by the trees in the forest, weighted by their probability estimates. That is, the predicted class is the one with highest mean probability estimate across the trees.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier.predict--parameters","title":"Parameters","text":"X : {array-like, sparse matrix} of shape (n_samples, n_features) The input samples. Internally, its dtype will be converted to dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparse csr_matrix
.
y : ndarray of shape (n_samples,) or (n_samples, n_outputs) The predicted classes.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier.predict_proba","title":"predict_proba","text":"predict_proba(X)\n
Predict class probabilities for X.
The predicted class probabilities of an input sample are computed as the mean predicted class probabilities of the trees in the forest. The class probability of a single tree is the fraction of samples of the same class in a leaf.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNClassifier.predict_proba--parameters","title":"Parameters","text":"X : {array-like, sparse matrix} of shape (n_samples, n_features) The input samples. Internally, its dtype will be converted to dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparse csr_matrix
.
p : ndarray of shape (n_samples, n_classes), or a list of such arrays The class probabilities of the input samples. The order of the classes corresponds to that in the attribute :term:classes_
.
set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNRegressor","title":"RandomForestTabPFNRegressor","text":" Bases: RandomForestTabPFNBase
, RandomForestRegressor
RandomForestTabPFNClassifier.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNRegressor.fit","title":"fit","text":"fit(X, y, sample_weight=None)\n
Fits RandomForestTabPFN :param X: Feature training data :param y: Label training data :param sample_weight: Weights of each sample :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNRegressor.predict","title":"predict","text":"predict(X)\n
Predict regression target for X.
The predicted regression target of an input sample is computed as the mean predicted regression targets of the trees in the forest.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNRegressor.predict--parameters","title":"Parameters","text":"X : {array-like, sparse matrix} of shape (n_samples, n_features) The input samples. Internally, its dtype will be converted to dtype=np.float32
. If a sparse matrix is provided, it will be converted into a sparse csr_matrix
.
y : ndarray of shape (n_samples,) or (n_samples, n_outputs) The predicted values.
"},{"location":"reference/tabpfn_extensions/rf_pfn/SklearnBasedRandomForestTabPFN/#tabpfn_extensions.rf_pfn.SklearnBasedRandomForestTabPFN.RandomForestTabPFNRegressor.set_categorical_features","title":"set_categorical_features","text":"set_categorical_features(categorical_features)\n
Sets categorical features :param categorical_features: Categorical features :return: None.
"},{"location":"reference/tabpfn_extensions/rf_pfn/configs/","title":"Configs","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/configs/#tabpfn_extensions.rf_pfn.configs","title":"configs","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/utils/","title":"Utils","text":""},{"location":"reference/tabpfn_extensions/rf_pfn/utils/#tabpfn_extensions.rf_pfn.utils","title":"utils","text":"Copyright 2023
Author: Lukas Schweizer schweizer.lukas@web.de
"},{"location":"reference/tabpfn_extensions/rf_pfn/utils/#tabpfn_extensions.rf_pfn.utils.preprocess_data","title":"preprocess_data","text":"preprocess_data(\n data,\n nan_values=True,\n one_hot_encoding=False,\n normalization=True,\n categorical_indices=None,\n)\n
This method preprocesses data regarding missing values, categorical features and data normalization (for the kNN Model) :param data: Data to preprocess :param nan_values: Preprocesses nan values if True :param one_hot_encoding: Whether use OHE for categoricals :param normalization: Normalizes data if True :param categorical_indices: Categorical columns of data :return: Preprocessed version of the data
"},{"location":"reference/tabpfn_extensions/scoring/scoring_utils/","title":"Scoring utils","text":""},{"location":"reference/tabpfn_extensions/scoring/scoring_utils/#tabpfn_extensions.scoring.scoring_utils","title":"scoring_utils","text":""},{"location":"reference/tabpfn_extensions/scoring/scoring_utils/#tabpfn_extensions.scoring.scoring_utils.safe_roc_auc_score","title":"safe_roc_auc_score","text":"safe_roc_auc_score(y_true, y_score, **kwargs)\n
Compute the Area Under the Receiver Operating Characteristic Curve (ROC AUC) score.
This function is a safe wrapper around sklearn.metrics.roc_auc_score
that handles cases where the input data may have missing classes or binary classification problems.
Parameters:
Name Type Description Defaulty_true
array-like of shape (n_samples,) True binary labels or binary label indicators.
requiredy_score
array-like of shape (n_samples,) or (n_samples, n_classes) Target scores, can either be probability estimates of the positive class, confidence values, or non-thresholded measure of decisions.
required**kwargs
dict Additional keyword arguments to pass to sklearn.metrics.roc_auc_score
.
{}
Returns:
Name Type Descriptionfloat
The ROC AUC score.
Raises:
Type DescriptionValueError
If there are missing classes in y_true
that cannot be handled.
score_classification(\n optimize_metric: Literal[\n \"roc\", \"auroc\", \"accuracy\", \"f1\", \"log_loss\"\n ],\n y_true,\n y_pred,\n sample_weight=None,\n *,\n y_pred_is_labels: bool = False\n)\n
General function to score classification predictions.
Parameters:
Name Type Description Defaultoptimize_metric
{\"roc\", \"auroc\", \"accuracy\", \"f1\", \"log_loss\"} The metric to use for scoring the predictions.
requiredy_true
array-like of shape (n_samples,) True labels or binary label indicators.
requiredy_pred
array-like of shape (n_samples,) or (n_samples, n_classes) Predicted labels, probabilities, or confidence values.
requiredsample_weight
array-like of shape (n_samples,), default=None Sample weights.
None
Returns:
Name Type Descriptionfloat
The score for the specified metric.
Raises:
Type DescriptionValueError
If an unknown metric is specified.
"},{"location":"reference/tabpfn_extensions/scoring/scoring_utils/#tabpfn_extensions.scoring.scoring_utils.score_regression","title":"score_regression","text":"score_regression(\n optimize_metric: Literal[\"rmse\", \"mse\", \"mae\"],\n y_true,\n y_pred,\n sample_weight=None,\n)\n
General function to score regression predictions.
Parameters:
Name Type Description Defaultoptimize_metric
{\"rmse\", \"mse\", \"mae\"} The metric to use for scoring the predictions.
requiredy_true
array-like of shape (n_samples,) True target values.
requiredy_pred
array-like of shape (n_samples,) Predicted target values.
requiredsample_weight
array-like of shape (n_samples,), default=None Sample weights.
None
Returns:
Name Type Descriptionfloat
The score for the specified metric.
Raises:
Type DescriptionValueError
If an unknown metric is specified.
"},{"location":"reference/tabpfn_extensions/scoring/scoring_utils/#tabpfn_extensions.scoring.scoring_utils.score_survival","title":"score_survival","text":"score_survival(\n optimize_metric: Literal[\"cindex\"],\n y_true,\n y_pred,\n event_observed,\n sample_weight=None,\n)\n
General function to score regression predictions.
Parameters:
Name Type Description Defaultoptimize_metric
{\"rmse\", \"mse\", \"mae\"} The metric to use for scoring the predictions.
requiredy_true
array-like of shape (n_samples,) True target values.
requiredy_pred
array-like of shape (n_samples,) Predicted target values.
requiredsample_weight
array-like of shape (n_samples,), default=None Sample weights.
None
Returns:
Name Type Descriptionfloat
The score for the specified metric.
Raises:
Type DescriptionValueError
If an unknown metric is specified.
"},{"location":"reference/tabpfn_extensions/sklearn_ensembles/configs/","title":"Configs","text":""},{"location":"reference/tabpfn_extensions/sklearn_ensembles/configs/#tabpfn_extensions.sklearn_ensembles.configs","title":"configs","text":""},{"location":"reference/tabpfn_extensions/sklearn_ensembles/meta_models/","title":"Meta models","text":""},{"location":"reference/tabpfn_extensions/sklearn_ensembles/meta_models/#tabpfn_extensions.sklearn_ensembles.meta_models","title":"meta_models","text":""},{"location":"reference/tabpfn_extensions/sklearn_ensembles/meta_models/#tabpfn_extensions.sklearn_ensembles.meta_models.get_tabpfn_outer_ensemble","title":"get_tabpfn_outer_ensemble","text":"get_tabpfn_outer_ensemble(config: TabPFNConfig, **kwargs)\n
This will create a model very similar to our standard TabPFN estimators, but it uses multiple model weights to generate predictions. Thus the configs.TabPFNModelPathsConfig
can contain multiple paths which are all used.
A product of the preprocessor_trasnforms and paths is created to yield interesting ensemble members.
This only supports multiclass for now. If you want to add regression, you probably want to add the y_transforms to the relevant_config_product. :param config: TabPFNConfig :param kwargs: kwargs are passed to get_single_tabpfn, e.g. device :return: A TabPFNEnsemble, which is a soft voting classifier that mixes multiple standard TabPFN estimators.
"},{"location":"reference/tabpfn_extensions/sklearn_ensembles/weighted_ensemble/","title":"Weighted ensemble","text":""},{"location":"reference/tabpfn_extensions/sklearn_ensembles/weighted_ensemble/#tabpfn_extensions.sklearn_ensembles.weighted_ensemble","title":"weighted_ensemble","text":""},{"location":"reference/tabpfn_extensions/unsupervised/experiments/","title":"Experiments","text":""},{"location":"reference/tabpfn_extensions/unsupervised/experiments/#tabpfn_extensions.unsupervised.experiments","title":"experiments","text":""},{"location":"reference/tabpfn_extensions/unsupervised/experiments/#tabpfn_extensions.unsupervised.experiments.EmbeddingUnsupervisedExperiment","title":"EmbeddingUnsupervisedExperiment","text":" Bases: Experiment
This class is used to run experiments on synthetic toy functions.
"},{"location":"reference/tabpfn_extensions/unsupervised/experiments/#tabpfn_extensions.unsupervised.experiments.GenerateSyntheticDataExperiment","title":"GenerateSyntheticDataExperiment","text":" Bases: Experiment
This class is used to run experiments on generating synthetic data.
"},{"location":"reference/tabpfn_extensions/unsupervised/experiments/#tabpfn_extensions.unsupervised.experiments.GenerateSyntheticDataExperiment.run","title":"run","text":"run(tabpfn, **kwargs)\n
:param tabpfn: :param kwargs: indices: list of indices from X features to use :return:
"},{"location":"reference/tabpfn_extensions/unsupervised/experiments/#tabpfn_extensions.unsupervised.experiments.OutlierDetectionUnsupervisedExperiment","title":"OutlierDetectionUnsupervisedExperiment","text":" Bases: Experiment
This class is used to run experiments for outlier detection.
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/","title":"Unsupervised","text":""},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised","title":"unsupervised","text":""},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel","title":"TabPFNUnsupervisedModel","text":" Bases: BaseEstimator
TabPFN experiments model for imputation, outlier detection, and synthetic data generation.
This model combines a TabPFNClassifier for categorical features and a TabPFNRegressor for numerical features to perform various experiments learning tasks on tabular data.
Parameters:
Name Type Description Defaulttabpfn_clf
TabPFNClassifier, optional TabPFNClassifier instance for handling categorical features. If not provided, the model assumes that there are no categorical features in the data.
None
tabpfn_reg
TabPFNRegressor, optional TabPFNRegressor instance for handling numerical features. If not provided, the model assumes that there are no numerical features in the data.
None
Attributes:
Name Type Descriptioncategorical_features
list List of indices of categorical features in the input data.
Example>>> tabpfn_clf = TabPFNClassifier()\n>>> tabpfn_reg = TabPFNRegressor()\n>>> model = TabPFNUnsupervisedModel(tabpfn_clf, tabpfn_reg)\n>>>\n>>> X = [[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]\n>>> model.fit(X)\n>>>\n>>> X_imputed = model.impute(X)\n>>> X_outliers = model.outliers(X)\n>>> X_synthetic = model.generate_synthetic_data(n_samples=100)\n
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.fit","title":"fit","text":"fit(X: ndarray, y: Optional[ndarray] = None) -> None\n
Fit the model to the input data.
Parameters:
Name Type Description DefaultX
array-like of shape (n_samples, n_features) Input data to fit the model.
requiredy
array-like of shape (n_samples,), optional Target values.
None
Returns:
Name Type Descriptionself
None
TabPFNUnsupervisedModel Fitted model.
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.generate_synthetic_data","title":"generate_synthetic_data","text":"generate_synthetic_data(\n n_samples=100, t=1.0, n_permutations=3\n)\n
Generate synthetic data using the trained models. Uses imputation method to generate synthetic data, passed with a matrix of nans. Samples are generated feature by feature in one pass, so samples are not dependent on each other per feature.
Parameters:
Name Type Description Defaultn_samples
int, default=100 Number of synthetic samples to generate.
100
t
float, default=1.0 Temperature for sampling from the imputation distribution. Lower values result in more deterministic samples.
1.0
Returns:
Type Descriptiontorch.Tensor of shape (n_samples, n_features) Generated synthetic data.
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.get_embeddings","title":"get_embeddings","text":"get_embeddings(\n X: tensor, per_column: bool = False\n) -> tensor\n
Get the transformer embeddings for the test data X.
Parameters:
Name Type Description DefaultX
tensor
required Returns:
Type Descriptiontensor
torch.Tensor of shape (n_samples, embedding_dim)
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.get_embeddings_per_column","title":"get_embeddings_per_column","text":"get_embeddings_per_column(X: tensor) -> tensor\n
Alternative implementation for get_embeddings, where we get the embeddings for each column as a label separately and concatenate the results. This alternative way needs more passes but might be more accurate
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.impute","title":"impute","text":"impute(\n X: tensor, t: float = 1e-09, n_permutations: int = 10\n) -> tensor\n
Impute missing values in the input data.
Parameters:
Name Type Description DefaultX
torch.Tensor of shape (n_samples, n_features) Input data with missing values encoded as np.nan.
requiredt
float, default=0.000000001 Temperature for sampling from the imputation distribution. Lower values result in more deterministic imputations.
1e-09
Returns:
Type Descriptiontensor
torch.Tensor of shape (n_samples, n_features) Imputed data with missing values replaced.
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.impute_","title":"impute_","text":"impute_(\n X: tensor,\n t: float = 1e-09,\n n_permutations: int = 10,\n condition_on_all_features: bool = True,\n) -> tensor\n
Impute missing values (np.nan) in X by sampling all cells independently from the trained models
:param X: Input data of the shape (num_examples, num_features) with missing values encoded as np.nan :param t: Temperature for sampling from the imputation distribution, lower values are more deterministic :return: Imputed data, with missing values replaced
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.impute_single_permutation_","title":"impute_single_permutation_","text":"impute_single_permutation_(\n X: tensor,\n feature_permutation: list[int] | tuple[int],\n t: float = 1e-09,\n condition_on_all_features: bool = True,\n) -> tensor\n
Impute missing values (np.nan) in X by sampling all cells independently from the trained models
:param X: Input data of the shape (num_examples, num_features) with missing values encoded as np.nan :param t: Temperature for sampling from the imputation distribution, lower values are more deterministic :return: Imputed data, with missing values replaced
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.TabPFNUnsupervisedModel.outliers","title":"outliers","text":"outliers(X: tensor, n_permutations: int = 10) -> tensor\n
Preferred implementation for outliers, where we calculate the sample probability for each sample in X by multiplying the probabilities of each feature according to chain rule of probability. The first feature is estimated by using a zero feature as input.
Args X: Samples to calculate the sample probability for, shape (n_samples, n_features)
Returns:
Type Descriptiontensor
Sample unnormalized probability for each sample in X, shape (n_samples,)
"},{"location":"reference/tabpfn_extensions/unsupervised/unsupervised/#tabpfn_extensions.unsupervised.unsupervised.efficient_random_permutation_","title":"efficient_random_permutation_","text":"efficient_random_permutation_(indices)\n
Generate a single random permutation from a very large space.
:param n: The size of the permutation (number of elements) :return: A list representing a random permutation of numbers from 0 to n-1
"},{"location":"research/papers/","title":"Papers","text":""},{"location":"research/papers/#tabpfn-followups","title":"TabPFN Followups","text":"Forecastpfn: Synthetically-trained zero-shot forecasting Dooley, Khurana, Mohapatra, Naidu, White Advances in Neural Information Processing Systems, 2024, Volume 36.
Interpretable machine learning for TabPFN Rundel, Kobialka, von Crailsheim, Feurer, Nagler, R{\"u}gamer World Conference on Explainable Artificial Intelligence, 2024, Pages 465--476.
Scaling tabpfn: Sketching and feature selection for tabular prior-data fitted networks Feuer, Hegde, Cohen arXiv preprint arXiv:2311.10609, 2023.
In-Context Data Distillation with TabPFN Ma, Thomas, Yu, Caterini arXiv preprint arXiv:2402.06971, 2024.
Tokenize features, enhancing tables: the FT-TABPFN model for tabular classification Liu, Yang, Liang, Pang, Zou arXiv preprint arXiv:2406.06891, 2024.
Towards Localization via Data Embedding for TabPFN Koshil, Nagler, Feurer, Eggensperger NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
Enhancing Classification Performance Through the Synergistic Use of XGBoost, TABPFN, and LGBM Models Prabowo, others 2023 15th International Congress on Advanced Applied Informatics Winter (IIAI-AAI-Winter), 2023, Pages 255--259.
The Tabular Foundation Model TabPFN Outperforms Specialized Time Series Forecasting Models Based on Simple Features Hoo, M{\"u}ller, Salinas, Hutter NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
TabPFGen--Tabular Data Generation with TabPFN Ma, Dankar, Stein, Yu, Caterini arXiv preprint arXiv:2406.05216, 2024.
Drift-Resilient TabPFN: In-Context Learning Temporal Distribution Shifts on Tabular Data Helli, Schnurr, Hollmann, M{\"u}ller, Hutter arXiv preprint arXiv:2411.10634, 2024.
TabFlex: Scaling Tabular Learning to Millions with Linear Attention Zeng, Kang, Mueller NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
Retrieval \\& Fine-Tuning for In-Context Tabular Models Thomas, Ma, Hosseinzadeh, Golestan, Yu, Volkovs, Caterini arXiv preprint arXiv:2406.05207, 2024.
TabDPT: Scaling Tabular Foundation Models Ma, Thomas, Hosseinzadeh, Kamkari, Labach, Cresswell, Golestan, Yu, Volkovs, Caterini arXiv preprint arXiv:2410.18164, 2024.
Why In-Context Learning Transformers are Tabular Data Classifiers Breejen, Bae, Cha, Yun arXiv preprint arXiv:2405.13396, 2024.
MotherNet: Fast Training and Inference via Hyper-Network Transformers Mueller, Curino, Ramakrishnan NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
Mixture of In-Context Prompters for Tabular PFNs Xu, Cirit, Asadi, Sun, Wang arXiv preprint arXiv:2405.16156, 2024.
Fast and Accurate Zero-Training Classification for Tabular Engineering Data Picard, Ahmed arXiv preprint arXiv:2401.06948, 2024.
Fine-Tuning the Retrieval Mechanism for Tabular Deep Learning den Breejen, Bae, Cha, Kim, Koh, Yun NeurIPS 2023 Second Table Representation Learning Workshop, 2023.
TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks Feuer, Schirrmeister, Cherepanova, Hegde, Hutter, Goldblum, Cohen, White arXiv preprint arXiv:2402.11137, 2024.
Exploration of autoregressive models for in-context learning on tabular data Baur, Kim NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
TabMDA: Tabular Manifold Data Augmentation for Any Classifier using Transformers with In-context Subsetting Margeloiu, Bazaga, Simidjievski, Li{`o}, Jamnik arXiv preprint arXiv:2406.01805, 2024.
Large Scale Transfer Learning for Tabular Data via Language Modeling Gardner, Perdomo, Schmidt arXiv preprint arXiv:2406.12031, 2024.
AnnotatedTables: A Large Tabular Dataset with Language Model Annotations Hu, Fountalis, Tian, Vasiloglou arXiv preprint arXiv:2406.16349, 2024.
TabM: Advancing Tabular Deep Learning with Parameter-Efficient Ensembling Gorishniy, Kotelnikov, Babenko arXiv preprint arXiv:2410.24210, 2024.
Pre-Trained Tabular Transformer for Real-Time, Efficient, Stable Radiomics Data Processing: A Comprehensive Study Jiang, Jia, Zhang, Li 2023 IEEE International Conference on E-health Networking, Application \\& Services (Healthcom), 2023, Pages 276--281.
TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models Margeloiu, Jiang, Simidjievski, Jamnik arXiv preprint arXiv:2409.16118, 2024.
Augmenting Small-size Tabular Data with Class-Specific Energy-Based Models Margeloiu, Jiang, Simidjievski, Jamnik NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
FORECASTPFN: ZERO-SHOT LOW-RESOURCE FORECASTING Khurana, Dooley, Naidu, White, AI No Source, No Year.
What exactly has TabPFN learned to do? McCarter The Third Blogpost Track at ICLR 2024, No Year.
Statistical foundations of prior-data fitted networks Nagler International Conference on Machine Learning, 2023, Pages 25660--25676.
Why In-Context Learning Transformers are Tabular Data Classifiers den Breejen, Bae, Cha, Yun arXiv e-prints, 2024, Pages arXiv--2405.
"},{"location":"research/papers/#tabpfn-application","title":"TabPFN Application","text":"Large-scale chemoproteomics expedites ligand discovery and predicts ligand behavior in cells Offensperger, Tin, Duran-Frigola, Hahn, Dobner, Ende, Strohbach, Rukavina, Brennsteiner, Ogilvie, others Science, 2024, Volume 384, Issue 6694, Pages eadk5864.
Deep learning for cross-selling health insurance classification Chu, Than, Jo 2024 International Conference on Green Energy, Computing and Sustainable Technology (GECOST), 2024, Pages 453--457.
Early fault classification in rotating machinery with limited data using TabPFN Magad{'a}n, Rold{'a}n-G{'o}mez, Granda, Su{'a}rez IEEE Sensors Journal, 2023.
Artificial intelligence-driven predictive framework for early detection of still birth Alzakari, Aldrees, Umer, Cascone, Innab, Ashraf SLAS technology, 2024, Volume 29, Issue 6, Pages 100203.
Prostate Cancer Diagnosis via Visual Representation of Tabular Data and Deep Transfer Learning El-Melegy, Mamdouh, Ali, Badawy, El-Ghar, Alghamdi, El-Baz Bioengineering, 2024, Volume 11, Issue 7, Pages 635.
A machine learning-based approach for individualized prediction of short-term outcomes after anterior cervical corpectomy Karabacak, Schupper, Carr, Margetis Asian Spine Journal, 2024, Volume 18, Issue 4, Pages 541.
Comparing the Performance of a Deep Learning Model (TabPFN) for Predicting River Algal Blooms with Varying Data Composition Yang, Park Journal of Wetlands Research, 2024, Volume 26, Issue 3, Pages 197--203.
Adapting TabPFN for Zero-Inflated Metagenomic Data Perciballi, Granese, Fall, Zehraoui, Prifti, Zucker NeurIPS 2024 Third Table Representation Learning Workshop, No Year.
Comprehensive peripheral blood immunoprofiling reveals five immunotypes with immunotherapy response characteristics in patients with cancer Dyikanov, Zaitsev, Vasileva, Wang, Sokolov, Bolshakov, Frank, Turova, Golubeva, Gantseva, others Cancer Cell, 2024, Volume 42, Issue 5, Pages 759--779.
Predicting dementia in Parkinson's disease on a small tabular dataset using hybrid LightGBM--TabPFN and SHAP Tran, Byeon Digital Health, 2024, Volume 10, Pages 20552076241272585.
Enhancing actuarial non-life pricing models via transformers Brauer European Actuarial Journal, 2024, Pages 1--22.
Machine learning-based diagnostic prediction of minimal change disease: model development study Noda, Ichikawa, Shibagaki Scientific Reports, 2024, Volume 14, Issue 1, Pages 23460.
Using AutoML and generative AI to predict the type of wildfire propagation in Canadian conifer forests Khanmohammadi, Cruz, Perrakis, Alexander, Arashpour Ecological Informatics, 2024, Volume 82, Pages 102711.
Machine learning applications on lunar meteorite minerals: From classification to mechanical properties prediction Pe{~n}a-Asensio, Trigo-Rodr{'\\i}guez, Sort, Ib{'a}{~n}ez-Insa, Rimola International Journal of Mining Science and Technology, 2024.
Data-Driven Prognostication in Distal Medium Vessel Occlusions Using Explainable Machine Learning Karabacak, Ozkara, Faizy, Hardigan, Heit, Lakhani, Margetis, Mocco, Nael, Wintermark, others American Journal of Neuroradiology, 2024.
"},{"location":"tutorials/cheat_sheet/","title":"Cheat Sheet / Best practices","text":"Look at Autogluon cheat sheet [https://auto.gluon.ai/stable/cheatsheet.html]
"},{"location":"tutorials/classification/","title":"Classification","text":"TabPFN provides a powerful interface for handling classification tasks on tabular data. The TabPFNClassifier
class can be used for binary and multi-class classification problems.
Below is an example of how to use TabPFNClassifier
for a multi-class classification task:
from tabpfn_client import TabPFNClassifier\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\n\n# Load the Iris dataset\nX, y = load_iris(return_X_y=True)\n\n# Split data\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Initialize and train classifier\nclassifier = TabPFNClassifier(device='cuda', N_ensemble_configurations=10)\nclassifier.fit(X_train, y_train)\n\n# Evaluate\ny_pred = classifier.predict(X_test)\nprint('Test Accuracy:', accuracy_score(y_test, y_pred))\n
from tabpfn import TabPFNClassifier\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import train_test_split\nfrom sklearn.metrics import accuracy_score\n\n# Load the Iris dataset\nX, y = load_iris(return_X_y=True)\n\n# Split data\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n\n# Initialize and train classifier\nclassifier = TabPFNClassifier(device='cuda', N_ensemble_configurations=10)\nclassifier.fit(X_train, y_train)\n\n# Evaluate\ny_pred = classifier.predict(X_test)\nprint('Test Accuracy:', accuracy_score(y_test, y_pred))\n
"},{"location":"tutorials/classification/#example-with-autotabpfnclassifier","title":"Example with AutoTabPFNClassifier","text":"Abstract
AutoTabPFNClassifier yields the most accurate predictions for TabPFN and is recommended for most use cases. The AutoTabPFNClassifier and AutoTabPFNRegressor automatically run a hyperparameter search and build an ensemble of strong hyperparameters. You can control the runtime using \u00b4max_time\u00b4 and need to make no further adjustments to get best results.
from tabpfn_extensions.post_hoc_ensembles.sklearn_interface import AutoTabPFNClassifier\n\n# we refer to the PHE variant of TabPFN as AutoTabPFN in the code\nclf = AutoTabPFNClassifier(device='auto', max_time=30)\nX, y = load_breast_cancer(return_X_y=True)\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\n\nclf.fit(X_train, y_train)\n\npreds = clf.predict_proba(X_test)\ny_eval = np.argmax(preds, axis=1)\n\nprint('ROC AUC: ', sklearn.metrics.roc_auc_score(y_test, preds[:,1], multi_class='ovr'), 'Accuracy', sklearn.metrics.accuracy_score(y_test, y_eval))\n
"},{"location":"tutorials/distshift/","title":"TabPFN's Out-of-Distribution Excellence","text":"Recent research demonstrates TabPFN's out-of-distribution (OOD) performance on tabular data, with further improvements through Drift-Resilient modifications.
"},{"location":"tutorials/distshift/#key-performance-metrics","title":"Key Performance Metrics","text":"Model OOD Accuracy OOD ROC AUC TabPFN Base 0.688 0.786 TabPFN + Drift-Resilient 0.744 0.832 XGBoost 0.664 0.754 CatBoost 0.677 0.766"},{"location":"tutorials/distshift/#technical-improvements","title":"Technical Improvements","text":"The Drift-Resilient modifications introduce:
The enhanced model shows robust generalization across:
For comprehensive documentation and implementation details, visit the GitHub repository.
"},{"location":"tutorials/distshift/#citation","title":"Citation","text":"@inproceedings{\n helli2024driftresilient,\n title={Drift-Resilient Tab{PFN}: In-Context Learning Temporal Distribution Shifts on Tabular Data},\n author={Kai Helli and David Schnurr and Noah Hollmann and Samuel M{\\\"u}ller and Frank Hutter},\n booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},\n year={2024},\n url={https://openreview.net/forum?id=p3tSEFMwpG}\n}\n
"},{"location":"tutorials/regression/","title":"Regression","text":"TabPFN can also be applied to regression tasks using the TabPFNRegressor
class. This allows for predictive modeling of continuous outcomes.
An example usage of TabPFNRegressor
is shown below:
from tabpfn_client import TabPFNRegressor\nfrom sklearn.datasets import load_diabetes\nfrom sklearn.model_selection import train_test_split\nimport numpy as np\nimport sklearn\n\nreg = TabPFNRegressor(device='auto')\nX, y = load_diabetes(return_X_y=True)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\nreg.fit(X_train, y_train)\npreds = reg.predict(X_test)\n\nprint('Mean Squared Error (MSE): ', sklearn.metrics.mean_squared_error(y_test, preds))\nprint('Mean Absolute Error (MAE): ', sklearn.metrics.mean_absolute_error(y_test, preds))\nprint('R-squared (R^2): ', sklearn.metrics.r2_score(y_test, preds))\n
from tabpfn import TabPFNRegressor\nfrom sklearn.datasets import load_diabetes\nfrom sklearn.model_selection import train_test_split\nimport numpy as np\nimport sklearn\n\nreg = TabPFNRegressor(device='auto')\nX, y = load_diabetes(return_X_y=True)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\nreg.fit(X_train, y_train)\npreds = reg.predict(X_test)\n\nprint('Mean Squared Error (MSE): ', sklearn.metrics.mean_squared_error(y_test, preds))\nprint('Mean Absolute Error (MAE): ', sklearn.metrics.mean_absolute_error(y_test, preds))\nprint('R-squared (R^2): ', sklearn.metrics.r2_score(y_test, preds))\n
This example demonstrates how to train and evaluate a regression model. For more details on TabPFNRegressor and its parameters, refer to the API Reference section.
"},{"location":"tutorials/regression/#example-with-autotabpfnregressor","title":"Example with AutoTabPFNRegressor","text":"Abstract
AutoTabPFNRegressor yields the most accurate predictions for TabPFN and is recommended for most use cases. The AutoTabPFNClassifier and AutoTabPFNRegressor automatically run a hyperparameter search and build an ensemble of strong hyperparameters. You can control the runtime using \u00b4max_time\u00b4 and need to make no further adjustments to get best results.
from tabpfn_extensions.post_hoc_ensembles.sklearn_interface import AutoTabPFNRegressor\nfrom sklearn.datasets import load_diabetes\nfrom sklearn.model_selection import train_test_split\nimport numpy as np\nimport sklearn\n\nreg = AutoTabPFNRegressor(max_time=30) # runs for 30 seconds\nX, y = load_diabetes(return_X_y=True)\n\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)\nreg.fit(X_train, y_train)\npreds = reg.predict(X_test)\n\nprint('Mean Squared Error (MSE): ', sklearn.metrics.mean_squared_error(y_test, preds))\nprint('Mean Absolute Error (MAE): ', sklearn.metrics.mean_absolute_error(y_test, preds))\nprint('R-squared (R^2): ', sklearn.metrics.r2_score(y_test, preds))\n
"},{"location":"tutorials/timeseries/","title":"Time Series Tutorial","text":"TabPFN can be used for time series forecasting by framing it as a tabular regression problem. This tutorial demonstrates how to use the TabPFN Time Series package for accurate zero-shot forecasting. It was developed by Shi Bin Hoo, Samuel M\u00fcller, David Salinas and Frank Hutter.
"},{"location":"tutorials/timeseries/#quick-start","title":"Quick Start","text":"First, install the package:
!git clone https://github.com/liam-sbhoo/tabpfn-time-series.git\n!pip install -r tabpfn-time-series/requirements.txt\n
See the demo notebook for a complete example.
"},{"location":"tutorials/timeseries/#how-it-works","title":"How It Works","text":"TabPFN performs time series forecasting by:
This approach provides several benefits:
Join our Discord community for support and discussions about TabPFN time series forecasting.
"},{"location":"tutorials/unsupervised/","title":"Unsupervised functionalities","text":"Warning
This functionality is currently only supported using the Local TabPFN Version but not the API.
"},{"location":"tutorials/unsupervised/#data-generation","title":"Data Generation","text":"import numpy as np\nimport torch\nfrom sklearn.datasets import load_breast_cancer\nfrom sklearn.model_selection import train_test_split\nfrom tabpfn_extensions import TabPFNClassifier, TabPFNRegressor\nfrom tabpfn_extensions import unsupervised\n\n# Load the breast cancer dataset\ndf = load_breast_cancer(return_X_y=False)\nX, y = df[\"data\"], df[\"target\"]\nattribute_names = df[\"feature_names\"]\n\n# Split the data\nX_train, X_test, y_train, y_test = train_test_split(\n X, y, test_size=0.5, random_state=42\n)\n\n# Initialize TabPFN models\nclf = TabPFNClassifier(n_estimators=3)\nreg = TabPFNClassifier(n_estimators=3)\n\n# Initialize unsupervised model\nmodel_unsupervised = unsupervised.TabPFNUnsupervisedModel(\n tabpfn_clf=clf, tabpfn_reg=reg\n)\n\n# Select features for analysis (e.g., first two features)\nfeature_indices = [0, 1]\n\n# Create and run synthetic experiment\nexp_synthetic = unsupervised.experiments.GenerateSyntheticDataExperiment(\n task_type=\"unsupervised\"\n)\n\n# Convert data to torch tensors\nX_tensor = torch.tensor(X_train, dtype=torch.float32)\ny_tensor = torch.tensor(y_train, dtype=torch.float32)\n\n# Run the experiment\nresults = exp_synthetic.run(\n tabpfn=model_unsupervised,\n X=X_tensor,\n y=y_tensor,\n attribute_names=attribute_names,\n temp=1.0,\n n_samples=X_train.shape[0] * 3, # Generate 3x original samples\n indices=feature_indices,\n)\n
"},{"location":"tutorials/unsupervised/#outlier-detection","title":"Outlier Detection","text":"import torch\nfrom sklearn.datasets import load_breast_cancer\nfrom tabpfn_extensions import unsupervised\nfrom tabpfn_extensions import TabPFNClassifier, TabPFNRegressor\n\n# Load data\ndf = load_breast_cancer(return_X_y=False)\nX, y = df[\"data\"], df[\"target\"]\nattribute_names = df[\"feature_names\"]\n\n# Initialize models\nclf = TabPFNClassifier(n_estimators=4)\nreg = TabPFNRegressor(n_estimators=4)\nmodel_unsupervised = unsupervised.TabPFNUnsupervisedModel(\n tabpfn_clf=clf, tabpfn_reg=reg\n)\n\n# Run outlier detection\nexp_outlier = unsupervised.experiments.OutlierDetectionUnsupervisedExperiment(\n task_type=\"unsupervised\"\n)\nresults = exp_outlier.run(\n tabpfn=model_unsupervised,\n X=torch.tensor(X),\n y=torch.tensor(y),\n attribute_names=attribute_names,\n indices=[4, 12], # Analyze features 4 and 12\n)\n
"}]}
\ No newline at end of file
diff --git a/site/sitemap.xml.gz b/site/sitemap.xml.gz
index f9a8f66..ba7cbfd 100644
Binary files a/site/sitemap.xml.gz and b/site/sitemap.xml.gz differ
diff --git a/site/tabpfn-nature/index.html b/site/tabpfn-nature/index.html
index 91bf609..1b4d548 100644
--- a/site/tabpfn-nature/index.html
+++ b/site/tabpfn-nature/index.html
@@ -2640,7 +2640,6 @@ This page contains links to download, install, and set up TabPFN, as well as tutorials and resources to help you get started.