Introduction
Even after our Data API and Chat API become public, there will still be use cases that are not easily covered by them. These use cases can sometimes be addressed by scraping the content of the page that the viewer is seeing.
We primarily recommend scraping to browser extension and desktop app developers building tools that are not BEAM-specific and who want to avoid integrating the "Login with BEAM" functionality to get an Access Token.
We highly recommend that you first consider using our API before resorting to scraping. There are only so many valid use cases that fall under fair use policy, and anything beyond them will be considered bot behavior and potentially blocked.
How do I know if what I am building may be considered bot behavior?
If any of these statements apply to the tool you are building, its use would not fall under fair use for scraping:
- The data I am scraping from the site is already available via one of the official public APIs without requiring a User Access Token (Note: since chat events usually require a User Access Token, scraping chat is a valid use case).
- I am submitting content to the site on behalf of the user (e.g., chat message, search query, account login or signup)
- I am launching a new tab to get updated content rather than waiting for real-time updates to happen on the page
- I am trying to extract a single value (e.g., live viewer counter) from the page and not a repeatable section (e.g., chat message)
- I am trying to save the entirety of the site (e.g., full page as HTML) and not an isolated section
If in doubt, you can always join our Discord server and ask there, or send us an email at developer@beamstream.gg.
Getting Started
BEAM® uses RDFa (Resource Description Framework in attributes)-inspired syntax to tag HTML tags that contain structured information. Due to the highly customizable nature of BEAM, the markup that you see in your own browser may significantly differ from what other users are getting.
Using an RDFa spec, you will be able to query the page's DOM tree using your favorite tool to extract the information that you need. Sometimes, you'll be able to extract information that is otherwise not visible on the page as the user had hidden it. This may include timestamps when the time was hidden, full names or profile links when showing full names or links was disabled by the user.
You won't be able to scrape more information than the user could potentially see on the site. Anything that's not currently visible but is still scrapable can be revealed by the user by changing some settings.
Users may have ad-blocking or userscript extensions installed, which may alter the DOM. While there is no reason for them to touch our RDFa tags, there are still ways they can negatively impact your scraping code. If a user complains about your code not working, ask them which extensions they are using and tell them to try disabling them temporarily.
If you are sure you want to scrape BEAM's pages, then use the menu to get to the content that you are interested in to learn the specs.