New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Element Capture #73
Comments
|
It could also help for tools like https://www.chromatic.com/ for visual regression testing ! |
We use the browser to create videos for Storrito.com and Audiocado.com, the described feature would have saved me many headaches. I would prefer access to an element's raw pixel buffer. Converting this to JPG, PNG, or compressed video frame beforehand would significantly slow down all video rendering efforts. |
I am also very interested in this API and second that it will be important to get the frame in an uncompressed way. |
This is going to be a big thing. Because there are many websites which are trying to capture elements for generating images like I am doing https://startupholic.com/post-thumbnail-maker or can be used to create images (For now, I have been capturing image by some work around using canvas.) |
The proposed API would also work very well together with my pixel-density property proposal: This would allow an element to be captured as a raw pixel buffer, but also having the ability to scale the resolution up or down. This would also work well with designers workflow of exporting assets at different resolutions for different DPI environments (usually 0.5x, 1x, 2x, 4x) |
A repo is now live at https://github.com/WICG/element-capture Happy incubating!! |
It's a small detail, but this should probably be a global function (ex. getElementCapture) or have a much less generic (and longer name). Adding a property to HTMLElement prevents adding a reflected attribute named My suggestion is to extend getDisplayMedia to accept an element option. Capturing an element or the screen/tab is pretty much the same thing security wise. Since this will prompt anyway, that's a small surface area change. |
I'm a fan of extending |
One possibility I would like would be
Communicating to the user what kind of permissions they would be giving is going to be quite tricky whichever way we choose, because the nature and contents of an Element can change; think the contents of a div or iframe, for instance. For that reason, I am mostly thinking about prompting the user for permission to capture the entire tab, and then extrapolating from that permission a permission to capture occluded elements. The delta in permission seems manage-able to me. I intend to do a write-up on that. If we do end up going with that approach, then after prompting the user once to share their entire tab, you'd be able to capture any and all elements. |
Why would this API even need permissions in the first place? Unless it is for like a 3rd party iframe / cross origin resource. Since I could just as well replicate the functionality with a library like html2canvas and get the underlying pixel data without having to ask for permissions. I understand it for if you share your Desktop etc that goes beyond just the current browser window, but since all HTML data etc is already accessible, why would getting the pixel data of an element suddenly be an elevated security risk? For a lot of use cases like capturing a preview or something, it would be very strange for a user to be suddenly prompted with a security warning, which might induce FUD for nor reason. |
Just to clarify further, one of the main use cases for the API would be to generate previews of content If this API should truly replace that need for doing it server side, then permissions are the wrong approach. To summarize: |
For anyone not aware it's worth mentioning html2canvas only attempts to re-draw the DOM using it's own heuristics and drawing tools available in the canvas. It has no way of asking the browser for an image for any portion of the page. It is imperfect for many edge cases and requires work-arounds if the content it's drawing has anything cross-origin anywhere in the tree it's trying to draw (you can taint the canvas very easily). From the users perspective, this feature adds the ability to create a live video stream of any part of the page and stream it out to anywhere. As a user, if I visit a webpage, I don't expect a livestream of what I'm looking at to start happening without my awareness. So for me the crux of it is that
With the prevalence of 3rd party scripts being included on websites these days, including the trend of serving these from 1st party domains to avoid blockers, giving all of these scripts the ability to flip on a livestream without intervention feels like a step too far. |
A script could just as easily record snapshots of document.body.innerHTML and "replay" the session on a backend if it had malicious intent, without the user ever knowing. I really fail to see any added security benefit from asking for permissions. Any data point that would go into creating a screenshot / video is already available to the browser (via parsing .getBoundingClientRect and computed style, as well as capturing event Listeners). |
The modern internet is already a nightmare in terms of forced cookie banners on every site that require user consent. |
Understand where you're coming from, I think it's a very good point. From a pure security perspective this doesn't expose data not already exposed in some other fashion to the code running on the page (I think?). In that sense, I'm on the same page. For me it comes down to the user being aware of what is happening / understanding what the browser is doing on their behalf (facilitating the video capture). That could be a permission flow, it could also be a box around the streamed area or a recording icon in the tab. Without any indication that it's happening I would personally just disable the feature. I realise I might be an outlier here though 😛 |
If the tab indicates that a capture is happening (especially for longer time frames such as a video stream), then that could be a good non-invasive compromise without disrupting user / developer flow. Similar to how its done for Tabs that play audio / capture webcam etc. In terms of API, the other main important thing is compression. It will be crucial to expose an API that is able to capture an uncompressed screenshot (png / uint8Array). Forcing compression on the video stream would be a solution, but then would break its use cases for exporting high quality video exports (think video editor etc.) |
Regarding the security model: For the canvas element, as soon as a third party resource gets used to draw on a canvas (for example loading a network image without CORS), the canvas gets "tainted" and it is not possible to grab an image from it anymore. If it's hard for browsers to handle these security concerns, then a permission prompt should be introduced, if the browser already has a notion of whether the DOM is tainted by third party content, then it would be cool to use it and allow capture of locally generated content without permission prompt. |
@JonnyBurger, the proposal assumes (i) cross-origin isolation and (2) an opt-in header, likely in the form of a Required Document Policy. At the moment, this is briefly covered in the original message in this thread. I will expand on this later. To ensure good velocity, I am starting out with the assumption of a prompt. |
Having thought about this in the meantime, another approach occurs to me, which is relevant if the user has already initiated capture using pre-existing means such as So where we already have Region Capture mutate an existing video track: track.cropTo(cropTarget); // Region Caputre We could now have: track.restrictCaptureTo(cropTarget); // Element Capture The resulting track would behave the same as previously, but:
|
This bit is very interesting. I'm imagining capturing an element that represents a video game being played, say a canvas, but I need to define where the audio should be captured from. That could be a seperate DOM tree that contains some Maybe audio source can be the second argument? |
There are separate tracks for audio and video, so I think it's enough to |
Ah of course, I forgot for a sec we are operating on the _video_Track |
It's interesting but with this approach it will be difficult to capture multiple elements. There is draft of getDisplayMediaSet for capturing multiple display surfaces but it's about capturing multiple tabs or windows not multiple |
You could: const track2 = track1.clone();
track1.restrictCaptureTo(cropTarget1);
track2.restrictCaptureTo(cropTarget2); Implementation-wise, this will require some work in Chrome to support. But in terms of API shape, this should work. |
There's also a more general requirement to be able to rasterize DOM elements without the output format being video. I realise semantics and use cases can be different as the rasterized dom elements can be updated at any time which is automatically handled by the MediaStream but I guarantee that offering just a MediaStream API it will be (ab)used for the [snapshot] rasterizing use-case given the plethora of hacky solutions that exist: Something to think about? |
Definitely screenshot-production is an important use-case that interests many (me included). I think we should keep in mind that:
|
1 is a valid point, however for 2 & 3: Perhaps the option of a 'raw' video stream with raw rgba pixel 'encoding' and a single frame (0fps options) would allow for greater freedom of use here? It would adhere to the spec proposal, would be element capture api compatible and solve the rasterization use-case. |
Indeed. And that's why I had previously proposed a dedicated API for screenshots. And such an API could be extended to support capturing an element. Until such a screenshot-grabbing API is introduced, the current Element Capture API can be used to polyfill, but it's of course imperfect when tackling missions other than that to which it is tailored (video).
Why would there be visual loss? There should be no lossy encoding involved.
That's an interesting consideration. Thanks for raising it. I'll keep it in mind.
I don't think the word "abusive" applies here. I would associate it more with the idea that the way the API is used is harmful to the user, which implies the misuse should somehow be blocked. |
TL;DR
The proposed API will allow a website to capture an HTML element as a video stream. Only the target element and its descendant elements will be captured. Parent and sibling elements will not be captured, even if they draw over/under the target element.
In a way, this can be thought of as similar to HTMLCanvasElement.captureStream(), expanded to any HTMLElement, with some additional security gating added to address the concern of leaking cross-origin content.
Introduction
State of the Art
The Web Platform currently offers the ability to screen-capture using getDisplayMedia(). The resulting MediaStream, composed of video and potentially also audio, can be manipulated locally and/or transmitted remotely. Some common use-cases include:
It is also possible to crop the resulting video using Region Capture. This is useful in multiple scenarios:
Issues
The main technique available nowadays is to use getDisplayMedia() and Region Capture; in some ways, it's the only technique. One major downside is that this captures both occluding as well as occluded content.
Consider this example, where the progress bar is occluding content which is unintentionally captured:
Unintentionally capturing occluding content is usually the problem, but occluded content is also a concern if the target element has is partially transparent or has bevelled edges.
Proposed Solution
Add a method along the lines of:
Initially, scope the discussion to a MediaStream with a single video MediaStreamTrack. Audio is an obvious follow-up, to be discussed later.
Successful invocations of this method are only possible from contexts that have:
The second requirement could be tricky, as sibling elements could affect the shape of the target element, and so information could leak. As a start, we could go with requiring opt-in from all of the content in the page, and later explore if this requirement can be safely reduced in scope for elements whose rendering is unaffected by adjacent elements, perhaps via some CSS property or some other sandboxing property.
The necessity of a user prompt is debatable. Ideally, requirements 1 and 2 should ensure that any content captured, is either already known to the capturing page, or can be known to it via communication with embedded documents. (If those documents opted in to being captured, it is arguable that they would be willing to transmit such information.) However, some non-DOM information could leak, like link-purpling. It is therefore best to start with a required user prompt, and re-examine that requirement later.
Privacy & Security Considerations
Pending.
The text was updated successfully, but these errors were encountered: